public class Column extends Object implements Logging
DataFrame
.
A new column can be constructed based on the input columns present in a DataFrame:
df("columnName") // On a specific `df` DataFrame.
col("columnName") // A generic column no yet associated with a DataFrame.
col("columnName.field") // Extracting a struct field
col("`a.column.with.dots`") // Escape `.` in column names.
$"columnName" // Scala short hand for a named column.
Column
objects can be composed to form complex expressions:
$"a" + 1
$"a" === $"b"
expr
, but this method is for
debugging purposes only and can change in any future Spark releases.
Constructor and Description |
---|
Column(org.apache.spark.sql.catalyst.expressions.Expression expr) |
Column(String name) |
Modifier and Type | Method and Description |
---|---|
Column |
alias(String alias)
Gives the column an alias.
|
Column |
and(Column other)
Boolean AND.
|
Column |
apply(Object extraction)
Extracts a value or values from a complex type.
|
<U> TypedColumn<Object,U> |
as(Encoder<U> evidence$1)
Provides a type hint about the expected return value of this column.
|
Column |
as(scala.collection.Seq<String> aliases)
(Scala-specific) Assigns the given aliases to the results of a table generating function.
|
Column |
as(String alias)
Gives the column an alias.
|
Column |
as(String[] aliases)
Assigns the given aliases to the results of a table generating function.
|
Column |
as(String alias,
Metadata metadata)
Gives the column an alias with metadata.
|
Column |
as(scala.Symbol alias)
Gives the column an alias.
|
Column |
asc_nulls_first()
Returns a sort expression based on ascending order of the column,
and null values return before non-null values.
|
Column |
asc_nulls_last()
Returns a sort expression based on ascending order of the column,
and null values appear after non-null values.
|
Column |
asc()
Returns a sort expression based on ascending order of the column.
|
Column |
between(Object lowerBound,
Object upperBound)
True if the current column is between the lower bound and upper bound, inclusive.
|
Column |
bitwiseAND(Object other)
Compute bitwise AND of this expression with another expression.
|
Column |
bitwiseOR(Object other)
Compute bitwise OR of this expression with another expression.
|
Column |
bitwiseXOR(Object other)
Compute bitwise XOR of this expression with another expression.
|
Column |
cast(DataType to)
Casts the column to a different data type.
|
Column |
cast(String to)
Casts the column to a different data type, using the canonical string representation
of the type.
|
Column |
contains(Object other)
Contains the other element.
|
Column |
desc_nulls_first()
Returns a sort expression based on the descending order of the column,
and null values appear before non-null values.
|
Column |
desc_nulls_last()
Returns a sort expression based on the descending order of the column,
and null values appear after non-null values.
|
Column |
desc()
Returns a sort expression based on the descending order of the column.
|
Column |
divide(Object other)
Division this expression by another expression.
|
Column |
endsWith(Column other)
String ends with.
|
Column |
endsWith(String literal)
String ends with another string literal.
|
Column |
eqNullSafe(Object other)
Equality test that is safe for null values.
|
boolean |
equals(Object that) |
Column |
equalTo(Object other)
Equality test.
|
void |
explain(boolean extended)
Prints the expression to the console for debugging purposes.
|
org.apache.spark.sql.catalyst.expressions.Expression |
expr() |
Column |
geq(Object other)
Greater than or equal to an expression.
|
Column |
getField(String fieldName)
An expression that gets a field by name in a
StructType . |
Column |
getItem(Object key)
An expression that gets an item at position
ordinal out of an array,
or gets a value by key key in a MapType . |
Column |
gt(Object other)
Greater than.
|
int |
hashCode() |
Column |
isin(Object... list)
A boolean expression that is evaluated to true if the value of this expression is contained
by the evaluated values of the arguments.
|
Column |
isin(scala.collection.Seq<Object> list)
A boolean expression that is evaluated to true if the value of this expression is contained
by the evaluated values of the arguments.
|
Column |
isNaN()
True if the current expression is NaN.
|
Column |
isNotNull()
True if the current expression is NOT null.
|
Column |
isNull()
True if the current expression is null.
|
Column |
leq(Object other)
Less than or equal to.
|
Column |
like(String literal)
SQL like expression.
|
Column |
lt(Object other)
Less than.
|
Column |
minus(Object other)
Subtraction.
|
Column |
mod(Object other)
Modulo (a.k.a.
|
Column |
multiply(Object other)
Multiplication of this expression and another expression.
|
Column |
name(String alias)
Gives the column a name (alias).
|
Column |
notEqual(Object other)
Inequality test.
|
Column |
or(Column other)
Boolean OR.
|
Column |
otherwise(Object value)
Evaluates a list of conditions and returns one of multiple possible result expressions.
|
Column |
over()
Defines an empty analytic clause.
|
Column |
over(WindowSpec window)
Defines a windowing column.
|
Column |
plus(Object other)
Sum of this expression and another expression.
|
Column |
rlike(String literal)
SQL RLIKE expression (LIKE with Regex).
|
Column |
startsWith(Column other)
String starts with.
|
Column |
startsWith(String literal)
String starts with another string literal.
|
Column |
substr(Column startPos,
Column len)
An expression that returns a substring.
|
Column |
substr(int startPos,
int len)
An expression that returns a substring.
|
String |
toString() |
static scala.Option<org.apache.spark.sql.catalyst.expressions.Expression> |
unapply(Column col) |
Column |
when(Column condition,
Object value)
Evaluates a list of conditions and returns one of multiple possible result expressions.
|
initializeLogging, initializeLogIfNecessary, initializeLogIfNecessary, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning
public Column(org.apache.spark.sql.catalyst.expressions.Expression expr)
public Column(String name)
public static scala.Option<org.apache.spark.sql.catalyst.expressions.Expression> unapply(Column col)
public Column isin(Object... list)
list
- (undocumented)public org.apache.spark.sql.catalyst.expressions.Expression expr()
public String toString()
toString
in class Object
public boolean equals(Object that)
equals
in class Object
public int hashCode()
hashCode
in class Object
public <U> TypedColumn<Object,U> as(Encoder<U> evidence$1)
select
on a Dataset
to automatically convert the
results into the correct JVM types.evidence$1
- (undocumented)public Column apply(Object extraction)
- Given an Array, an integer ordinal can be used to retrieve a single value. - Given a Map, a key of the correct type can be used to retrieve an individual value. - Given a Struct, a string fieldName can be used to extract that field. - Given an Array of Structs, a string fieldName can be used to extract filed of every struct in that array, and return an Array of fields
extraction
- (undocumented)public Column equalTo(Object other)
// Scala:
df.filter( df("colA") === df("colB") )
// Java
import static org.apache.spark.sql.functions.*;
df.filter( col("colA").equalTo(col("colB")) );
other
- (undocumented)public Column notEqual(Object other)
// Scala:
df.select( df("colA") !== df("colB") )
df.select( !(df("colA") === df("colB")) )
// Java:
import static org.apache.spark.sql.functions.*;
df.filter( col("colA").notEqual(col("colB")) );
other
- (undocumented)public Column gt(Object other)
// Scala: The following selects people older than 21.
people.select( people("age") > lit(21) )
// Java:
import static org.apache.spark.sql.functions.*;
people.select( people("age").gt(21) );
other
- (undocumented)public Column lt(Object other)
// Scala: The following selects people younger than 21.
people.select( people("age") < 21 )
// Java:
people.select( people("age").lt(21) );
other
- (undocumented)public Column leq(Object other)
// Scala: The following selects people age 21 or younger than 21.
people.select( people("age") <= 21 )
// Java:
people.select( people("age").leq(21) );
other
- (undocumented)public Column geq(Object other)
// Scala: The following selects people age 21 or older than 21.
people.select( people("age") >= 21 )
// Java:
people.select( people("age").geq(21) )
other
- (undocumented)public Column eqNullSafe(Object other)
other
- (undocumented)public Column when(Column condition, Object value)
// Example: encoding gender string column into integer.
// Scala:
people.select(when(people("gender") === "male", 0)
.when(people("gender") === "female", 1)
.otherwise(2))
// Java:
people.select(when(col("gender").equalTo("male"), 0)
.when(col("gender").equalTo("female"), 1)
.otherwise(2))
condition
- (undocumented)value
- (undocumented)public Column otherwise(Object value)
// Example: encoding gender string column into integer.
// Scala:
people.select(when(people("gender") === "male", 0)
.when(people("gender") === "female", 1)
.otherwise(2))
// Java:
people.select(when(col("gender").equalTo("male"), 0)
.when(col("gender").equalTo("female"), 1)
.otherwise(2))
value
- (undocumented)public Column between(Object lowerBound, Object upperBound)
lowerBound
- (undocumented)upperBound
- (undocumented)public Column isNaN()
public Column isNull()
public Column isNotNull()
public Column or(Column other)
// Scala: The following selects people that are in school or employed.
people.filter( people("inSchool") || people("isEmployed") )
// Java:
people.filter( people("inSchool").or(people("isEmployed")) );
other
- (undocumented)public Column and(Column other)
// Scala: The following selects people that are in school and employed at the same time.
people.select( people("inSchool") && people("isEmployed") )
// Java:
people.select( people("inSchool").and(people("isEmployed")) );
other
- (undocumented)public Column plus(Object other)
// Scala: The following selects the sum of a person's height and weight.
people.select( people("height") + people("weight") )
// Java:
people.select( people("height").plus(people("weight")) );
other
- (undocumented)public Column minus(Object other)
// Scala: The following selects the difference between people's height and their weight.
people.select( people("height") - people("weight") )
// Java:
people.select( people("height").minus(people("weight")) );
other
- (undocumented)public Column multiply(Object other)
// Scala: The following multiplies a person's height by their weight.
people.select( people("height") * people("weight") )
// Java:
people.select( people("height").multiply(people("weight")) );
other
- (undocumented)public Column divide(Object other)
// Scala: The following divides a person's height by their weight.
people.select( people("height") / people("weight") )
// Java:
people.select( people("height").divide(people("weight")) );
other
- (undocumented)public Column mod(Object other)
other
- (undocumented)public Column isin(scala.collection.Seq<Object> list)
list
- (undocumented)public Column like(String literal)
literal
- (undocumented)public Column rlike(String literal)
literal
- (undocumented)public Column getItem(Object key)
ordinal
out of an array,
or gets a value by key key
in a MapType
.
key
- (undocumented)public Column getField(String fieldName)
StructType
.
fieldName
- (undocumented)public Column substr(Column startPos, Column len)
startPos
- expression for the starting position.len
- expression for the length of the substring.
public Column substr(int startPos, int len)
startPos
- starting position.len
- length of the substring.
public Column contains(Object other)
other
- (undocumented)public Column startsWith(Column other)
other
- (undocumented)public Column startsWith(String literal)
literal
- (undocumented)public Column endsWith(Column other)
other
- (undocumented)public Column endsWith(String literal)
literal
- (undocumented)public Column alias(String alias)
as
.
// Renames colA to colB in select output.
df.select($"colA".alias("colB"))
alias
- (undocumented)public Column as(String alias)
// Renames colA to colB in select output.
df.select($"colA".as("colB"))
If the current column has metadata associated with it, this metadata will be propagated
to the new column. If this not desired, use as
with explicitly empty metadata.
alias
- (undocumented)public Column as(scala.collection.Seq<String> aliases)
// Renames colA to colB in select output.
df.select(explode($"myMap").as("key" :: "value" :: Nil))
aliases
- (undocumented)public Column as(String[] aliases)
// Renames colA to colB in select output.
df.select(explode($"myMap").as("key" :: "value" :: Nil))
aliases
- (undocumented)public Column as(scala.Symbol alias)
// Renames colA to colB in select output.
df.select($"colA".as('colB))
If the current column has metadata associated with it, this metadata will be propagated
to the new column. If this not desired, use as
with explicitly empty metadata.
alias
- (undocumented)public Column as(String alias, Metadata metadata)
val metadata: Metadata = ...
df.select($"colA".as("colB", metadata))
alias
- (undocumented)metadata
- (undocumented)public Column name(String alias)
// Renames colA to colB in select output.
df.select($"colA".name("colB"))
If the current column has metadata associated with it, this metadata will be propagated
to the new column. If this not desired, use as
with explicitly empty metadata.
alias
- (undocumented)public Column cast(DataType to)
// Casts colA to IntegerType.
import org.apache.spark.sql.types.IntegerType
df.select(df("colA").cast(IntegerType))
// equivalent to
df.select(df("colA").cast("int"))
to
- (undocumented)public Column cast(String to)
string
, boolean
, byte
, short
, int
, long
,
float
, double
, decimal
, date
, timestamp
.
// Casts colA to integer.
df.select(df("colA").cast("int"))
to
- (undocumented)public Column desc()
// Scala
df.sort(df("age").desc)
// Java
df.sort(df.col("age").desc());
public Column desc_nulls_first()
// Scala: sort a DataFrame by age column in descending order and null values appearing first.
df.sort(df("age").desc_nulls_first)
// Java
df.sort(df.col("age").desc_nulls_first());
public Column desc_nulls_last()
// Scala: sort a DataFrame by age column in descending order and null values appearing last.
df.sort(df("age").desc_nulls_last)
// Java
df.sort(df.col("age").desc_nulls_last());
public Column asc()
// Scala: sort a DataFrame by age column in ascending order.
df.sort(df("age").asc)
// Java
df.sort(df.col("age").asc());
public Column asc_nulls_first()
// Scala: sort a DataFrame by age column in ascending order and null values appearing first.
df.sort(df("age").asc_nulls_last)
// Java
df.sort(df.col("age").asc_nulls_last());
public Column asc_nulls_last()
// Scala: sort a DataFrame by age column in ascending order and null values appearing last.
df.sort(df("age").asc_nulls_last)
// Java
df.sort(df.col("age").asc_nulls_last());
public void explain(boolean extended)
extended
- (undocumented)public Column bitwiseOR(Object other)
df.select($"colA".bitwiseOR($"colB"))
other
- (undocumented)public Column bitwiseAND(Object other)
df.select($"colA".bitwiseAND($"colB"))
other
- (undocumented)public Column bitwiseXOR(Object other)
df.select($"colA".bitwiseXOR($"colB"))
other
- (undocumented)public Column over(WindowSpec window)
val w = Window.partitionBy("name").orderBy("id")
df.select(
sum("price").over(w.rangeBetween(Window.unboundedPreceding, 2)),
avg("price").over(w.rowsBetween(Window.currentRow, 4))
)
window
- (undocumented)public Column over()
df.select(
sum("price").over(),
avg("price").over()
)