:: Experimental ::
A base class for user-defined aggregations, which can be used in Dataset
operations to take
all of the elements of a group and reduce them to a single value.
A Row
representing a mutable aggregation buffer.
A Row
representing a mutable aggregation buffer.
This is not meant to be extended outside of Spark.
1.5.0
The base class for implementing user-defined aggregate functions (UDAF).
The base class for implementing user-defined aggregate functions (UDAF).
1.5.0
A user-defined function.
A user-defined function. To create one, use the udf
functions in functions
.
As an example:
// Defined a UDF that returns true or false based on some numeric score. val predict = udf((score: Double) => if (score > 0.5) true else false) // Projects a column that adds a prediction column based on the score column. df.select( predict(df("score")) )
1.3.0
The user-defined functions must be deterministic. Due to optimization, duplicate invocations may be eliminated or the function may even be invoked more times than it is present in the query.
Utility functions for defining window in DataFrames.
Utility functions for defining window in DataFrames.
// PARTITION BY country ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW Window.partitionBy("country").orderBy("date") .rowsBetween(Window.unboundedPreceding, Window.currentRow) // PARTITION BY country ORDER BY date ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING Window.partitionBy("country").orderBy("date").rowsBetween(-3, 3)
1.4.0
A window specification that defines the partitioning, ordering, and frame boundaries.
A window specification that defines the partitioning, ordering, and frame boundaries.
Use the static methods in Window to create a WindowSpec.
1.4.0
Utility functions for defining window in DataFrames.
Utility functions for defining window in DataFrames.
// PARTITION BY country ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW Window.partitionBy("country").orderBy("date") .rowsBetween(Window.unboundedPreceding, Window.currentRow) // PARTITION BY country ORDER BY date ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING Window.partitionBy("country").orderBy("date").rowsBetween(-3, 3)
1.4.0
:: Experimental :: A base class for user-defined aggregations, which can be used in
Dataset
operations to take all of the elements of a group and reduce them to a single value.For example, the following aggregator extracts an
int
from a specific class and adds them up:Based loosely on Aggregator from Algebird: https://github.com/twitter/algebird
The input type for the aggregation.
The type of the intermediate value of the reduction.
The type of the final output result.
1.6.0