GroupedDataset

Value Members

final def !=(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def !=(arg0: Any): Boolean

Definition Classes
Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def ==(arg0: Any): Boolean

Definition Classes
Any
def agg[U1, U2, U3, U4](col1: TypedColumn[V, U1], col2: TypedColumn[V, U2], col3: TypedColumn[V, U3], col4: TypedColumn[V, U4]): Dataset[(K, U1, U2, U3, U4)]

Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.
Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.

Since
1.6.0
def agg[U1, U2, U3](col1: TypedColumn[V, U1], col2: TypedColumn[V, U2], col3: TypedColumn[V, U3]): Dataset[(K, U1, U2, U3)]

Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.
Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.

Since
1.6.0
def agg[U1, U2](col1: TypedColumn[V, U1], col2: TypedColumn[V, U2]): Dataset[(K, U1, U2)]

Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.
Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.

Since
1.6.0
def agg[U1](col1: TypedColumn[V, U1]): Dataset[(K, U1)]

Computes the given aggregation, returning a Dataset of tuples for each unique key and the result of computing this aggregation over all elements in the group.
Computes the given aggregation, returning a Dataset of tuples for each unique key and the result of computing this aggregation over all elements in the group.

Since
1.6.0
def aggUntyped(columns: TypedColumn[_, _]*): Dataset[_]

Internal helper function for building typed aggregations that return tuples.
Internal helper function for building typed aggregations that return tuples. For simplicity and code reuse, we do this without the help of the type system and then use helper functions that cast appropriately for the user facing interface. TODO: does not handle aggrecations that return nonflat results,

Attributes
protected
final def asInstanceOf[T0]: T0

Definition Classes
Any
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
def cogroup[U, R](other: GroupedDataset[K, U], f: CoGroupFunction[K, V, U, R], encoder: Encoder[R]): Dataset[R]

Applies the given function to each cogrouped data.
Applies the given function to each cogrouped data. For each unique group, the function will be passed the grouping key and 2 iterators containing all elements in the group from Dataset this and other. The function can return an iterator containing elements of an arbitrary type which will be returned as a new Dataset.

Since
1.6.0
def cogroup[U, R](other: GroupedDataset[K, U])(f: (K, Iterator[V], Iterator[U]) ⇒ TraversableOnce[R])(implicit arg0: Encoder[R]): Dataset[R]

Applies the given function to each cogrouped data.
Applies the given function to each cogrouped data. For each unique group, the function will be passed the grouping key and 2 iterators containing all elements in the group from Dataset this and other. The function can return an iterator containing elements of an arbitrary type which will be returned as a new Dataset.

Since
1.6.0
def count(): Dataset[(K, Long)]

Returns a Dataset that contains a tuple with each key and the number of items present for that key.
Returns a Dataset that contains a tuple with each key and the number of items present for that key.

Since
1.6.0
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
def flatMapGroups[U](f: FlatMapGroupsFunction[K, V, U], encoder: Encoder[U]): Dataset[U]

Applies the given function to each group of data.
Applies the given function to each group of data. For each unique group, the function will be passed the group key and an iterator that contains all of the elements in the group. The function can return an iterator containing elements of an arbitrary type which will be returned as a new Dataset.
This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or an Aggregator.
Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling toList) unless they are sure that this is possible given the memory constraints of their cluster.

Since
1.6.0
def flatMapGroups[U](f: (K, Iterator[V]) ⇒ TraversableOnce[U])(implicit arg0: Encoder[U]): Dataset[U]

Applies the given function to each group of data.
Applies the given function to each group of data. For each unique group, the function will be passed the group key and an iterator that contains all of the elements in the group. The function can return an iterator containing elements of an arbitrary type which will be returned as a new Dataset.
This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or an Aggregator.
Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling toList) unless they are sure that this is possible given the memory constraints of their cluster.

Since
1.6.0
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def hashCode(): Int

Definition Classes
AnyRef → Any
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
def keyAs[L](implicit arg0: Encoder[L]): GroupedDataset[L, V]

Returns a new GroupedDataset where the type of the key has been mapped to the specified type.
Returns a new GroupedDataset where the type of the key has been mapped to the specified type. The mapping of key columns to the type follows the same rules as as on Dataset.

Since
1.6.0
def keys: Dataset[K]

Returns a Dataset that contains each unique key.
Returns a Dataset that contains each unique key.

Since
1.6.0
def mapGroups[U](f: MapGroupsFunction[K, V, U], encoder: Encoder[U]): Dataset[U]

Applies the given function to each group of data.
Applies the given function to each group of data. For each unique group, the function will be passed the group key and an iterator that contains all of the elements in the group. The function can return an element of arbitrary type which will be returned as a new Dataset.
This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or an Aggregator.
Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling toList) unless they are sure that this is possible given the memory constraints of their cluster.

Since
1.6.0
def mapGroups[U](f: (K, Iterator[V]) ⇒ U)(implicit arg0: Encoder[U]): Dataset[U]

Applies the given function to each group of data.
Applies the given function to each group of data. For each unique group, the function will be passed the group key and an iterator that contains all of the elements in the group. The function can return an element of arbitrary type which will be returned as a new Dataset.
This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or an Aggregator.
Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling toList) unless they are sure that this is possible given the memory constraints of their cluster.

Since
1.6.0
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
val queryExecution: QueryExecution
def reduce(f: ReduceFunction[V]): Dataset[(K, V)]

Reduces the elements of each group of data using the specified binary function.
Reduces the elements of each group of data using the specified binary function. The given function must be commutative and associative or the result may be non-deterministic.

Since
1.6.0
def reduce(f: (V, V) ⇒ V): Dataset[(K, V)]

Reduces the elements of each group of data using the specified binary function.
Reduces the elements of each group of data using the specified binary function. The given function must be commutative and associative or the result may be non-deterministic.

Since
1.6.0
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

class GroupedDataset[K, V] extends Serializable

Value Members

final def !=(arg0: AnyRef): Boolean

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: AnyRef): Boolean

final def ==(arg0: Any): Boolean

def agg[U1, U2, U3, U4](col1: TypedColumn[V, U1], col2: TypedColumn[V, U2], col3: TypedColumn[V, U3], col4: TypedColumn[V, U4]): Dataset[(K, U1, U2, U3, U4)]

def agg[U1, U2, U3](col1: TypedColumn[V, U1], col2: TypedColumn[V, U2], col3: TypedColumn[V, U3]): Dataset[(K, U1, U2, U3)]

def agg[U1, U2](col1: TypedColumn[V, U1], col2: TypedColumn[V, U2]): Dataset[(K, U1, U2)]

def agg[U1](col1: TypedColumn[V, U1]): Dataset[(K, U1)]

def aggUntyped(columns: TypedColumn[_, _]*): Dataset[_]

final def asInstanceOf[T0]: T0

def clone(): AnyRef

def cogroup[U, R](other: GroupedDataset[K, U], f: CoGroupFunction[K, V, U, R], encoder: Encoder[R]): Dataset[R]

def cogroup[U, R](other: GroupedDataset[K, U])(f: (K, Iterator[V], Iterator[U]) ⇒ TraversableOnce[R])(implicit arg0: Encoder[R]): Dataset[R]

def count(): Dataset[(K, Long)]

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def finalize(): Unit

def flatMapGroups[U](f: FlatMapGroupsFunction[K, V, U], encoder: Encoder[U]): Dataset[U]

def flatMapGroups[U](f: (K, Iterator[V]) ⇒ TraversableOnce[U])(implicit arg0: Encoder[U]): Dataset[U]

final def getClass(): Class[_]

def hashCode(): Int

final def isInstanceOf[T0]: Boolean

def keyAs[L](implicit arg0: Encoder[L]): GroupedDataset[L, V]

def keys: Dataset[K]

def mapGroups[U](f: MapGroupsFunction[K, V, U], encoder: Encoder[U]): Dataset[U]

def mapGroups[U](f: (K, Iterator[V]) ⇒ U)(implicit arg0: Encoder[U]): Dataset[U]

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

val queryExecution: QueryExecution

def reduce(f: ReduceFunction[V]): Dataset[(K, V)]

def reduce(f: (V, V) ⇒ V): Dataset[(K, V)]

final def synchronized[T0](arg0: ⇒ T0): T0

def toString(): String

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped