A data type that can be accumulated, ie has an commutative and associative "add" operation,
but where the result type, R
, may be different from the element type being added, T
.
Helper object defining how to accumulate values of a particular type.
A simpler value of Accumulable where the result type being accumulated is the same as the types of elements being merged, i.
A simpler version of org.apache.spark.AccumulableParam where the only data type you can add in is the same type as the accumulated value.
A set of functions used to aggregate data.
A FutureAction for actions that could trigger multiple Spark jobs.
Base class for dependencies.
A future for the result of an action to support cancellation.
A org.apache.spark.Partitioner that implements hash-based partitioning using
Java's Object.hashCode
.
An iterator that wraps around an existing iterator to provide task killing functionality.
Utility trait for classes that want to log data.
Base class for dependencies where each partition of the parent RDD is used by at most one partition of the child RDD.
Represents a one-to-one dependency between partitions of the parent and child RDDs.
A partition of an RDD.
An object that defines how the elements in a key-value pair RDD are partitioned by key.
Represents a one-to-one dependency between ranges of partitions in the parent and child RDDs.
A org.apache.spark.Partitioner that partitions sortable records by range into roughly equal ranges.
Represents a dependency on the output of a shuffle stage.
A FutureAction holding the result of an action that triggers a single job.
Configuration for a Spark application.
Main entry point for Spark functionality.
Holds all the runtime environment objects for a running Spark instance (either master or worker), including the serializer, Akka actor system, block manager, map output tracker, etc.
The SparkContext object contains a number of implicit conversions and parameters for use with various Spark features.
Package for broadcast variables.
Core Spark functionality. org.apache.spark.SparkContext serves as the main entry point to Spark, while org.apache.spark.rdd.RDD is the data type representing a distributed collection, and provides most parallel operations.
In addition, org.apache.spark.rdd.PairRDDFunctions contains operations available only on RDDs of key-value pairs, such as
groupByKey
andjoin
; org.apache.spark.rdd.DoubleRDDFunctions contains operations available only on RDDs of Doubles; and org.apache.spark.rdd.SequenceFileRDDFunctions contains operations available on RDDs that can be saved as SequenceFiles. These operations are automatically available on any RDD of the right type (e.g. RDD[(Int, Int)] through implicit conversions when youimport org.apache.spark.SparkContext._
.Java programmers should reference the spark.api.java package for Spark programming APIs in Java.