A data type that can be accumulated, ie has an commutative and associative "add" operation,
but where the result type, R
, may be different from the element type being added, T
.
Helper object defining how to accumulate values of a particular type.
A simpler value of Accumulable where the result type being accumulated is the same as the types of elements being merged, i.
A simpler version of org.apache.spark.AccumulableParam where the only data type you can add in is the same type as the accumulated value.
:: DeveloperApi :: A set of functions used to aggregate data.
:: Experimental :: A FutureAction for actions that could trigger multiple Spark jobs.
:: DeveloperApi :: Base class for dependencies.
:: DeveloperApi :: Task failed due to a runtime exception.
:: DeveloperApi :: Task failed to fetch shuffle data from a remote node.
:: Experimental :: A future for the result of an action to support cancellation.
A org.apache.spark.Partitioner that implements hash-based partitioning using
Java's Object.hashCode
.
:: DeveloperApi :: An iterator that wraps around an existing iterator to provide task killing functionality.
:: DeveloperApi :: Utility trait for classes that want to log data.
:: DeveloperApi :: Base class for dependencies where each partition of the child RDD depends on a small number of partitions of the parent RDD.
:: DeveloperApi :: Represents a one-to-one dependency between partitions of the parent and child RDDs.
A partition of an RDD.
An object that defines how the elements in a key-value pair RDD are partitioned by key.
:: DeveloperApi :: Represents a one-to-one dependency between ranges of partitions in the parent and child RDDs.
A org.apache.spark.Partitioner that partitions sortable records by range into roughly equal ranges.
:: DeveloperApi :: Represents a dependency on the output of a shuffle stage.
:: Experimental :: A FutureAction holding the result of an action that triggers a single job.
Configuration for a Spark application.
Main entry point for Spark functionality.
:: DeveloperApi :: Holds all the runtime environment objects for a running Spark instance (either master or worker), including the serializer, Akka actor system, block manager, map output tracker, etc.
:: DeveloperApi :: Contextual information about a task which can be read or mutated during execution.
:: DeveloperApi :: Various possible reasons why a task ended.
:: DeveloperApi :: Various possible reasons why a task failed.
:: DeveloperApi :: Exception thrown when a task is explicitly killed (i.
:: DeveloperApi :: The task failed because the executor that it was running on was lost.
:: DeveloperApi :: A org.apache.spark.scheduler.ShuffleMapTask that completed successfully earlier, but we lost the executor before the stage completed.
The SparkContext object contains a number of implicit conversions and parameters for use with various Spark features.
Resolves paths to files added through SparkContext.addFile()
.
:: DeveloperApi :: Task succeeded.
:: DeveloperApi :: Task was killed intentionally and needs to be rescheduled.
:: DeveloperApi :: The task finished successfully, but the result was lost from the executor's block manager before it was fetched.
:: DeveloperApi :: We don't know why the task ended -- for example, because of a ClassNotFound exception when deserializing the task result.
Spark annotations to mark an API experimental or intended only for advanced usages by developers.
Bagel: An implementation of Pregel in Spark.
Spark's broadcast variables, used to broadcast immutable datasets to all nodes.
Executor components used with various cluster managers.
ALPHA COMPONENT GraphX is a graph processing framework built on top of Spark.
IO codecs used for compression.
Spark's machine learning library.
:: Experimental ::
Provides several RDD implementations.
Spark's scheduling components.
Pluggable serializers for RDD and shuffle data.
Allows the execution of relational queries, including those expressed in SQL using Spark.
Spark Streaming functionality.
Spark utilities.
Core Spark functionality. org.apache.spark.SparkContext serves as the main entry point to Spark, while org.apache.spark.rdd.RDD is the data type representing a distributed collection, and provides most parallel operations.
In addition, org.apache.spark.rdd.PairRDDFunctions contains operations available only on RDDs of key-value pairs, such as
groupByKey
andjoin
; org.apache.spark.rdd.DoubleRDDFunctions contains operations available only on RDDs of Doubles; and org.apache.spark.rdd.SequenceFileRDDFunctions contains operations available on RDDs that can be saved as SequenceFiles. These operations are automatically available on any RDD of the right type (e.g. RDD[(Int, Int)] through implicit conversions when youimport org.apache.spark.SparkContext._
.Java programmers should reference the org.apache.spark.api.java package for Spark programming APIs in Java.
Classes and methods marked with Experimental are user-facing features which have not been officially adopted by the Spark project. These are subject to change or removal in minor releases.
Classes and methods marked with Developer API are intended for advanced users want to extend Spark through lower level interfaces. These are subject to changes or removal in minor releases.