public class OrderedRDDFunctions<K,V,P extends scala.Product2<K,V>> extends Object implements Logging, scala.Serializable
org.apache.spark.SparkContext._
at the top of your program to
use these functions. They will work with any key type K
that has an implicit Ordering[K]
in
scope. Ordering objects already exist for all of the standard primitive types. Users can also
define their own orderings for custom types, or to override the default ordering. The implicit
ordering that is in the closest scope will be used.
import org.apache.spark.SparkContext._
val rdd: RDD[(String, Int)] = ...
implicit val caseInsensitiveOrdering = new Ordering[String] {
override def compare(a: String, b: String) = a.toLowerCase.compare(b.toLowerCase)
}
// Sort by key, using the above case insensitive ordering.
rdd.sortByKey()
Constructor and Description |
---|
OrderedRDDFunctions(RDD<P> self,
scala.math.Ordering<K> evidence$1,
scala.reflect.ClassTag<K> evidence$2,
scala.reflect.ClassTag<V> evidence$3,
scala.reflect.ClassTag<P> evidence$4) |
Modifier and Type | Method and Description |
---|---|
RDD<scala.Tuple2<K,V>> |
sortByKey(boolean ascending,
int numPartitions)
Sort the RDD by key, so that each partition contains a sorted range of the elements.
|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
initialized, initializeIfNecessary, initializeLogging, initLock, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning
public RDD<scala.Tuple2<K,V>> sortByKey(boolean ascending, int numPartitions)
collect
or save
on the resulting RDD will return or output an ordered list of records
(in the save
case, they will be written to multiple part-X
files in the filesystem, in
order of the keys).