pyspark.RDD.zip

RDD.zip(other: pyspark.rdd.RDD[U]) → pyspark.rdd.RDD[Tuple[T, U]][source]

Zips this RDD with another one, returning key-value pairs with the first element in each RDD second element in each RDD, etc. Assumes that the two RDDs have the same number of partitions and the same number of elements in each partition (e.g. one was made through a map on the other).

New in version 1.0.0.

Parameters
otherRDD

another RDD

Returns
RDD

a RDD containing the zipped key-value pairs

Examples

>>> rdd1 = sc.parallelize(range(0,5))
>>> rdd2 = sc.parallelize(range(1000, 1005))
>>> rdd1.zip(rdd2).collect()
[(0, 1000), (1, 1001), (2, 1002), (3, 1003), (4, 1004)]