RDD.
cartesian
Return the Cartesian product of this RDD and another one, that is, the RDD of all pairs of elements (a, b) where a is in self and b is in other.
(a, b)
a
b
New in version 0.7.0.
RDD
another RDD
the Cartesian product of this RDD and another one
See also
pyspark.sql.DataFrame.crossJoin()
Examples
>>> rdd = sc.parallelize([1, 2]) >>> sorted(rdd.cartesian(rdd).collect()) [(1, 1), (1, 2), (2, 1), (2, 2)]