Tuesday, May 8, 2018

flatMap & flatMapValues explained in example

In [101]: rdd = sc.parallelize([2, 3, 4])

In [102]: rdd.map(lambda x: range(1, x)).collect()
Out[102]: [[1], [1, 2], [1, 2, 3]]                                             

In [103]: rdd.flatMap(lambda x: range(1, x)).collect()
Out[103]: [1, 1, 2, 1, 2, 3]


In [104]: x = sc.parallelize([("a", ["x", "y", "z"]), ("b", ["p", "r"])])

In [106]: x.flatMapValues(lambda value:value).collect()
Out[106]: [('a', 'x'), ('a', 'y'), ('a', 'z'), ('b', 'p'), ('b', 'r')]