Tuesday, May 8, 2018

flatMap & flatMapValues explained in example

In [101]: rdd = sc.parallelize([2, 3, 4])

In [102]: rdd.map(lambda x: range(1, x)).collect()
Out[102]: [[1], [1, 2], [1, 2, 3]]                                             

In [103]: rdd.flatMap(lambda x: range(1, x)).collect()
Out[103]: [1, 1, 2, 1, 2, 3]


In [104]: x = sc.parallelize([("a", ["x", "y", "z"]), ("b", ["p", "r"])])

In [106]: x.flatMapValues(lambda value:value).collect()
Out[106]: [('a', 'x'), ('a', 'y'), ('a', 'z'), ('b', 'p'), ('b', 'r')] 

2 comments:

  1. How to use flatMapValues on a dataset like [("a", ["x", "y", "z"], 10), ("b", ["p", "r"], 22)]?

    ReplyDelete
    Replies
    1. and get result as [('a', 'x', 10), ('a', 'y', 10), ('a', 'z', 10), ('b', 'p', 22), ('b', 'r', 22)]

      Delete