rdd - pyspark: 'PipelinedRDD' object is not iterable -


i getting error not know why. erroring code:

    = data.mappartitions(helper(locations)) 

where data rdd , helper defined as:

    def helper(iterator, locations):          x in iterator:             c = locations[x]             yield c 

(locations array of data points) not see problem not best @ pyspark can please tell me why getting 'pipelinedrdd' object not iterable code?

rdd can iterated using map , lambda functions. have iterated through pipelined rdd using below method

lines1 = sc.textfile("\..\file1.csv") lines2 = sc.textfile("\..\file2.csv")  pairs1 = lines1.map(lambda s: (int(s), 'file1')) pairs2 = lines2.map(lambda s: (int(s), 'file2'))  pair_result = pairs1.union(pairs2)  pair_result.reducebykey(lambda a, b: + ','+ b)  result = pair.map(lambda l: tuple(l[:1]) + tuple(l[1].split(','))) result_ll = [list(elem) elem in result] 

---> result_ll = [list(elem) elem in result] typeerror: 'pipelinedrdd' object not iterable

instead of replaced iteration using map function

result_ll = result.map( lambda elem: list(elem)) 

hope helps modify code accordingly


Comments

Popular posts from this blog

Ansible - ERROR! the field 'hosts' is required but was not set -

customize file_field button ruby on rails -

SoapUI on windows 10 - high DPI/4K scaling issue -