rdd - pyspark: 'PipelinedRDD' object is not iterable -
i getting error not know why. erroring code:
= data.mappartitions(helper(locations))
where data rdd , helper defined as:
def helper(iterator, locations): x in iterator: c = locations[x] yield c
(locations array of data points) not see problem not best @ pyspark can please tell me why getting 'pipelinedrdd' object not iterable code?
rdd can iterated using map , lambda functions. have iterated through pipelined rdd using below method
lines1 = sc.textfile("\..\file1.csv") lines2 = sc.textfile("\..\file2.csv") pairs1 = lines1.map(lambda s: (int(s), 'file1')) pairs2 = lines2.map(lambda s: (int(s), 'file2')) pair_result = pairs1.union(pairs2) pair_result.reducebykey(lambda a, b: + ','+ b) result = pair.map(lambda l: tuple(l[:1]) + tuple(l[1].split(','))) result_ll = [list(elem) elem in result]
---> result_ll = [list(elem) elem in result] typeerror: 'pipelinedrdd' object not iterable
instead of replaced iteration using map function
result_ll = result.map( lambda elem: list(elem))
hope helps modify code accordingly
Comments
Post a Comment