Force loading RDD from file to memory in Spark -
i have demo application runs spark computation. loads rdd stored in object file, , perform tasks depends on user's input.
loading rdd using sparkcontext.objectfile()
lengthy operation. since time issue, load before demo starts, , perform calculations depend on input during presentation. however, spark's lazy policy leads reading file once entire computation triggered.
rdd.cache()
not trick its-own. caching lazy operation too.
is there way force-load rdd file?
if not, there way speed rdd load, and/or keep in memory future spark jobs?
spark version 1.5 , runs in single-node standalone mode. file read local file system. can tweak spark's configuration or these settings if needed.
after calling cache()
, call action on rdd (usually 1 uses count()
) "materialize" cache. further calls rdd use cached version:
rdd.cache().count() // load rdd // use rdd, it's cached
Comments
Post a Comment