Force loading RDD from file to memory in Spark -


i have demo application runs spark computation. loads rdd stored in object file, , perform tasks depends on user's input.

loading rdd using sparkcontext.objectfile() lengthy operation. since time issue, load before demo starts, , perform calculations depend on input during presentation. however, spark's lazy policy leads reading file once entire computation triggered.

rdd.cache() not trick its-own. caching lazy operation too.

is there way force-load rdd file?

if not, there way speed rdd load, and/or keep in memory future spark jobs?

spark version 1.5 , runs in single-node standalone mode. file read local file system. can tweak spark's configuration or these settings if needed.

after calling cache(), call action on rdd (usually 1 uses count()) "materialize" cache. further calls rdd use cached version:

rdd.cache().count() // load rdd // use rdd, it's cached 

Comments

Popular posts from this blog

Ansible - ERROR! the field 'hosts' is required but was not set -

SoapUI on windows 10 - high DPI/4K scaling issue -

customize file_field button ruby on rails -