machine learning - Why is my Spark Mlib ALS collaborative filtering training model so slow? -


i use als collaborative filtering method content recommendation system in app. seems work fine , prediction part quick training model part takes on 20 seconds. need @ least 1 second or less, since need real time recommendations. use spark cluster 3 machines, each nodes has 17gb. use datastax shouldn't have influence.

i don't know why , how improve this? happy suggestions, thanks.

here basic spark code:

from pyspark.mllib.recommendation import als, matrixfactorizationmodel, rating  # load , parse data data = sc.textfile("data/mllib/als/test.data") ratings = data.map(lambda l: l.split(','))\     .map(lambda l: rating(int(l[0]), int(l[1]), float(l[2]))) 

this part takes on 20 seconds should take less 1.

# build recommendation model using alternating least squares rank = 10 numiterations = 10 model = als.train(ratings, rank, numiterations)  # evaluate model on training data testdata = ratings.map(lambda p: (p[0], p[1])) predictions = model.predictall(testdata).map(lambda r: ((r[0], r[1]), r[2])) ratesandpreds = ratings.map(lambda r: ((r[0], r[1]), r[2])).join(predictions) mse = ratesandpreds.map(lambda r: (r[1][0] - r[1][1])**2).mean() print("mean squared error = " + str(mse))  # save , load model model.save(sc, "target/tmp/mycollaborativefilter") samemodel = matrixfactorizationmodel.load(sc, "target/tmp/mycollaborativefilter") 


Comments

Popular posts from this blog

Ansible - ERROR! the field 'hosts' is required but was not set -

customize file_field button ruby on rails -

SoapUI on windows 10 - high DPI/4K scaling issue -