machine learning - Why is my Spark Mlib ALS collaborative filtering training model so slow? -
i use als collaborative filtering method content recommendation system in app. seems work fine , prediction part quick training model part takes on 20 seconds. need @ least 1 second or less, since need real time recommendations. use spark cluster 3 machines, each nodes has 17gb. use datastax shouldn't have influence.
i don't know why , how improve this? happy suggestions, thanks.
here basic spark code:
from pyspark.mllib.recommendation import als, matrixfactorizationmodel, rating # load , parse data data = sc.textfile("data/mllib/als/test.data") ratings = data.map(lambda l: l.split(','))\ .map(lambda l: rating(int(l[0]), int(l[1]), float(l[2])))
this part takes on 20 seconds should take less 1.
# build recommendation model using alternating least squares rank = 10 numiterations = 10 model = als.train(ratings, rank, numiterations) # evaluate model on training data testdata = ratings.map(lambda p: (p[0], p[1])) predictions = model.predictall(testdata).map(lambda r: ((r[0], r[1]), r[2])) ratesandpreds = ratings.map(lambda r: ((r[0], r[1]), r[2])).join(predictions) mse = ratesandpreds.map(lambda r: (r[1][0] - r[1][1])**2).mean() print("mean squared error = " + str(mse)) # save , load model model.save(sc, "target/tmp/mycollaborativefilter") samemodel = matrixfactorizationmodel.load(sc, "target/tmp/mycollaborativefilter")
Comments
Post a Comment