How To

PySpark Word2Vec – Lessons Learned – Part 3

Another setting that I had to change was the spark.driver.maxResultSize. While training the Word2Vec model, Spark job threw a SparkException “Job aborted due to stage failure: Total size of serialized results of x tasks (y MB) is bigger than spark.driver.maxResultSize (z MB)“. This usually happens during the collect stage as the driver needs more memory. […]