How To

PySpark Word2Vec – Lessons Learned – Part 3

Another setting that I had to change was the spark.driver.maxResultSize.

While training the Word2Vec model, Spark job threw a SparkException “Job aborted due to stage failure: Total size of serialized results of x tasks (y MB) is bigger than spark.driver.maxResultSize (z MB)“.

This usually happens during the collect stage as the driver needs more memory.

Me default value was set at 1 GB and I was getting this error. I increased the value to 10 GB and resolved this issue.

Have this set to 0 means unlimited.

From official spark documentation

Having a high limit may cause out-of-memory errors in driver (depends on spark.driver.memory and memory overhead of objects in JVM). Setting a proper limit can protect the driver from out-of-memory errors.

Leave a Reply

Your email address will not be published. Required fields are marked *