How To

PySpark Word2Vec – Lessons Learned – Part 2

The next setting that I had to change was the spark.rpc.message.maxSize.

This was changed when I got the following error.

Serialized task XXX:XXX was XXX bytes, which exceeds max allowed: spark.rpc.message.maxSize (XXX bytes).
Consider increasing spark.rpc.message.maxSize or using broadcast variables for large values.

This error comes if there is large data that is being exchanged between the driver and the executor by Map and Reduce tasks.

The default value for spark.rpc.message.maxSize is 128 MB. I got this error resolved by increasing the value to 512 MB.

This value can be increased either in the Spark Configuration or by passing the new value as command line parameter.

Leave a Reply

Your email address will not be published. Required fields are marked *