• Create
    • Ask a question
    • Create an article
    • Topics
    • Questions
    • Articles
    • Users
    • Badges
  • Sign in
  • Home /
avatar image
0

Why am I seeing "FetchFailedException: Adjusted frame length" and large shuffle block sizes as reported by the error?

data-management
Question by Databricks_Support · Feb 18, 2015 at 09:26 PM ·
Add comment
Comment
10 |600 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users

1 Answer

Sort

  • Votes
  • Created
  • Oldest
avatar image
0

Answer by Databricks_Support · Feb 18, 2015 at 09:26 PM

This usually means you're exceeding the 2GB shuffle block size limit which could be caused by the following:

  • not enough partitions (1 in the case of creating an RDD from a non-splittable, compressed file like Gzip). You need to repartition() after the data is loaded in order to partition the data (via shuffle) to other nodes in the cluster. This will give you the parallelism that you need for faster processing.
  • skewed data due to a poor partition key choice. Note: the average block size for an unskewed data source is (total data size) / (# mappers) / (# reducers), which is usually a divisor of around 100*100 to 1000*1000, so we typically see on the order of KB or MB for single block sizes.
Comment
Add comment · Share
10 |600 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users

Your answer

Hint: You can notify a user about this post by typing @username

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this Question

No one has followed this question yet.

Related Questions

Why do I still see Databricks EC2 instances even after I've terminated my Spark Cluster? 1 Answer

How do I parse bad data (json, xml, csv, text, etc)? 1 Answer

Is speculative execution supported by DB Cloud? 1 Answer

How do I fix the following error with saveAsTable(): "org.apache.hadoop.hive.ql.metadata.HiveException: checkPaths" followed by "has nested directory" and ".../_temporary"? 1 Answer

Can I use the Amazon Java SDK to read from and write to S3? 1 Answer

  • Product
    • Databricks Cloud
    • FAQ
  • Spark
    • About Spark
    • Developer Resources
    • Community + Events
  • Services
    • Certification
    • Spark Support
    • Spark Training
  • Company
    • About Us
    • Team
    • News
    • Contact
  • Careers
  • Blog

Databricks Inc.
160 Spear Street, 13th Floor
San Francisco, CA 94105

info@databricks.com
1-866-330-0121

  • Twitter
  • LinkedIn
  • Facebook
  • Facebook

© Databricks 2015. All rights reserved. Apache Spark and the Apache Spark Logo are trademarks of the Apache Software Foundation.

  • Anonymous
  • Sign in
  • Create
  • Ask a question
  • Create an article
  • Explore
  • Topics
  • Questions
  • Articles
  • Badges