• Create
    • Ask a question
    • Create an article
    • Topics
    • Questions
    • Articles
    • Users
    • Badges
  • Sign in
  • Home /
avatar image
1

what does the cache do in spark sql

spark sqlcache
Question by zhong zhang · Feb 15, 2016 at 09:33 PM ·

From the Spark official document, it says:

Spark SQL can cache tables using an in-memory columnar format by calling sqlContext.cacheTable("tableName") or dataFrame.cache(). Then Spark SQL will scan only required columns and will automatically tune compression to minimize memory usage and GC pressure. You can call sqlContext.uncacheTable("tableName") to remove the table from memory.

What does caching tables using a in-memory columnar format really mean? Put the whole table into the memory? As we know that cache is also lazy, the table is cached after the first action on the query. Does it make any difference to the cached table if choosing different actions and queries? I've googled this cache topic several times but failed to find some detailed articles. I would really appreciate it if anyone can provides some links or articles for this topic.

http://spark.apache.org/docs/latest/sql-programming-guide.html#caching-data-in-memory

Add comment
Comment
10 |600 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users

1 Answer

Sort

  • Votes
  • Created
  • Oldest
avatar image
0

Answer by bill · Feb 19, 2016 at 08:52 PM

It means that it puts the whole table (or as much as it can) in memory in an optimized format for later queries. It does make a difference if you use difference actions in an attempt to cache the data.

There is some more information on this post and this post.

Comment
Add comment · Show 1 · Share
10 |600 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users
avatar image Rahul Tokase · Jun 30, 2017 at 06:09 PM 0
Share

Hi bill

I am having a following situation can you please let me know if caching can help me with this.

1) I have a big table created a dataframe out of this using read.jdbc() method say bigTableDataframe

2) Later i want to join this data frame with 4 different small tables. like below

bigTableDataframe = bigTableDataframe.join(smallTableDataFrame1);

bigTableDataframe = bigTableDataframe.join(smallTableDataFrame2); like this go on

So what you suggest before doing the join should I cache bigTableDataframe.cache() and then a give a call to join method().

Or should i make a use of broadcast join ?

Your answer

Hint: You can notify a user about this post by typing @username

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this Question

12 People are following this question.

avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image

Related Questions

Cache tables in Spark SQL from different Hive schemas 1 Answer

cache table advanced before executing the spark sql 0 Answers

How to config storage level when executing cache table in spark sql? 1 Answer

Caching in Spark SQL 1 Answer

How do I create a Spark SQL table with columns greater than 22 columns (Scala 2.10 limit on case class parameters)? 1 Answer

  • Product
    • Databricks Cloud
    • FAQ
  • Spark
    • About Spark
    • Developer Resources
    • Community + Events
  • Services
    • Certification
    • Spark Support
    • Spark Training
  • Company
    • About Us
    • Team
    • News
    • Contact
  • Careers
  • Blog

Databricks Inc.
160 Spear Street, 13th Floor
San Francisco, CA 94105

info@databricks.com
1-866-330-0121

  • Twitter
  • LinkedIn
  • Facebook
  • Facebook

© Databricks 2015. All rights reserved. Apache Spark and the Apache Spark Logo are trademarks of the Apache Software Foundation.

  • Anonymous
  • Sign in
  • Create
  • Ask a question
  • Create an article
  • Explore
  • Topics
  • Questions
  • Articles
  • Users
  • Badges