• Create
    • Ask a question
    • Create an article
    • Topics
    • Questions
    • Articles
    • Users
    • Badges
  • Sign in
  • Home /
avatar image
0

Cache tables in Spark SQL from different Hive schemas

spark sqlcachetableau
Question by gnolo · Nov 26, 2015 at 02:59 PM ·

I need to cache some tables in Spark SQL Thrift server to access them efficiently from Tableau Desktop client: data to be cached is retrieved from Hive, where I have multiple schemas which may include different tables with the same name.

For example, I have 2 schemas in Hive ("schema1" and "schema2"), and in both schemas there's a table named "mytable" with different content.

How may I cache in the same Spark SQL Thrift Server the two tables from the two Hive schemas?

It seems I cannot specify the schema name in "CACHE TABLE <tableName>" SQL statement. I tried to cache the tables in this way:

CACHE TABLE schema1_mytable AS SELECT * FROM schema1.mytable;

CACHE TABLE schema2_mytable AS SELECT * FROM schema2.mytable;

but it seems that when executing a query on those tables the relations in memory are not used at all, and data is read by scanning the Hive table.

Is there a way to cache in Spark SQL Thrift Server two or more tables having the same name but related to different Hive schemas? Thanks!

Add comment
Comment
10 |600 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users

1 Answer

Sort

  • Votes
  • Created
  • Oldest
avatar image
0

Answer by Miklos · Dec 04, 2015 at 06:40 PM

Those cache statements should work and you should be reading from in memory copies of the data. Are you using the Databricks platform and seeing the data cached within the executor's page from the Spark UI?

Comment
Add comment · Share
10 |600 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users

Your answer

Hint: You can notify a user about this post by typing @username

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this Question

10 People are following this question.

avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image

Related Questions

what does the cache do in spark sql 1 Answer

How does the JDBC ODBC Thrift Server stream query results back to the client? 4 Answers

Tableau Hive Tables are not visible 1 Answer

cache table advanced before executing the spark sql 0 Answers

How to config storage level when executing cache table in spark sql? 1 Answer

  • Product
    • Databricks Cloud
    • FAQ
  • Spark
    • About Spark
    • Developer Resources
    • Community + Events
  • Services
    • Certification
    • Spark Support
    • Spark Training
  • Company
    • About Us
    • Team
    • News
    • Contact
  • Careers
  • Blog

Databricks Inc.
160 Spear Street, 13th Floor
San Francisco, CA 94105

info@databricks.com
1-866-330-0121

  • Twitter
  • LinkedIn
  • Facebook
  • Facebook

© Databricks 2015. All rights reserved. Apache Spark and the Apache Spark Logo are trademarks of the Apache Software Foundation.

  • Anonymous
  • Sign in
  • Create
  • Ask a question
  • Create an article
  • Explore
  • Topics
  • Questions
  • Articles
  • Users
  • Badges