I need to cache some tables in Spark SQL Thrift server to access them efficiently from Tableau Desktop client: data to be cached is retrieved from Hive, where I have multiple schemas which may include different tables with the same name.
For example, I have 2 schemas in Hive ("schema1" and "schema2"), and in both schemas there's a table named "mytable" with different content.
How may I cache in the same Spark SQL Thrift Server the two tables from the two Hive schemas?
It seems I cannot specify the schema name in "CACHE TABLE <tableName>" SQL statement. I tried to cache the tables in this way:
CACHE TABLE schema1_mytable AS SELECT * FROM schema1.mytable;
CACHE TABLE schema2_mytable AS SELECT * FROM schema2.mytable;
but it seems that when executing a query on those tables the relations in memory are not used at all, and data is read by scanning the Hive table.
Is there a way to cache in Spark SQL Thrift Server two or more tables having the same name but related to different Hive schemas? Thanks!
Answer by Miklos · Dec 04, 2015 at 06:40 PM
Those cache statements should work and you should be reading from in memory copies of the data. Are you using the Databricks platform and seeing the data cached within the executor's page from the Spark UI?
what does the cache do in spark sql 1 Answer
How does the JDBC ODBC Thrift Server stream query results back to the client? 4 Answers
Tableau Hive Tables are not visible 1 Answer
cache table advanced before executing the spark sql 0 Answers
How to config storage level when executing cache table in spark sql? 1 Answer
Databricks Inc.
160 Spear Street, 13th Floor
San Francisco, CA 94105
info@databricks.com
1-866-330-0121