I am trying to create an EXTERNAL TABLE from Parquet using SQL DDL as follows:
CREATE EXTERNAL TABLE foo ( a int, b int, c decimal(5,2)) STORED AS PARQUET LOCATION 's3a://...:...@<bucket>/<path>' TBLPROPERTIES("parquet.compress"="SNAPPY")
and i get the error:
AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.UnsupportedOperationException: Parquet does not support decimal. See HIVE-6384;
However, I can create a table from the same s3 source (parquet) using the Data UI tab in Databricks and get no problems, along with a Decimal column. The describe foo reports:
a int
b int
c decimal(5,2)
How can I create the same result using a SQL script? I have several tables and don't want to use the UI.
I believe the problem is that my cluster config shows the variable
spark.sql.hive.metastore.version 0.13.0
And that version doesn't support the richer Parquet column types. I am using the Databricks-provided metastore associated with the account. However when I launch a cluster with the version set to 1.2.1, all of my SQL requests fail with the message "Cancelled".
Is there a proven way to update the Databricks-provisioned metastore beyond 0.13.0 ??
Hello,
I'm experiencing a similar issue.
What is the exact syntax for setting the hive metastore properties in pyspark?
I tried using the commands below, but I still get the '
AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException' error.
spark.sql("SET spark.sql.hive.metastore.version = 1.2.1")
spark.sql("SET spark.sql.hive.metastore.jars = builtin")
Thanks.
Answer by stuarto · Jun 16, 2017 at 07:06 PM
Solved the issue. I needed to change 2 items from the default configuration of my cluster:
spark.sql.hive.metastore.version 1.2.1
spark.sql.hive.metastore.jars builtin
After that, the cluster was able to digest Parquet data containing decimal types.
Parquet & Snappy Conversion Recommendations 1 Answer
DataFrame append to Parquet file issues 2 Answers
Repartition and store in Parquet file 3 Answers
SparkR with StructType 0 Answers
Databricks Inc.
160 Spear Street, 13th Floor
San Francisco, CA 94105
info@databricks.com
1-866-330-0121