• Create
    • Ask a question
    • Create an article
    • Topics
    • Questions
    • Articles
    • Users
    • Badges
  • Sign in
  • Home /
avatar image
0

How to read a logical DOUBLE value stored in Avro format using Spark?

spark sqldataframeavroschemalogical
Question by mdolgonos · Jul 25, 2016 at 06:01 PM ·

I have existing Hive data stored in Avro format. For whatever reason reading these data by executing SELECT is very slow. I didn't figure out yet why. So I decided to read the data directly by navigating to the partition path and using Spark SQLContext. This works much faster. However, the problem I have is reading the DOUBLE values which are stored as logical double type. In the schema file they are defined as:

{"name":"ENDING_NET_RECEIVABLES_LOCAL","type":["null",{"type":"bytes","logicalType":"decimal","precision":38,"scale":18}],"doc":"Ending Net Receivables Local","default":null}

Somewhere I found a recommendation of using the following approach to convert my Avro schema into a Spark SQL schema:

def getSparkSchemaForAvro(sqc: SQLContext, avroSchema: Schema): StructType = 
{ val tmpPath = "/tmp/avro_dummy" 
val dummyFIle = File.createTempFile(tmpPath, "avro") 
val datumWriter = new GenericDatumWriter[Any]() datumWriter.setSchema(avroSchema) 
val writer = new DataFileWriter(datumWriter).create(avroSchema, dummyFIle) 
writer.flush() 
writer.close() 
val df = sqc.read.format("com.databricks.spark.avro").load("file://" + dummyFIle.getAbsolutePath) 
val sparkSchema = df.schema 
sparkSchema 
}
However, it converts the above type into
StructField(ENDING_NET_RECEIVABLES_LOCAL,BinaryType,true) instead of StructField(ENDING_NET_RECEIVABLES_LOCAL,DecimalType,true)

I this conversion was correct I could have read the file like this:

val df = sqlContext.read.schema(CRDataUtils.getSparkSchemaForAvro(sqlContext, avroSchema)).avro(path)

I also looked at com.databricks.spark.avro.SchemaConverters but the conversion method deftoSqlType(avroSchema: Schema):SchemaType returns SchemaType instead of StructType required by the above approach.

Can anyone help how to read Avro files with logical types in Spark.

Add comment
Comment
10 |600 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users

5 Answers

Sort

  • Votes
  • Created
  • Oldest
avatar image
0

Answer by subhamoy chowdhury · Mar 03, 2017 at 01:41 AM

facing same issue here. Can anyone please help?

I am using spark 1.6.0 in CDH 5.7

,

Same issue here can anyone please help?

Comment
Add comment · Share
10 |600 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users
avatar image
0

Answer by subhamoy chowdhury · Mar 03, 2017 at 08:03 PM

I am trying to write a dataframe to avro using databricks scala api. The writing is successful. But while reading the data from hive it is throwing exception:

Error: java.io.IOException: org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Failed to obtain scale value from file schema: "bytes" (state=,code=0)

In the avsc file I have column wityh type byte:

-->

{"name":"rate","type":["null",{"type":"bytes","logicalType":"decimal","precision":38,"scale":18}],"default":null}

reading

====================

val df = sqlContext.read.format("com.databricks.spark.avro")
.option("avroSchema", schema.toString)
.option("inferSchema", "true")
.avro(sourceFile)
.filter(preparePartitionFilterClause);

====================

writing

=======================

df.write.mode(SaveMode.Append).format("com.databricks.spark.avro").partitionBy(TrlConstants.PARTITION_COLUMN_COUNTRYCODE).save(path);

=======================

I am completely clue less please help!!!

Comment
Add comment · Show 1 · Share
10 |600 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users
avatar image mdolgonos · Mar 03, 2017 at 10:00 PM 0
Share

subhamoy chowdhury, How do your questions above qualify as answers?

avatar image
0

Answer by sai krishna Pujari · Apr 10, 2017 at 12:42 PM

@subhamoy chowdhuryI am also facing similar problem ..

Let me know if you got the solution.

Comment
Add comment · Share
10 |600 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users
avatar image
0

Answer by Uthayakumar · Mar 01, 2018 at 09:41 AM

HI Subhamoy / Pujari, Similar thing happening for me, if any findings please share with me . thanks in advance.
UK

Comment
Add comment · Share
10 |600 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users
avatar image
0

Answer by smiksha · Jun 11 at 12:29 PM

I am facing the same issue. Can someone suggest something? @Databricks_Support

Comment
Add comment · Share
10 |600 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users

Your answer

Hint: You can notify a user about this post by typing @username

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this Question

16 People are following this question.

avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image

Related Questions

Spark SQL DF coalesce and repartition not working on AVRO 0 Answers

Accessing fields generated by a UserDefinedAggregateFunction in Spark SQL 0 Answers

Avro creation using Nested Schema in Spark 0 Answers

applying a schema to a dataframe 1 Answer

Querying Avro data 1 Answer

  • Product
    • Databricks Cloud
    • FAQ
  • Spark
    • About Spark
    • Developer Resources
    • Community + Events
  • Services
    • Certification
    • Spark Support
    • Spark Training
  • Company
    • About Us
    • Team
    • News
    • Contact
  • Careers
  • Blog

Databricks Inc.
160 Spear Street, 13th Floor
San Francisco, CA 94105

info@databricks.com
1-866-330-0121

  • Twitter
  • LinkedIn
  • Facebook
  • Facebook

© Databricks 2015. All rights reserved. Apache Spark and the Apache Spark Logo are trademarks of the Apache Software Foundation.

  • Anonymous
  • Sign in
  • Create
  • Ask a question
  • Create an article
  • Explore
  • Topics
  • Questions
  • Articles
  • Users
  • Badges