• Create
    • Ask a question
    • Create an article
    • Topics
    • Questions
    • Articles
    • Users
    • Badges
  • Sign in
  • Home /
avatar image
0

Avro support for structured streaming write?

sparkstreamingavro
Question by manugarri · May 04, 2017 at 03:40 PM ·

With the new structured streaming api released in Spark, the new api for writing a Stream is as follows. As an example, I am reading from kafka and writing to hdfs in avro format.

kafkaStream = ( spark .readStream .format("kafka") .option("kafka.bootstrap.servers", BROKER) .option("subscribe", INPUT_QUEUES) .load() )

now we write the stream

query = ( kafkaStream .writeStream .format("com.databricks.spark.avro") .option("path", PATH) .option("checkpointLocation", "/tmp/") .start() ) query.await

This works well if thewriteStreamformat isparquetorconsolefor example, but when you use the formatcom.databricks.spark.avroit breaks with the following error:

java.lang.UnsupportedOperationException: Data source com.databricks.spark.avro does not support streamed writing 

I posted this in an issue on github as well (it's been more than 2 months and not a single reply), hopefully someone will yield some light on the matter here.

Add comment
Comment
10 |600 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users

Sort

  • Votes
  • Created
  • Oldest

Your answer

Hint: You can notify a user about this post by typing @username

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this Question

21 People are following this question.

avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image

Related Questions

Spark Structured Streaming Kafka source checkpointing frequency 0 Answers

Better way to specify spark.scheduler.pool property ? 1 Answer

How can i read multiple avro directories into a single DataFrame? 3 Answers

RegisterRDDAsTable 1 Answer

Avro creation using Nested Schema in Spark 0 Answers

  • Product
    • Databricks Cloud
    • FAQ
  • Spark
    • About Spark
    • Developer Resources
    • Community + Events
  • Services
    • Certification
    • Spark Support
    • Spark Training
  • Company
    • About Us
    • Team
    • News
    • Contact
  • Careers
  • Blog

Databricks Inc.
160 Spear Street, 13th Floor
San Francisco, CA 94105

info@databricks.com
1-866-330-0121

  • Twitter
  • LinkedIn
  • Facebook
  • Facebook

© Databricks 2015. All rights reserved. Apache Spark and the Apache Spark Logo are trademarks of the Apache Software Foundation.

  • Anonymous
  • Sign in
  • Create
  • Ask a question
  • Create an article
  • Explore
  • Topics
  • Questions
  • Articles
  • Users
  • Badges