• Create
    • Ask a question
    • Create an article
    • Topics
    • Questions
    • Articles
    • Users
    • Badges
  • Sign in
  • Home /
avatar image
0

Designing ETL in spark

scalatablescachingdesign pattern
Question by bigdata82 · Mar 14, 2016 at 04:42 PM ·

Hi I have following ETL components written in java. During the startup the ETL loads several look up tables (25 tables) as hashmaps into Redis(wheres keys are descriptions in lookup and values are IDs). After the initial loads, it fetches the logs one by one. The logs are in the form of json. where each field in json is related to one lookup table. Eg: one field is Country_name. In the later stage I am searching for the ID corresponding to that country in the redis cache created during initial load. This is done for all the fields in the json object for each log received to determine the IDs corresponding to each field before we store the java object into database. What is the best approach to model this in spark? Is it possible to load each lookup table into some efficient data structure in spark using which i can search for the IDs corresponding to each value as efficiently as it is currently using Redis?

Currently in java, we create a model for each table using hibernate. What is the corresponding best practice in scala?

Add comment
Comment
10 |600 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users

Sort

  • Votes
  • Created
  • Oldest

Your answer

Hint: You can notify a user about this post by typing @username

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this Question

8 People are following this question.

avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image

Related Questions

Where is my table? 1 Answer

How do I create a Spark SQL table with columns greater than 22 columns (Scala 2.10 limit on case class parameters)? 1 Answer

Scala Console Canvas 0 Answers

Need Some resources to study about Spark streaming 1 Answer

java.lang.ClassCastException: java.sql.Date cannot be cast to java.lang.String 3 Answers

  • Product
    • Databricks Cloud
    • FAQ
  • Spark
    • About Spark
    • Developer Resources
    • Community + Events
  • Services
    • Certification
    • Spark Support
    • Spark Training
  • Company
    • About Us
    • Team
    • News
    • Contact
  • Careers
  • Blog

Databricks Inc.
160 Spear Street, 13th Floor
San Francisco, CA 94105

info@databricks.com
1-866-330-0121

  • Twitter
  • LinkedIn
  • Facebook
  • Facebook

© Databricks 2015. All rights reserved. Apache Spark and the Apache Spark Logo are trademarks of the Apache Software Foundation.

  • Anonymous
  • Sign in
  • Create
  • Ask a question
  • Create an article
  • Explore
  • Topics
  • Questions
  • Articles
  • Users
  • Badges