Hi I have following ETL components written in java. During the startup the ETL loads several look up tables (25 tables) as hashmaps into Redis(wheres keys are descriptions in lookup and values are IDs). After the initial loads, it fetches the logs one by one. The logs are in the form of json. where each field in json is related to one lookup table. Eg: one field is Country_name. In the later stage I am searching for the ID corresponding to that country in the redis cache created during initial load. This is done for all the fields in the json object for each log received to determine the IDs corresponding to each field before we store the java object into database. What is the best approach to model this in spark? Is it possible to load each lookup table into some efficient data structure in spark using which i can search for the IDs corresponding to each value as efficiently as it is currently using Redis?
Currently in java, we create a model for each table using hibernate. What is the corresponding best practice in scala?
Databricks Inc.
160 Spear Street, 13th Floor
San Francisco, CA 94105
info@databricks.com
1-866-330-0121