• Create
    • Ask a question
    • Create an article
    • Topics
    • Questions
    • Articles
    • Users
    • Badges
  • Sign in
  • Home /
avatar image
1

Apply a logic for a particular column in dataframe in spark

sparkspark sqldataframespark-sqlscala spark
Question by rakiuday · Sep 14, 2018 at 11:42 AM ·

I have a Dataframe and it has been imported from mysql

dataframe_mysql.show()
+----+---------+---------------------------------------------------+
|  id|accountid|                                            xmldata|
+----+---------+---------------------------------------------------+
|1001|    12346|<AccountSetup xmlns:xsi="test"><Customers test="...|
|1002|    12346|<AccountSetup xmlns:xsi="test"><Customers test="...|
|1003|    12346|<AccountSetup xmlns:xsi="test"><Customers test="...|
|1004|    12347|<AccountSetup xmlns:xsi="test"><Customers test="...|
+----+---------+---------------------------------------------------+

In the xmldata column there is xml tags inside, I need to parse it in a structured data in a seperate dataframe.

Previously I had the xml file alone in a text file, and loaded in a spark dataframe using "com.databricks.spark.xml"

spark-shell --packages com.databricks:spark-xml_2.10:0.4.1,com.databricks:spark-csv_2.10:1.5.0
val sqlContext =new org.apache.spark.sql.SQLContext(sc)
val df = sqlContext.read.format("com.databricks.spark.xml").option("rowTag","Account").load("../Account.xml")

the final output I got as structured one

df.show() +----------+--------------------+--------------------+-------------+.... |AcctNbr |AddlParties |Addresses |ApplicationIn|.... -----------+--------------------+--------------------+-------------+.... |AAAAAAAAAA|[[Securzxcdd cxcs...|[WrappedArray([D,...| T|.... +----------+--------------------+--------------------+-------------+....

Please advice how to achieve the this when I have the xml content inside a dataframe. I am new to spark and scala.

Add comment
Comment
10 |600 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users

Sort

  • Votes
  • Created
  • Oldest

Your answer

Hint: You can notify a user about this post by typing @username

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this Question

39 People are following this question.

avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image

Related Questions

load data from mongoDB to Spark 0 Answers

Expand a single row with a start and end date into multiple rows, one for each day 4 Answers

org.apache.spark.SparkException: Task not serializable : Case class serialization issue may be? 1 Answer

How to Build custom column function/expression 1 Answer

Writing SQL vs using Dataframe APIs in Spark SQL 0 Answers

  • Product
    • Databricks Cloud
    • FAQ
  • Spark
    • About Spark
    • Developer Resources
    • Community + Events
  • Services
    • Certification
    • Spark Support
    • Spark Training
  • Company
    • About Us
    • Team
    • News
    • Contact
  • Careers
  • Blog

Databricks Inc.
160 Spear Street, 13th Floor
San Francisco, CA 94105

info@databricks.com
1-866-330-0121

  • Twitter
  • LinkedIn
  • Facebook
  • Facebook

© Databricks 2015. All rights reserved. Apache Spark and the Apache Spark Logo are trademarks of the Apache Software Foundation.

  • Anonymous
  • Sign in
  • Create
  • Ask a question
  • Create an article
  • Explore
  • Topics
  • Questions
  • Articles
  • Users
  • Badges