I have a Dataframe and it has been imported from mysql
dataframe_mysql.show() +----+---------+---------------------------------------------------+ | id|accountid| xmldata| +----+---------+---------------------------------------------------+ |1001| 12346|<AccountSetup xmlns:xsi="test"><Customers test="...| |1002| 12346|<AccountSetup xmlns:xsi="test"><Customers test="...| |1003| 12346|<AccountSetup xmlns:xsi="test"><Customers test="...| |1004| 12347|<AccountSetup xmlns:xsi="test"><Customers test="...| +----+---------+---------------------------------------------------+
In the xmldata column there is xml tags inside, I need to parse it in a structured data in a seperate dataframe.
Previously I had the xml file alone in a text file, and loaded in a spark dataframe using "com.databricks.spark.xml"
spark-shell --packages com.databricks:spark-xml_2.10:0.4.1,com.databricks:spark-csv_2.10:1.5.0 val sqlContext =new org.apache.spark.sql.SQLContext(sc) val df = sqlContext.read.format("com.databricks.spark.xml").option("rowTag","Account").load("../Account.xml")
the final output I got as structured one
df.show() +----------+--------------------+--------------------+-------------+.... |AcctNbr |AddlParties |Addresses |ApplicationIn|.... -----------+--------------------+--------------------+-------------+.... |AAAAAAAAAA|[[Securzxcdd cxcs...|[WrappedArray([D,...| T|.... +----------+--------------------+--------------------+-------------+....Please advice how to achieve the this when I have the xml content inside a dataframe. I am new to spark and scala.
Databricks Inc.
160 Spear Street, 13th Floor
San Francisco, CA 94105
info@databricks.com
1-866-330-0121