• Create
    • Ask a question
    • Create an article
    • Topics
    • Questions
    • Articles
    • Users
    • Badges
  • Sign in
  • Home /
avatar image
0

sqlContext.sql hangs for very simple queries

bug
Question by liuz · May 12, 2015 at 11:20 PM ·

using spark 1.3.1

connect via jdbc to remote postgresql server

SPARK_CLASSPATH=postgresql-9.4-1201.jdbc4.jar bin/spark-shell

....

it took over 100s to return 2 rows with 2 int -

scala> sqlContext.sql("SELECT person_id FROM xxx limit 2").collect 15/05/12 18:09:01 INFO ParseDriver: Parsing command: SELECT person_id FROM xxx limit 2 15/05/12 18:09:01 INFO ParseDriver: Parse Completed 15/05/12 18:09:01 INFO SparkContext: Starting job: runJob at SparkPlan.scala:122 15/05/12 18:09:01 INFO DAGScheduler: Got job 1 (runJob at SparkPlan.scala:122) with 1 output partitions (allowLocal=false) 15/05/12 18:09:01 INFO DAGScheduler: Final stage: Stage 1(runJob at SparkPlan.scala:122) 15/05/12 18:09:01 INFO DAGScheduler: Parents of final stage: List() 15/05/12 18:09:01 INFO DAGScheduler: Missing parents: List() 15/05/12 18:09:01 INFO DAGScheduler: Submitting Stage 1 (MapPartitionsRDD[3] at map at SparkPlan.scala:97), which has no missing parents 15/05/12 18:09:01 INFO MemoryStore: ensureFreeSpace(4096) called with curMem=6688, maxMem=277842493 15/05/12 18:09:01 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.0 KB, free 265.0 MB) 15/05/12 18:09:01 INFO MemoryStore: ensureFreeSpace(2591) called with curMem=10784, maxMem=277842493 15/05/12 18:09:01 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.5 KB, free 265.0 MB) 15/05/12 18:09:01 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:50405 (size: 2.5 KB, free: 265.0 MB) 15/05/12 18:09:01 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0 15/05/12 18:09:01 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:839 15/05/12 18:09:01 INFO DAGScheduler: Submitting 1 missing tasks from Stage 1 (MapPartitionsRDD[3] at map at SparkPlan.scala:97) 15/05/12 18:09:01 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks 15/05/12 18:09:01 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, PROCESS_LOCAL, 1062 bytes) 15/05/12 18:09:01 INFO Executor: Running task 0.0 in stage 1.0 (TID 1) 15/05/12 18:09:24 INFO BlockManager: Removing broadcast 0 15/05/12 18:09:24 INFO BlockManager: Removing block broadcast_0 15/05/12 18:09:24 INFO MemoryStore: Block broadcast_0 of size 4096 dropped from memory (free 277833214) 15/05/12 18:09:24 INFO BlockManager: Removing block broadcast_0_piece0 15/05/12 18:09:24 INFO MemoryStore: Block broadcast_0_piece0 of size 2592 dropped from memory (free 277835806) 15/05/12 18:09:24 INFO BlockManagerInfo: Removed broadcast_0_piece0 on localhost:50405 in memory (size: 2.5 KB, free: 265.0 MB) 15/05/12 18:09:24 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0 15/05/12 18:09:24 INFO ContextCleaner: Cleaned broadcast 0 15/05/12 18:10:43 INFO JDBCRDD: closed connection 15/05/12 18:10:43 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 874 bytes result sent to driver 15/05/12 18:10:43 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 101230 ms on localhost (1/1) 15/05/12 18:10:43 INFO DAGScheduler: Stage 1 (runJob at SparkPlan.scala:122) finished in 101.232 s 15/05/12 18:10:43 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 15/05/12 18:10:43 INFO DAGScheduler: Job 1 finished: runJob at SparkPlan.scala:122, took 101.244593 s res2: Array[org.apache.spark.sql.Row] = Array([1719973], [1719976])

Add comment
Comment
10 |600 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users

1 Answer

Sort

  • Votes
  • Created
  • Oldest
avatar image
0

Answer by vida · Jun 18, 2015 at 10:30 PM

Where is your postgres server? Is it on the same AWS region as your Spark cluster? If you were to issue the same query on your SQL database using a plain JDBC driver, how long does that take? Knowing that will help determine if the cause of slowness is an issue about your SQL server connection to the Spark cluster or something about the way Spark queries your database.

Comment
Add comment · Share
10 |600 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users

Your answer

Hint: You can notify a user about this post by typing @username

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this Question

3 People are following this question.

avatar image avatar image avatar image

Related Questions

Cannot remove table name if df1.saveAsTable("tb1") failed 4 Answers

Malformed records are not working as expected in spark using scala 0 Answers

Databricks 3rd party integration and Tableau 9.2 crashes 4 Answers

Bug: Typo in modal 1 Answer

Adding maven library for a scala notebook job fails 2 Answers

  • Product
    • Databricks Cloud
    • FAQ
  • Spark
    • About Spark
    • Developer Resources
    • Community + Events
  • Services
    • Certification
    • Spark Support
    • Spark Training
  • Company
    • About Us
    • Team
    • News
    • Contact
  • Careers
  • Blog

Databricks Inc.
160 Spear Street, 13th Floor
San Francisco, CA 94105

info@databricks.com
1-866-330-0121

  • Twitter
  • LinkedIn
  • Facebook
  • Facebook

© Databricks 2015. All rights reserved. Apache Spark and the Apache Spark Logo are trademarks of the Apache Software Foundation.

  • Anonymous
  • Sign in
  • Create
  • Ask a question
  • Create an article
  • Explore
  • Topics
  • Questions
  • Articles
  • Users
  • Badges