Wednesday, October 12, 2016

Query Cassandra database using spark-sql shell from Apache Spark

[donghua@vmxdb01 ~]$ $SPARK_HOME/bin/spark-sql --packages datastax:spark-cassandra-connector:2.0.0-M2-s_2.11 --conf spark.cassandra.connection.host=127.0.0.1
Ivy Default Cache set to: /home/donghua/.ivy2/cache
The jars for the packages stored in: /home/donghua/.ivy2/jars
:: loading settings :: url = jar:file:/home/donghua/spark-2.0.1-bin-hadoop2.7/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
datastax#spark-cassandra-connector added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
    confs: [default]
    found datastax#spark-cassandra-connector;2.0.0-M2-s_2.11 in spark-packages
    found commons-beanutils#commons-beanutils;1.8.0 in central
    found org.joda#joda-convert;1.2 in central
    found joda-time#joda-time;2.3 in central
    found io.netty#netty-all;4.0.33.Final in central
    found com.twitter#jsr166e;1.1.0 in central
    found org.scala-lang#scala-reflect;2.11.8 in central
:: resolution report :: resolve 869ms :: artifacts dl 15ms
    :: modules in use:
    com.twitter#jsr166e;1.1.0 from central in [default]
    commons-beanutils#commons-beanutils;1.8.0 from central in [default]
    datastax#spark-cassandra-connector;2.0.0-M2-s_2.11 from spark-packages in [default]
    io.netty#netty-all;4.0.33.Final from central in [default]
    joda-time#joda-time;2.3 from central in [default]
    org.joda#joda-convert;1.2 from central in [default]
    org.scala-lang#scala-reflect;2.11.8 from central in [default]
    ---------------------------------------------------------------------
    |                  |            modules            ||   artifacts   |
    |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
    ---------------------------------------------------------------------
    |      default     |   7   |   0   |   0   |   0   ||   7   |   0   |
    ---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent
    confs: [default]
    0 artifacts copied, 7 already retrieved (0kB/8ms)
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/10/12 08:15:01 INFO SparkContext: Running Spark version 2.0.1
16/10/12 08:15:02 INFO SecurityManager: Changing view acls to: donghua
16/10/12 08:15:02 INFO SecurityManager: Changing modify acls to: donghua
16/10/12 08:15:02 INFO SecurityManager: Changing view acls groups to:
16/10/12 08:15:02 INFO SecurityManager: Changing modify acls groups to:
16/10/12 08:15:02 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(donghua); groups with view permissions: Set(); users  with modify permissions: Set(donghua); groups with modify permissions: Set()
16/10/12 08:15:02 INFO Utils: Successfully started service 'sparkDriver' on port 60027.
16/10/12 08:15:02 INFO SparkEnv: Registering MapOutputTracker
16/10/12 08:15:02 INFO SparkEnv: Registering BlockManagerMaster
16/10/12 08:15:02 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-20513a2d-c965-416d-81ae-9668df59d304
16/10/12 08:15:02 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
16/10/12 08:15:02 INFO SparkEnv: Registering OutputCommitCoordinator
16/10/12 08:15:02 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/10/12 08:15:02 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.6.151:4040
16/10/12 08:15:03 INFO SparkContext: Added JAR file:/home/donghua/.ivy2/jars/datastax_spark-cassandra-connector-2.0.0-M2-s_2.11.jar at spark://192.168.6.151:60027/jars/datastax_spark-cassandra-connector-2.0.0-M2-s_2.11.jar with timestamp 1476274503002
16/10/12 08:15:03 INFO SparkContext: Added JAR file:/home/donghua/.ivy2/jars/commons-beanutils_commons-beanutils-1.8.0.jar at spark://192.168.6.151:60027/jars/commons-beanutils_commons-beanutils-1.8.0.jar with timestamp 1476274503003
16/10/12 08:15:03 INFO SparkContext: Added JAR file:/home/donghua/.ivy2/jars/org.joda_joda-convert-1.2.jar at spark://192.168.6.151:60027/jars/org.joda_joda-convert-1.2.jar with timestamp 1476274503003
16/10/12 08:15:03 INFO SparkContext: Added JAR file:/home/donghua/.ivy2/jars/joda-time_joda-time-2.3.jar at spark://192.168.6.151:60027/jars/joda-time_joda-time-2.3.jar with timestamp 1476274503003
16/10/12 08:15:03 INFO SparkContext: Added JAR file:/home/donghua/.ivy2/jars/io.netty_netty-all-4.0.33.Final.jar at spark://192.168.6.151:60027/jars/io.netty_netty-all-4.0.33.Final.jar with timestamp 1476274503003
16/10/12 08:15:03 INFO SparkContext: Added JAR file:/home/donghua/.ivy2/jars/com.twitter_jsr166e-1.1.0.jar at spark://192.168.6.151:60027/jars/com.twitter_jsr166e-1.1.0.jar with timestamp 1476274503003
16/10/12 08:15:03 INFO SparkContext: Added JAR file:/home/donghua/.ivy2/jars/org.scala-lang_scala-reflect-2.11.8.jar at spark://192.168.6.151:60027/jars/org.scala-lang_scala-reflect-2.11.8.jar with timestamp 1476274503003
16/10/12 08:15:03 INFO Executor: Starting executor ID driver on host localhost
16/10/12 08:15:03 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 44882.
16/10/12 08:15:03 INFO NettyBlockTransferService: Server created on 192.168.6.151:44882
16/10/12 08:15:03 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.6.151, 44882)
16/10/12 08:15:03 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.6.151:44882 with 366.3 MB RAM, BlockManagerId(driver, 192.168.6.151, 44882)
16/10/12 08:15:03 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.6.151, 44882)
16/10/12 08:15:03 WARN SparkContext: Use an existing SparkContext, some configuration may not take effect.
16/10/12 08:15:03 INFO HiveSharedState: Warehouse path is '/home/donghua/spark-warehouse'.
16/10/12 08:15:03 INFO HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
16/10/12 08:15:04 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
16/10/12 08:15:04 INFO ObjectStore: ObjectStore, initialize called
16/10/12 08:15:04 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
16/10/12 08:15:04 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
16/10/12 08:15:05 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
16/10/12 08:15:06 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/10/12 08:15:06 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/10/12 08:15:06 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/10/12 08:15:06 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/10/12 08:15:07 INFO Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing
16/10/12 08:15:07 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
16/10/12 08:15:07 INFO ObjectStore: Initialized ObjectStore
16/10/12 08:15:07 INFO HiveMetaStore: Added admin role in metastore
16/10/12 08:15:07 INFO HiveMetaStore: Added public role in metastore
16/10/12 08:15:07 INFO HiveMetaStore: No user is added in admin role, since config is empty
16/10/12 08:15:07 INFO HiveMetaStore: 0: get_all_databases
16/10/12 08:15:07 INFO audit: ugi=donghua    ip=unknown-ip-addr    cmd=get_all_databases   
16/10/12 08:15:07 INFO HiveMetaStore: 0: get_functions: db=default pat=*
16/10/12 08:15:07 INFO audit: ugi=donghua    ip=unknown-ip-addr    cmd=get_functions: db=default pat=*   
16/10/12 08:15:07 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
16/10/12 08:15:07 INFO SessionState: Created local directory: /tmp/c37fe7b1-45a3-4ce1-b115-ea341dfe013b_resources
16/10/12 08:15:07 INFO SessionState: Created HDFS directory: /tmp/hive/donghua/c37fe7b1-45a3-4ce1-b115-ea341dfe013b
16/10/12 08:15:07 INFO SessionState: Created local directory: /tmp/donghua/c37fe7b1-45a3-4ce1-b115-ea341dfe013b
16/10/12 08:15:07 INFO SessionState: Created HDFS directory: /tmp/hive/donghua/c37fe7b1-45a3-4ce1-b115-ea341dfe013b/_tmp_space.db
16/10/12 08:15:07 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is /home/donghua/spark-warehouse
16/10/12 08:15:07 INFO SessionState: Created local directory: /tmp/228512c3-917e-4070-b28d-3de3f688b746_resources
16/10/12 08:15:08 INFO SessionState: Created HDFS directory: /tmp/hive/donghua/228512c3-917e-4070-b28d-3de3f688b746
16/10/12 08:15:08 INFO SessionState: Created local directory: /tmp/donghua/228512c3-917e-4070-b28d-3de3f688b746
16/10/12 08:15:08 INFO SessionState: Created HDFS directory: /tmp/hive/donghua/228512c3-917e-4070-b28d-3de3f688b746/_tmp_space.db
16/10/12 08:15:08 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is /home/donghua/spark-warehouse

spark-sql> select * from mykeyspace.kv;
16/10/12 08:15:12 INFO SparkSqlParser: Parsing command: select * from mykeyspace.kv
16/10/12 08:15:13 INFO HiveMetaStore: 0: create_database: Database(name:default, description:default database, locationUri:file:/home/donghua/spark-warehouse, parameters:{})
16/10/12 08:15:13 INFO audit: ugi=donghua    ip=unknown-ip-addr    cmd=create_database: Database(name:default, description:default database, locationUri:file:/home/donghua/spark-warehouse, parameters:{})   
16/10/12 08:15:13 INFO HiveMetaStore: 0: get_database: mykeyspace
16/10/12 08:15:13 INFO audit: ugi=donghua    ip=unknown-ip-addr    cmd=get_database: mykeyspace   
16/10/12 08:15:13 WARN ObjectStore: Failed to get database mykeyspace, returning NoSuchObjectException
Error in query: Table or view not found: `mykeyspace`.`kv`; line 1 pos 14

spark-sql> create TEMPORARY TABLE kv USING org.apache.spark.sql.cassandra OPTIONS (table "kv",keyspace "mykeyspace", cluster "Test Cluster",pushdown "true");
16/10/12 08:15:19 INFO SparkSqlParser: Parsing command: create TEMPORARY TABLE kv USING org.apache.spark.sql.cassandra OPTIONS (table "kv",keyspace "mykeyspace", cluster "Test Cluster",pushdown "true")
16/10/12 08:15:20 WARN SparkStrategies$DDLStrategy: CREATE TEMPORARY TABLE kv USING... is deprecated, please use CREATE TEMPORARY VIEW viewName USING... instead
16/10/12 08:15:21 INFO NettyUtil: Found Netty's native epoll transport in the classpath, using it
16/10/12 08:15:22 INFO Cluster: New Cassandra host /127.0.0.1:9042 added
16/10/12 08:15:22 INFO CassandraConnector: Connected to Cassandra cluster: Test Cluster
16/10/12 08:15:22 INFO SparkContext: Starting job: processCmd at CliDriver.java:376
16/10/12 08:15:22 INFO DAGScheduler: Got job 0 (processCmd at CliDriver.java:376) with 1 output partitions
16/10/12 08:15:22 INFO DAGScheduler: Final stage: ResultStage 0 (processCmd at CliDriver.java:376)
16/10/12 08:15:22 INFO DAGScheduler: Parents of final stage: List()
16/10/12 08:15:22 INFO DAGScheduler: Missing parents: List()
16/10/12 08:15:22 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[3] at processCmd at CliDriver.java:376), which has no missing parents
16/10/12 08:15:22 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 3.2 KB, free 366.3 MB)
16/10/12 08:15:23 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1964.0 B, free 366.3 MB)
16/10/12 08:15:23 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.6.151:44882 (size: 1964.0 B, free: 366.3 MB)
16/10/12 08:15:23 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1012
16/10/12 08:15:23 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[3] at processCmd at CliDriver.java:376)
16/10/12 08:15:23 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
16/10/12 08:15:23 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0, PROCESS_LOCAL, 6128 bytes)
16/10/12 08:15:23 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
16/10/12 08:15:23 INFO Executor: Fetching spark://192.168.6.151:60027/jars/com.twitter_jsr166e-1.1.0.jar with timestamp 1476274503003
16/10/12 08:15:23 INFO TransportClientFactory: Successfully created connection to /192.168.6.151:60027 after 62 ms (0 ms spent in bootstraps)
16/10/12 08:15:23 INFO Utils: Fetching spark://192.168.6.151:60027/jars/com.twitter_jsr166e-1.1.0.jar to /tmp/spark-39508304-8579-4e51-ae16-e1cf8a22c139/userFiles-b34a93b4-cffc-4ec3-bcdb-1ce2823ac122/fetchFileTemp1249905734998620423.tmp
16/10/12 08:15:23 INFO Executor: Adding file:/tmp/spark-39508304-8579-4e51-ae16-e1cf8a22c139/userFiles-b34a93b4-cffc-4ec3-bcdb-1ce2823ac122/com.twitter_jsr166e-1.1.0.jar to class loader
16/10/12 08:15:23 INFO Executor: Fetching spark://192.168.6.151:60027/jars/datastax_spark-cassandra-connector-2.0.0-M2-s_2.11.jar with timestamp 1476274503002
16/10/12 08:15:23 INFO Utils: Fetching spark://192.168.6.151:60027/jars/datastax_spark-cassandra-connector-2.0.0-M2-s_2.11.jar to /tmp/spark-39508304-8579-4e51-ae16-e1cf8a22c139/userFiles-b34a93b4-cffc-4ec3-bcdb-1ce2823ac122/fetchFileTemp1156901688704158624.tmp
16/10/12 08:15:24 INFO Executor: Adding file:/tmp/spark-39508304-8579-4e51-ae16-e1cf8a22c139/userFiles-b34a93b4-cffc-4ec3-bcdb-1ce2823ac122/datastax_spark-cassandra-connector-2.0.0-M2-s_2.11.jar to class loader
16/10/12 08:15:24 INFO Executor: Fetching spark://192.168.6.151:60027/jars/org.joda_joda-convert-1.2.jar with timestamp 1476274503003
16/10/12 08:15:24 INFO Utils: Fetching spark://192.168.6.151:60027/jars/org.joda_joda-convert-1.2.jar to /tmp/spark-39508304-8579-4e51-ae16-e1cf8a22c139/userFiles-b34a93b4-cffc-4ec3-bcdb-1ce2823ac122/fetchFileTemp3155581321752809377.tmp
16/10/12 08:15:24 INFO Executor: Adding file:/tmp/spark-39508304-8579-4e51-ae16-e1cf8a22c139/userFiles-b34a93b4-cffc-4ec3-bcdb-1ce2823ac122/org.joda_joda-convert-1.2.jar to class loader
16/10/12 08:15:24 INFO Executor: Fetching spark://192.168.6.151:60027/jars/commons-beanutils_commons-beanutils-1.8.0.jar with timestamp 1476274503003
16/10/12 08:15:24 INFO Utils: Fetching spark://192.168.6.151:60027/jars/commons-beanutils_commons-beanutils-1.8.0.jar to /tmp/spark-39508304-8579-4e51-ae16-e1cf8a22c139/userFiles-b34a93b4-cffc-4ec3-bcdb-1ce2823ac122/fetchFileTemp3073542072392987481.tmp
16/10/12 08:15:24 INFO Executor: Adding file:/tmp/spark-39508304-8579-4e51-ae16-e1cf8a22c139/userFiles-b34a93b4-cffc-4ec3-bcdb-1ce2823ac122/commons-beanutils_commons-beanutils-1.8.0.jar to class loader
16/10/12 08:15:24 INFO Executor: Fetching spark://192.168.6.151:60027/jars/org.scala-lang_scala-reflect-2.11.8.jar with timestamp 1476274503003
16/10/12 08:15:24 INFO Utils: Fetching spark://192.168.6.151:60027/jars/org.scala-lang_scala-reflect-2.11.8.jar to /tmp/spark-39508304-8579-4e51-ae16-e1cf8a22c139/userFiles-b34a93b4-cffc-4ec3-bcdb-1ce2823ac122/fetchFileTemp8208827726672004865.tmp
16/10/12 08:15:24 INFO Executor: Adding file:/tmp/spark-39508304-8579-4e51-ae16-e1cf8a22c139/userFiles-b34a93b4-cffc-4ec3-bcdb-1ce2823ac122/org.scala-lang_scala-reflect-2.11.8.jar to class loader
16/10/12 08:15:24 INFO Executor: Fetching spark://192.168.6.151:60027/jars/joda-time_joda-time-2.3.jar with timestamp 1476274503003
16/10/12 08:15:24 INFO Utils: Fetching spark://192.168.6.151:60027/jars/joda-time_joda-time-2.3.jar to /tmp/spark-39508304-8579-4e51-ae16-e1cf8a22c139/userFiles-b34a93b4-cffc-4ec3-bcdb-1ce2823ac122/fetchFileTemp3706194028917481017.tmp
16/10/12 08:15:24 INFO Executor: Adding file:/tmp/spark-39508304-8579-4e51-ae16-e1cf8a22c139/userFiles-b34a93b4-cffc-4ec3-bcdb-1ce2823ac122/joda-time_joda-time-2.3.jar to class loader
16/10/12 08:15:24 INFO Executor: Fetching spark://192.168.6.151:60027/jars/io.netty_netty-all-4.0.33.Final.jar with timestamp 1476274503003
16/10/12 08:15:24 INFO Utils: Fetching spark://192.168.6.151:60027/jars/io.netty_netty-all-4.0.33.Final.jar to /tmp/spark-39508304-8579-4e51-ae16-e1cf8a22c139/userFiles-b34a93b4-cffc-4ec3-bcdb-1ce2823ac122/fetchFileTemp5973767129005243936.tmp
16/10/12 08:15:24 INFO Executor: Adding file:/tmp/spark-39508304-8579-4e51-ae16-e1cf8a22c139/userFiles-b34a93b4-cffc-4ec3-bcdb-1ce2823ac122/io.netty_netty-all-4.0.33.Final.jar to class loader
16/10/12 08:15:24 INFO CodeGenerator: Code generated in 270.608582 ms
16/10/12 08:15:24 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1064 bytes result sent to driver
16/10/12 08:15:24 INFO DAGScheduler: ResultStage 0 (processCmd at CliDriver.java:376) finished in 1.701 s
16/10/12 08:15:24 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 1664 ms on localhost (1/1)
16/10/12 08:15:24 INFO DAGScheduler: Job 0 finished: processCmd at CliDriver.java:376, took 2.311670 s
16/10/12 08:15:24 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
Time taken: 5.907 seconds
16/10/12 08:15:24 INFO CliDriver: Time taken: 5.907 seconds
16/10/12 08:15:29 INFO CassandraConnector: Disconnected from Cassandra cluster: Test Cluster

spark-sql> select * from kv;
16/10/12 08:15:43 INFO SparkSqlParser: Parsing command: select * from kv
16/10/12 08:15:43 INFO CassandraSourceRelation: Input Predicates: []
16/10/12 08:15:43 INFO CassandraSourceRelation: Input Predicates: []
16/10/12 08:15:43 INFO CodeGenerator: Code generated in 57.196131 ms
16/10/12 08:15:43 INFO Cluster: New Cassandra host /127.0.0.1:9042 added
16/10/12 08:15:43 INFO CassandraConnector: Connected to Cassandra cluster: Test Cluster
16/10/12 08:15:44 INFO SparkContext: Starting job: processCmd at CliDriver.java:376
16/10/12 08:15:44 INFO DAGScheduler: Got job 1 (processCmd at CliDriver.java:376) with 6 output partitions
16/10/12 08:15:44 INFO DAGScheduler: Final stage: ResultStage 1 (processCmd at CliDriver.java:376)
16/10/12 08:15:44 INFO DAGScheduler: Parents of final stage: List()
16/10/12 08:15:44 INFO DAGScheduler: Missing parents: List()
16/10/12 08:15:44 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[7] at processCmd at CliDriver.java:376), which has no missing parents
16/10/12 08:15:44 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 11.6 KB, free 366.3 MB)
16/10/12 08:15:44 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 6.0 KB, free 366.3 MB)
16/10/12 08:15:44 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.6.151:44882 (size: 6.0 KB, free: 366.3 MB)
16/10/12 08:15:44 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1012
16/10/12 08:15:44 INFO DAGScheduler: Submitting 6 missing tasks from ResultStage 1 (MapPartitionsRDD[7] at processCmd at CliDriver.java:376)
16/10/12 08:15:44 INFO TaskSchedulerImpl: Adding task set 1.0 with 6 tasks
16/10/12 08:15:44 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, partition 0, NODE_LOCAL, 13410 bytes)
16/10/12 08:15:44 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 2, localhost, partition 1, NODE_LOCAL, 14240 bytes)
16/10/12 08:15:44 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
16/10/12 08:15:44 INFO Executor: Running task 1.0 in stage 1.0 (TID 2)
16/10/12 08:15:45 INFO Executor: Finished task 1.0 in stage 1.0 (TID 2). 1071 bytes result sent to driver
16/10/12 08:15:45 INFO TaskSetManager: Starting task 2.0 in stage 1.0 (TID 3, localhost, partition 2, NODE_LOCAL, 13887 bytes)
16/10/12 08:15:45 INFO Executor: Running task 2.0 in stage 1.0 (TID 3)
16/10/12 08:15:45 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 2) in 610 ms on localhost (1/6)
16/10/12 08:15:45 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 1071 bytes result sent to driver
16/10/12 08:15:45 INFO TaskSetManager: Starting task 3.0 in stage 1.0 (TID 4, localhost, partition 3, NODE_LOCAL, 13055 bytes)
16/10/12 08:15:45 INFO Executor: Running task 3.0 in stage 1.0 (TID 4)
16/10/12 08:15:45 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 1373 ms on localhost (2/6)
16/10/12 08:15:45 INFO Executor: Finished task 2.0 in stage 1.0 (TID 3). 1333 bytes result sent to driver
16/10/12 08:15:46 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 192.168.6.151:44882 in memory (size: 1964.0 B, free: 366.3 MB)
16/10/12 08:15:46 INFO TaskSetManager: Starting task 4.0 in stage 1.0 (TID 5, localhost, partition 4, NODE_LOCAL, 13170 bytes)
16/10/12 08:15:46 INFO Executor: Running task 4.0 in stage 1.0 (TID 5)
16/10/12 08:15:46 INFO TaskSetManager: Finished task 2.0 in stage 1.0 (TID 3) in 846 ms on localhost (3/6)
16/10/12 08:15:46 INFO Executor: Finished task 3.0 in stage 1.0 (TID 4). 1362 bytes result sent to driver
16/10/12 08:15:46 INFO TaskSetManager: Starting task 5.0 in stage 1.0 (TID 6, localhost, partition 5, NODE_LOCAL, 8066 bytes)
16/10/12 08:15:46 INFO Executor: Running task 5.0 in stage 1.0 (TID 6)
16/10/12 08:15:46 INFO TaskSetManager: Finished task 3.0 in stage 1.0 (TID 4) in 658 ms on localhost (4/6)
16/10/12 08:15:46 INFO Executor: Finished task 4.0 in stage 1.0 (TID 5). 1071 bytes result sent to driver
16/10/12 08:15:46 INFO TaskSetManager: Finished task 4.0 in stage 1.0 (TID 5) in 476 ms on localhost (5/6)
16/10/12 08:15:46 INFO Executor: Finished task 5.0 in stage 1.0 (TID 6). 1071 bytes result sent to driver
16/10/12 08:15:46 INFO TaskSetManager: Finished task 5.0 in stage 1.0 (TID 6) in 174 ms on localhost (6/6)
16/10/12 08:15:46 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
16/10/12 08:15:46 INFO DAGScheduler: ResultStage 1 (processCmd at CliDriver.java:376) finished in 1.949 s
16/10/12 08:15:46 INFO DAGScheduler: Job 1 finished: processCmd at CliDriver.java:376, took 1.994766 s
key1    1
key4    4
key3    3
key2    2

Time taken: 3.044 seconds, Fetched 4 row(s)
16/10/12 08:15:46 INFO CliDriver: Time taken: 3.044 seconds, Fetched 4 row(s)
16/10/12 08:15:53 INFO CassandraConnector: Disconnected from Cassandra cluster: Test Cluster

spark-sql> desc kv;
16/10/12 08:16:24 INFO SparkSqlParser: Parsing command: desc kv

16/10/12 08:16:24 INFO SparkContext: Starting job: processCmd at CliDriver.java:376
16/10/12 08:16:24 INFO DAGScheduler: Got job 2 (processCmd at CliDriver.java:376) with 1 output partitions
16/10/12 08:16:24 INFO DAGScheduler: Final stage: ResultStage 2 (processCmd at CliDriver.java:376)
16/10/12 08:16:24 INFO DAGScheduler: Parents of final stage: List()
16/10/12 08:16:24 INFO DAGScheduler: Missing parents: List()
16/10/12 08:16:24 INFO DAGScheduler: Submitting ResultStage 2 (MapPartitionsRDD[10] at processCmd at CliDriver.java:376), which has no missing parents
16/10/12 08:16:24 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 4.2 KB, free 366.3 MB)
16/10/12 08:16:24 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2.5 KB, free 366.3 MB)
16/10/12 08:16:24 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.6.151:44882 (size: 2.5 KB, free: 366.3 MB)
16/10/12 08:16:24 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1012
16/10/12 08:16:24 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (MapPartitionsRDD[10] at processCmd at CliDriver.java:376)
16/10/12 08:16:24 INFO TaskSchedulerImpl: Adding task set 2.0 with 1 tasks
16/10/12 08:16:25 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 7, localhost, partition 0, PROCESS_LOCAL, 6167 bytes)
16/10/12 08:16:25 INFO Executor: Running task 0.0 in stage 2.0 (TID 7)
16/10/12 08:16:25 INFO CodeGenerator: Code generated in 16.186786 ms
16/10/12 08:16:25 INFO Executor: Finished task 0.0 in stage 2.0 (TID 7). 1051 bytes result sent to driver
16/10/12 08:16:25 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 7) in 69 ms on localhost (1/1)
16/10/12 08:16:25 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool
16/10/12 08:16:25 INFO DAGScheduler: ResultStage 2 (processCmd at CliDriver.java:376) finished in 0.070 s
16/10/12 08:16:25 INFO DAGScheduler: Job 2 finished: processCmd at CliDriver.java:376, took 0.083123 s
key    string    NULL
value    int    NULL
Time taken: 0.128 seconds, Fetched 2 row(s)
16/10/12 08:16:25 INFO CliDriver: Time taken: 0.128 seconds, Fetched 2 row(s)
spark-sql> select * from kv where value > 2;
16/10/12 08:16:34 INFO SparkSqlParser: Parsing command: select * from kv where value > 2
16/10/12 08:16:34 INFO CassandraSourceRelation: Input Predicates: [IsNotNull(value), GreaterThan(value,2)]
16/10/12 08:16:34 INFO CassandraSourceRelation: Input Predicates: [IsNotNull(value), GreaterThan(value,2)]
16/10/12 08:16:34 INFO CodeGenerator: Code generated in 14.815303 ms
16/10/12 08:16:34 INFO Cluster: New Cassandra host localhost/127.0.0.1:9042 added
16/10/12 08:16:34 INFO CassandraConnector: Connected to Cassandra cluster: Test Cluster
16/10/12 08:16:35 INFO SparkContext: Starting job: processCmd at CliDriver.java:376
16/10/12 08:16:35 INFO DAGScheduler: Got job 3 (processCmd at CliDriver.java:376) with 6 output partitions
16/10/12 08:16:35 INFO DAGScheduler: Final stage: ResultStage 3 (processCmd at CliDriver.java:376)
16/10/12 08:16:35 INFO DAGScheduler: Parents of final stage: List()
16/10/12 08:16:35 INFO DAGScheduler: Missing parents: List()
16/10/12 08:16:35 INFO DAGScheduler: Submitting ResultStage 3 (MapPartitionsRDD[14] at processCmd at CliDriver.java:376), which has no missing parents
16/10/12 08:16:35 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 12.4 KB, free 366.3 MB)
16/10/12 08:16:35 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 6.2 KB, free 366.3 MB)
16/10/12 08:16:35 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 192.168.6.151:44882 (size: 6.2 KB, free: 366.3 MB)
16/10/12 08:16:35 INFO SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1012
16/10/12 08:16:35 INFO DAGScheduler: Submitting 6 missing tasks from ResultStage 3 (MapPartitionsRDD[14] at processCmd at CliDriver.java:376)
16/10/12 08:16:35 INFO TaskSchedulerImpl: Adding task set 3.0 with 6 tasks
16/10/12 08:16:35 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID 8, localhost, partition 0, NODE_LOCAL, 13426 bytes)
16/10/12 08:16:35 INFO TaskSetManager: Starting task 1.0 in stage 3.0 (TID 9, localhost, partition 1, NODE_LOCAL, 14256 bytes)
16/10/12 08:16:35 INFO Executor: Running task 0.0 in stage 3.0 (TID 8)
16/10/12 08:16:35 INFO Executor: Running task 1.0 in stage 3.0 (TID 9)
16/10/12 08:16:35 INFO Executor: Finished task 0.0 in stage 3.0 (TID 8). 1133 bytes result sent to driver
16/10/12 08:16:35 INFO TaskSetManager: Starting task 2.0 in stage 3.0 (TID 10, localhost, partition 2, NODE_LOCAL, 13903 bytes)
16/10/12 08:16:35 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID 8) in 481 ms on localhost (1/6)
16/10/12 08:16:35 INFO Executor: Running task 2.0 in stage 3.0 (TID 10)
16/10/12 08:16:35 INFO Executor: Finished task 1.0 in stage 3.0 (TID 9). 1133 bytes result sent to driver
16/10/12 08:16:35 INFO TaskSetManager: Starting task 3.0 in stage 3.0 (TID 11, localhost, partition 3, NODE_LOCAL, 13071 bytes)
16/10/12 08:16:35 INFO TaskSetManager: Finished task 1.0 in stage 3.0 (TID 9) in 545 ms on localhost (2/6)
16/10/12 08:16:35 INFO Executor: Running task 3.0 in stage 3.0 (TID 11)
16/10/12 08:16:36 INFO Executor: Finished task 3.0 in stage 3.0 (TID 11). 1336 bytes result sent to driver
16/10/12 08:16:36 INFO TaskSetManager: Starting task 4.0 in stage 3.0 (TID 12, localhost, partition 4, NODE_LOCAL, 13186 bytes)
16/10/12 08:16:36 INFO TaskSetManager: Finished task 3.0 in stage 3.0 (TID 11) in 535 ms on localhost (3/6)
16/10/12 08:16:36 INFO Executor: Running task 4.0 in stage 3.0 (TID 12)
16/10/12 08:16:36 INFO Executor: Finished task 2.0 in stage 3.0 (TID 10). 1293 bytes result sent to driver
16/10/12 08:16:36 INFO TaskSetManager: Starting task 5.0 in stage 3.0 (TID 13, localhost, partition 5, NODE_LOCAL, 8082 bytes)
16/10/12 08:16:36 INFO Executor: Running task 5.0 in stage 3.0 (TID 13)
16/10/12 08:16:36 INFO TaskSetManager: Finished task 2.0 in stage 3.0 (TID 10) in 626 ms on localhost (4/6)
16/10/12 08:16:36 INFO Executor: Finished task 5.0 in stage 3.0 (TID 13). 1133 bytes result sent to driver
16/10/12 08:16:36 INFO TaskSetManager: Finished task 5.0 in stage 3.0 (TID 13) in 103 ms on localhost (5/6)
16/10/12 08:16:36 INFO Executor: Finished task 4.0 in stage 3.0 (TID 12). 1133 bytes result sent to driver
16/10/12 08:16:36 INFO TaskSetManager: Finished task 4.0 in stage 3.0 (TID 12) in 327 ms on localhost (6/6)
16/10/12 08:16:36 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool
16/10/12 08:16:36 INFO DAGScheduler: ResultStage 3 (processCmd at CliDriver.java:376) finished in 1.406 s
16/10/12 08:16:36 INFO DAGScheduler: Job 3 finished: processCmd at CliDriver.java:376, took 1.449255 s
key4    4
key3    3

Time taken: 1.958 seconds, Fetched 2 row(s)
16/10/12 08:16:36 INFO CliDriver: Time taken: 1.958 seconds, Fetched 2 row(s)
spark-sql> exit;
16/10/12 08:16:41 INFO SparkUI: Stopped Spark web UI at http://192.168.6.151:4040
16/10/12 08:16:41 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/10/12 08:16:41 INFO MemoryStore: MemoryStore cleared
16/10/12 08:16:41 INFO BlockManager: BlockManager stopped
16/10/12 08:16:41 INFO BlockManagerMaster: BlockManagerMaster stopped
16/10/12 08:16:41 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/10/12 08:16:41 INFO SparkContext: Successfully stopped SparkContext
16/10/12 08:16:41 INFO ShutdownHookManager: Shutdown hook called
16/10/12 08:16:41 INFO ShutdownHookManager: Deleting directory /tmp/spark-39508304-8579-4e51-ae16-e1cf8a22c139
16/10/12 08:16:41 INFO ShutdownHookManager: Deleting directory /tmp/spark-be3c499a-b678-43d1-8a0d-cdb7236428cd
16/10/12 08:16:43 INFO CassandraConnector: Disconnected from Cassandra cluster: Test Cluster
16/10/12 08:16:43 INFO SerialShutdownHooks: Successfully executed shutdown hook: Clearing session cache for C* connector



================== Turn off verbose output to make the output cleaner ==========

[donghua@vmxdb01 ~]$ diff spark-2.0.1-bin-hadoop2.7/conf/log4j.properties.template spark-2.0.1-bin-hadoop2.7/conf/log4j.properties
19c19
< log4j.rootCategory=INFO, console
---
> log4j.rootCategory=WARN, console


spark-sql> select * from kv where value > 2;
Error in query: Table or view not found: kv; line 1 pos 14
spark-sql> create TEMPORARY TABLE kv USING org.apache.spark.sql.cassandra OPTIONS (table "kv",keyspace "mykeyspace", cluster "Test Cluster",pushdown "true");
16/10/12 08:28:09 WARN SparkStrategies$DDLStrategy: CREATE TEMPORARY TABLE kv USING... is deprecated, please use CREATE TEMPORARY VIEW viewName USING... instead
Time taken: 4.008 seconds
spark-sql> select * from kv;
key1    1                                                                      
key4    4
key3    3
key2    2
Time taken: 2.253 seconds, Fetched 4 row(s)
spark-sql> select substring(key,1,3) from kv;
key                                                                            
key
key
key
Time taken: 1.328 seconds, Fetched 4 row(s)
spark-sql> select substring(key,1,3),count(*) from kv group by substring(key,1,3);
key     4                                                                      
Time taken: 3.518 seconds, Fetched 1 row(s)
spark-sql>

No comments:

Post a Comment