Monday, October 17, 2016

Example on datetimeformat in cqlsh copy command to transfer data in/out of Cassandra

donghua@cqlsh:mykeyspace> create table t3 (id int, d timestamp, primary key(id));
donghua@cqlsh:mykeyspace> desc t3;

CREATE TABLE mykeyspace.t3 (
    id int PRIMARY KEY,
    d timestamp
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';
donghua@cqlsh:mykeyspace> insert into t3 (id,d) values (1,totimestamp(now()));
donghua@cqlsh:mykeyspace> insert into t3 (id,d) values (2,'2016-10-17T10:00:00');
donghua@cqlsh:mykeyspace> insert into t3 (id,d) values (3,'2016-10-17T10:00:00+0800');
donghua@cqlsh:mykeyspace> insert into t3 (id,d) values (4,'2016-10-17T10:00:00+0000');
donghua@cqlsh:mykeyspace> select * from t3;

 id | d
----+---------------------------------
  1 | 2016-10-17 10:20:35.416000+0000
  2 | 2016-10-17 02:00:00.000000+0000
  4 | 2016-10-17 10:00:00.000000+0000
  3 | 2016-10-17 02:00:00.000000+0000

(4 rows)

donghua@cqlsh:mykeyspace> copy t3 TO '/tmp/t3-1.csv' with header=true;
Reading options from the command line: {'header': 'true'}
Using 1 child processes

Starting copy of mykeyspace.t3 with columns [id, d].
Processed: 4 rows; Rate:       5 rows/s; Avg. rate:       4 rows/s
4 rows exported to 1 files in 0.945 seconds.


ddonghua@cqlsh:mykeyspace> copy t3 TO '/tmp/t3-2.csv' with header=true and datetimeformat='%m/%d/%Y';
Reading options from the command line: {'datetimeformat': '%m/%d/%Y', 'header': 'true'}
Using 1 child processes

Starting copy of mykeyspace.t3 with columns [id, d].
Processed: 4 rows; Rate:       5 rows/s; Avg. rate:       4 rows/s
4 rows exported to 1 files in 1.070 seconds.


[donghua@localhost ~]$ cat /tmp/t3-1.csv
id,d
2,2016-10-17 02:00:00.000+0000
1,2016-10-17 10:20:35.416+0000
3,2016-10-17 02:00:00.000+0000
4,2016-10-17 10:00:00.000+0000
[donghua@localhost ~]$ cat /tmp/t3-2.csv
id,d
2,10/17/2016
1,10/17/2016
3,10/17/2016
4,10/17/2016


donghua@cqlsh:mykeyspace> truncate table t3;
donghua@cqlsh:mykeyspace> select * from t3;

 id | d
----+---

(0 rows)
donghua@cqlsh:mykeyspace> copy t3 from '/tmp/t3-1.csv' with header=true;
Reading options from the command line: {'header': 'true'}
Using 1 child processes

Starting copy of mykeyspace.t3 with columns [id, d].
Processed: 4 rows; Rate:       4 rows/s; Avg. rate:       7 rows/s
4 rows imported from 1 files in 0.548 seconds (0 skipped).
donghua@cqlsh:mykeyspace> select * from t3;

 id | d
----+---------------------------------
  1 | 2016-10-17 10:20:35.416000+0000
  2 | 2016-10-17 02:00:00.000000+0000
  4 | 2016-10-17 10:00:00.000000+0000
  3 | 2016-10-17 02:00:00.000000+0000

(4 rows)

donghua@cqlsh:mykeyspace> truncate table t3;
donghua@cqlsh:mykeyspace> copy t3 from '/tmp/t3-2.csv' with header=true and datetimeformat='%m/%d/%Y';
Reading options from the command line: {'datetimeformat': '%m/%d/%Y', 'header': 'true'}
Using 1 child processes

Starting copy of mykeyspace.t3 with columns [id, d].
Processed: 4 rows; Rate:       5 rows/s; Avg. rate:       7 rows/s
4 rows imported from 1 files in 0.544 seconds (0 skipped).
donghua@cqlsh:mykeyspace> select * from t3;

 id | d
----+---------------------------------
  1 | 2016-10-17 00:00:00.000000+0000
  2 | 2016-10-17 00:00:00.000000+0000
  4 | 2016-10-17 00:00:00.000000+0000
  3 | 2016-10-17 00:00:00.000000+0000

(4 rows)
donghua@cqlsh:mykeyspace>



Thursday, October 13, 2016

CassandraSQLContext not found after spark-cassandra-connector upgrade to v2.0

scala> val cass=new CassandraSQLContext(sc)
:33: error: not found: type CassandraSQLContext
       val cass=new CassandraSQLContext(sc)
                    ^

scala> import org.apache.spark.sql.cassandra.CassandraSQLContext
:32: error: object CassandraSQLContext is not a member of package org.apache.spark.sql.cassandra
       import org.apache.spark.sql.cassandra.CassandraSQLContext
              ^
==================
https://github.com/datastax/spark-cassandra-connector/blob/07b47effeb10480b4b80b2686d0e6874aefa0a24/CHANGES.txt


2.0.0 M1
* Added support for left outer joins with C* table (SPARKC-181)
* Removed CassandraSqlContext and underscore based options (SPARKC-399)
* Upgrade to Spark 2.0.0-preview (SPARKC-396)
- Removed Twitter demo because there is no spark-streaming-twitter package available anymore
- Removed Akka Actor demo becaues there is no support for such streams anymore
- Bring back Kafka project and make it compile
- Update several classes to use our Logging instead of Spark Logging because Spark Logging became private
- Update few components and tests to make them work with Spark 2.0.0
- Fix Spark SQL - temporarily
- Update plugins and Scala version



Intergrate Apache Spark with after enabling user authorization in Apache Casasndra

[donghua@localhost conf]$  $SPARK_HOME/bin/spark-sql --hiveconf spark.cassandra.connection.host=127.0.0.1 --hiveconf spark.cassandra.auth.username=donghua --hiveconf spark.cassandra.auth.password=new_pasword
16/10/13 21:44:09 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/10/13 21:44:23 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/10/13 21:44:23 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
16/10/13 21:44:25 WARN Utils: Your hostname, localhost.localdomain resolves to a loopback address: 127.0.0.1; using 10.0.2.15 instead (on interface enp0s3)
16/10/13 21:44:25 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/10/13 21:44:27 WARN SparkContext: Use an existing SparkContext, some configuration may not take effect.
spark-sql> create TEMPORARY TABLE t1 USING org.apache.spark.sql.cassandra OPTIONS (table "t1",keyspace "mykeyspace", cluster "Test Cluster",pushdown "true");
16/10/13 21:44:40 WARN SparkStrategies$DDLStrategy: CREATE TEMPORARY TABLE t1 USING... is deprecated, please use CREATE TEMPORARY VIEW viewName USING... instead
16/10/13 21:44:42 ERROR SparkSQLDriver: Failed in [create TEMPORARY TABLE t1 USING org.apache.spark.sql.cassandra OPTIONS (table "t1",keyspace "mykeyspace", cluster "Test Cluster",pushdown "true")]
com.datastax.driver.core.exceptions.UnauthorizedException: User donghua has no SELECT permission on or any of its parents

        at com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:59)
        at com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:25)
        at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
        at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245)
        at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:64)
        at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:47)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.datastax.spark.connector.cql.SessionProxy.invoke(SessionProxy.scala:33)
        at com.sun.proxy.$Proxy29.execute(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.datastax.spark.connector.cql.SessionProxy.invoke(SessionProxy.scala:33)
        at com.sun.proxy.$Proxy29.execute(Unknown Source)
        at com.datastax.spark.connector.rdd.partitioner.DataSizeEstimates$$anonfun$tokenRanges$1.apply(DataSizeEstimates.scala:40)
        at com.datastax.spark.connector.rdd.partitioner.DataSizeEstimates$$anonfun$tokenRanges$1.apply(DataSizeEstimates.scala:38)
        at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withSessionDo$1.apply(CassandraConnector.scala:111)
        at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withSessionDo$1.apply(CassandraConnector.scala:110)
        at com.datastax.spark.connector.cql.CassandraConnector.closeResourceAfterUse(CassandraConnector.scala:140)
        at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:110)
        at com.datastax.spark.connector.rdd.partitioner.DataSizeEstimates.tokenRanges$lzycompute(DataSizeEstimates.scala:38)
        at com.datastax.spark.connector.rdd.partitioner.DataSizeEstimates.tokenRanges(DataSizeEstimates.scala:37)
        at com.datastax.spark.connector.rdd.partitioner.DataSizeEstimates.totalDataSizeInBytes$lzycompute(DataSizeEstimates.scala:88)
        at com.datastax.spark.connector.rdd.partitioner.DataSizeEstimates.totalDataSizeInBytes(DataSizeEstimates.scala:87)
        at org.apache.spark.sql.cassandra.CassandraSourceRelation$.apply(CassandraSourceRelation.scala:260)
        at org.apache.spark.sql.cassandra.DefaultSource.createRelation(DefaultSource.scala:55)
        at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330)
        at org.apache.spark.sql.execution.datasources.CreateTempViewUsing.run(ddl.scala:87)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:60)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:58)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
        at org.apache.spark.sql.Dataset.(Dataset.scala:186)
        at org.apache.spark.sql.Dataset.(Dataset.scala:167)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:65)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582)
        at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:682)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:62)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:331)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:247)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: com.datastax.driver.core.exceptions.UnauthorizedException: User donghua has no SELECT permission on or any of its parents

        at com.datastax.driver.core.Responses$Error.asException(Responses.java:134)
        at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:179)
        at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:173)
        at com.datastax.driver.core.RequestHandler.access$2500(RequestHandler.java:43)
        at com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:788)
        at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:607)
        at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1012)
        at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:935)
        at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304)
        at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304)
        at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304)
        at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:276)
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:263)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
        at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:823)
        at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:339)
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:255)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
        at java.lang.Thread.run(Thread.java:745)
com.datastax.driver.core.exceptions.UnauthorizedException: User donghua has no SELECT permission on or any of its parents

        at com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:59)
        at com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:25)
        at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
        at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245)
        at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:64)
        at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:47)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.datastax.spark.connector.cql.SessionProxy.invoke(SessionProxy.scala:33)
        at com.sun.proxy.$Proxy29.execute(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.datastax.spark.connector.cql.SessionProxy.invoke(SessionProxy.scala:33)
        at com.sun.proxy.$Proxy29.execute(Unknown Source)
        at com.datastax.spark.connector.rdd.partitioner.DataSizeEstimates$$anonfun$tokenRanges$1.apply(DataSizeEstimates.scala:40)
        at com.datastax.spark.connector.rdd.partitioner.DataSizeEstimates$$anonfun$tokenRanges$1.apply(DataSizeEstimates.scala:38)
        at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withSessionDo$1.apply(CassandraConnector.scala:111)
        at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withSessionDo$1.apply(CassandraConnector.scala:110)
        at com.datastax.spark.connector.cql.CassandraConnector.closeResourceAfterUse(CassandraConnector.scala:140)
        at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:110)
        at com.datastax.spark.connector.rdd.partitioner.DataSizeEstimates.tokenRanges$lzycompute(DataSizeEstimates.scala:38)
        at com.datastax.spark.connector.rdd.partitioner.DataSizeEstimates.tokenRanges(DataSizeEstimates.scala:37)
        at com.datastax.spark.connector.rdd.partitioner.DataSizeEstimates.totalDataSizeInBytes$lzycompute(DataSizeEstimates.scala:88)
        at com.datastax.spark.connector.rdd.partitioner.DataSizeEstimates.totalDataSizeInBytes(DataSizeEstimates.scala:87)
        at org.apache.spark.sql.cassandra.CassandraSourceRelation$.apply(CassandraSourceRelation.scala:260)
        at org.apache.spark.sql.cassandra.DefaultSource.createRelation(DefaultSource.scala:55)
        at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330)
        at org.apache.spark.sql.execution.datasources.CreateTempViewUsing.run(ddl.scala:87)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:60)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:58)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
        at org.apache.spark.sql.Dataset.(Dataset.scala:186)
        at org.apache.spark.sql.Dataset.(Dataset.scala:167)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:65)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582)
        at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:682)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:62)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:331)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:247)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: com.datastax.driver.core.exceptions.UnauthorizedException: User donghua has no SELECT permission on or any of its parents

        at com.datastax.driver.core.Responses$Error.asException(Responses.java:134)
        at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:179)
        at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:173)
        at com.datastax.driver.core.RequestHandler.access$2500(RequestHandler.java:43)
        at com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:788)
        at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:607)
        at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1012)
        at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:935)
        at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304)
        at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304)
        at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304)
        at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:276)
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:263)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
        at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:823)
        at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:339)
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:255)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
        at java.lang.Thread.run(Thread.java:745)


Steps to fix:

cassandra@cqlsh> grant select on system.size_estimates to donghua;
cassandra@cqlsh> list all permissions of donghua;

 role             | username         | resource                      | permission
------------------+------------------+-------------------------------+------------
          donghua |          donghua |         |      ALTER

          donghua |          donghua |         |       DROP

          donghua |          donghua |         |     SELECT

          donghua |          donghua |         |     MODIFY

          donghua |          donghua |         |  AUTHORIZE

          donghua |          donghua | |     SELECT

 mykeyspace_owner | mykeyspace_owner |         |     CREATE
 mykeyspace_owner | mykeyspace_owner |         |      ALTER
 mykeyspace_owner | mykeyspace_owner |         |       DROP
 mykeyspace_owner | mykeyspace_owner |         |     SELECT
 mykeyspace_owner | mykeyspace_owner |         |     MODIFY
 mykeyspace_owner | mykeyspace_owner |         |  AUTHORIZE

(12 rows)



spark-sql> create TEMPORARY TABLE t1 USING org.apache.spark.sql.cassandra OPTIONS (table "t1",keyspace "mykeyspace", cluster "Test Cluster",pushdown "true");
16/10/13 21:46:46 WARN SparkStrategies$DDLStrategy: CREATE TEMPORARY TABLE t1 USING... is deprecated, please use CREATE TEMPORARY VIEW viewName USING... instead
Time taken: 2.462 seconds
spark-sql> select count(*) from t1;
445674
Time taken: 12.389 seconds, Fetched 1 row(s)
spark-sql>

How to fix OperationTimedOut when using cqlsh query large dataset

cqlsh:mykeyspace2> select d,min(e),max(e) from mykeyspace.t1;
OperationTimedOut: errors={'127.0.0.1': 'Client request timeout. See Session.execute[_async](timeout)'}, last_host=127.0.0.1


[donghua@localhost cluster]$ touch  ~/.cassandra/cqlshrc

[donghua@localhost cluster]$ cat  ~/.cassandra/cqlshrc
[connection]
client_timeout = 3600 # default is 10 seconds
# Can also be set to None to disable:
# client_timeout = None

[donghua@localhost cluster]$ cqlsh
Connected to SingleServer Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.
cqlsh> select d,min(e),max(e) from mykeyspace.t1;

 d | system.min(e) | system.max(e)
---+---------------+---------------
 5 |       1.00033 |     159.99989

(1 rows)

Wednesday, October 12, 2016

How to include multiple JAR files for Spark-shell and spark-sql


#Below are additional JAR files required to connect Apache Cassandra

[donghua@localhost ~]$ ls  -l /home/donghua/spark-casssandra-connector/*
-rw-rw-r--. 1 donghua donghua  231320 Oct 12 15:39 /home/donghua/spark-casssandra-connector/commons-beanutils-1.8.0.jar
-rw-rw-r--. 1 donghua donghua   38460 Oct 12 15:39 /home/donghua/spark-casssandra-connector/joda-convert-1.2.jar
-rw-rw-r--. 1 donghua donghua  581571 Oct 12 15:39 /home/donghua/spark-casssandra-connector/joda-time-2.3.jar
-rw-rw-r--. 1 donghua donghua   62226 Oct 12 15:40 /home/donghua/spark-casssandra-connector/jsr166e-1.1.0.jar
-rw-rw-r--. 1 donghua donghua 2112017 Oct 12 15:39 /home/donghua/spark-casssandra-connector/netty-all-4.0.33.Final.jar
-rw-rw-r--. 1 donghua donghua 4573750 Oct 12 15:41 /home/donghua/spark-casssandra-connector/scala-reflect-2.11.8.jar
-rw-rw-r--. 1 donghua donghua 6036067 Oct 12 15:39 /home/donghua/spark-casssandra-connector/spark-cassandra-connector-2.0.0-M2-s_2.11.jar

[donghua@localhost ~]$ cat spark-2.0.1-bin-hadoop2.7/conf/
docker.properties.template    log4j.properties.template     slaves.template               spark-defaults.conf.template
fairscheduler.xml.template    metrics.properties.template   spark-defaults.conf           spark-env.sh.template
[donghua@localhost ~]$ cat spark-2.0.1-bin-hadoop2.7/conf/spark-defaults.conf
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Default system properties included when running spark-submit.
# This is useful for setting default environmental settings.

# Example:
# spark.master                     spark://master:7077
# spark.eventLog.enabled           true
# spark.eventLog.dir               hdfs://namenode:8021/directory
# spark.serializer                 org.apache.spark.serializer.KryoSerializer
# spark.driver.memory              5g
# spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
#
spark.driver.extraClassPath  /home/donghua/spark-casssandra-connector/*
spark.executor.extraClassPath /home/donghua/spark-casssandra-connector/*

# To execute without using --jars to include additional jar files
[donghua@localhost spark-2.0.1-bin-hadoop2.7]$ $SPARK_HOME/bin/spark-shell --conf spark.cassandra.connection.host=127.0.0.1 

Query Cassandra database using spark-sql shell from Apache Spark

[donghua@vmxdb01 ~]$ $SPARK_HOME/bin/spark-sql --packages datastax:spark-cassandra-connector:2.0.0-M2-s_2.11 --conf spark.cassandra.connection.host=127.0.0.1
Ivy Default Cache set to: /home/donghua/.ivy2/cache
The jars for the packages stored in: /home/donghua/.ivy2/jars
:: loading settings :: url = jar:file:/home/donghua/spark-2.0.1-bin-hadoop2.7/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
datastax#spark-cassandra-connector added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
    confs: [default]
    found datastax#spark-cassandra-connector;2.0.0-M2-s_2.11 in spark-packages
    found commons-beanutils#commons-beanutils;1.8.0 in central
    found org.joda#joda-convert;1.2 in central
    found joda-time#joda-time;2.3 in central
    found io.netty#netty-all;4.0.33.Final in central
    found com.twitter#jsr166e;1.1.0 in central
    found org.scala-lang#scala-reflect;2.11.8 in central
:: resolution report :: resolve 869ms :: artifacts dl 15ms
    :: modules in use:
    com.twitter#jsr166e;1.1.0 from central in [default]
    commons-beanutils#commons-beanutils;1.8.0 from central in [default]
    datastax#spark-cassandra-connector;2.0.0-M2-s_2.11 from spark-packages in [default]
    io.netty#netty-all;4.0.33.Final from central in [default]
    joda-time#joda-time;2.3 from central in [default]
    org.joda#joda-convert;1.2 from central in [default]
    org.scala-lang#scala-reflect;2.11.8 from central in [default]
    ---------------------------------------------------------------------
    |                  |            modules            ||   artifacts   |
    |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
    ---------------------------------------------------------------------
    |      default     |   7   |   0   |   0   |   0   ||   7   |   0   |
    ---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent
    confs: [default]
    0 artifacts copied, 7 already retrieved (0kB/8ms)
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/10/12 08:15:01 INFO SparkContext: Running Spark version 2.0.1
16/10/12 08:15:02 INFO SecurityManager: Changing view acls to: donghua
16/10/12 08:15:02 INFO SecurityManager: Changing modify acls to: donghua
16/10/12 08:15:02 INFO SecurityManager: Changing view acls groups to:
16/10/12 08:15:02 INFO SecurityManager: Changing modify acls groups to:
16/10/12 08:15:02 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(donghua); groups with view permissions: Set(); users  with modify permissions: Set(donghua); groups with modify permissions: Set()
16/10/12 08:15:02 INFO Utils: Successfully started service 'sparkDriver' on port 60027.
16/10/12 08:15:02 INFO SparkEnv: Registering MapOutputTracker
16/10/12 08:15:02 INFO SparkEnv: Registering BlockManagerMaster
16/10/12 08:15:02 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-20513a2d-c965-416d-81ae-9668df59d304
16/10/12 08:15:02 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
16/10/12 08:15:02 INFO SparkEnv: Registering OutputCommitCoordinator
16/10/12 08:15:02 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/10/12 08:15:02 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.6.151:4040
16/10/12 08:15:03 INFO SparkContext: Added JAR file:/home/donghua/.ivy2/jars/datastax_spark-cassandra-connector-2.0.0-M2-s_2.11.jar at spark://192.168.6.151:60027/jars/datastax_spark-cassandra-connector-2.0.0-M2-s_2.11.jar with timestamp 1476274503002
16/10/12 08:15:03 INFO SparkContext: Added JAR file:/home/donghua/.ivy2/jars/commons-beanutils_commons-beanutils-1.8.0.jar at spark://192.168.6.151:60027/jars/commons-beanutils_commons-beanutils-1.8.0.jar with timestamp 1476274503003
16/10/12 08:15:03 INFO SparkContext: Added JAR file:/home/donghua/.ivy2/jars/org.joda_joda-convert-1.2.jar at spark://192.168.6.151:60027/jars/org.joda_joda-convert-1.2.jar with timestamp 1476274503003
16/10/12 08:15:03 INFO SparkContext: Added JAR file:/home/donghua/.ivy2/jars/joda-time_joda-time-2.3.jar at spark://192.168.6.151:60027/jars/joda-time_joda-time-2.3.jar with timestamp 1476274503003
16/10/12 08:15:03 INFO SparkContext: Added JAR file:/home/donghua/.ivy2/jars/io.netty_netty-all-4.0.33.Final.jar at spark://192.168.6.151:60027/jars/io.netty_netty-all-4.0.33.Final.jar with timestamp 1476274503003
16/10/12 08:15:03 INFO SparkContext: Added JAR file:/home/donghua/.ivy2/jars/com.twitter_jsr166e-1.1.0.jar at spark://192.168.6.151:60027/jars/com.twitter_jsr166e-1.1.0.jar with timestamp 1476274503003
16/10/12 08:15:03 INFO SparkContext: Added JAR file:/home/donghua/.ivy2/jars/org.scala-lang_scala-reflect-2.11.8.jar at spark://192.168.6.151:60027/jars/org.scala-lang_scala-reflect-2.11.8.jar with timestamp 1476274503003
16/10/12 08:15:03 INFO Executor: Starting executor ID driver on host localhost
16/10/12 08:15:03 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 44882.
16/10/12 08:15:03 INFO NettyBlockTransferService: Server created on 192.168.6.151:44882
16/10/12 08:15:03 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.6.151, 44882)
16/10/12 08:15:03 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.6.151:44882 with 366.3 MB RAM, BlockManagerId(driver, 192.168.6.151, 44882)
16/10/12 08:15:03 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.6.151, 44882)
16/10/12 08:15:03 WARN SparkContext: Use an existing SparkContext, some configuration may not take effect.
16/10/12 08:15:03 INFO HiveSharedState: Warehouse path is '/home/donghua/spark-warehouse'.
16/10/12 08:15:03 INFO HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
16/10/12 08:15:04 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
16/10/12 08:15:04 INFO ObjectStore: ObjectStore, initialize called
16/10/12 08:15:04 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
16/10/12 08:15:04 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
16/10/12 08:15:05 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
16/10/12 08:15:06 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/10/12 08:15:06 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/10/12 08:15:06 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/10/12 08:15:06 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/10/12 08:15:07 INFO Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing
16/10/12 08:15:07 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
16/10/12 08:15:07 INFO ObjectStore: Initialized ObjectStore
16/10/12 08:15:07 INFO HiveMetaStore: Added admin role in metastore
16/10/12 08:15:07 INFO HiveMetaStore: Added public role in metastore
16/10/12 08:15:07 INFO HiveMetaStore: No user is added in admin role, since config is empty
16/10/12 08:15:07 INFO HiveMetaStore: 0: get_all_databases
16/10/12 08:15:07 INFO audit: ugi=donghua    ip=unknown-ip-addr    cmd=get_all_databases   
16/10/12 08:15:07 INFO HiveMetaStore: 0: get_functions: db=default pat=*
16/10/12 08:15:07 INFO audit: ugi=donghua    ip=unknown-ip-addr    cmd=get_functions: db=default pat=*   
16/10/12 08:15:07 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
16/10/12 08:15:07 INFO SessionState: Created local directory: /tmp/c37fe7b1-45a3-4ce1-b115-ea341dfe013b_resources
16/10/12 08:15:07 INFO SessionState: Created HDFS directory: /tmp/hive/donghua/c37fe7b1-45a3-4ce1-b115-ea341dfe013b
16/10/12 08:15:07 INFO SessionState: Created local directory: /tmp/donghua/c37fe7b1-45a3-4ce1-b115-ea341dfe013b
16/10/12 08:15:07 INFO SessionState: Created HDFS directory: /tmp/hive/donghua/c37fe7b1-45a3-4ce1-b115-ea341dfe013b/_tmp_space.db
16/10/12 08:15:07 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is /home/donghua/spark-warehouse
16/10/12 08:15:07 INFO SessionState: Created local directory: /tmp/228512c3-917e-4070-b28d-3de3f688b746_resources
16/10/12 08:15:08 INFO SessionState: Created HDFS directory: /tmp/hive/donghua/228512c3-917e-4070-b28d-3de3f688b746
16/10/12 08:15:08 INFO SessionState: Created local directory: /tmp/donghua/228512c3-917e-4070-b28d-3de3f688b746
16/10/12 08:15:08 INFO SessionState: Created HDFS directory: /tmp/hive/donghua/228512c3-917e-4070-b28d-3de3f688b746/_tmp_space.db
16/10/12 08:15:08 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is /home/donghua/spark-warehouse

spark-sql> select * from mykeyspace.kv;
16/10/12 08:15:12 INFO SparkSqlParser: Parsing command: select * from mykeyspace.kv
16/10/12 08:15:13 INFO HiveMetaStore: 0: create_database: Database(name:default, description:default database, locationUri:file:/home/donghua/spark-warehouse, parameters:{})
16/10/12 08:15:13 INFO audit: ugi=donghua    ip=unknown-ip-addr    cmd=create_database: Database(name:default, description:default database, locationUri:file:/home/donghua/spark-warehouse, parameters:{})   
16/10/12 08:15:13 INFO HiveMetaStore: 0: get_database: mykeyspace
16/10/12 08:15:13 INFO audit: ugi=donghua    ip=unknown-ip-addr    cmd=get_database: mykeyspace   
16/10/12 08:15:13 WARN ObjectStore: Failed to get database mykeyspace, returning NoSuchObjectException
Error in query: Table or view not found: `mykeyspace`.`kv`; line 1 pos 14

spark-sql> create TEMPORARY TABLE kv USING org.apache.spark.sql.cassandra OPTIONS (table "kv",keyspace "mykeyspace", cluster "Test Cluster",pushdown "true");
16/10/12 08:15:19 INFO SparkSqlParser: Parsing command: create TEMPORARY TABLE kv USING org.apache.spark.sql.cassandra OPTIONS (table "kv",keyspace "mykeyspace", cluster "Test Cluster",pushdown "true")
16/10/12 08:15:20 WARN SparkStrategies$DDLStrategy: CREATE TEMPORARY TABLE kv USING... is deprecated, please use CREATE TEMPORARY VIEW viewName USING... instead
16/10/12 08:15:21 INFO NettyUtil: Found Netty's native epoll transport in the classpath, using it
16/10/12 08:15:22 INFO Cluster: New Cassandra host /127.0.0.1:9042 added
16/10/12 08:15:22 INFO CassandraConnector: Connected to Cassandra cluster: Test Cluster
16/10/12 08:15:22 INFO SparkContext: Starting job: processCmd at CliDriver.java:376
16/10/12 08:15:22 INFO DAGScheduler: Got job 0 (processCmd at CliDriver.java:376) with 1 output partitions
16/10/12 08:15:22 INFO DAGScheduler: Final stage: ResultStage 0 (processCmd at CliDriver.java:376)
16/10/12 08:15:22 INFO DAGScheduler: Parents of final stage: List()
16/10/12 08:15:22 INFO DAGScheduler: Missing parents: List()
16/10/12 08:15:22 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[3] at processCmd at CliDriver.java:376), which has no missing parents
16/10/12 08:15:22 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 3.2 KB, free 366.3 MB)
16/10/12 08:15:23 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1964.0 B, free 366.3 MB)
16/10/12 08:15:23 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.6.151:44882 (size: 1964.0 B, free: 366.3 MB)
16/10/12 08:15:23 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1012
16/10/12 08:15:23 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[3] at processCmd at CliDriver.java:376)
16/10/12 08:15:23 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
16/10/12 08:15:23 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0, PROCESS_LOCAL, 6128 bytes)
16/10/12 08:15:23 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
16/10/12 08:15:23 INFO Executor: Fetching spark://192.168.6.151:60027/jars/com.twitter_jsr166e-1.1.0.jar with timestamp 1476274503003
16/10/12 08:15:23 INFO TransportClientFactory: Successfully created connection to /192.168.6.151:60027 after 62 ms (0 ms spent in bootstraps)
16/10/12 08:15:23 INFO Utils: Fetching spark://192.168.6.151:60027/jars/com.twitter_jsr166e-1.1.0.jar to /tmp/spark-39508304-8579-4e51-ae16-e1cf8a22c139/userFiles-b34a93b4-cffc-4ec3-bcdb-1ce2823ac122/fetchFileTemp1249905734998620423.tmp
16/10/12 08:15:23 INFO Executor: Adding file:/tmp/spark-39508304-8579-4e51-ae16-e1cf8a22c139/userFiles-b34a93b4-cffc-4ec3-bcdb-1ce2823ac122/com.twitter_jsr166e-1.1.0.jar to class loader
16/10/12 08:15:23 INFO Executor: Fetching spark://192.168.6.151:60027/jars/datastax_spark-cassandra-connector-2.0.0-M2-s_2.11.jar with timestamp 1476274503002
16/10/12 08:15:23 INFO Utils: Fetching spark://192.168.6.151:60027/jars/datastax_spark-cassandra-connector-2.0.0-M2-s_2.11.jar to /tmp/spark-39508304-8579-4e51-ae16-e1cf8a22c139/userFiles-b34a93b4-cffc-4ec3-bcdb-1ce2823ac122/fetchFileTemp1156901688704158624.tmp
16/10/12 08:15:24 INFO Executor: Adding file:/tmp/spark-39508304-8579-4e51-ae16-e1cf8a22c139/userFiles-b34a93b4-cffc-4ec3-bcdb-1ce2823ac122/datastax_spark-cassandra-connector-2.0.0-M2-s_2.11.jar to class loader
16/10/12 08:15:24 INFO Executor: Fetching spark://192.168.6.151:60027/jars/org.joda_joda-convert-1.2.jar with timestamp 1476274503003
16/10/12 08:15:24 INFO Utils: Fetching spark://192.168.6.151:60027/jars/org.joda_joda-convert-1.2.jar to /tmp/spark-39508304-8579-4e51-ae16-e1cf8a22c139/userFiles-b34a93b4-cffc-4ec3-bcdb-1ce2823ac122/fetchFileTemp3155581321752809377.tmp
16/10/12 08:15:24 INFO Executor: Adding file:/tmp/spark-39508304-8579-4e51-ae16-e1cf8a22c139/userFiles-b34a93b4-cffc-4ec3-bcdb-1ce2823ac122/org.joda_joda-convert-1.2.jar to class loader
16/10/12 08:15:24 INFO Executor: Fetching spark://192.168.6.151:60027/jars/commons-beanutils_commons-beanutils-1.8.0.jar with timestamp 1476274503003
16/10/12 08:15:24 INFO Utils: Fetching spark://192.168.6.151:60027/jars/commons-beanutils_commons-beanutils-1.8.0.jar to /tmp/spark-39508304-8579-4e51-ae16-e1cf8a22c139/userFiles-b34a93b4-cffc-4ec3-bcdb-1ce2823ac122/fetchFileTemp3073542072392987481.tmp
16/10/12 08:15:24 INFO Executor: Adding file:/tmp/spark-39508304-8579-4e51-ae16-e1cf8a22c139/userFiles-b34a93b4-cffc-4ec3-bcdb-1ce2823ac122/commons-beanutils_commons-beanutils-1.8.0.jar to class loader
16/10/12 08:15:24 INFO Executor: Fetching spark://192.168.6.151:60027/jars/org.scala-lang_scala-reflect-2.11.8.jar with timestamp 1476274503003
16/10/12 08:15:24 INFO Utils: Fetching spark://192.168.6.151:60027/jars/org.scala-lang_scala-reflect-2.11.8.jar to /tmp/spark-39508304-8579-4e51-ae16-e1cf8a22c139/userFiles-b34a93b4-cffc-4ec3-bcdb-1ce2823ac122/fetchFileTemp8208827726672004865.tmp
16/10/12 08:15:24 INFO Executor: Adding file:/tmp/spark-39508304-8579-4e51-ae16-e1cf8a22c139/userFiles-b34a93b4-cffc-4ec3-bcdb-1ce2823ac122/org.scala-lang_scala-reflect-2.11.8.jar to class loader
16/10/12 08:15:24 INFO Executor: Fetching spark://192.168.6.151:60027/jars/joda-time_joda-time-2.3.jar with timestamp 1476274503003
16/10/12 08:15:24 INFO Utils: Fetching spark://192.168.6.151:60027/jars/joda-time_joda-time-2.3.jar to /tmp/spark-39508304-8579-4e51-ae16-e1cf8a22c139/userFiles-b34a93b4-cffc-4ec3-bcdb-1ce2823ac122/fetchFileTemp3706194028917481017.tmp
16/10/12 08:15:24 INFO Executor: Adding file:/tmp/spark-39508304-8579-4e51-ae16-e1cf8a22c139/userFiles-b34a93b4-cffc-4ec3-bcdb-1ce2823ac122/joda-time_joda-time-2.3.jar to class loader
16/10/12 08:15:24 INFO Executor: Fetching spark://192.168.6.151:60027/jars/io.netty_netty-all-4.0.33.Final.jar with timestamp 1476274503003
16/10/12 08:15:24 INFO Utils: Fetching spark://192.168.6.151:60027/jars/io.netty_netty-all-4.0.33.Final.jar to /tmp/spark-39508304-8579-4e51-ae16-e1cf8a22c139/userFiles-b34a93b4-cffc-4ec3-bcdb-1ce2823ac122/fetchFileTemp5973767129005243936.tmp
16/10/12 08:15:24 INFO Executor: Adding file:/tmp/spark-39508304-8579-4e51-ae16-e1cf8a22c139/userFiles-b34a93b4-cffc-4ec3-bcdb-1ce2823ac122/io.netty_netty-all-4.0.33.Final.jar to class loader
16/10/12 08:15:24 INFO CodeGenerator: Code generated in 270.608582 ms
16/10/12 08:15:24 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1064 bytes result sent to driver
16/10/12 08:15:24 INFO DAGScheduler: ResultStage 0 (processCmd at CliDriver.java:376) finished in 1.701 s
16/10/12 08:15:24 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 1664 ms on localhost (1/1)
16/10/12 08:15:24 INFO DAGScheduler: Job 0 finished: processCmd at CliDriver.java:376, took 2.311670 s
16/10/12 08:15:24 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
Time taken: 5.907 seconds
16/10/12 08:15:24 INFO CliDriver: Time taken: 5.907 seconds
16/10/12 08:15:29 INFO CassandraConnector: Disconnected from Cassandra cluster: Test Cluster

spark-sql> select * from kv;
16/10/12 08:15:43 INFO SparkSqlParser: Parsing command: select * from kv
16/10/12 08:15:43 INFO CassandraSourceRelation: Input Predicates: []
16/10/12 08:15:43 INFO CassandraSourceRelation: Input Predicates: []
16/10/12 08:15:43 INFO CodeGenerator: Code generated in 57.196131 ms
16/10/12 08:15:43 INFO Cluster: New Cassandra host /127.0.0.1:9042 added
16/10/12 08:15:43 INFO CassandraConnector: Connected to Cassandra cluster: Test Cluster
16/10/12 08:15:44 INFO SparkContext: Starting job: processCmd at CliDriver.java:376
16/10/12 08:15:44 INFO DAGScheduler: Got job 1 (processCmd at CliDriver.java:376) with 6 output partitions
16/10/12 08:15:44 INFO DAGScheduler: Final stage: ResultStage 1 (processCmd at CliDriver.java:376)
16/10/12 08:15:44 INFO DAGScheduler: Parents of final stage: List()
16/10/12 08:15:44 INFO DAGScheduler: Missing parents: List()
16/10/12 08:15:44 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[7] at processCmd at CliDriver.java:376), which has no missing parents
16/10/12 08:15:44 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 11.6 KB, free 366.3 MB)
16/10/12 08:15:44 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 6.0 KB, free 366.3 MB)
16/10/12 08:15:44 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.6.151:44882 (size: 6.0 KB, free: 366.3 MB)
16/10/12 08:15:44 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1012
16/10/12 08:15:44 INFO DAGScheduler: Submitting 6 missing tasks from ResultStage 1 (MapPartitionsRDD[7] at processCmd at CliDriver.java:376)
16/10/12 08:15:44 INFO TaskSchedulerImpl: Adding task set 1.0 with 6 tasks
16/10/12 08:15:44 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, partition 0, NODE_LOCAL, 13410 bytes)
16/10/12 08:15:44 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 2, localhost, partition 1, NODE_LOCAL, 14240 bytes)
16/10/12 08:15:44 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
16/10/12 08:15:44 INFO Executor: Running task 1.0 in stage 1.0 (TID 2)
16/10/12 08:15:45 INFO Executor: Finished task 1.0 in stage 1.0 (TID 2). 1071 bytes result sent to driver
16/10/12 08:15:45 INFO TaskSetManager: Starting task 2.0 in stage 1.0 (TID 3, localhost, partition 2, NODE_LOCAL, 13887 bytes)
16/10/12 08:15:45 INFO Executor: Running task 2.0 in stage 1.0 (TID 3)
16/10/12 08:15:45 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 2) in 610 ms on localhost (1/6)
16/10/12 08:15:45 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 1071 bytes result sent to driver
16/10/12 08:15:45 INFO TaskSetManager: Starting task 3.0 in stage 1.0 (TID 4, localhost, partition 3, NODE_LOCAL, 13055 bytes)
16/10/12 08:15:45 INFO Executor: Running task 3.0 in stage 1.0 (TID 4)
16/10/12 08:15:45 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 1373 ms on localhost (2/6)
16/10/12 08:15:45 INFO Executor: Finished task 2.0 in stage 1.0 (TID 3). 1333 bytes result sent to driver
16/10/12 08:15:46 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 192.168.6.151:44882 in memory (size: 1964.0 B, free: 366.3 MB)
16/10/12 08:15:46 INFO TaskSetManager: Starting task 4.0 in stage 1.0 (TID 5, localhost, partition 4, NODE_LOCAL, 13170 bytes)
16/10/12 08:15:46 INFO Executor: Running task 4.0 in stage 1.0 (TID 5)
16/10/12 08:15:46 INFO TaskSetManager: Finished task 2.0 in stage 1.0 (TID 3) in 846 ms on localhost (3/6)
16/10/12 08:15:46 INFO Executor: Finished task 3.0 in stage 1.0 (TID 4). 1362 bytes result sent to driver
16/10/12 08:15:46 INFO TaskSetManager: Starting task 5.0 in stage 1.0 (TID 6, localhost, partition 5, NODE_LOCAL, 8066 bytes)
16/10/12 08:15:46 INFO Executor: Running task 5.0 in stage 1.0 (TID 6)
16/10/12 08:15:46 INFO TaskSetManager: Finished task 3.0 in stage 1.0 (TID 4) in 658 ms on localhost (4/6)
16/10/12 08:15:46 INFO Executor: Finished task 4.0 in stage 1.0 (TID 5). 1071 bytes result sent to driver
16/10/12 08:15:46 INFO TaskSetManager: Finished task 4.0 in stage 1.0 (TID 5) in 476 ms on localhost (5/6)
16/10/12 08:15:46 INFO Executor: Finished task 5.0 in stage 1.0 (TID 6). 1071 bytes result sent to driver
16/10/12 08:15:46 INFO TaskSetManager: Finished task 5.0 in stage 1.0 (TID 6) in 174 ms on localhost (6/6)
16/10/12 08:15:46 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
16/10/12 08:15:46 INFO DAGScheduler: ResultStage 1 (processCmd at CliDriver.java:376) finished in 1.949 s
16/10/12 08:15:46 INFO DAGScheduler: Job 1 finished: processCmd at CliDriver.java:376, took 1.994766 s
key1    1
key4    4
key3    3
key2    2

Time taken: 3.044 seconds, Fetched 4 row(s)
16/10/12 08:15:46 INFO CliDriver: Time taken: 3.044 seconds, Fetched 4 row(s)
16/10/12 08:15:53 INFO CassandraConnector: Disconnected from Cassandra cluster: Test Cluster

spark-sql> desc kv;
16/10/12 08:16:24 INFO SparkSqlParser: Parsing command: desc kv

16/10/12 08:16:24 INFO SparkContext: Starting job: processCmd at CliDriver.java:376
16/10/12 08:16:24 INFO DAGScheduler: Got job 2 (processCmd at CliDriver.java:376) with 1 output partitions
16/10/12 08:16:24 INFO DAGScheduler: Final stage: ResultStage 2 (processCmd at CliDriver.java:376)
16/10/12 08:16:24 INFO DAGScheduler: Parents of final stage: List()
16/10/12 08:16:24 INFO DAGScheduler: Missing parents: List()
16/10/12 08:16:24 INFO DAGScheduler: Submitting ResultStage 2 (MapPartitionsRDD[10] at processCmd at CliDriver.java:376), which has no missing parents
16/10/12 08:16:24 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 4.2 KB, free 366.3 MB)
16/10/12 08:16:24 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2.5 KB, free 366.3 MB)
16/10/12 08:16:24 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.6.151:44882 (size: 2.5 KB, free: 366.3 MB)
16/10/12 08:16:24 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1012
16/10/12 08:16:24 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (MapPartitionsRDD[10] at processCmd at CliDriver.java:376)
16/10/12 08:16:24 INFO TaskSchedulerImpl: Adding task set 2.0 with 1 tasks
16/10/12 08:16:25 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 7, localhost, partition 0, PROCESS_LOCAL, 6167 bytes)
16/10/12 08:16:25 INFO Executor: Running task 0.0 in stage 2.0 (TID 7)
16/10/12 08:16:25 INFO CodeGenerator: Code generated in 16.186786 ms
16/10/12 08:16:25 INFO Executor: Finished task 0.0 in stage 2.0 (TID 7). 1051 bytes result sent to driver
16/10/12 08:16:25 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 7) in 69 ms on localhost (1/1)
16/10/12 08:16:25 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool
16/10/12 08:16:25 INFO DAGScheduler: ResultStage 2 (processCmd at CliDriver.java:376) finished in 0.070 s
16/10/12 08:16:25 INFO DAGScheduler: Job 2 finished: processCmd at CliDriver.java:376, took 0.083123 s
key    string    NULL
value    int    NULL
Time taken: 0.128 seconds, Fetched 2 row(s)
16/10/12 08:16:25 INFO CliDriver: Time taken: 0.128 seconds, Fetched 2 row(s)
spark-sql> select * from kv where value > 2;
16/10/12 08:16:34 INFO SparkSqlParser: Parsing command: select * from kv where value > 2
16/10/12 08:16:34 INFO CassandraSourceRelation: Input Predicates: [IsNotNull(value), GreaterThan(value,2)]
16/10/12 08:16:34 INFO CassandraSourceRelation: Input Predicates: [IsNotNull(value), GreaterThan(value,2)]
16/10/12 08:16:34 INFO CodeGenerator: Code generated in 14.815303 ms
16/10/12 08:16:34 INFO Cluster: New Cassandra host localhost/127.0.0.1:9042 added
16/10/12 08:16:34 INFO CassandraConnector: Connected to Cassandra cluster: Test Cluster
16/10/12 08:16:35 INFO SparkContext: Starting job: processCmd at CliDriver.java:376
16/10/12 08:16:35 INFO DAGScheduler: Got job 3 (processCmd at CliDriver.java:376) with 6 output partitions
16/10/12 08:16:35 INFO DAGScheduler: Final stage: ResultStage 3 (processCmd at CliDriver.java:376)
16/10/12 08:16:35 INFO DAGScheduler: Parents of final stage: List()
16/10/12 08:16:35 INFO DAGScheduler: Missing parents: List()
16/10/12 08:16:35 INFO DAGScheduler: Submitting ResultStage 3 (MapPartitionsRDD[14] at processCmd at CliDriver.java:376), which has no missing parents
16/10/12 08:16:35 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 12.4 KB, free 366.3 MB)
16/10/12 08:16:35 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 6.2 KB, free 366.3 MB)
16/10/12 08:16:35 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 192.168.6.151:44882 (size: 6.2 KB, free: 366.3 MB)
16/10/12 08:16:35 INFO SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1012
16/10/12 08:16:35 INFO DAGScheduler: Submitting 6 missing tasks from ResultStage 3 (MapPartitionsRDD[14] at processCmd at CliDriver.java:376)
16/10/12 08:16:35 INFO TaskSchedulerImpl: Adding task set 3.0 with 6 tasks
16/10/12 08:16:35 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID 8, localhost, partition 0, NODE_LOCAL, 13426 bytes)
16/10/12 08:16:35 INFO TaskSetManager: Starting task 1.0 in stage 3.0 (TID 9, localhost, partition 1, NODE_LOCAL, 14256 bytes)
16/10/12 08:16:35 INFO Executor: Running task 0.0 in stage 3.0 (TID 8)
16/10/12 08:16:35 INFO Executor: Running task 1.0 in stage 3.0 (TID 9)
16/10/12 08:16:35 INFO Executor: Finished task 0.0 in stage 3.0 (TID 8). 1133 bytes result sent to driver
16/10/12 08:16:35 INFO TaskSetManager: Starting task 2.0 in stage 3.0 (TID 10, localhost, partition 2, NODE_LOCAL, 13903 bytes)
16/10/12 08:16:35 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID 8) in 481 ms on localhost (1/6)
16/10/12 08:16:35 INFO Executor: Running task 2.0 in stage 3.0 (TID 10)
16/10/12 08:16:35 INFO Executor: Finished task 1.0 in stage 3.0 (TID 9). 1133 bytes result sent to driver
16/10/12 08:16:35 INFO TaskSetManager: Starting task 3.0 in stage 3.0 (TID 11, localhost, partition 3, NODE_LOCAL, 13071 bytes)
16/10/12 08:16:35 INFO TaskSetManager: Finished task 1.0 in stage 3.0 (TID 9) in 545 ms on localhost (2/6)
16/10/12 08:16:35 INFO Executor: Running task 3.0 in stage 3.0 (TID 11)
16/10/12 08:16:36 INFO Executor: Finished task 3.0 in stage 3.0 (TID 11). 1336 bytes result sent to driver
16/10/12 08:16:36 INFO TaskSetManager: Starting task 4.0 in stage 3.0 (TID 12, localhost, partition 4, NODE_LOCAL, 13186 bytes)
16/10/12 08:16:36 INFO TaskSetManager: Finished task 3.0 in stage 3.0 (TID 11) in 535 ms on localhost (3/6)
16/10/12 08:16:36 INFO Executor: Running task 4.0 in stage 3.0 (TID 12)
16/10/12 08:16:36 INFO Executor: Finished task 2.0 in stage 3.0 (TID 10). 1293 bytes result sent to driver
16/10/12 08:16:36 INFO TaskSetManager: Starting task 5.0 in stage 3.0 (TID 13, localhost, partition 5, NODE_LOCAL, 8082 bytes)
16/10/12 08:16:36 INFO Executor: Running task 5.0 in stage 3.0 (TID 13)
16/10/12 08:16:36 INFO TaskSetManager: Finished task 2.0 in stage 3.0 (TID 10) in 626 ms on localhost (4/6)
16/10/12 08:16:36 INFO Executor: Finished task 5.0 in stage 3.0 (TID 13). 1133 bytes result sent to driver
16/10/12 08:16:36 INFO TaskSetManager: Finished task 5.0 in stage 3.0 (TID 13) in 103 ms on localhost (5/6)
16/10/12 08:16:36 INFO Executor: Finished task 4.0 in stage 3.0 (TID 12). 1133 bytes result sent to driver
16/10/12 08:16:36 INFO TaskSetManager: Finished task 4.0 in stage 3.0 (TID 12) in 327 ms on localhost (6/6)
16/10/12 08:16:36 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool
16/10/12 08:16:36 INFO DAGScheduler: ResultStage 3 (processCmd at CliDriver.java:376) finished in 1.406 s
16/10/12 08:16:36 INFO DAGScheduler: Job 3 finished: processCmd at CliDriver.java:376, took 1.449255 s
key4    4
key3    3

Time taken: 1.958 seconds, Fetched 2 row(s)
16/10/12 08:16:36 INFO CliDriver: Time taken: 1.958 seconds, Fetched 2 row(s)
spark-sql> exit;
16/10/12 08:16:41 INFO SparkUI: Stopped Spark web UI at http://192.168.6.151:4040
16/10/12 08:16:41 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/10/12 08:16:41 INFO MemoryStore: MemoryStore cleared
16/10/12 08:16:41 INFO BlockManager: BlockManager stopped
16/10/12 08:16:41 INFO BlockManagerMaster: BlockManagerMaster stopped
16/10/12 08:16:41 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/10/12 08:16:41 INFO SparkContext: Successfully stopped SparkContext
16/10/12 08:16:41 INFO ShutdownHookManager: Shutdown hook called
16/10/12 08:16:41 INFO ShutdownHookManager: Deleting directory /tmp/spark-39508304-8579-4e51-ae16-e1cf8a22c139
16/10/12 08:16:41 INFO ShutdownHookManager: Deleting directory /tmp/spark-be3c499a-b678-43d1-8a0d-cdb7236428cd
16/10/12 08:16:43 INFO CassandraConnector: Disconnected from Cassandra cluster: Test Cluster
16/10/12 08:16:43 INFO SerialShutdownHooks: Successfully executed shutdown hook: Clearing session cache for C* connector



================== Turn off verbose output to make the output cleaner ==========

[donghua@vmxdb01 ~]$ diff spark-2.0.1-bin-hadoop2.7/conf/log4j.properties.template spark-2.0.1-bin-hadoop2.7/conf/log4j.properties
19c19
< log4j.rootCategory=INFO, console
---
> log4j.rootCategory=WARN, console


spark-sql> select * from kv where value > 2;
Error in query: Table or view not found: kv; line 1 pos 14
spark-sql> create TEMPORARY TABLE kv USING org.apache.spark.sql.cassandra OPTIONS (table "kv",keyspace "mykeyspace", cluster "Test Cluster",pushdown "true");
16/10/12 08:28:09 WARN SparkStrategies$DDLStrategy: CREATE TEMPORARY TABLE kv USING... is deprecated, please use CREATE TEMPORARY VIEW viewName USING... instead
Time taken: 4.008 seconds
spark-sql> select * from kv;
key1    1                                                                      
key4    4
key3    3
key2    2
Time taken: 2.253 seconds, Fetched 4 row(s)
spark-sql> select substring(key,1,3) from kv;
key                                                                            
key
key
key
Time taken: 1.328 seconds, Fetched 4 row(s)
spark-sql> select substring(key,1,3),count(*) from kv group by substring(key,1,3);
key     4                                                                      
Time taken: 3.518 seconds, Fetched 1 row(s)
spark-sql>