Thursday, October 13, 2016

Intergrate Apache Spark with after enabling user authorization in Apache Casasndra

[donghua@localhost conf]$  $SPARK_HOME/bin/spark-sql --hiveconf spark.cassandra.connection.host=127.0.0.1 --hiveconf spark.cassandra.auth.username=donghua --hiveconf spark.cassandra.auth.password=new_pasword
16/10/13 21:44:09 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/10/13 21:44:23 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/10/13 21:44:23 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
16/10/13 21:44:25 WARN Utils: Your hostname, localhost.localdomain resolves to a loopback address: 127.0.0.1; using 10.0.2.15 instead (on interface enp0s3)
16/10/13 21:44:25 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/10/13 21:44:27 WARN SparkContext: Use an existing SparkContext, some configuration may not take effect.
spark-sql> create TEMPORARY TABLE t1 USING org.apache.spark.sql.cassandra OPTIONS (table "t1",keyspace "mykeyspace", cluster "Test Cluster",pushdown "true");
16/10/13 21:44:40 WARN SparkStrategies$DDLStrategy: CREATE TEMPORARY TABLE t1 USING... is deprecated, please use CREATE TEMPORARY VIEW viewName USING... instead
16/10/13 21:44:42 ERROR SparkSQLDriver: Failed in [create TEMPORARY TABLE t1 USING org.apache.spark.sql.cassandra OPTIONS (table "t1",keyspace "mykeyspace", cluster "Test Cluster",pushdown "true")]
com.datastax.driver.core.exceptions.UnauthorizedException: User donghua has no SELECT permission on or any of its parents

        at com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:59)
        at com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:25)
        at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
        at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245)
        at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:64)
        at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:47)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.datastax.spark.connector.cql.SessionProxy.invoke(SessionProxy.scala:33)
        at com.sun.proxy.$Proxy29.execute(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.datastax.spark.connector.cql.SessionProxy.invoke(SessionProxy.scala:33)
        at com.sun.proxy.$Proxy29.execute(Unknown Source)
        at com.datastax.spark.connector.rdd.partitioner.DataSizeEstimates$$anonfun$tokenRanges$1.apply(DataSizeEstimates.scala:40)
        at com.datastax.spark.connector.rdd.partitioner.DataSizeEstimates$$anonfun$tokenRanges$1.apply(DataSizeEstimates.scala:38)
        at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withSessionDo$1.apply(CassandraConnector.scala:111)
        at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withSessionDo$1.apply(CassandraConnector.scala:110)
        at com.datastax.spark.connector.cql.CassandraConnector.closeResourceAfterUse(CassandraConnector.scala:140)
        at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:110)
        at com.datastax.spark.connector.rdd.partitioner.DataSizeEstimates.tokenRanges$lzycompute(DataSizeEstimates.scala:38)
        at com.datastax.spark.connector.rdd.partitioner.DataSizeEstimates.tokenRanges(DataSizeEstimates.scala:37)
        at com.datastax.spark.connector.rdd.partitioner.DataSizeEstimates.totalDataSizeInBytes$lzycompute(DataSizeEstimates.scala:88)
        at com.datastax.spark.connector.rdd.partitioner.DataSizeEstimates.totalDataSizeInBytes(DataSizeEstimates.scala:87)
        at org.apache.spark.sql.cassandra.CassandraSourceRelation$.apply(CassandraSourceRelation.scala:260)
        at org.apache.spark.sql.cassandra.DefaultSource.createRelation(DefaultSource.scala:55)
        at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330)
        at org.apache.spark.sql.execution.datasources.CreateTempViewUsing.run(ddl.scala:87)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:60)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:58)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
        at org.apache.spark.sql.Dataset.(Dataset.scala:186)
        at org.apache.spark.sql.Dataset.(Dataset.scala:167)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:65)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582)
        at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:682)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:62)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:331)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:247)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: com.datastax.driver.core.exceptions.UnauthorizedException: User donghua has no SELECT permission on or any of its parents

        at com.datastax.driver.core.Responses$Error.asException(Responses.java:134)
        at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:179)
        at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:173)
        at com.datastax.driver.core.RequestHandler.access$2500(RequestHandler.java:43)
        at com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:788)
        at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:607)
        at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1012)
        at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:935)
        at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304)
        at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304)
        at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304)
        at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:276)
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:263)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
        at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:823)
        at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:339)
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:255)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
        at java.lang.Thread.run(Thread.java:745)
com.datastax.driver.core.exceptions.UnauthorizedException: User donghua has no SELECT permission on or any of its parents

        at com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:59)
        at com.datastax.driver.core.exceptions.UnauthorizedException.copy(UnauthorizedException.java:25)
        at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
        at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245)
        at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:64)
        at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:47)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.datastax.spark.connector.cql.SessionProxy.invoke(SessionProxy.scala:33)
        at com.sun.proxy.$Proxy29.execute(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.datastax.spark.connector.cql.SessionProxy.invoke(SessionProxy.scala:33)
        at com.sun.proxy.$Proxy29.execute(Unknown Source)
        at com.datastax.spark.connector.rdd.partitioner.DataSizeEstimates$$anonfun$tokenRanges$1.apply(DataSizeEstimates.scala:40)
        at com.datastax.spark.connector.rdd.partitioner.DataSizeEstimates$$anonfun$tokenRanges$1.apply(DataSizeEstimates.scala:38)
        at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withSessionDo$1.apply(CassandraConnector.scala:111)
        at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withSessionDo$1.apply(CassandraConnector.scala:110)
        at com.datastax.spark.connector.cql.CassandraConnector.closeResourceAfterUse(CassandraConnector.scala:140)
        at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:110)
        at com.datastax.spark.connector.rdd.partitioner.DataSizeEstimates.tokenRanges$lzycompute(DataSizeEstimates.scala:38)
        at com.datastax.spark.connector.rdd.partitioner.DataSizeEstimates.tokenRanges(DataSizeEstimates.scala:37)
        at com.datastax.spark.connector.rdd.partitioner.DataSizeEstimates.totalDataSizeInBytes$lzycompute(DataSizeEstimates.scala:88)
        at com.datastax.spark.connector.rdd.partitioner.DataSizeEstimates.totalDataSizeInBytes(DataSizeEstimates.scala:87)
        at org.apache.spark.sql.cassandra.CassandraSourceRelation$.apply(CassandraSourceRelation.scala:260)
        at org.apache.spark.sql.cassandra.DefaultSource.createRelation(DefaultSource.scala:55)
        at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330)
        at org.apache.spark.sql.execution.datasources.CreateTempViewUsing.run(ddl.scala:87)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:60)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:58)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
        at org.apache.spark.sql.Dataset.(Dataset.scala:186)
        at org.apache.spark.sql.Dataset.(Dataset.scala:167)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:65)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582)
        at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:682)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:62)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:331)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:247)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: com.datastax.driver.core.exceptions.UnauthorizedException: User donghua has no SELECT permission on or any of its parents

        at com.datastax.driver.core.Responses$Error.asException(Responses.java:134)
        at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:179)
        at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:173)
        at com.datastax.driver.core.RequestHandler.access$2500(RequestHandler.java:43)
        at com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:788)
        at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:607)
        at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1012)
        at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:935)
        at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304)
        at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304)
        at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304)
        at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:276)
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:263)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
        at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:823)
        at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:339)
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:255)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
        at java.lang.Thread.run(Thread.java:745)


Steps to fix:

cassandra@cqlsh> grant select on system.size_estimates to donghua;
cassandra@cqlsh> list all permissions of donghua;

 role             | username         | resource                      | permission
------------------+------------------+-------------------------------+------------
          donghua |          donghua |         |      ALTER

          donghua |          donghua |         |       DROP

          donghua |          donghua |         |     SELECT

          donghua |          donghua |         |     MODIFY

          donghua |          donghua |         |  AUTHORIZE

          donghua |          donghua | |     SELECT

 mykeyspace_owner | mykeyspace_owner |         |     CREATE
 mykeyspace_owner | mykeyspace_owner |         |      ALTER
 mykeyspace_owner | mykeyspace_owner |         |       DROP
 mykeyspace_owner | mykeyspace_owner |         |     SELECT
 mykeyspace_owner | mykeyspace_owner |         |     MODIFY
 mykeyspace_owner | mykeyspace_owner |         |  AUTHORIZE

(12 rows)



spark-sql> create TEMPORARY TABLE t1 USING org.apache.spark.sql.cassandra OPTIONS (table "t1",keyspace "mykeyspace", cluster "Test Cluster",pushdown "true");
16/10/13 21:46:46 WARN SparkStrategies$DDLStrategy: CREATE TEMPORARY TABLE t1 USING... is deprecated, please use CREATE TEMPORARY VIEW viewName USING... instead
Time taken: 2.462 seconds
spark-sql> select count(*) from t1;
445674
Time taken: 12.389 seconds, Fetched 1 row(s)
spark-sql>