Prepare the offset (message below it will be removed)
[donghua@hdp ~]$ /usr/hdp/current/kafka-broker/bin/kafka-topics.sh --describe --topic kafka_hive_topic --zookeeper hdp:2181
Topic:kafka_hive_topic PartitionCount:1 ReplicationFactor:1 Configs:
Topic: kafka_hive_topic Partition: 0 Leader: 1001 Replicas: 1001 Isr: 1 001
parameter file:
[donghua@hdp ~]$ cat /tmp/delete_offset1.json
{"partitions":
[{"topic": "kafka_hive_topic", "partition": 0,
"offset": 12}],
"version":1
Execute kafka-delete-records.sh
[donghua@hdp ~]$ /usr/hdp/current/kafka-broker/bin/kafka-delete-records.sh --bootstrap-server hdp:6667 --offset-json-file /tmp/delete_offset1.json
Executing records delete operation
Records delete operation completed:
partition: kafka_hive_topic-0 low_watermark: 12
Saturday, March 9, 2019
Query Kafka topic directly using hive-kafka-storagehandler
1. Create kafka topic
[donghua@hdp ~]$ /usr/hdp/current/kafka-broker/bin/kafka-topics.sh --create --zookeeper hdp:2181 --replication-factor 1 --partitions 1 --topic kafka_hive_topic
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Created topic "kafka_hive_topic".
Row 3 and 4 are incorrect, which repeat data from row 2.
[donghua@hdp ~]$ /usr/hdp/current/kafka-broker/bin/kafka-topics.sh --create --zookeeper hdp:2181 --replication-factor 1 --partitions 1 --topic kafka_hive_topic
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Created topic "kafka_hive_topic".
2. Create hive table
[donghua@hdp ~]$ beeline -u "jdbc:hive2://hdp.dbaglobe.com:2181/demodb;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2" -n donghua -p x
Connecting to jdbc:hive2://hdp.dbaglobe.com:2181/demodb;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
19/03/09 16:44:13 [main]: INFO jdbc.HiveConnection: Connected to hdp:10000
Connected to: Apache Hive (version 3.1.0.3.1.0.0-78)
Driver: Hive JDBC (version 3.1.0.3.1.0.0-78)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.0.3.1.0.0-78 by Apache Hive
0: jdbc:hive2://hdp.dbaglobe.com:2181/demodb>
0: jdbc:hive2://hdp.dbaglobe.com:2181/demodb> CREATE EXTERNAL TABLE kafka_hive_table
. . . . . . . . . . . . . . . . . . . . . . > (`Country Name` string , `Language` string, `_id` struct<`$oid`:string>)
. . . . . . . . . . . . . . . . . . . . . . > STORED BY 'org.apache.hadoop.hive.kafka.KafkaStorageHandler'
. . . . . . . . . . . . . . . . . . . . . . > TBLPROPERTIES
. . . . . . . . . . . . . . . . . . . . . . > ("kafka.topic" = "kafka_hive_topic", "kafka.bootstrap.servers"="hdp:6667");
No rows affected (4.747 seconds)
0: jdbc:hive2://hdp.dbaglobe.com:2181/demodb> desc kafka_hive_table;
+---------------+----------------------+--------------------+
| col_name | data_type | comment |
+---------------+----------------------+--------------------+
| country name | string | from deserializer |
| language | string | from deserializer |
| _id | struct<$oid:string> | from deserializer |
| __key | binary | from deserializer |
| __partition | int | from deserializer |
| __offset | bigint | from deserializer |
| __timestamp | bigint | from deserializer |
+---------------+----------------------+--------------------+
7 rows selected (0.359 seconds)
0: jdbc:hive2://hdp.dbaglobe.com:2181/demodb> !outputformat tsv2
0: jdbc:hive2://hdp.dbaglobe.com:2181/demodb> !brief
verbose: off
createtab_stmt
CREATE EXTERNAL TABLE `kafka_hive_table`(
`country name` string COMMENT 'from deserializer',
`language` string COMMENT 'from deserializer',
`_id` struct<$oid:string> COMMENT 'from deserializer',
`__key` binary COMMENT 'from deserializer',
`__partition` int COMMENT 'from deserializer',
`__offset` bigint COMMENT 'from deserializer',
`__timestamp` bigint COMMENT 'from deserializer')
ROW FORMAT SERDE
'org.apache.hadoop.hive.kafka.KafkaSerDe'
STORED BY
'org.apache.hadoop.hive.kafka.KafkaStorageHandler'
WITH SERDEPROPERTIES (
'serialization.format'='1')
LOCATION
'hdfs://hdp.dbaglobe.com:8020/warehouse/tablespace/external/hive/demodb.db/kafka_hive_table'
TBLPROPERTIES (
'bucketing_version'='2',
'hive.kafka.max.retries'='6',
'hive.kafka.metadata.poll.timeout.ms'='30000',
'hive.kafka.optimistic.commit'='false',
'hive.kafka.poll.timeout.ms'='5000',
'kafka.bootstrap.servers'='hdp:6667',
'kafka.serde.class'='org.apache.hadoop.hive.serde2.JsonSerDe',
'kafka.topic'='kafka_hive_topic',
'kafka.write.semantic'='AT_LEAST_ONCE',
'transient_lastDdlTime'='1552121132')
27 rows selected (0.109 seconds)
0: jdbc:hive2://hdp.dbaglobe.com:2181/demodb>
3. Ingest some data into Kafka topic
[donghua@hdp ~]$ /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list hdp:6667 --topic kafka_hive_topic
>{"Country Name":"Afrika","Language":"af","_id":{"$oid":"55a0f1d420a4d760b5fbdbd6"},"ISO":0}
>{"Country Name":"Oseanië","Language":"af","_id":{"$oid":"55a0f1d420a4d760b5fbdbd7"},"ISO":0}
>^C
[donghua@hdp ~]$ /usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh --topic kafka_hive_topic --bootstrap-server hdp:6667 --from-beginning
{"Country Name":"Afrika","Language":"af","_id":{"$oid":"55a0f1d420a4d760b5fbdbd6"},"ISO":0}
{"Country Name":"Oseanië","Language":"af","_id":{"$oid":"55a0f1d420a4d760b5fbdbd7"},"ISO":0}
^C
Processed a total of 2 messages
4. Query hive table
0: jdbc:hive2://hdp.dbaglobe.com:2181/demodb> select t.`Country Name` as Name, t.`Language` as lang, t.`__offset` from kafka_hive_table t;
INFO : Compiling command(queryId=hive_20190309175638_f3a01692-28be-4683-bc66-32854615782c): select t.`Country Name` as Name, t.`Language` as lang, t.`__offset` from kafka_hive_table t
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:name, type:string, comment:null), FieldSchema(name:lang, type:string, comment:null), FieldSchema(name:t.__offset, type:bigint, comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20190309175638_f3a01692-28be-4683-bc66-32854615782c); Time taken: 0.289 seconds
INFO : Executing command(queryId=hive_20190309175638_f3a01692-28be-4683-bc66-32854615782c): select t.`Country Name` as Name, t.`Language` as lang, t.`__offset` from kafka_hive_table t
INFO : Completed executing command(queryId=hive_20190309175638_f3a01692-28be-4683-bc66-32854615782c); Time taken: 0.006 seconds
INFO : OK
+----------+-------+-------------+
| name | lang | t.__offset |
+----------+-------+-------------+
| Afrika | af | 12 |
| Oseanië | af | 13 |
+----------+-------+-------------+
2 rows selected (0.379 seconds)
Additional Finding
if empty messages inside Kafka topic, the hive result could be wrong, as below:
offset 3 and 4 are empty string
Row 3 and 4 are incorrect, which repeat data from row 2.
Thursday, February 28, 2019
How to import Hive metadata to Atlas
How to import existing hive tables which created before Apache Atlas added?
[hive@hdp ~]$ /usr/hdp/current/atlas-server/hook-bin/import-hive.sh
Using Hive configuration directory [/etc/hive/conf]
Log file for import is /usr/hdp/current/atlas-server/logs/import-hive.log
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2019-02-28T18:20:05,890 INFO [main] org.apache.atlas.ApplicationProperties - Looking for atlas-application.properties in classpath
2019-02-28T18:20:05,897 INFO [main] org.apache.atlas.ApplicationProperties - Loading atlas-application.properties from file:/etc/hive/3.1.0.0-78/0/atlas-application.properties
2019-02-28T18:20:06,006 INFO [main] org.apache.atlas.ApplicationProperties - No graphdb backend specified. Will use 'janus'
2019-02-28T18:20:06,006 INFO [main] org.apache.atlas.ApplicationProperties - Using storage backend 'hbase2'
2019-02-28T18:20:06,006 INFO [main] org.apache.atlas.ApplicationProperties - Using index backend 'solr'
2019-02-28T18:20:06,007 INFO [main] org.apache.atlas.ApplicationProperties - Setting solr-wait-searcher property 'true'
2019-02-28T18:20:06,007 INFO [main] org.apache.atlas.ApplicationProperties - Setting index.search.map-name property 'false'
2019-02-28T18:20:06,011 INFO [main] org.apache.atlas.ApplicationProperties - Property (set to default) atlas.graph.cache.db-cache = true
2019-02-28T18:20:06,011 INFO [main] org.apache.atlas.ApplicationProperties - Property (set to default) atlas.graph.cache.db-cache-clean-wait = 20
2019-02-28T18:20:06,012 INFO [main] org.apache.atlas.ApplicationProperties - Property (set to default) atlas.graph.cache.db-cache-size = 0.5
2019-02-28T18:20:06,012 INFO [main] org.apache.atlas.ApplicationProperties - Property (set to default) atlas.graph.cache.tx-cache-size = 15000
2019-02-28T18:20:06,012 INFO [main] org.apache.atlas.ApplicationProperties - Property (set to default) atlas.graph.cache.tx-dirty-size = 120
Enter username for atlas :- admin
Enter password for atlas :-
2019-02-28T18:20:10,984 INFO [main] org.apache.atlas.AtlasBaseClient - Client has only one service URL, will use that for all actions: http://hdp.dbaglobe.com:21000
2019-02-28T18:20:11,028 INFO [main] org.apache.hadoop.hive.conf.HiveConf - Found configuration file file:/etc/hive/3.1.0.0-78/0/hive-site.xml
2019-02-28T18:20:12,131 WARN [main] org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.stats.fetch.partition.stats does not exist
2019-02-28T18:20:12,131 WARN [main] org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.heapsize does not exist
2019-02-28T18:20:13,617 WARN [main] org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2019-02-28T18:20:14,051 INFO [main] org.apache.hadoop.hive.metastore.HiveMetaStoreClient - Trying to connect to metastore with URI thrift://hdp:9083
2019-02-28T18:20:14,223 INFO [main] org.apache.hadoop.hive.metastore.HiveMetaStoreClient - Opened a connection to metastore, current connections: 1
2019-02-28T18:20:14,474 INFO [main] org.apache.hadoop.hive.metastore.HiveMetaStoreClient - Connected to metastore.
2019-02-28T18:20:14,474 INFO [main] org.apache.hadoop.hive.metastore.RetryingMetaStoreClient - RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ugi=hive (auth:SIMPLE) retries=24 delay=5 lifetime=0
2019-02-28T18:20:15,314 INFO [main] org.apache.atlas.hive.bridge.HiveMetaStoreBridge - Importing Hive metadata
2019-02-28T18:20:15,356 INFO [main] org.apache.atlas.hive.bridge.HiveMetaStoreBridge - Found 4 databases
2019-02-28T18:20:15,717 INFO [main] org.apache.atlas.AtlasBaseClient - method=GET path=api/atlas/v2/entity/uniqueAttribute/type/ contentType=application/json; charset=UTF-8 accept=application/json status=200
2019-02-28T18:20:16,494 INFO [main] org.apache.atlas.hive.bridge.HiveMetaStoreBridge - Database default is already registered - id=620ebb57-a216-469f-8184-3def1b22da16. Updating it.
2019-02-28T18:20:16,711 INFO [main] org.apache.atlas.AtlasBaseClient - method=POST path=api/atlas/v2/entity/ contentType=application/json; charset=UTF-8 accept=application/json status=200
2019-02-28T18:20:16,815 INFO [main] org.apache.atlas.hive.bridge.HiveMetaStoreBridge - Updated hive_db entity: name=default@lake, guid=620ebb57-a216-469f-8184-3def1b22da16
2019-02-28T18:20:16,848 INFO [main] org.apache.atlas.hive.bridge.HiveMetaStoreBridge - No tables to import in database default
2019-02-28T18:20:16,894 INFO [main] org.apache.atlas.AtlasBaseClient - method=GET path=api/atlas/v2/entity/uniqueAttribute/type/ contentType=application/json; charset=UTF-8 accept=application/json status=200
2019-02-28T18:20:16,896 INFO [main] org.apache.atlas.hive.bridge.HiveMetaStoreBridge - Database demodb is already registered - id=2ebf0dce-e258-4121-ac3f-16e38facce2e. Updating it.
2019-02-28T18:20:16,923 INFO [main] org.apache.atlas.AtlasBaseClient - method=POST path=api/atlas/v2/entity/ contentType=application/json; charset=UTF-8 accept=application/json status=200
2019-02-28T18:20:16,923 INFO [main] org.apache.atlas.hive.bridge.HiveMetaStoreBridge - Updated hive_db entity: name=demodb@lake, guid=2ebf0dce-e258-4121-ac3f-16e38facce2e
2019-02-28T18:20:16,928 INFO [main] org.apache.atlas.hive.bridge.HiveMetaStoreBridge - Found 5 tables to import in database demodb
2019-02-28T18:20:17,184 INFO [main] org.apache.atlas.AtlasBaseClient - method=GET path=api/atlas/v2/entity/uniqueAttribute/type/ contentType=application/json; charset=UTF-8 accept=application/json status=404
2019-02-28T18:20:17,234 WARN [main] org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.stats.fetch.partition.stats does not exist
2019-02-28T18:20:17,234 WARN [main] org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.heapsize does not exist
2019-02-28T18:20:23,463 INFO [main] org.apache.atlas.AtlasBaseClient - method=POST path=api/atlas/v2/entity/ contentType=application/json; charset=UTF-8 accept=application/json status=200
2019-02-28T18:20:23,604 INFO [main] org.apache.atlas.AtlasBaseClient - method=GET path=api/atlas/v2/entity/guid/ contentType=application/json; charset=UTF-8 accept=application/json status=200
2019-02-28T18:20:23,607 INFO [main] org.apache.atlas.hive.bridge.HiveMetaStoreBridge - Created hive_table entity: name=demodb.person@lake, guid=af210ef7-9c1c-4faf-b07e-363dc51683aa
2019-02-28T18:20:23,637 INFO [main] org.apache.atlas.AtlasBaseClient - method=GET path=api/atlas/v2/entity/uniqueAttribute/type/ contentType=application/json; charset=UTF-8 accept=application/json status=404
2019-02-28T18:20:25,496 INFO [main] org.apache.atlas.AtlasBaseClient - method=POST path=api/atlas/v2/entity/ contentType=application/json; charset=UTF-8 accept=application/json status=200
2019-02-28T18:20:25,573 INFO [main] org.apache.atlas.AtlasBaseClient - method=GET path=api/atlas/v2/entity/guid/ contentType=application/json; charset=UTF-8 accept=application/json status=200
2019-02-28T18:20:25,575 INFO [main] org.apache.atlas.hive.bridge.HiveMetaStoreBridge - Created hive_table entity: name=demodb.zip@lake, guid=8aad59c6-ed05-43c9-8d89-e74ac712ee2b
2019-02-28T18:20:25,670 INFO [main] org.apache.atlas.AtlasBaseClient - method=GET path=api/atlas/v2/entity/uniqueAttribute/type/ contentType=application/json; charset=UTF-8 accept=application/json status=200
2019-02-28T18:20:25,673 INFO [main] org.apache.atlas.hive.bridge.HiveMetaStoreBridge - Table demodb.position is already registered with id 5476f1f0-8175-4b50-873b-e6589d0313bf. Updating entity.
2019-02-28T18:20:26,268 INFO [main] org.apache.atlas.AtlasBaseClient - method=POST path=api/atlas/v2/entity/ contentType=application/json; charset=UTF-8 accept=application/json status=200
[hive@hdp ~]$ /usr/hdp/current/atlas-server/hook-bin/import-hive.sh
Using Hive configuration directory [/etc/hive/conf]
Log file for import is /usr/hdp/current/atlas-server/logs/import-hive.log
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2019-02-28T18:20:05,890 INFO [main] org.apache.atlas.ApplicationProperties - Looking for atlas-application.properties in classpath
2019-02-28T18:20:05,897 INFO [main] org.apache.atlas.ApplicationProperties - Loading atlas-application.properties from file:/etc/hive/3.1.0.0-78/0/atlas-application.properties
2019-02-28T18:20:06,006 INFO [main] org.apache.atlas.ApplicationProperties - No graphdb backend specified. Will use 'janus'
2019-02-28T18:20:06,006 INFO [main] org.apache.atlas.ApplicationProperties - Using storage backend 'hbase2'
2019-02-28T18:20:06,006 INFO [main] org.apache.atlas.ApplicationProperties - Using index backend 'solr'
2019-02-28T18:20:06,007 INFO [main] org.apache.atlas.ApplicationProperties - Setting solr-wait-searcher property 'true'
2019-02-28T18:20:06,007 INFO [main] org.apache.atlas.ApplicationProperties - Setting index.search.map-name property 'false'
2019-02-28T18:20:06,011 INFO [main] org.apache.atlas.ApplicationProperties - Property (set to default) atlas.graph.cache.db-cache = true
2019-02-28T18:20:06,011 INFO [main] org.apache.atlas.ApplicationProperties - Property (set to default) atlas.graph.cache.db-cache-clean-wait = 20
2019-02-28T18:20:06,012 INFO [main] org.apache.atlas.ApplicationProperties - Property (set to default) atlas.graph.cache.db-cache-size = 0.5
2019-02-28T18:20:06,012 INFO [main] org.apache.atlas.ApplicationProperties - Property (set to default) atlas.graph.cache.tx-cache-size = 15000
2019-02-28T18:20:06,012 INFO [main] org.apache.atlas.ApplicationProperties - Property (set to default) atlas.graph.cache.tx-dirty-size = 120
Enter username for atlas :- admin
Enter password for atlas :-
2019-02-28T18:20:10,984 INFO [main] org.apache.atlas.AtlasBaseClient - Client has only one service URL, will use that for all actions: http://hdp.dbaglobe.com:21000
2019-02-28T18:20:11,028 INFO [main] org.apache.hadoop.hive.conf.HiveConf - Found configuration file file:/etc/hive/3.1.0.0-78/0/hive-site.xml
2019-02-28T18:20:12,131 WARN [main] org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.stats.fetch.partition.stats does not exist
2019-02-28T18:20:12,131 WARN [main] org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.heapsize does not exist
2019-02-28T18:20:13,617 WARN [main] org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2019-02-28T18:20:14,051 INFO [main] org.apache.hadoop.hive.metastore.HiveMetaStoreClient - Trying to connect to metastore with URI thrift://hdp:9083
2019-02-28T18:20:14,223 INFO [main] org.apache.hadoop.hive.metastore.HiveMetaStoreClient - Opened a connection to metastore, current connections: 1
2019-02-28T18:20:14,474 INFO [main] org.apache.hadoop.hive.metastore.HiveMetaStoreClient - Connected to metastore.
2019-02-28T18:20:14,474 INFO [main] org.apache.hadoop.hive.metastore.RetryingMetaStoreClient - RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ugi=hive (auth:SIMPLE) retries=24 delay=5 lifetime=0
2019-02-28T18:20:15,314 INFO [main] org.apache.atlas.hive.bridge.HiveMetaStoreBridge - Importing Hive metadata
2019-02-28T18:20:15,356 INFO [main] org.apache.atlas.hive.bridge.HiveMetaStoreBridge - Found 4 databases
2019-02-28T18:20:15,717 INFO [main] org.apache.atlas.AtlasBaseClient - method=GET path=api/atlas/v2/entity/uniqueAttribute/type/ contentType=application/json; charset=UTF-8 accept=application/json status=200
2019-02-28T18:20:16,494 INFO [main] org.apache.atlas.hive.bridge.HiveMetaStoreBridge - Database default is already registered - id=620ebb57-a216-469f-8184-3def1b22da16. Updating it.
2019-02-28T18:20:16,711 INFO [main] org.apache.atlas.AtlasBaseClient - method=POST path=api/atlas/v2/entity/ contentType=application/json; charset=UTF-8 accept=application/json status=200
2019-02-28T18:20:16,815 INFO [main] org.apache.atlas.hive.bridge.HiveMetaStoreBridge - Updated hive_db entity: name=default@lake, guid=620ebb57-a216-469f-8184-3def1b22da16
2019-02-28T18:20:16,848 INFO [main] org.apache.atlas.hive.bridge.HiveMetaStoreBridge - No tables to import in database default
2019-02-28T18:20:16,894 INFO [main] org.apache.atlas.AtlasBaseClient - method=GET path=api/atlas/v2/entity/uniqueAttribute/type/ contentType=application/json; charset=UTF-8 accept=application/json status=200
2019-02-28T18:20:16,896 INFO [main] org.apache.atlas.hive.bridge.HiveMetaStoreBridge - Database demodb is already registered - id=2ebf0dce-e258-4121-ac3f-16e38facce2e. Updating it.
2019-02-28T18:20:16,923 INFO [main] org.apache.atlas.AtlasBaseClient - method=POST path=api/atlas/v2/entity/ contentType=application/json; charset=UTF-8 accept=application/json status=200
2019-02-28T18:20:16,923 INFO [main] org.apache.atlas.hive.bridge.HiveMetaStoreBridge - Updated hive_db entity: name=demodb@lake, guid=2ebf0dce-e258-4121-ac3f-16e38facce2e
2019-02-28T18:20:16,928 INFO [main] org.apache.atlas.hive.bridge.HiveMetaStoreBridge - Found 5 tables to import in database demodb
2019-02-28T18:20:17,184 INFO [main] org.apache.atlas.AtlasBaseClient - method=GET path=api/atlas/v2/entity/uniqueAttribute/type/ contentType=application/json; charset=UTF-8 accept=application/json status=404
2019-02-28T18:20:17,234 WARN [main] org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.stats.fetch.partition.stats does not exist
2019-02-28T18:20:17,234 WARN [main] org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.heapsize does not exist
2019-02-28T18:20:23,463 INFO [main] org.apache.atlas.AtlasBaseClient - method=POST path=api/atlas/v2/entity/ contentType=application/json; charset=UTF-8 accept=application/json status=200
2019-02-28T18:20:23,604 INFO [main] org.apache.atlas.AtlasBaseClient - method=GET path=api/atlas/v2/entity/guid/ contentType=application/json; charset=UTF-8 accept=application/json status=200
2019-02-28T18:20:23,607 INFO [main] org.apache.atlas.hive.bridge.HiveMetaStoreBridge - Created hive_table entity: name=demodb.person@lake, guid=af210ef7-9c1c-4faf-b07e-363dc51683aa
2019-02-28T18:20:23,637 INFO [main] org.apache.atlas.AtlasBaseClient - method=GET path=api/atlas/v2/entity/uniqueAttribute/type/ contentType=application/json; charset=UTF-8 accept=application/json status=404
2019-02-28T18:20:25,496 INFO [main] org.apache.atlas.AtlasBaseClient - method=POST path=api/atlas/v2/entity/ contentType=application/json; charset=UTF-8 accept=application/json status=200
2019-02-28T18:20:25,573 INFO [main] org.apache.atlas.AtlasBaseClient - method=GET path=api/atlas/v2/entity/guid/ contentType=application/json; charset=UTF-8 accept=application/json status=200
2019-02-28T18:20:25,575 INFO [main] org.apache.atlas.hive.bridge.HiveMetaStoreBridge - Created hive_table entity: name=demodb.zip@lake, guid=8aad59c6-ed05-43c9-8d89-e74ac712ee2b
2019-02-28T18:20:25,670 INFO [main] org.apache.atlas.AtlasBaseClient - method=GET path=api/atlas/v2/entity/uniqueAttribute/type/ contentType=application/json; charset=UTF-8 accept=application/json status=200
2019-02-28T18:20:25,673 INFO [main] org.apache.atlas.hive.bridge.HiveMetaStoreBridge - Table demodb.position is already registered with id 5476f1f0-8175-4b50-873b-e6589d0313bf. Updating entity.
2019-02-28T18:20:26,268 INFO [main] org.apache.atlas.AtlasBaseClient - method=POST path=api/atlas/v2/entity/ contentType=application/json; charset=UTF-8 accept=application/json status=200
...
2019-02-28T18:21:45,219 INFO [main] org.apache.atlas.hive.bridge.HiveMetaStoreBridge - Updated hive_table entity: name=sys.wm_mappings@lake, guid=505dd84a-0221-433d-bc9e-5a406062a48c
2019-02-28T18:21:45,232 INFO [main] org.apache.atlas.AtlasBaseClient - method=GET path=api/atlas/v2/entity/uniqueAttribute/type/ contentType=application/json; charset=UTF-8 accept=application/json status=200
2019-02-28T18:21:45,232 INFO [main] org.apache.atlas.hive.bridge.HiveMetaStoreBridge - Process sys.wm_mappings@lake:1549277088000 is already registered
2019-02-28T18:21:45,232 INFO [main] org.apache.atlas.hive.bridge.HiveMetaStoreBridge - Successfully imported 45 tables from database sys
Hive Meta Data imported successfully!!!
Tuesday, February 12, 2019
Error accessing DB: (2059, "Authentication plugin 'caching_sha2_password' cannot be loaded: /usr/lib64/mysql/plugin caching_sha2_password.so: cannot open shared object file: No such file or directory")
Hue error with MySQL 8:
Error accessing DB: (2059, "Authentication plugin 'caching_sha2_password' cannot be loaded: /usr/lib64/mysql/plugin caching_sha2_password.so: cannot open shared object file: No such file or directory")
To fix:
ALTER USER hue IDENTIFIED WITH mysql_native_password BY 'password';
Error accessing DB: (2059, "Authentication plugin 'caching_sha2_password' cannot be loaded: /usr/lib64/mysql/plugin caching_sha2_password.so: cannot open shared object file: No such file or directory")
To fix:
ALTER USER hue IDENTIFIED WITH mysql_native_password BY 'password';
Saturday, February 2, 2019
Display lP address of machine on the banner without login
Add following lines to /etc/rc.local:
# initial setup, cp /etc/issue /etc/issue.orig
cp /etc/issue.orig /etc/issue
ip addr|grep inet |grep -i enp|awk {'print $NF, $2'}|awk -F'/' {'print $1'} >> /etc/issue
echo "" >> /etc/issue
# initial setup, cp /etc/issue /etc/issue.orig
cp /etc/issue.orig /etc/issue
ip addr|grep inet |grep -i enp|awk {'print $NF, $2'}|awk -F'/' {'print $1'} >> /etc/issue
echo "" >> /etc/issue
Monday, December 31, 2018
Demo code to show case Solr ingestion
# Written for Solr 7 (shipped with CDH6) and Python3
import tika import json import urllib3 import traceback import os tika.initVM() from tika import parser url = 'http://node02.dbaglobe.com:8983/solr/cms/update/json/docs?commit=true' filelist = ['D:\\Temp\\Building Positive Relationships young children.pdf', 'D:\\Temp\\Building Positive Relationships spouse n in laws.pdf']http = urllib3.PoolManager()for file in filelist: try: parsed = parser.from_file(file) #Add content to "combined" dict object combined={} combined['id']=os.path.basename(file) # use file name as Doc ID combined.update(parsed["metadata"]) combined['content']=parsed["content"] combined_json = json.loads(json.dumps(combined)) print(combined_json) # to clean up, execute solr command# use immutable to avoid error "This ConfigSet is immutable.", use below to create the template before create the collection # http://node02:8983/solr/admin/configs?action=CREATE&name=myConfigSet&baseConfigSet=schemalessTemplate&configSetProp.immutable=false&wt=xml # to search: content:"Psychologist" response = http.request('POST',url,body=json.dumps(combined_json),headers={'Content-Type': 'application/json'}) print (response.data) except: print(traceback.format_exc()) *:*
Sunday, December 16, 2018
Sentry and Hive permission explained
When using Sentry, the impersonation feature of HiveServer2 is disabled and each query runs in the cluster as the configured Hive principal. Thus, each HDFS location associated with a Hive table should be readable and writable by the Hive user or group.
If you are using the HDFS ACL synchronization feature, the required HDFS permissions (
r-x
for SELECT
, -wx
for INSERT
, and rwx
for ALL
) on files are enforced automatically and maintained dynamically in response to changes in privilege grants on databases and tables. In our example, the alice user would be given r-x
permission to files in tables in the sales
database. Note that a grant on a URI
object does not result in corresponding permissions on the location in HDFS.
Subscribe to:
Posts (Atom)