Saturday, December 23, 2017

Hive DDL and statistics update

[donghua@cdh-vm ~]$  beeline -u jdbc:hive2://cdh-vm.dbaglobe.com:10000/default -n donghua
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
scan complete in 2ms
Connecting to jdbc:hive2://cdh-vm.dbaglobe.com:10000/default
Connected to: Apache Hive (version 1.1.0-cdh5.13.1)
Driver: Hive JDBC (version 1.1.0-cdh5.13.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.1.0-cdh5.13.0 by Apache Hive
0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> create table employees (employee_id int, first_name string, last_name string, salary double);
INFO  : Compiling command(queryId=hive_20171223151818_3114edcb-3cc6-47fc-9506-bdf23dd3591b): create table employees (employee_id int, first_name string, last_name string, salary double)
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hive_20171223151818_3114edcb-3cc6-47fc-9506-bdf23dd3591b); Time taken: 0.056 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hive_20171223151818_3114edcb-3cc6-47fc-9506-bdf23dd3591b): create table employees (employee_id int, first_name string, last_name string, salary double)
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hive_20171223151818_3114edcb-3cc6-47fc-9506-bdf23dd3591b); Time taken: 0.075 seconds
INFO  : OK
No rows affected (0.211 seconds)
0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> insert into employees values(1,'Donghua','Luo',1000);
INFO  : Compiling command(queryId=hive_20171223151818_443d5c69-feb4-4179-9472-8f0c656bf325): insert into employees values(1,'Donghua','Luo',1000)
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_col0, type:int, comment:null), FieldSchema(name:_col1, type:string, comment:null), FieldSchema(name:_col2, type:string, comment:null), FieldSchema(name:_col3, type:double, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20171223151818_443d5c69-feb4-4179-9472-8f0c656bf325); Time taken: 0.112 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hive_20171223151818_443d5c69-feb4-4179-9472-8f0c656bf325): insert into employees values(1,'Donghua','Luo',1000)
INFO  : Query ID = hive_20171223151818_443d5c69-feb4-4179-9472-8f0c656bf325
INFO  : Total jobs = 3
INFO  : Launching Job 1 out of 3
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
INFO  : number of splits:1
INFO  : Submitting tokens for job: job_1513984921012_0013
INFO  : The url to track the job:
http://cdh-vm.dbaglobe.com:8088/proxy/application_1513984921012_0013/
INFO  : Starting Job = job_1513984921012_0013, Tracking URL = http://cdh-vm.dbaglobe.com:8088/proxy/application_1513984921012_0013/
INFO  : Kill Command = /opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/lib/hadoop/bin/hadoop job  -kill job_1513984921012_0013
INFO  : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
INFO  : 2017-12-23 15:19:03,237 Stage-1 map = 0%,  reduce = 0%
INFO  : 2017-12-23 15:19:09,639 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.54 sec
INFO  : MapReduce Total cumulative CPU time: 1 seconds 540 msec
INFO  : Ended Job = job_1513984921012_0013
INFO  : Starting task [Stage-7:CONDITIONAL] in serial mode
INFO  : Stage-4 is selected by condition resolver.
INFO  : Stage-3 is filtered out by condition resolver.
INFO  : Stage-5 is filtered out by condition resolver.
INFO  : Starting task [Stage-4:MOVE] in serial mode
INFO  : Moving data to: hdfs://cdh-vm.dbaglobe.com:8020/user/hive/warehouse/employees/.hive-staging_hive_2017-12-23_15-18-55_608_8119375440349524248-6/-ext-10000 from hdfs://cdh-vm.dbaglobe.com:8020/user/hive/warehouse/employees/.hive-staging_hive_2017-12-23_15-18-55_608_8119375440349524248-6/-ext-10002
INFO  : Starting task [Stage-0:MOVE] in serial mode
INFO  : Loading data to table default.employees from hdfs://cdh-vm.dbaglobe.com:8020/user/hive/warehouse/employees/.hive-staging_hive_2017-12-23_15-18-55_608_8119375440349524248-6/-ext-10000
INFO  : Starting task [Stage-2:STATS] in serial mode
INFO  : Table default.employees stats: [numFiles=1, numRows=1, totalSize=21, rawDataSize=20]
INFO  : MapReduce Jobs Launched:
INFO  : Stage-Stage-1: Map: 1   Cumulative CPU: 1.54 sec   HDFS Read: 4438 HDFS Write: 94 SUCCESS
INFO  : Total MapReduce CPU Time Spent: 1 seconds 540 msec
INFO  : Completed executing command(queryId=hive_20171223151818_443d5c69-feb4-4179-9472-8f0c656bf325); Time taken: 15.312 seconds
INFO  : OK
No rows affected (15.435 seconds)
0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> insert into employees values(2,'Larry','Elison',8000),(3,'Tom','Kyte',5000.1),(4,'Tiger','Hood',3443.4);
INFO  : Compiling command(queryId=hive_20171223152020_e219ddd3-4ea0-488d-a918-64d4951c140e): insert into employees values(2,'Larry','Elison',8000),(3,'Tom','Kyte',5000.1),(4,'Tiger','Hood',3443.4)
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_col0, type:int, comment:null), FieldSchema(name:_col1, type:string, comment:null), FieldSchema(name:_col2, type:string, comment:null), FieldSchema(name:_col3, type:double, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20171223152020_e219ddd3-4ea0-488d-a918-64d4951c140e); Time taken: 0.202 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hive_20171223152020_e219ddd3-4ea0-488d-a918-64d4951c140e): insert into employees values(2,'Larry','Elison',8000),(3,'Tom','Kyte',5000.1),(4,'Tiger','Hood',3443.4)
INFO  : Query ID = hive_20171223152020_e219ddd3-4ea0-488d-a918-64d4951c140e
INFO  : Total jobs = 3
INFO  : Launching Job 1 out of 3
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
INFO  : number of splits:1
INFO  : Submitting tokens for job: job_1513984921012_0014
INFO  : The url to track the job:
http://cdh-vm.dbaglobe.com:8088/proxy/application_1513984921012_0014/
INFO  : Starting Job = job_1513984921012_0014, Tracking URL = http://cdh-vm.dbaglobe.com:8088/proxy/application_1513984921012_0014/
INFO  : Kill Command = /opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/lib/hadoop/bin/hadoop job  -kill job_1513984921012_0014
INFO  : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
INFO  : 2017-12-23 15:20:27,877 Stage-1 map = 0%,  reduce = 0%
INFO  : 2017-12-23 15:20:34,291 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.81 sec
INFO  : MapReduce Total cumulative CPU time: 1 seconds 810 msec
INFO  : Ended Job = job_1513984921012_0014
INFO  : Starting task [Stage-7:CONDITIONAL] in serial mode
INFO  : Stage-4 is selected by condition resolver.
INFO  : Stage-3 is filtered out by condition resolver.
INFO  : Stage-5 is filtered out by condition resolver.
INFO  : Starting task [Stage-4:MOVE] in serial mode
INFO  : Moving data to: hdfs://cdh-vm.dbaglobe.com:8020/user/hive/warehouse/employees/.hive-staging_hive_2017-12-23_15-20-19_354_7141715692978521501-6/-ext-10000 from hdfs://cdh-vm.dbaglobe.com:8020/user/hive/warehouse/employees/.hive-staging_hive_2017-12-23_15-20-19_354_7141715692978521501-6/-ext-10002
INFO  : Starting task [Stage-0:MOVE] in serial mode
INFO  : Loading data to table default.employees from hdfs://cdh-vm.dbaglobe.com:8020/user/hive/warehouse/employees/.hive-staging_hive_2017-12-23_15-20-19_354_7141715692978521501-6/-ext-10000
INFO  : Starting task [Stage-2:STATS] in serial mode
INFO  : Table default.employees stats: [numFiles=2, numRows=4, totalSize=81, rawDataSize=77]
INFO  : MapReduce Jobs Launched:
INFO  : Stage-Stage-1: Map: 1   Cumulative CPU: 1.81 sec   HDFS Read: 4561 HDFS Write: 133 SUCCESS
INFO  : Total MapReduce CPU Time Spent: 1 seconds 810 msec
INFO  : Completed executing command(queryId=hive_20171223152020_e219ddd3-4ea0-488d-a918-64d4951c140e); Time taken: 16.345 seconds
INFO  : OK
No rows affected (16.572 seconds)
0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> select * from employees;
INFO  : Compiling command(queryId=hive_20171223152020_492fc281-4e23-4590-9e7e-776efceb3b57): select * from employees
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:employees.employee_id, type:int, comment:null), FieldSchema(name:employees.first_name, type:string, comment:null), FieldSchema(name:employees.last_name, type:string, comment:null), FieldSchema(name:employees.salary, type:double, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20171223152020_492fc281-4e23-4590-9e7e-776efceb3b57); Time taken: 0.055 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hive_20171223152020_492fc281-4e23-4590-9e7e-776efceb3b57): select * from employees
INFO  : Completed executing command(queryId=hive_20171223152020_492fc281-4e23-4590-9e7e-776efceb3b57); Time taken: 0.001 seconds
INFO  : OK
+------------------------+-----------------------+----------------------+-------------------+--+
| employees.employee_id  | employees.first_name  | employees.last_name  | employees.salary  |
+------------------------+-----------------------+----------------------+-------------------+--+
| 1                      | Donghua               | Luo                  | 1000.0            |
| 2                      | Larry                 | Elison               | 8000.0            |
| 3                      | Tom                   | Kyte                 | 5000.1            |
| 4                      | Tiger                 | Hood                 | 3443.4            |
+------------------------+-----------------------+----------------------+-------------------+--+
4 rows selected (0.142 seconds)
0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> desc employees;
INFO  : Compiling command(queryId=hive_20171223152020_470cfae6-e40f-4e9d-90ad-f9c4da76129d): desc employees
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:col_name, type:string, comment:from deserializer), FieldSchema(name:data_type, type:string, comment:from deserializer), FieldSchema(name:comment, type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=hive_20171223152020_470cfae6-e40f-4e9d-90ad-f9c4da76129d); Time taken: 0.045 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hive_20171223152020_470cfae6-e40f-4e9d-90ad-f9c4da76129d): desc employees
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hive_20171223152020_470cfae6-e40f-4e9d-90ad-f9c4da76129d); Time taken: 0.021 seconds
INFO  : OK
+--------------+------------+----------+--+
|   col_name   | data_type  | comment  |
+--------------+------------+----------+--+
| employee_id  | int        |          |
| first_name   | string     |          |
| last_name    | string     |          |
| salary       | double     |          |
+--------------+------------+----------+--+

4 rows selected (0.083 seconds)
0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> desc extended employees;
INFO  : Compiling command(queryId=hive_20171223152121_c43024b4-3d7c-4347-b4f6-7d7b496b9813): desc extended employees
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:col_name, type:string, comment:from deserializer), FieldSchema(name:data_type, type:string, comment:from deserializer), FieldSchema(name:comment, type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=hive_20171223152121_c43024b4-3d7c-4347-b4f6-7d7b496b9813); Time taken: 0.031 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hive_20171223152121_c43024b4-3d7c-4347-b4f6-7d7b496b9813): desc extended employees
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hive_20171223152121_c43024b4-3d7c-4347-b4f6-7d7b496b9813); Time taken: 0.011 seconds
INFO  : OK
+-----------------------------+----------------------------------------------------+----------+--+
|          col_name           |                     data_type                      | comment  |
+-----------------------------+----------------------------------------------------+----------+--+
| employee_id                 | int                                                |          |
| first_name                  | string                                             |          |
| last_name                   | string                                             |          |
| salary                      | double                                             |          |
|                             | NULL                                               | NULL     |
| Detailed Table Information  | Table(tableName:employees, dbName:default, owner:donghua, createTime:1514013512, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:employee_id, type:int, comment:null), FieldSchema(name:first_name, type:string, comment:null), FieldSchema(name:last_name, type:string, comment:null), FieldSchema(name:salary, type:double, comment:null)], location:hdfs://cdh-vm.dbaglobe.com:8020/user/hive/warehouse/employees, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{totalSize=81, numRows=4, rawDataSize=77, COLUMN_STATS_ACCURATE=true, numFiles=2, transient_lastDdlTime=1514013635}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE) |          |
+-----------------------------+----------------------------------------------------+----------+--+
6 rows selected (0.066 seconds)
0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> update employees set salary=5000 where employee_id=1;
Error: Error while compiling statement: FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations. (state=42000,code=10294)
0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> analyze table employees compute statistics;
INFO  : Compiling command(queryId=hive_20171223152222_20671839-fcae-4ce2-b96b-b5c6b22e62d8): analyze table employees compute statistics
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:employees.employee_id, type:int, comment:null), FieldSchema(name:employees.first_name, type:string, comment:null), FieldSchema(name:employees.last_name, type:string, comment:null), FieldSchema(name:employees.salary, type:double, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20171223152222_20671839-fcae-4ce2-b96b-b5c6b22e62d8); Time taken: 0.087 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hive_20171223152222_20671839-fcae-4ce2-b96b-b5c6b22e62d8): analyze table employees compute statistics
INFO  : Query ID = hive_20171223152222_20671839-fcae-4ce2-b96b-b5c6b22e62d8
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-0:MAPRED] in serial mode
INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
INFO  : number of splits:1
INFO  : Submitting tokens for job: job_1513984921012_0015
INFO  : The url to track the job:
http://cdh-vm.dbaglobe.com:8088/proxy/application_1513984921012_0015/
INFO  : Starting Job = job_1513984921012_0015, Tracking URL = http://cdh-vm.dbaglobe.com:8088/proxy/application_1513984921012_0015/
INFO  : Kill Command = /opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/lib/hadoop/bin/hadoop job  -kill job_1513984921012_0015
INFO  : Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0
INFO  : 2017-12-23 15:22:57,900 Stage-0 map = 0%,  reduce = 0%
INFO  : 2017-12-23 15:23:03,275 Stage-0 map = 100%,  reduce = 0%, Cumulative CPU 1.09 sec
INFO  : MapReduce Total cumulative CPU time: 1 seconds 90 msec
INFO  : Ended Job = job_1513984921012_0015
INFO  : Starting task [Stage-1:STATS] in serial mode
INFO  : Table default.employees stats: [numFiles=2, numRows=4, totalSize=81, rawDataSize=77]
INFO  : MapReduce Jobs Launched:
INFO  : Stage-Stage-0: Map: 1   Cumulative CPU: 1.09 sec   HDFS Read: 3054 HDFS Write: 73 SUCCESS
INFO  : Total MapReduce CPU Time Spent: 1 seconds 90 msec
INFO  : Completed executing command(queryId=hive_20171223152222_20671839-fcae-4ce2-b96b-b5c6b22e62d8); Time taken: 14.073 seconds
INFO  : OK
No rows affected (14.176 seconds)

Constraint NOT supported (added in Hive 2.1.0 https://issues.apache.org/jira/browse/HIVE-13290)


0: jdbc:hive2://cdh-vm.dbaglobe.com:10000/def> create table employees2 (employee_id int, first_name string, last_name string, salary double, primary key(employee_id) disable novalidate);
Error: Error while compiling statement: FAILED: ParseException line 1:102 cannot recognize input near 'key' '(' 'employee_id' in column type (state=42000,code=40000)