Wednesday, April 17, 2019

Differences between Parquet and ORC formats

ORC format
0: jdbc:hive2://hdp.dbaglobe.com:10000/demodb> create external table managed_t1 (id int, name string, remark string) stored as orc location '/data/managed_t1' tblproperties('transactional'='false');

0: jdbc:hive2://hdp.dbaglobe.com:10000/demodb> alter table managed_t1 replace columns (id int, name string);
INFO  : Compiling command(queryId=hive_20190417210521_b1634dd1-280f-41c2-96a1-d0e921c64041): alter table managed_t1 replace columns (id int, name string)
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hive_20190417210521_b1634dd1-280f-41c2-96a1-d0e921c64041); Time taken: 0.047 seconds
INFO  : Executing command(queryId=hive_20190417210521_b1634dd1-280f-41c2-96a1-d0e921c64041): alter table managed_t1 replace columns (id int, name string)
INFO  : Starting task [Stage-0:DDL] in serial mode
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Replacing columns cannot drop columns for table demodb.managed_t1. SerDe may be incompatible
INFO  : Completed executing command(queryId=hive_20190417210521_b1634dd1-280f-41c2-96a1-d0e921c64041); Time taken: 0.021 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Replacing columns cannot drop columns for table demodb.managed_t1. SerDe may be incompatible (state=42000,code=1)
Parquet format

0: jdbc:hive2://hdp.dbaglobe.com:10000/demodb> create external table managed_t1 (id int, name string, remark string) stored as parquet location '/data/managed_t1' tblproperties('transactional'='false');

0: jdbc:hive2://hdp.dbaglobe.com:10000/demodb> alter table managed_t1 replace columns (id int, name string);
INFO  : Compiling command(queryId=hive_20190417210231_90e8f8bd-bb03-4aa3-b138-e693ca59dc1b): alter table managed_t1 replace columns (id int, name string)
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hive_20190417210231_90e8f8bd-bb03-4aa3-b138-e693ca59dc1b); Time taken: 0.088 seconds
INFO  : Executing command(queryId=hive_20190417210231_90e8f8bd-bb03-4aa3-b138-e693ca59dc1b): alter table managed_t1 replace columns (id int, name string)
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hive_20190417210231_90e8f8bd-bb03-4aa3-b138-e693ca59dc1b); Time taken: 0.148 seconds
INFO  : OK
No rows affected (0.278 seconds)

Sunday, April 7, 2019