Saturday, January 27, 2018

Use Sqoop to import into hive tables (source is a view)

[donghua@cdh-vm test_db-master]$ ls -l /opt/cloudera/parcels/CDH/lib/sqoop/lib/
total 0
lrwxrwxrwx 1 root root 35 Nov  9 13:49 ant-contrib-1.0b3.jar -> ../../../jars/ant-contrib-1.0b3.jar
lrwxrwxrwx 1 root root 40 Nov  9 13:49 ant-eclipse-1.0-jvm1.2.jar -> ../../../jars/ant-eclipse-1.0-jvm1.2.jar
lrwxrwxrwx 1 root root 41 Nov  9 13:42 avro-mapred-hadoop2.jar -> ../../../lib/avro/avro-mapred-hadoop2.jar
lrwxrwxrwx 1 root root 26 Nov  9 13:42 avro.jar -> ../../../lib/avro/avro.jar
lrwxrwxrwx 1 root root 35 Nov  9 13:49 commons-codec-1.4.jar -> ../../../jars/commons-codec-1.4.jar
lrwxrwxrwx 1 root root 40 Nov  9 13:49 commons-compress-1.4.1.jar -> ../../../jars/commons-compress-1.4.1.jar
lrwxrwxrwx 1 root root 32 Nov  9 13:49 commons-io-1.4.jar -> ../../../jars/commons-io-1.4.jar
lrwxrwxrwx 1 root root 36 Nov  9 13:49 commons-jexl-2.1.1.jar -> ../../../jars/commons-jexl-2.1.1.jar
lrwxrwxrwx 1 root root 35 Nov  9 13:49 commons-lang3-3.4.jar -> ../../../jars/commons-lang3-3.4.jar
lrwxrwxrwx 1 root root 39 Nov  9 13:49 commons-logging-1.1.3.jar -> ../../../jars/commons-logging-1.1.3.jar
lrwxrwxrwx 1 root root 30 Nov  9 13:49 fastutil-6.3.jar -> ../../../jars/fastutil-6.3.jar
lrwxrwxrwx 1 root root 33 Nov  9 13:49 hsqldb-1.8.0.10.jar -> ../../../jars/hsqldb-1.8.0.10.jar
lrwxrwxrwx 1 root root 43 Nov  9 13:49 jackson-annotations-2.3.1.jar -> ../../../jars/jackson-annotations-2.3.1.jar
lrwxrwxrwx 1 root root 36 Nov  9 13:49 jackson-core-2.3.1.jar -> ../../../jars/jackson-core-2.3.1.jar
lrwxrwxrwx 1 root root 40 Nov  9 13:49 jackson-core-asl-1.8.8.jar -> ../../../jars/jackson-core-asl-1.8.8.jar
lrwxrwxrwx 1 root root 40 Nov  9 13:49 jackson-databind-2.3.1.jar -> ../../../jars/jackson-databind-2.3.1.jar
lrwxrwxrwx 1 root root 42 Nov  9 13:49 jackson-mapper-asl-1.8.8.jar -> ../../../jars/jackson-mapper-asl-1.8.8.jar
lrwxrwxrwx 1 root root 36 Nov  9 13:42 kite-data-core.jar -> ../../../lib/kite/kite-data-core.jar
lrwxrwxrwx 1 root root 36 Nov  9 13:42 kite-data-hive.jar -> ../../../lib/kite/kite-data-hive.jar
lrwxrwxrwx 1 root root 41 Nov  9 13:42 kite-data-mapreduce.jar -> ../../../lib/kite/kite-data-mapreduce.jar
lrwxrwxrwx 1 root root 47 Nov  9 13:42 kite-hadoop-compatibility.jar -> ../../../lib/kite/kite-hadoop-compatibility.jar
lrwxrwxrwx 1 root root 29 Nov  9 13:49 opencsv-2.3.jar -> ../../../jars/opencsv-2.3.jar
lrwxrwxrwx 1 root root 31 Nov  9 13:49 paranamer-2.3.jar -> ../../../jars/paranamer-2.3.jar
lrwxrwxrwx 1 root root 37 Nov  9 13:42 parquet-avro.jar -> ../../../lib/parquet/parquet-avro.jar
lrwxrwxrwx 1 root root 39 Nov  9 13:42 parquet-column.jar -> ../../../lib/parquet/parquet-column.jar
lrwxrwxrwx 1 root root 39 Nov  9 13:42 parquet-common.jar -> ../../../lib/parquet/parquet-common.jar
lrwxrwxrwx 1 root root 41 Nov  9 13:42 parquet-encoding.jar -> ../../../lib/parquet/parquet-encoding.jar
lrwxrwxrwx 1 root root 39 Nov  9 13:42 parquet-format.jar -> ../../../lib/parquet/parquet-format.jar
lrwxrwxrwx 1 root root 39 Nov  9 13:42 parquet-hadoop.jar -> ../../../lib/parquet/parquet-hadoop.jar
lrwxrwxrwx 1 root root 40 Nov  9 13:42 parquet-jackson.jar -> ../../../lib/parquet/parquet-jackson.jar
lrwxrwxrwx 1 root root 33 Nov  9 13:49 slf4j-api-1.7.5.jar -> ../../../jars/slf4j-api-1.7.5.jar
lrwxrwxrwx 1 root root 37 Nov  9 13:49 snappy-java-1.0.4.1.jar -> ../../../jars/snappy-java-1.0.4.1.jar
lrwxrwxrwx 1 root root 24 Nov  9 13:49 xz-1.0.jar -> ../../../jars/xz-1.0.jar
[donghua@cdh-vm test_db-master]$ 
[donghua@cdh-vm test_db-master]$ sudo ln -s /usr/share/java/mysql-connector-java.jar /opt/cloudera/parcels/CDH/lib/sqoop/lib/
[sudo] password for donghua: 
[donghua@cdh-vm test_db-master]$ readlink /opt/cloudera/parcels/CDH/lib/sqoop/lib/mysql-connector-java.jar
/usr/share/java/mysql-connector-java.jar
[donghua@cdh-vm test_db-master]$ 


MariaDB [(none)]> create user employee_user identified by 'password';
Query OK, 0 rows affected (0.07 sec)

MariaDB [(none)]> grant all on employees.* to employee_user;
Query OK, 0 rows affected (0.04 sec)

MariaDB [(none)]> show grants for employee_user;
+--------------------------------------------------------------------------------------------------------------+
| Grants for employee_user@%                                                                                   |
+--------------------------------------------------------------------------------------------------------------+
| GRANT USAGE ON *.* TO 'employee_user'@'%' IDENTIFIED BY PASSWORD '*2470C0C06DEE42FD1618BB99005ADCA2EC9D1E19' |
| GRANT ALL PRIVILEGES ON `employees`.* TO 'employee_user'@'%'                                                 |
+--------------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

[donghua@cdh-vm test_db-master]$ sqoop list-databases --connect jdbc:mysql://cdh-vm.dbaglobe.com --username employee_user --password password 
Warning: /opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/01/26 23:32:56 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.1
18/01/26 23:32:56 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/01/26 23:32:56 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
information_schema
employees

[donghua@cdh-vm test_db-master]$ sqoop list-tables --connect jdbc:mysql://cdh-vm.dbaglobe.com/employees --username employee_user --password password 
Warning: /opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/01/26 23:33:17 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.1
18/01/26 23:33:17 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/01/26 23:33:17 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
current_dept_emp
departments
dept_emp
dept_emp_latest_date
dept_manager
employees
salaries
titles

[donghua@cdh-vm test_db-master]$ sqoop import --connect jdbc:mysql://cdh-vm.dbaglobe.com/employees --username employee_user --password password --table current_dept_emp
Warning: /opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/01/26 23:37:48 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.1
18/01/26 23:37:48 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/01/26 23:37:49 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
18/01/26 23:37:49 INFO tool.CodeGenTool: Beginning code generation
18/01/26 23:37:49 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `current_dept_emp` AS t LIMIT 1
18/01/26 23:37:49 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `current_dept_emp` AS t LIMIT 1
18/01/26 23:37:49 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce
Note: /tmp/sqoop-donghua/compile/f0cac41ee0eb9df573aa4341b36a671d/current_dept_emp.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/01/26 23:37:51 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-donghua/compile/f0cac41ee0eb9df573aa4341b36a671d/current_dept_emp.jar
18/01/26 23:37:51 WARN manager.MySQLManager: It looks like you are importing from mysql.
18/01/26 23:37:51 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
18/01/26 23:37:51 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
18/01/26 23:37:51 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
18/01/26 23:37:51 ERROR tool.ImportTool: Import failed: No primary key could be found for table current_dept_emp. Please specify one with --split-by or perform a sequential import with '-m 1'.


[donghua@cdh-vm test_db-master]$ sqoop import --connect jdbc:mysql://cdh-vm.dbaglobe.com/employees --username employee_user --password password --table current_dept_emp -m 1
Warning: /opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/01/26 23:38:08 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.1
18/01/26 23:38:08 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/01/26 23:38:08 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
18/01/26 23:38:08 INFO tool.CodeGenTool: Beginning code generation
18/01/26 23:38:09 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `current_dept_emp` AS t LIMIT 1
18/01/26 23:38:09 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `current_dept_emp` AS t LIMIT 1
18/01/26 23:38:09 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce
Note: /tmp/sqoop-donghua/compile/3cb418ffe5487ad8ed8b36689ec598f4/current_dept_emp.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/01/26 23:38:10 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-donghua/compile/3cb418ffe5487ad8ed8b36689ec598f4/current_dept_emp.jar
18/01/26 23:38:11 WARN manager.MySQLManager: It looks like you are importing from mysql.
18/01/26 23:38:11 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
18/01/26 23:38:11 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
18/01/26 23:38:11 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
18/01/26 23:38:11 INFO mapreduce.ImportJobBase: Beginning import of current_dept_emp
18/01/26 23:38:11 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
18/01/26 23:38:12 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
18/01/26 23:38:12 INFO client.RMProxy: Connecting to ResourceManager at cdh-vm.dbaglobe.com/192.168.56.10:8032
18/01/26 23:38:17 INFO db.DBInputFormat: Using read commited transaction isolation
18/01/26 23:38:18 INFO mapreduce.JobSubmitter: number of splits:1
18/01/26 23:38:18 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1517023991003_0001
18/01/26 23:38:19 INFO impl.YarnClientImpl: Submitted application application_1517023991003_0001
18/01/26 23:38:19 INFO mapreduce.Job: The url to track the job: http://cdh-vm.dbaglobe.com:8088/proxy/application_1517023991003_0001/
18/01/26 23:38:19 INFO mapreduce.Job: Running job: job_1517023991003_0001
18/01/26 23:38:30 INFO mapreduce.Job: Job job_1517023991003_0001 running in uber mode : false
18/01/26 23:38:30 INFO mapreduce.Job:  map 0% reduce 0%
18/01/26 23:38:42 INFO mapreduce.Job:  map 100% reduce 0%
18/01/26 23:38:43 INFO mapreduce.Job: Job job_1517023991003_0001 completed successfully
18/01/26 23:38:43 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=173876
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=10110817
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters 
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=8922
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=8922
Total vcore-milliseconds taken by all map tasks=8922
Total megabyte-milliseconds taken by all map tasks=13704192
Map-Reduce Framework
Map input records=300024
Map output records=300024
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=109
CPU time spent (ms)=3330
Physical memory (bytes) snapshot=281448448
Virtual memory (bytes) snapshot=2788491264
Total committed heap usage (bytes)=246939648
File Input Format Counters 
Bytes Read=0
File Output Format Counters 
Bytes Written=10110817
18/01/26 23:38:43 INFO mapreduce.ImportJobBase: Transferred 9.6424 MB in 31.2284 seconds (316.1811 KB/sec)
18/01/26 23:38:43 INFO mapreduce.ImportJobBase: Retrieved 300024 records.
[donghua@cdh-vm test_db-master]$ hdfs dfs -ls 
Found 3 items
drwx------   - donghua supergroup          0 2018-01-26 23:38 .staging
drwxr-xr-x   - donghua supergroup          0 2018-01-26 23:38 current_dept_emp
-rw-r--r--   1 donghua supergroup         15 2018-01-20 04:41 test.csv
[donghua@cdh-vm test_db-master]$ 


0: jdbc:hive2://localhost:10000/default> create database employees;
INFO  : Compiling command(queryId=hive_20180126234646_4c4d2716-9d75-4786-8c31-1ee517688165): create database employees
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hive_20180126234646_4c4d2716-9d75-4786-8c31-1ee517688165); Time taken: 0.043 seconds
INFO  : Executing command(queryId=hive_20180126234646_4c4d2716-9d75-4786-8c31-1ee517688165): create database employees
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hive_20180126234646_4c4d2716-9d75-4786-8c31-1ee517688165); Time taken: 0.182 seconds
INFO  : OK
No rows affected (0.351 seconds)

0: jdbc:hive2://localhost:10000/default> !sh hdfs dfs -ls /user/hive/warehouse/
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
Found 1 items
drwxrwxrwt   - anonymous hive          0 2018-01-26 23:46 /user/hive/warehouse/employees.db

[donghua@cdh-vm test_db-master]$ sqoop import --connect jdbc:mysql://cdh-vm.dbaglobe.com/employees --username employee_user --password password --table current_dept_emp --split-by=emp_no --hive-import --create-hive-table --hive-table=employees.current_dept_emp --warehouse-dir=/user/hive/warehouse
Warning: /opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/01/26 23:56:32 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.1
18/01/26 23:56:32 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/01/26 23:56:32 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
18/01/26 23:56:32 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
18/01/26 23:56:32 WARN tool.BaseSqoopTool: It seems that you're doing hive import directly into default
18/01/26 23:56:32 WARN tool.BaseSqoopTool: hive warehouse directory which is not supported. Sqoop is
18/01/26 23:56:32 WARN tool.BaseSqoopTool: firstly importing data into separate directory and then
18/01/26 23:56:32 WARN tool.BaseSqoopTool: inserting data into hive. Please consider removing
18/01/26 23:56:32 WARN tool.BaseSqoopTool: --target-dir or --warehouse-dir into /user/hive/warehouse in
18/01/26 23:56:32 WARN tool.BaseSqoopTool: case that you will detect any issues.
18/01/26 23:56:32 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
18/01/26 23:56:32 INFO tool.CodeGenTool: Beginning code generation
18/01/26 23:56:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `current_dept_emp` AS t LIMIT 1
18/01/26 23:56:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `current_dept_emp` AS t LIMIT 1
18/01/26 23:56:33 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce
Note: /tmp/sqoop-donghua/compile/35ced35e8590fbbd798fa058e0584fed/current_dept_emp.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/01/26 23:56:35 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-donghua/compile/35ced35e8590fbbd798fa058e0584fed/current_dept_emp.jar
18/01/26 23:56:35 WARN manager.MySQLManager: It looks like you are importing from mysql.
18/01/26 23:56:35 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
18/01/26 23:56:35 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
18/01/26 23:56:35 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
18/01/26 23:56:35 INFO mapreduce.ImportJobBase: Beginning import of current_dept_emp
18/01/26 23:56:35 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
18/01/26 23:56:36 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
18/01/26 23:56:36 INFO client.RMProxy: Connecting to ResourceManager at cdh-vm.dbaglobe.com/192.168.56.10:8032
18/01/26 23:56:41 INFO db.DBInputFormat: Using read commited transaction isolation
18/01/26 23:56:41 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`emp_no`), MAX(`emp_no`) FROM `current_dept_emp`
18/01/26 23:56:41 INFO db.IntegerSplitter: Split size: 122499; Num splits: 4 from: 10001 to: 499999
18/01/26 23:56:42 INFO mapreduce.JobSubmitter: number of splits:4
18/01/26 23:56:42 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1517023991003_0003
18/01/26 23:56:42 INFO impl.YarnClientImpl: Submitted application application_1517023991003_0003
18/01/26 23:56:42 INFO mapreduce.Job: The url to track the job: http://cdh-vm.dbaglobe.com:8088/proxy/application_1517023991003_0003/
18/01/26 23:56:42 INFO mapreduce.Job: Running job: job_1517023991003_0003
18/01/26 23:56:50 INFO mapreduce.Job: Job job_1517023991003_0003 running in uber mode : false
18/01/26 23:56:50 INFO mapreduce.Job:  map 0% reduce 0%
18/01/26 23:56:58 INFO mapreduce.Job:  map 25% reduce 0%
18/01/26 23:57:03 INFO mapreduce.Job:  map 50% reduce 0%
18/01/26 23:57:08 INFO mapreduce.Job:  map 75% reduce 0%
18/01/26 23:57:13 INFO mapreduce.Job:  map 100% reduce 0%
18/01/26 23:57:14 INFO mapreduce.Job: Job job_1517023991003_0003 completed successfully
18/01/26 23:57:14 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=698232
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=464
HDFS: Number of bytes written=10110817
HDFS: Number of read operations=16
HDFS: Number of large read operations=0
HDFS: Number of write operations=8
Job Counters 
Launched map tasks=4
Other local map tasks=4
Total time spent by all maps in occupied slots (ms)=17721
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=17721
Total vcore-milliseconds taken by all map tasks=17721
Total megabyte-milliseconds taken by all map tasks=27219456
Map-Reduce Framework
Map input records=300024
Map output records=300024
Input split bytes=464
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=313
CPU time spent (ms)=8810
Physical memory (bytes) snapshot=927260672
Virtual memory (bytes) snapshot=11156475904
Total committed heap usage (bytes)=836239360
File Input Format Counters 
Bytes Read=0
File Output Format Counters 
Bytes Written=10110817
18/01/26 23:57:14 INFO mapreduce.ImportJobBase: Transferred 9.6424 MB in 38.4431 seconds (256.8429 KB/sec)
18/01/26 23:57:14 INFO mapreduce.ImportJobBase: Retrieved 300024 records.
18/01/26 23:57:14 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `current_dept_emp` AS t LIMIT 1
18/01/26 23:57:15 WARN hive.TableDefWriter: Column from_date had to be cast to a less precise type in Hive
18/01/26 23:57:15 WARN hive.TableDefWriter: Column to_date had to be cast to a less precise type in Hive
18/01/26 23:57:15 INFO hive.HiveImport: Loading uploaded data into Hive

Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/jars/hive-common-1.1.0-cdh5.13.1.jar!/hive-log4j.properties
OK
Time taken: 3.967 seconds
Loading data to table employees.current_dept_emp
Table employees.current_dept_emp stats: [numFiles=4, totalSize=10110817]
OK
Time taken: 0.85 seconds



[donghua@cdh-vm test_db-master]$ sqoop import --connect jdbc:mysql://cdh-vm.dbaglobe.com/employees --username employee_user --password password --table current_dept_emp --split-by=emp_no --hive-import --create-hive-table --hive-table=employees.current_dept_emp2 --target-dir=/user/donghua/current_dept_emp2
Warning: /opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/01/27 00:00:21 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.1
18/01/27 00:00:21 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/01/27 00:00:21 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
18/01/27 00:00:21 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
18/01/27 00:00:21 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
18/01/27 00:00:21 INFO tool.CodeGenTool: Beginning code generation
18/01/27 00:00:21 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `current_dept_emp` AS t LIMIT 1
18/01/27 00:00:21 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `current_dept_emp` AS t LIMIT 1
18/01/27 00:00:22 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce
Note: /tmp/sqoop-donghua/compile/24b51955f91604b3504e2d409fe1d631/current_dept_emp.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/01/27 00:00:23 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-donghua/compile/24b51955f91604b3504e2d409fe1d631/current_dept_emp.jar
18/01/27 00:00:23 WARN manager.MySQLManager: It looks like you are importing from mysql.
18/01/27 00:00:23 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
18/01/27 00:00:23 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
18/01/27 00:00:23 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
18/01/27 00:00:23 INFO mapreduce.ImportJobBase: Beginning import of current_dept_emp
18/01/27 00:00:23 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
18/01/27 00:00:24 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
18/01/27 00:00:24 INFO client.RMProxy: Connecting to ResourceManager at cdh-vm.dbaglobe.com/192.168.56.10:8032
18/01/27 00:00:30 INFO db.DBInputFormat: Using read commited transaction isolation
18/01/27 00:00:30 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`emp_no`), MAX(`emp_no`) FROM `current_dept_emp`
18/01/27 00:00:31 INFO db.IntegerSplitter: Split size: 122499; Num splits: 4 from: 10001 to: 499999
18/01/27 00:00:31 INFO mapreduce.JobSubmitter: number of splits:4
18/01/27 00:00:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1517023991003_0004
18/01/27 00:00:32 INFO impl.YarnClientImpl: Submitted application application_1517023991003_0004
18/01/27 00:00:32 INFO mapreduce.Job: The url to track the job: http://cdh-vm.dbaglobe.com:8088/proxy/application_1517023991003_0004/
18/01/27 00:00:32 INFO mapreduce.Job: Running job: job_1517023991003_0004
18/01/27 00:00:39 INFO mapreduce.Job: Job job_1517023991003_0004 running in uber mode : false
18/01/27 00:00:39 INFO mapreduce.Job:  map 0% reduce 0%
18/01/27 00:00:48 INFO mapreduce.Job:  map 25% reduce 0%
18/01/27 00:00:53 INFO mapreduce.Job:  map 50% reduce 0%
18/01/27 00:00:58 INFO mapreduce.Job:  map 75% reduce 0%
18/01/27 00:01:05 INFO mapreduce.Job:  map 100% reduce 0%
18/01/27 00:01:05 INFO mapreduce.Job: Job job_1517023991003_0004 completed successfully
18/01/27 00:01:05 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=698244
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=464
HDFS: Number of bytes written=10110817
HDFS: Number of read operations=16
HDFS: Number of large read operations=0
HDFS: Number of write operations=8
Job Counters 
Launched map tasks=4
Other local map tasks=4
Total time spent by all maps in occupied slots (ms)=17494
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=17494
Total vcore-milliseconds taken by all map tasks=17494
Total megabyte-milliseconds taken by all map tasks=26870784
Map-Reduce Framework
Map input records=300024
Map output records=300024
Input split bytes=464
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=296
CPU time spent (ms)=8230
Physical memory (bytes) snapshot=935788544
Virtual memory (bytes) snapshot=11149619200
Total committed heap usage (bytes)=926416896
File Input Format Counters 
Bytes Read=0
File Output Format Counters 
Bytes Written=10110817
18/01/27 00:01:05 INFO mapreduce.ImportJobBase: Transferred 9.6424 MB in 40.7122 seconds (242.5281 KB/sec)
18/01/27 00:01:05 INFO mapreduce.ImportJobBase: Retrieved 300024 records.
18/01/27 00:01:05 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `current_dept_emp` AS t LIMIT 1
18/01/27 00:01:05 WARN hive.TableDefWriter: Column from_date had to be cast to a less precise type in Hive
18/01/27 00:01:05 WARN hive.TableDefWriter: Column to_date had to be cast to a less precise type in Hive
18/01/27 00:01:05 INFO hive.HiveImport: Loading uploaded data into Hive

Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/jars/hive-common-1.1.0-cdh5.13.1.jar!/hive-log4j.properties
OK
Time taken: 2.277 seconds
Loading data to table employees.current_dept_emp2
Table employees.current_dept_emp2 stats: [numFiles=4, totalSize=10110817]
OK
Time taken: 0.638 seconds


0: jdbc:hive2://localhost:10000/default> use employees;
INFO  : Compiling command(queryId=hive_20180127000909_679b9dfa-5161-467c-9620-8081c6686c8e): use employees
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hive_20180127000909_679b9dfa-5161-467c-9620-8081c6686c8e); Time taken: 0.007 seconds
INFO  : Executing command(queryId=hive_20180127000909_679b9dfa-5161-467c-9620-8081c6686c8e): use employees
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hive_20180127000909_679b9dfa-5161-467c-9620-8081c6686c8e); Time taken: 0.021 seconds
INFO  : OK
No rows affected (0.048 seconds)

0: jdbc:hive2://localhost:10000/default> !tables
+------------+--------------+--------------------+-------------+-------------------------------------------+--+
| TABLE_CAT  | TABLE_SCHEM  |     TABLE_NAME     | TABLE_TYPE  |                  REMARKS                  |
+------------+--------------+--------------------+-------------+-------------------------------------------+--+
|            | employees    | current_dept_emp   | TABLE       | Imported by sqoop on 2018/01/26 23:57:15  |
|            | employees    | current_dept_emp2  | TABLE       | Imported by sqoop on 2018/01/27 00:01:05  |
+------------+--------------+--------------------+-------------+-------------------------------------------+--+

0: jdbc:hive2://localhost:10000/default> !set maxcolumnwidth 200

0: jdbc:hive2://localhost:10000/default> show create table employees.current_dept_emp;
INFO  : Compiling command(queryId=hive_20180127005252_fe156650-eacf-492d-8860-17af7d4fc590): show create table employees.current_dept_emp
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:createtab_stmt, type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=hive_20180127005252_fe156650-eacf-492d-8860-17af7d4fc590); Time taken: 0.03 seconds
INFO  : Executing command(queryId=hive_20180127005252_fe156650-eacf-492d-8860-17af7d4fc590): show create table employees.current_dept_emp
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hive_20180127005252_fe156650-eacf-492d-8860-17af7d4fc590); Time taken: 0.009 seconds
INFO  : OK
+----------------------------------------------------------------------------------------+--+
|                                     createtab_stmt                                     |
+----------------------------------------------------------------------------------------+--+
| CREATE TABLE `employees.current_dept_emp`(                                             |
|   `emp_no` int,                                                                        |
|   `dept_no` string,                                                                    |
|   `from_date` string,                                                                  |
|   `to_date` string)                                                                    |
| COMMENT 'Imported by sqoop on 2018/01/26 23:57:15'                                     |
| ROW FORMAT SERDE                                                                       |
|   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'                                 |
| WITH SERDEPROPERTIES (                                                                 |
|   'field.delim'='\u0001',                                                              |
|   'line.delim'='\n',                                                                   |
|   'serialization.format'='\u0001')                                                     |
| STORED AS INPUTFORMAT                                                                  |
|   'org.apache.hadoop.mapred.TextInputFormat'                                           |
| OUTPUTFORMAT                                                                           |
|   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'                         |
| LOCATION                                                                               |
|   'hdfs://cdh-vm.dbaglobe.com:8020/user/hive/warehouse/employees.db/current_dept_emp'  |
| TBLPROPERTIES (                                                                        |
|   'COLUMN_STATS_ACCURATE'='true',                                                      |
|   'numFiles'='4',                                                                      |
|   'totalSize'='10110817',                                                              |
|   'transient_lastDdlTime'='1517029041')                                                |
+----------------------------------------------------------------------------------------+--+
23 rows selected (0.087 seconds)

0: jdbc:hive2://localhost:10000/default> show create table employees.current_dept_emp2;
INFO  : Compiling command(queryId=hive_20180127005252_e90f722a-ffd4-400d-ae8b-aa76c382dc78): show create table employees.current_dept_emp2
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:createtab_stmt, type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=hive_20180127005252_e90f722a-ffd4-400d-ae8b-aa76c382dc78); Time taken: 0.027 seconds
INFO  : Executing command(queryId=hive_20180127005252_e90f722a-ffd4-400d-ae8b-aa76c382dc78): show create table employees.current_dept_emp2
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hive_20180127005252_e90f722a-ffd4-400d-ae8b-aa76c382dc78); Time taken: 0.013 seconds
INFO  : OK
+-----------------------------------------------------------------------------------------+--+
|                                     createtab_stmt                                      |
+-----------------------------------------------------------------------------------------+--+
| CREATE TABLE `employees.current_dept_emp2`(                                             |
|   `emp_no` int,                                                                         |
|   `dept_no` string,                                                                     |
|   `from_date` string,                                                                   |
|   `to_date` string)                                                                     |
| COMMENT 'Imported by sqoop on 2018/01/27 00:01:05'                                      |
| ROW FORMAT SERDE                                                                        |
|   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'                                  |
| WITH SERDEPROPERTIES (                                                                  |
|   'field.delim'='\u0001',                                                               |
|   'line.delim'='\n',                                                                    |
|   'serialization.format'='\u0001')                                                      |
| STORED AS INPUTFORMAT                                                                   |
|   'org.apache.hadoop.mapred.TextInputFormat'                                            |
| OUTPUTFORMAT                                                                            |
|   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'                          |
| LOCATION                                                                                |
|   'hdfs://cdh-vm.dbaglobe.com:8020/user/hive/warehouse/employees.db/current_dept_emp2'  |
| TBLPROPERTIES (                                                                         |
|   'COLUMN_STATS_ACCURATE'='true',                                                       |
|   'numFiles'='4',                                                                       |
|   'totalSize'='10110817',                                                               |
|   'transient_lastDdlTime'='1517029269')                                                 |
+-----------------------------------------------------------------------------------------+--+
23 rows selected (0.079 seconds)
0: jdbc:hive2://localhost:10000/default> 


[root@cdh-vm ~]# hdfs dfs -ls /user//hive/warehouse
Found 2 items
drwxrwxrwt   - donghua   hive          0 2018-01-27 00:01 /user/hive/warehouse/employees.db
drwxrwxrwt   - donghua   hive          0 2018-01-27 00:38 /user/hive/warehouse/test.db
[hdfs@cdh-vm ~]$ hdfs dfs -ls /user//hive/warehouse/employees.db
Found 2 items
drwxrwxrwt   - donghua hive          0 2018-01-26 23:57 /user/hive/warehouse/employees.db/current_dept_emp
drwxrwxrwt   - donghua hive          0 2018-01-27 00:01 /user/hive/warehouse/employees.db/current_dept_emp2



0: jdbc:hive2://localhost:10000/default> select count(*) from employees.current_dept_emp;
INFO  : Compiling command(queryId=hive_20180127003737_611e22dd-873d-48f6-9888-a414f9b5cf0a): select count(*) from employees.current_dept_emp
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:bigint, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20180127003737_611e22dd-873d-48f6-9888-a414f9b5cf0a); Time taken: 0.065 seconds
INFO  : Executing command(queryId=hive_20180127003737_611e22dd-873d-48f6-9888-a414f9b5cf0a): select count(*) from employees.current_dept_emp
INFO  : Query ID = hive_20180127003737_611e22dd-873d-48f6-9888-a414f9b5cf0a
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Number of reduce tasks determined at compile time: 1
INFO  : In order to change the average load for a reducer (in bytes):
INFO  :   set hive.exec.reducers.bytes.per.reducer=
INFO  : In order to limit the maximum number of reducers:
INFO  :   set hive.exec.reducers.max=
INFO  : In order to set a constant number of reducers:
INFO  :   set mapreduce.job.reduces=
INFO  : number of splits:1
INFO  : Submitting tokens for job: job_1517023991003_0007
INFO  : The url to track the job: http://cdh-vm.dbaglobe.com:8088/proxy/application_1517023991003_0007/
INFO  : Starting Job = job_1517023991003_0007, Tracking URL = http://cdh-vm.dbaglobe.com:8088/proxy/application_1517023991003_0007/
INFO  : Kill Command = /opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/lib/hadoop/bin/hadoop job  -kill job_1517023991003_0007
INFO  : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
INFO  : 2018-01-27 00:37:50,690 Stage-1 map = 0%,  reduce = 0%
INFO  : 2018-01-27 00:37:58,188 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.83 sec
INFO  : 2018-01-27 00:38:05,606 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.79 sec
INFO  : MapReduce Total cumulative CPU time: 3 seconds 790 msec
INFO  : Ended Job = job_1517023991003_0007
INFO  : MapReduce Jobs Launched: 
INFO  : Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 3.79 sec   HDFS Read: 10118840 HDFS Write: 7 SUCCESS
INFO  : Total MapReduce CPU Time Spent: 3 seconds 790 msec
INFO  : Completed executing command(queryId=hive_20180127003737_611e22dd-873d-48f6-9888-a414f9b5cf0a); Time taken: 23.26 seconds
INFO  : OK
+---------+--+
|   _c0   |
+---------+--+
| 300024  |
+---------+--+
1 row selected (23.371 seconds)


No comments:

Post a Comment