Saturday, January 27, 2018

Use sqoop export to move data from HDFS into MySQL


MariaDB [employees]> create table current_dept_emp2 as  select * from current_dept_emp where 1=2;
Query OK, 0 rows affected (0.05 sec)
Records: 0  Duplicates: 0  Warnings: 0

[donghua@cdh-vm ~]$ sqoop export --connect jdbc:mysql://cdh-vm.dbaglobe.com/employees --username employee_user --password password --table current_dept_emp2  --export-dir /user/donghua/current_dept_emp
Warning: /opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/01/27 05:43:54 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.1
18/01/27 05:43:54 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/01/27 05:43:55 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
18/01/27 05:43:55 INFO tool.CodeGenTool: Beginning code generation
18/01/27 05:43:55 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `current_dept_emp2` AS t LIMIT 1
18/01/27 05:43:55 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `current_dept_emp2` AS t LIMIT 1
18/01/27 05:43:55 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce
Note: /tmp/sqoop-donghua/compile/4eb832477301808137f8d255765ba2ca/current_dept_emp2.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/01/27 05:43:56 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-donghua/compile/4eb832477301808137f8d255765ba2ca/current_dept_emp2.jar
18/01/27 05:43:56 INFO mapreduce.ExportJobBase: Beginning export of current_dept_emp2
18/01/27 05:43:57 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
18/01/27 05:43:58 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
18/01/27 05:43:58 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
18/01/27 05:43:58 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
18/01/27 05:43:58 INFO client.RMProxy: Connecting to ResourceManager at cdh-vm.dbaglobe.com/192.168.56.10:8032
18/01/27 05:44:00 INFO input.FileInputFormat: Total input paths to process : 1
18/01/27 05:44:00 INFO input.FileInputFormat: Total input paths to process : 1
18/01/27 05:44:00 INFO mapreduce.JobSubmitter: number of splits:4
18/01/27 05:44:01 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1517023991003_0018
18/01/27 05:44:01 INFO impl.YarnClientImpl: Submitted application application_1517023991003_0018
18/01/27 05:44:01 INFO mapreduce.Job: The url to track the job: http://cdh-vm.dbaglobe.com:8088/proxy/application_1517023991003_0018/
18/01/27 05:44:01 INFO mapreduce.Job: Running job: job_1517023991003_0018
18/01/27 05:44:08 INFO mapreduce.Job: Job job_1517023991003_0018 running in uber mode : false
18/01/27 05:44:08 INFO mapreduce.Job:  map 0% reduce 0%
18/01/27 05:44:16 INFO mapreduce.Job:  map 25% reduce 0%
18/01/27 05:44:22 INFO mapreduce.Job:  map 50% reduce 0%
18/01/27 05:44:28 INFO mapreduce.Job:  map 75% reduce 0%
18/01/27 05:44:34 INFO mapreduce.Job:  map 100% reduce 0%
18/01/27 05:44:35 INFO mapreduce.Job: Job job_1517023991003_0018 completed successfully
18/01/27 05:44:35 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=695328
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=10241715
HDFS: Number of bytes written=0
HDFS: Number of read operations=19
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters 
Launched map tasks=4
Data-local map tasks=4
Total time spent by all maps in occupied slots (ms)=20479
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=20479
Total vcore-milliseconds taken by all map tasks=20479
Total megabyte-milliseconds taken by all map tasks=31455744
Map-Reduce Framework
Map input records=300024
Map output records=300024
Input split bytes=711
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=332
CPU time spent (ms)=15020
Physical memory (bytes) snapshot=1057984512
Virtual memory (bytes) snapshot=11192446976
Total committed heap usage (bytes)=862453760
File Input Format Counters 
Bytes Read=0
File Output Format Counters 
Bytes Written=0
18/01/27 05:44:35 INFO mapreduce.ExportJobBase: Transferred 9.7673 MB in 37.4601 seconds (266.9952 KB/sec)
18/01/27 05:44:35 INFO mapreduce.ExportJobBase: Exported 300024 records.

MariaDB [employees]> select count(*) from current_dept_emp2;
+----------+
| count(*) |
+----------+
|   300024 |
+----------+
1 row in set (0.09 sec)