Use sqoop to perform incremental import (--check-column=id --incremental=append --last-value=)

[donghua@cdh-vm ~]$ mysql -u employee_user -ppassword -D employees

MariaDB [employees]> create table t1 (id int primary key, c1 varchar(10));

MariaDB [employees]> insert into t1 values(1,'a'),(2,'b');

MariaDB [employees]> select * from t1;
+----+------+
| id | c1 |
+----+------+
| 1 | a |
| 2 | b |
+----+------+
2 rows in set (0.00 sec)

[donghua@cdh-vm ~]$ beeline -u jdbc:hive2://localhost:10000/default -n donghua --silent=true
0: jdbc:hive2://localhost:10000/default> create table employees.t1(id int, c1 varchar(10));

[donghua@cdh-vm ~]$ sqoop import --connect jdbc:mysql://cdh-vm.dbaglobe.com/employees \
> --username employee_user --password password --table t1 \
> --split-by=id --hive-import --hive-table=employees.t1 \
~~> --warehouse-dir=/user/hive/warehouse \~~
> --check-column=id --incremental=append --last-value=0
Warning: /opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/01/27 03:54:38 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.1
18/01/27 03:54:38 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/01/27 03:54:38 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
18/01/27 03:54:38 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
18/01/27 03:54:38 WARN tool.BaseSqoopTool: It seems that you're doing hive import directly into default
18/01/27 03:54:38 WARN tool.BaseSqoopTool: hive warehouse directory which is not supported. Sqoop is
18/01/27 03:54:38 WARN tool.BaseSqoopTool: firstly importing data into separate directory and then
18/01/27 03:54:38 WARN tool.BaseSqoopTool: inserting data into hive. Please consider removing
18/01/27 03:54:38 WARN tool.BaseSqoopTool: --target-dir or --warehouse-dir into /user/hive/warehouse in
18/01/27 03:54:38 WARN tool.BaseSqoopTool: case that you will detect any issues.
18/01/27 03:54:38 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
18/01/27 03:54:38 INFO tool.CodeGenTool: Beginning code generation
18/01/27 03:54:38 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `t1` AS t LIMIT 1
18/01/27 03:54:38 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `t1` AS t LIMIT 1
18/01/27 03:54:38 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce
Note: /tmp/sqoop-donghua/compile/1941b9efeafd888916e872561fa71b1d/t1.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/01/27 03:54:40 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-donghua/compile/1941b9efeafd888916e872561fa71b1d/t1.jar
18/01/27 03:54:41 INFO tool.ImportTool: Maximal id query for free form incremental import: SELECT MAX(`id`) FROM `t1`
18/01/27 03:54:41 INFO tool.ImportTool: Incremental import based on column `id`
18/01/27 03:54:41 INFO tool.ImportTool: Lower bound value: 0
18/01/27 03:54:41 INFO tool.ImportTool: Upper bound value: 2
18/01/27 03:54:41 WARN manager.MySQLManager: It looks like you are importing from mysql.
18/01/27 03:54:41 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
18/01/27 03:54:41 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
18/01/27 03:54:41 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
18/01/27 03:54:41 INFO mapreduce.ImportJobBase: Beginning import of t1
18/01/27 03:54:41 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
18/01/27 03:54:41 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
18/01/27 03:54:41 INFO client.RMProxy: Connecting to ResourceManager at cdh-vm.dbaglobe.com/192.168.56.10:8032
18/01/27 03:54:46 INFO db.DBInputFormat: Using read commited transaction isolation
18/01/27 03:54:46 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`id`), MAX(`id`) FROM `t1` WHERE ( `id` > 0 AND `id` <= 2 )
18/01/27 03:54:46 INFO db.IntegerSplitter: Split size: 0; Num splits: 4 from: 1 to: 2
18/01/27 03:54:46 INFO mapreduce.JobSubmitter: number of splits:2
18/01/27 03:54:47 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1517023991003_0009
18/01/27 03:54:47 INFO impl.YarnClientImpl: Submitted application application_1517023991003_0009
18/01/27 03:54:47 INFO mapreduce.Job: The url to track the job: http://cdh-vm.dbaglobe.com:8088/proxy/application_1517023991003_0009/
18/01/27 03:54:47 INFO mapreduce.Job: Running job: job_1517023991003_0009
18/01/27 03:54:54 INFO mapreduce.Job: Job job_1517023991003_0009 running in uber mode : false
18/01/27 03:54:54 INFO mapreduce.Job: map 0% reduce 0%
18/01/27 03:55:02 INFO mapreduce.Job: map 50% reduce 0%
18/01/27 03:55:06 INFO mapreduce.Job: map 100% reduce 0%
18/01/27 03:55:07 INFO mapreduce.Job: Job job_1517023991003_0009 completed successfully
18/01/27 03:55:07 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=350308
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=197
HDFS: Number of bytes written=8
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Job Counters
Launched map tasks=2
Other local map tasks=2
Total time spent by all maps in occupied slots (ms)=7843
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=7843
Total vcore-milliseconds taken by all map tasks=7843
Total megabyte-milliseconds taken by all map tasks=12046848
Map-Reduce Framework
Map input records=2
Map output records=2
Input split bytes=197
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=110
CPU time spent (ms)=1970
Physical memory (bytes) snapshot=413765632
Virtual memory (bytes) snapshot=5572857856
Total committed heap usage (bytes)=402653184
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=8
18/01/27 03:55:07 INFO mapreduce.ImportJobBase: Transferred 8 bytes in 26.2002 seconds (0.3053 bytes/sec)
18/01/27 03:55:07 INFO mapreduce.ImportJobBase: Retrieved 2 records.
18/01/27 03:55:07 INFO util.AppendUtils: Creating missing output directory - t1
18/01/27 03:55:07 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `t1` AS t LIMIT 1
18/01/27 03:55:07 INFO hive.HiveImport: Loading uploaded data into Hive

Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/jars/hive-common-1.1.0-cdh5.13.1.jar!/hive-log4j.properties
OK
Time taken: 2.037 seconds
Loading data to table employees.t1
Table employees.t1 stats: [numFiles=2, totalSize=8]
OK
Time taken: 0.646 seconds

0: jdbc:hive2://localhost:10000/default> select * from employees.t1;
+--------+--------+--+
| t1.id | t1.c1 |
+--------+--------+--+
| 1 | a |
| 2 | b |
+--------+--------+--+

MariaDB [employees]> insert into t1 values(3,'a'),(4,'b');
Query OK, 2 rows affected (0.00 sec)
Records: 2 Duplicates: 0 Warnings: 0

[donghua@cdh-vm ~]$ sqoop import --connect jdbc:mysql://cdh-vm.dbaglobe.com/employees --username employee_user --password password --table t1 --split-by=id --hive-import --hive-table=employees.t1 --warehouse-dir=/user/hive/warehouse --check-column=id --incremental=append --last-value=2
Warning: /opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/01/27 04:11:31 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.1
18/01/27 04:11:31 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/01/27 04:11:31 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
18/01/27 04:11:31 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
18/01/27 04:11:31 WARN tool.BaseSqoopTool: It seems that you're doing hive import directly into default
18/01/27 04:11:31 WARN tool.BaseSqoopTool: hive warehouse directory which is not supported. Sqoop is
18/01/27 04:11:31 WARN tool.BaseSqoopTool: firstly importing data into separate directory and then
18/01/27 04:11:31 WARN tool.BaseSqoopTool: inserting data into hive. Please consider removing
18/01/27 04:11:31 WARN tool.BaseSqoopTool: --target-dir or --warehouse-dir into /user/hive/warehouse in
18/01/27 04:11:31 WARN tool.BaseSqoopTool: case that you will detect any issues.
18/01/27 04:11:31 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
18/01/27 04:11:31 INFO tool.CodeGenTool: Beginning code generation
18/01/27 04:11:31 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `t1` AS t LIMIT 1
18/01/27 04:11:31 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `t1` AS t LIMIT 1
18/01/27 04:11:31 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce
Note: /tmp/sqoop-donghua/compile/80c2f1f6c1f1b6c4b9fca928aa6353a8/t1.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/01/27 04:11:33 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-donghua/compile/80c2f1f6c1f1b6c4b9fca928aa6353a8/t1.jar
18/01/27 04:11:34 INFO tool.ImportTool: Maximal id query for free form incremental import: SELECT MAX(`id`) FROM `t1`
18/01/27 04:11:34 INFO tool.ImportTool: Incremental import based on column `id`
18/01/27 04:11:34 INFO tool.ImportTool: Lower bound value: 2
18/01/27 04:11:34 INFO tool.ImportTool: Upper bound value: 4
18/01/27 04:11:34 WARN manager.MySQLManager: It looks like you are importing from mysql.
18/01/27 04:11:34 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
18/01/27 04:11:34 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
18/01/27 04:11:34 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
18/01/27 04:11:34 INFO mapreduce.ImportJobBase: Beginning import of t1
18/01/27 04:11:34 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
18/01/27 04:11:34 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
18/01/27 04:11:34 INFO client.RMProxy: Connecting to ResourceManager at cdh-vm.dbaglobe.com/192.168.56.10:8032
18/01/27 04:11:38 INFO db.DBInputFormat: Using read commited transaction isolation
18/01/27 04:11:38 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`id`), MAX(`id`) FROM `t1` WHERE ( `id` > 2 AND `id` <= 4 )
18/01/27 04:11:38 INFO db.IntegerSplitter: Split size: 0; Num splits: 4 from: 3 to: 4
18/01/27 04:11:38 INFO mapreduce.JobSubmitter: number of splits:2
18/01/27 04:11:38 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1517023991003_0012
18/01/27 04:11:38 INFO impl.YarnClientImpl: Submitted application application_1517023991003_0012
18/01/27 04:11:38 INFO mapreduce.Job: The url to track the job: http://cdh-vm.dbaglobe.com:8088/proxy/application_1517023991003_0012/
18/01/27 04:11:38 INFO mapreduce.Job: Running job: job_1517023991003_0012
18/01/27 04:11:45 INFO mapreduce.Job: Job job_1517023991003_0012 running in uber mode : false
18/01/27 04:11:45 INFO mapreduce.Job: map 0% reduce 0%
18/01/27 04:11:51 INFO mapreduce.Job: map 50% reduce 0%
18/01/27 04:11:57 INFO mapreduce.Job: map 100% reduce 0%
18/01/27 04:11:57 INFO mapreduce.Job: Job job_1517023991003_0012 completed successfully
18/01/27 04:11:57 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=350308
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=197
HDFS: Number of bytes written=8
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Job Counters
Launched map tasks=2
Other local map tasks=2
Total time spent by all maps in occupied slots (ms)=7531
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=7531
Total vcore-milliseconds taken by all map tasks=7531
Total megabyte-milliseconds taken by all map tasks=11567616
Map-Reduce Framework
Map input records=2
Map output records=2
Input split bytes=197
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=114
CPU time spent (ms)=1800
Physical memory (bytes) snapshot=403120128
Virtual memory (bytes) snapshot=5573816320
Total committed heap usage (bytes)=359137280
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=8
18/01/27 04:11:57 INFO mapreduce.ImportJobBase: Transferred 8 bytes in 23.359 seconds (0.3425 bytes/sec)
18/01/27 04:11:57 INFO mapreduce.ImportJobBase: Retrieved 2 records.
18/01/27 04:11:57 INFO util.AppendUtils: Creating missing output directory - t1
18/01/27 04:11:57 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `t1` AS t LIMIT 1
18/01/27 04:11:57 INFO hive.HiveImport: Loading uploaded data into Hive

Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/jars/hive-common-1.1.0-cdh5.13.1.jar!/hive-log4j.properties
OK
Time taken: 1.853 seconds
Loading data to table employees.t1
Table employees.t1 stats: [numFiles=4, numRows=0, totalSize=16, rawDataSize=0]
OK
Time taken: 0.603 seconds

0: jdbc:hive2://localhost:10000/default> select * from employees.t1;
+--------+--------+--+
| t1.id | t1.c1 |
+--------+--------+--+
| 1 | a |
| 3 | a |
| 2 | b |
| 4 | b |
+--------+--------+--+

[donghua@cdh-vm ~]$ sqoop job --create emp_t1_incr -- import --connect jdbc:mysql://cdh-vm.dbaglobe.com/employees --username employee_user --password password --table t1 --split-by=id --hive-import --hive-table=employees.t1 --warehouse-dir=/user/hive/warehouse --check-column=id --incremental=append --last-value=4
Warning: /opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/01/27 04:21:32 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.1
18/01/27 04:21:32 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/01/27 04:21:32 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
18/01/27 04:21:32 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
18/01/27 04:21:32 WARN tool.BaseSqoopTool: It seems that you're doing hive import directly into default
18/01/27 04:21:32 WARN tool.BaseSqoopTool: hive warehouse directory which is not supported. Sqoop is
18/01/27 04:21:32 WARN tool.BaseSqoopTool: firstly importing data into separate directory and then
18/01/27 04:21:32 WARN tool.BaseSqoopTool: inserting data into hive. Please consider removing
18/01/27 04:21:32 WARN tool.BaseSqoopTool: --target-dir or --warehouse-dir into /user/hive/warehouse in
18/01/27 04:21:32 WARN tool.BaseSqoopTool: case that you will detect any issues.

[donghua@cdh-vm ~]$ sqoop job --list
Warning: /opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/01/27 04:30:13 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.1
Available jobs:
emp_t1_incr

[donghua@cdh-vm ~]$ sqoop job --show emp_t1_incr
Warning: /opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/01/27 04:35:40 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.1
Enter password:
Job: emp_t1_incr
Tool: import
Options:
----------------------------
verbose = false
hcatalog.drop.and.create.table = false
incremental.last.value = 4
db.connect.string = jdbc:mysql://cdh-vm.dbaglobe.com/employees
codegen.output.delimiters.escape = 0
codegen.output.delimiters.enclose.required = false
codegen.input.delimiters.field = 0
mainframe.input.dataset.type = p
split.limit = null
hbase.create.table = false
db.require.password = true
hdfs.append.dir = true
db.table = t1
codegen.input.delimiters.escape = 0
accumulo.create.table = false
import.fetch.size = null
codegen.input.delimiters.enclose.required = false
db.username = employee_user
reset.onemapper = false
codegen.output.delimiters.record = 10
import.max.inline.lob.size = 16777216
sqoop.throwOnError = false
hbase.bulk.load.enabled = false
hcatalog.create.table = false
db.clear.staging.table = false
incremental.col = id
codegen.input.delimiters.record = 0
hdfs.warehouse.dir = /user/hive/warehouse
enable.compression = false
hive.overwrite.table = false
hive.import = true
codegen.input.delimiters.enclose = 0
hive.table.name = employees.t1
accumulo.batch.size = 10240000
hive.drop.delims = false
customtool.options.jsonmap = {}
codegen.output.delimiters.enclose = 0
hdfs.delete-target.dir = false
codegen.output.dir = .
codegen.auto.compile.dir = true
relaxed.isolation = false
mapreduce.num.mappers = 4
accumulo.max.latency = 5000
import.direct.split.size = 0
sqlconnection.metadata.transaction.isolation.level = 2
codegen.output.delimiters.field = 1
export.new.update = UpdateOnly
incremental.mode = AppendRows
hdfs.file.format = TextFile
sqoop.oracle.escaping.disabled = true
codegen.compile.dir = /tmp/sqoop-donghua/compile/e7212eb92686a1486fa1cd44a6c9afc7
direct.import = false
temporary.dirRoot = _sqoop
db.split.column = id
hive.fail.table.exists = false
db.batch = false

[donghua@cdh-vm ~]$ sqoop job --exec emp_t1_incr
Warning: /opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/01/27 04:38:01 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.1
Enter password:
18/01/27 04:38:06 WARN tool.BaseSqoopTool: It seems that you're doing hive import directly into default
18/01/27 04:38:06 WARN tool.BaseSqoopTool: hive warehouse directory which is not supported. Sqoop is
18/01/27 04:38:06 WARN tool.BaseSqoopTool: firstly importing data into separate directory and then
18/01/27 04:38:06 WARN tool.BaseSqoopTool: inserting data into hive. Please consider removing
18/01/27 04:38:06 WARN tool.BaseSqoopTool: --target-dir or --warehouse-dir into /user/hive/warehouse in
18/01/27 04:38:06 WARN tool.BaseSqoopTool: case that you will detect any issues.
18/01/27 04:38:06 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
18/01/27 04:38:06 INFO tool.CodeGenTool: Beginning code generation
18/01/27 04:38:07 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `t1` AS t LIMIT 1
18/01/27 04:38:07 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `t1` AS t LIMIT 1
18/01/27 04:38:07 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce
Note: /tmp/sqoop-donghua/compile/e3c397f1a5469f870ba19e95b80a66a2/t1.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/01/27 04:38:08 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-donghua/compile/e3c397f1a5469f870ba19e95b80a66a2/t1.jar
18/01/27 04:38:09 INFO tool.ImportTool: Maximal id query for free form incremental import: SELECT MAX(`id`) FROM `t1`
18/01/27 04:38:09 INFO tool.ImportTool: Incremental import based on column `id`
18/01/27 04:38:09 INFO tool.ImportTool: Lower bound value: 4
18/01/27 04:38:09 INFO tool.ImportTool: Upper bound value: 6
18/01/27 04:38:09 WARN manager.MySQLManager: It looks like you are importing from mysql.
18/01/27 04:38:09 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
18/01/27 04:38:09 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
18/01/27 04:38:09 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
18/01/27 04:38:09 INFO mapreduce.ImportJobBase: Beginning import of t1
18/01/27 04:38:09 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
18/01/27 04:38:09 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
18/01/27 04:38:09 INFO client.RMProxy: Connecting to ResourceManager at cdh-vm.dbaglobe.com/192.168.56.10:8032
18/01/27 04:38:12 INFO db.DBInputFormat: Using read commited transaction isolation
18/01/27 04:38:12 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`id`), MAX(`id`) FROM `t1` WHERE ( `id` > 4 AND `id` <= 6 )
18/01/27 04:38:12 INFO db.IntegerSplitter: Split size: 0; Num splits: 4 from: 5 to: 6
18/01/27 04:38:12 INFO mapreduce.JobSubmitter: number of splits:2
18/01/27 04:38:12 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1517023991003_0013
18/01/27 04:38:12 INFO impl.YarnClientImpl: Submitted application application_1517023991003_0013
18/01/27 04:38:12 INFO mapreduce.Job: The url to track the job: http://cdh-vm.dbaglobe.com:8088/proxy/application_1517023991003_0013/
18/01/27 04:38:12 INFO mapreduce.Job: Running job: job_1517023991003_0013
18/01/27 04:38:20 INFO mapreduce.Job: Job job_1517023991003_0013 running in uber mode : false
18/01/27 04:38:20 INFO mapreduce.Job: map 0% reduce 0%
18/01/27 04:38:26 INFO mapreduce.Job: map 50% reduce 0%
18/01/27 04:38:31 INFO mapreduce.Job: map 100% reduce 0%
18/01/27 04:38:32 INFO mapreduce.Job: Job job_1517023991003_0013 completed successfully
18/01/27 04:38:33 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=351166
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=197
HDFS: Number of bytes written=8
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Job Counters
Launched map tasks=2
Other local map tasks=2
Total time spent by all maps in occupied slots (ms)=8071
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=8071
Total vcore-milliseconds taken by all map tasks=8071
Total megabyte-milliseconds taken by all map tasks=12397056
Map-Reduce Framework
Map input records=2
Map output records=2
Input split bytes=197
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=136
CPU time spent (ms)=1820
Physical memory (bytes) snapshot=404738048
Virtual memory (bytes) snapshot=5573140480
Total committed heap usage (bytes)=354942976
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=8
18/01/27 04:38:33 INFO mapreduce.ImportJobBase: Transferred 8 bytes in 23.4364 seconds (0.3413 bytes/sec)
18/01/27 04:38:33 INFO mapreduce.ImportJobBase: Retrieved 2 records.
18/01/27 04:38:33 INFO util.AppendUtils: Creating missing output directory - t1
18/01/27 04:38:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `t1` AS t LIMIT 1
18/01/27 04:38:33 INFO hive.HiveImport: Loading uploaded data into Hive

Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/jars/hive-common-1.1.0-cdh5.13.1.jar!/hive-log4j.properties
OK
Time taken: 1.95 seconds
Loading data to table employees.t1
Table employees.t1 stats: [numFiles=6, numRows=0, totalSize=24, rawDataSize=0]
OK
Time taken: 0.664 seconds

[donghua@cdh-vm ~]$ sqoop job --show emp_t1_incr
Warning: /opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/01/27 04:38:50 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.1
Enter password:
Job: emp_t1_incr
Tool: import
Options:
----------------------------
verbose = false
hcatalog.drop.and.create.table = false
incremental.last.value = 6
db.connect.string = jdbc:mysql://cdh-vm.dbaglobe.com/employees
codegen.output.delimiters.escape = 0
codegen.output.delimiters.enclose.required = false
codegen.input.delimiters.field = 0
mainframe.input.dataset.type = p
split.limit = null
hbase.create.table = false
db.require.password = true
hdfs.append.dir = true
db.table = t1
codegen.input.delimiters.escape = 0
accumulo.create.table = false
import.fetch.size = null
codegen.input.delimiters.enclose.required = false
db.username = employee_user
reset.onemapper = false
codegen.output.delimiters.record = 10
import.max.inline.lob.size = 16777216
sqoop.throwOnError = false
hbase.bulk.load.enabled = false
hcatalog.create.table = false
db.clear.staging.table = false
incremental.col = id
codegen.input.delimiters.record = 0
hdfs.warehouse.dir = /user/hive/warehouse
enable.compression = false
hive.overwrite.table = false
hive.import = true
codegen.input.delimiters.enclose = 0
hive.table.name = employees.t1
accumulo.batch.size = 10240000
hive.drop.delims = false
customtool.options.jsonmap = {}
codegen.output.delimiters.enclose = 0
hdfs.delete-target.dir = false
codegen.output.dir = .
codegen.auto.compile.dir = true
relaxed.isolation = false
mapreduce.num.mappers = 4
accumulo.max.latency = 5000
import.direct.split.size = 0
sqlconnection.metadata.transaction.isolation.level = 2
codegen.output.delimiters.field = 1
export.new.update = UpdateOnly
incremental.mode = AppendRows
hdfs.file.format = TextFile
sqoop.oracle.escaping.disabled = true
codegen.compile.dir = /tmp/sqoop-donghua/compile/39496d079794ae53a008a2da9cd2ac4a
direct.import = false
temporary.dirRoot = _sqoop
db.split.column = id
hive.fail.table.exists = false
db.batch = false

Donghua's Blog - DBAGlobe

Saturday, January 27, 2018

Use sqoop to perform incremental import (--check-column=id --incremental=append --last-value=)

No comments:

Post a Comment

Disclaimer

Labels

Blog Archive

Search This Website

Tweets by Me