Donghua's Blog - DBAGlobe: June 2016

Thursday, June 30, 2016

Different Key orders make the document different in MongoDB?

[mongod@localhost ~]$ mongo test

MongoDB shell version: 3.2.6

connecting to: test

> db.mycol.drop()

False

> db.mycol.insert({'_id':1,x:1,y:2,z:3})

WriteResult({ "nInserted" : 1 })

> db.mycol.insert({'_id':3,y:2,z:3,x:1})

WriteResult({ "nInserted" : 1 })

> db.mycol.find().pretty()

{ "_id" : 1, "x" : 1, "y" : 2, "z" : 3 }

{ "_id" : 3, "y" : 2, "z" : 3, "x" : 1 }

> db.mycol.createIndex({x:1,y:1,z:1},{unique:true})

{

"ok" : 0,

"errmsg" : "E11000 duplicate key error collection: test.mycol index: x_1_y_1_z_1 dup key: { : 1.0, : 2.0, : 3.0 }",

"code" : 11000

}

> db.mycol.find({x:1,y:2,z:3})

{ "_id" : 1, "x" : 1, "y" : 2, "z" : 3 }

{ "_id" : 3, "y" : 2, "z" : 3, "x" : 1 }

> db.mycol.find({x:1,y:2,z:3},{x:1,y:1,z:1})

{ "_id" : 1, "x" : 1, "y" : 2, "z" : 3 }

{ "_id" : 3, "y" : 2, "z" : 3, "x" : 1 }

// some proper formatting by re-arranging the column/key names

> db.mycol.aggregate([{$match:{x:1,y:2,z:3}},{$project:{xx:"$x",yy:"$y",zz:"$z"}},{$project:{x:"$xx",y:"$yy",z:"$zz"}}])

{ "_id" : 1, "x" : 1, "y" : 2, "z" : 3 }

{ "_id" : 3, "x" : 1, "y" : 2, "z" : 3 }

Thursday, June 23, 2016

Import large JSON document into mongo DB

[mongod@localhost week5]$ ls -lh posts.json
-rw-r--r--. 1 mongod mongod 34M Jun 21 12:16 posts.json

[mongod@localhost week5]$ mongoimport -d blog -c posts --drop posts.json
2016-06-23T12:46:05.738+0800 connected to: localhost
2016-06-23T12:46:05.739+0800 dropping: blog.posts
2016-06-23T12:46:07.758+0800 Failed: lost connection to server
2016-06-23T12:46:07.758+0800 imported 0 documents

Import failed with exception below:

2016-06-23T12:46:07.321+0800 D COMMAND [conn2] run command admin.$cmd { getnonce: 1 }
2016-06-23T12:46:07.321+0800 I COMMAND [conn2] command admin.$cmd command: getnonce { getnonce: 1 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:65 locks:{} protocol:op_query 0ms
2016-06-23T12:46:07.724+0800 I - [conn2] Assertion: 10334:BSONObj size: 33581101 (0x200682D) is invalid. Size must be between 0 and 16793600(16MB) First element: insert: "posts"
2016-06-23T12:46:07.757+0800 I CONTROL [conn2]
0x1315ab2 0x12b38c8 0x12a01a8 0x12a025c 0x9da679 0xbd59f8 0xa29b3d 0xcd2172 0xcd4aa6 0x9b79fc 0x12c2fad 0x7ff585e7adc5 0x7ff585baac4d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"F15AB2","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"EB38C8","s":"_ZN5mongo10logContextEPKc"},{"b":"400000","o":"EA01A8","s":"_ N5mongo11msgassertedEiPKc"},{"b":"400000","o":"EA025C"},{"b":"400000","o":"5DA679","s":"_ZNK5mongo7BSONObj14_assertInvalidEv"},{"b":"400000","o":"7D59F8","s":"_ZN5mongo9DbM ssage9nextJsObjEv"},{"b":"400000","o":"629B3D","s":"_ZN5mongo12QueryMessageC2ERNS_9DbMessageE"},{"b":"400000","o":"8D2172"},{"b":"400000","o":"8D4AA6","s":"_ZN5mongo16assem leResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE"},{"b":"400000","o":"5B79FC","s":"_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_2 AbstractMessagingPortE"},{"b":"400000","o":"EC2FAD","s":"_ZN5mongo17PortMessageServer17handleIncomingMsgEPv"},{"b":"7FF585E73000","o":"7DC5"},{"b":"7FF585AB4000","o":"F6C4D ,"s":"clone"}],"processInfo":{ "mongodbVersion" : "3.2.6", "gitVersion" : "05552b562c7a0b3143a729aaa0838e558dc49b25", "compiledModules" : [], "uname" : { "sysname" : "Linux , "release" : "3.8.13-118.6.2.el7uek.x86_64", "version" : "#2 SMP Thu May 19 13:15:51 PDT 2016", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "build d" : "398421CA3C84F69951FDE89431516DCB02070495" }, { "b" : "7FFE03EBA000", "elfType" : 3, "buildId" : "DB1547CFFDE01D055FAC6A7F583CCC31E1BEBA83" }, { "b" : "7FF586D9B000", path" : "/lib64/libssl.so.10", "elfType" : 3, "buildId" : "686A25D0A83D002183C835FA5694A8110C78D3BC" }, { "b" : "7FF5869B3000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "D5775E07029F79874AEEC477778DB1C40E91B84D" }, { "b" : "7FF5867AB000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "CCC119FE8F4D8D262AFC67DDC36 2F266F30586F" }, { "b" : "7FF5865A7000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "DECA756A631284EEE020E3AE1CADCE9BDFB9914D" }, { "b" : "7FF5862A5000", "pat " : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "F129A8EBE4CD7D042019BFACD9A58677C18F2AE3" }, { "b" : "7FF58608F000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "bu ldId" : "97D5E2F5739B715C3A0EC9F95F7336E232346CA8" }, { "b" : "7FF585E73000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "0D3AFB2828490FDE7B2BE3BEEBACBDB D94A999F" }, { "b" : "7FF585AB4000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "725A7BE34325FB744D49930F508FBF831E573C3E" }, { "b" : "7FF587008000", "path" : /lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "09E1BB4D034C7263810A41100647068858A7ECB6" }, { "b" : "7FF585868000", "path" : "/lib64/libgssapi_krb5.so.2", "elfTy e" : 3, "buildId" : "E993A70A266D64269D29914AD59BE6FF54B0F531" }, { "b" : "7FF585583000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "EA78218A5AAEAB292259A0 1D59A9DC9EC95AF60" }, { "b" : "7FF58537F000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "5F93ABF89BFAD8DF5AB77D741156BDBBB77FC692" }, { "b" : "7FF58514D 00", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "6D3C012960EBE87D267A66BE09DBFAA9E7DD6801" }, { "b" : "7FF584F37000", "path" : "/lib64/libz.so.1", "elfT pe" : 3, "buildId" : "FC37913FB197B822BCDBF3697D061E248698CEC1" }, { "b" : "7FF584D28000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "8C509F8EACC7FC 178B8D76587252291AC98E663" }, { "b" : "7FF584B24000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "3B0403FDFDE98D24FA2B5EA33202259FADF9E9A1" }, { "b" : " FF58490A000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "DDBBFDDB9C3C33C6349711D1ED27218E5271757B" }, { "b" : "7FF5846E6000", "path" : "/lib64/libselinux so.1", "elfType" : 3, "buildId" : "DF5DEAA29286E350B0106E13E70CEB0E7D0B6562" }, { "b" : "7FF584485000", "path" : "/lib64/libpcre.so.1", "elfType" : 3, "buildId" : "1DEC80B8 143A7960489C7B7AA8DDF182D6E2BE6" }, { "b" : "7FF584260000", "path" : "/lib64/liblzma.so.5", "elfType" : 3, "buildId" : "61D7D46225E85F144221E1424B87FBF3CB2B9D3F" } ] }}
mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x1315ab2]
mongod(_ZN5mongo10logContextEPKc+0x138) [0x12b38c8]
mongod(_ZN5mongo11msgassertedEiPKc+0x88) [0x12a01a8]
mongod(+0xEA025C) [0x12a025c]
mongod(_ZNK5mongo7BSONObj14_assertInvalidEv+0x3B9) [0x9da679]
mongod(_ZN5mongo9DbMessage9nextJsObjEv+0x118) [0xbd59f8]
mongod(_ZN5mongo12QueryMessageC2ERNS_9DbMessageE+0x5D) [0xa29b3d]
mongod(+0x8D2172) [0xcd2172]
mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x696) [0xcd4aa6]
mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortE+0xEC) [0x9b79fc]
mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x26D) [0x12c2fad]
libpthread.so.0(+0x7DC5) [0x7ff585e7adc5]
libc.so.6(clone+0x6D) [0x7ff585baac4d]
----- END BACKTRACE -----
2016-06-23T12:46:07.757+0800 I NETWORK [conn2] AssertionException handling request, closing client connection: 10334 BSONObj size: 33581101 (0x200682D) is invalid. Size mu t be between 0 and 16793600(16MB) First element: insert: "posts"
2016-06-23T12:46:07.772+0800 D NETWORK [conn1] SocketException: remote: 127.0.0.1:62169 error: 9001 socket exception [CLOSED] server [127.0.0.1:62169]
2016-06-23T12:46:07.772+0800 I NETWORK [conn1] end connection 127.0.0.1:62169 (0 connections now open)
2016-06-23T12:46:33.529+0800 D - [PeriodicTaskRunner] cleaning up unused lock buckets of the global lock manager

Workaround 1: use multiple workers. (below example using 4 workers, instead of 1 worker handle 34MB document)

[mongod@localhost week5]$ mongoimport -d blog -c posts --drop -j 4 posts.json

2016-06-23T12:52:01.339+0800 connected to: localhost

2016-06-23T12:52:01.340+0800 dropping: blog.posts

2016-06-23T12:52:03.346+0800 imported 1000 documents

Workaround 2: use hidden option "batchSize"

[mongod@localhost week5]$ mongoimport -d blog -c posts --drop --batchSize 1 posts.json
2016-06-23T12:53:19.703+0800 connected to: localhost
2016-06-23T12:53:19.705+0800 dropping: blog.posts
2016-06-23T12:53:22.698+0800 [######################..] blog.posts 32.4 MB/33.9 MB (95.7%)
2016-06-23T12:53:22.862+0800 [########################] blog.posts 33.9 MB/33.9 MB (100.0%)
2016-06-23T12:53:22.862+0800 imported 1000 documents

[mongod@localhost week5]$ mongoimport -d blog -c posts --drop --batchSize 100 posts.json
2016-06-23T12:55:15.239+0800 connected to: localhost
2016-06-23T12:55:15.241+0800 dropping: blog.posts
2016-06-23T12:55:17.085+0800 imported 1000 documents

Thursday, June 2, 2016

Point in time recovery - MongoDB

Start the mongod server

mongod -f /home/m202/week2/mongod.conf
donghua@vmxdb01:~/week2$ ls -l backupDB/
BlogColl.bson BlogColl.metadata.json system.indexes.bson

Restore the backup

donghua@vmxdb01:~/week2$ mongorestore --collection BlogColl --db backupDB backupDB/BlogColl.bson --port 30001
2016-06-02T13:38:15.341+0100 checking for collection data in backupDB/BlogColl.bson
2016-06-02T13:38:15.345+0100 reading metadata file from backupDB/BlogColl.metadata.json
2016-06-02T13:38:15.346+0100 restoring backupDB.BlogColl from file backupDB/BlogColl.bson
2016-06-02T13:38:18.343+0100 [#########...............] backupDB.BlogColl 20.8 MB/52.5 MB (39.7%)
2016-06-02T13:38:21.344+0100 [############............] backupDB.BlogColl 28.0 MB/52.5 MB (53.4%)
2016-06-02T13:38:24.356+0100 [################........] backupDB.BlogColl 36.4 MB/52.5 MB (69.4%)
2016-06-02T13:38:27.352+0100 [#####################...] backupDB.BlogColl 46.9 MB/52.5 MB (89.3%)
2016-06-02T13:38:29.801+0100 restoring indexes for collection backupDB.BlogColl from metadata
2016-06-02T13:38:29.802+0100 finished restoring backupDB.BlogColl (604800 documents)
2016-06-02T13:38:29.802+0100 done
BackupTest:PRIMARY> use local
switched to db local
BackupTest:PRIMARY> show collections
me
oplog.rs
startup_log
system.indexes
system.replset
BackupTest:PRIMARY> use backupDB
switched to db backupDB
BackupTest:PRIMARY> show collections
BlogColl
system.indexes

create a mongodump of your oplog collection via this command:

donghua@vmxdb01:~/week2$ mongodump -d local -c oplog.rs -o oplogD --port 30001
2016-06-02T13:48:57.159+0100 writing local.oplog.rs to oplogD/local/oplog.rs.bson
2016-06-02T13:49:00.161+0100 [###############.........] local.oplog.rs 736424/1111069 (66.3%)
2016-06-02T13:49:01.845+0100 writing local.oplog.rs metadata to oplogD/local/oplog.rs.metadata.json
2016-06-02T13:49:01.847+0100 done dumping local.oplog.rs (1111069 documents)

Find the logical corruption point in time

donghua@vmxdb01:~/week2$ mkdir oplogR
donghua@vmxdb01:~/week2$ mv oplogD/local/oplog.rs.bson oplogR/oplog.bson
donghua@vmxdb01:~/week2$ bsondump oplogR/oplog.bson|grep drop
{"h":{"$numberLong":"-4262957146204779874"},"ns":"backupDB.$cmd","o":{"drop":"BlogColl"},"op":"c","ts":{"$timestamp":{"t":1398778745,"i":1}},"v":2}
2016-06-02T13:55:11.407+0100 1111069 objects found
// Alternatively you can query against the original oplog
donghua@vmxdb01:~/week2$ mongo localhost:30001/local
MongoDB shell version: 3.0.5
connecting to: localhost:30001/local
BackupTest:PRIMARY> db.oplog.rs.find({'o.drop':"BlogColl"})
{ "ts" : Timestamp(1398778745, 1), "h" : NumberLong("-4262957146204779874"), "v" : 2, "op" : "c", "ns" : "backupDB.$cmd", "o" : { "drop" : "BlogColl" } }

Replay this oplog on the restored stand-alone server BUT you will stop before this offending update operation.

donghua@vmxdb01:~/week2$ mongorestore -h localhost --port 30001 --oplogReplay --oplogLimit 1398778745:1 oplogR

2016-06-02T13:58:00.996+0100 building a list of dbs and collections to restore from oplogR dir
2016-06-02T13:58:00.998+0100 replaying oplog
2016-06-02T13:58:04.016+0100 [##......................] oplog 16.0 MB/174.0 MB (9.2%)
2016-06-02T13:58:07.039+0100 [####....................] oplog 32.0 MB/174.0 MB (18.4%)
2016-06-02T13:58:10.012+0100 [#####...................] oplog 40.0 MB/174.0 MB (23.0%)
2016-06-02T13:58:13.059+0100 [#######.................] oplog 56.0 MB/174.0 MB (32.2%)
2016-06-02T13:58:16.071+0100 [#########...............] oplog 65.7 MB/174.0 MB (37.8%)
2016-06-02T13:58:19.013+0100 [#########...............] oplog 72.0 MB/174.0 MB (41.4%)
2016-06-02T13:58:21.586+0100 done

donghua@vmxdb01:~/week2$ mongo localhost:30001/backupDB
MongoDB shell version: 3.0.5
connecting to: localhost:30001/backupDB
BackupTest:PRIMARY> db.BlogColl.find().count()
614800

Restart the mongod with an empty data directory and let MongoDB’s initial syncing

MongoDB provides two options for performing an initial sync:

Restart the mongod with an empty data directory and let MongoDB’s normal initial syncing feature restore the data. This is the more simple option but may take longer to replace the data.
See Procedures.
Restart the machine with a copy of a recent data directory from another member in the replica set. This procedure can replace the data more quickly but requires more manual steps.
See Sync by Copying Data Files from Another Member.

donghua@vmxdb01:~/week2$ mongo localhost:30001
MongoDB shell version: 3.0.5
connecting to: localhost:30001/test
Server has startup warnings:
CorruptionTest:PRIMARY> use testDB
switched to db testDB
CorruptionTest:PRIMARY> db.BlogColl.find().explain("executionStats")
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "testDB.BlogColl",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [ ]
},
"winningPlan" : {
"stage" : "COLLSCAN",
"filter" : {
"$and" : [ ]
},
"direction" : "forward"
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 100000,
"executionTimeMillis" : 552,
"totalKeysExamined" : 0,
"totalDocsExamined" : 100000,
"executionStages" : {
"stage" : "COLLSCAN",
"filter" : {
"$and" : [ ]
},
"nReturned" : 100000,
"executionTimeMillisEstimate" : 70,
"works" : 100195,
"advanced" : 100000,
"needTime" : 1,
"needFetch" : 193,
"saveState" : 969,
"restoreState" : 969,
"isEOF" : 1,
"invalidates" : 0,
"direction" : "forward",
"docsExamined" : 100000
}
},
"serverInfo" : {
"host" : "vmxdb01",
"port" : 30001,
"version" : "3.0.5",
"gitVersion" : "8bc4ae20708dbb493cb09338d9e7be6698e4a3a3"
},
"ok" : 1
}
CorruptionTest:PRIMARY>
bye
donghua@vmxdb01:~/week2$ mongo localhost:30002
MongoDB shell version: 3.0.5
connecting to: localhost:30002/test
Server has startup warnings:
CorruptionTest:SECONDARY> rs.slaveOk();
CorruptionTest:SECONDARY> use testDB;
switched to db testDB
CorruptionTest:SECONDARY> db.BlogColl.find().explain("executionStats")
2016-06-02T12:40:03.041+0100 E QUERY Error: explain failed: {
"errmsg" : "exception: BSONObj size: 1685417573 (0x64756E65) is invalid. Size must be between 0 and 16793600(16MB) First element: //drrdu/dvad\u0002string: ?type=111",
"code" : 10334,
"ok" : 0
}
at Error ()
at Function.throwOrReturn (src/mongo/shell/explainable.js:34:19)
at constructor.finish (src/mongo/shell/explain_query.js:188:36)
at DBQuery.explain (src/mongo/shell/query.js:434:25)
at (shell):1:20 at src/mongo/shell/explainable.js:34

donghua@vmxdb01:~/week2$ cd mongod-sec/
donghua@vmxdb01:~/week2/mongod-sec$ ls
journal local.0 local.1 local.ns mongod.lock storage.bson testDB.0 testDB.1 testDB.2 testDB.ns
donghua@vmxdb01:~/week2/mongod-sec$ ls -ltr
total 426012
-rw------- 1 m202 m202 16777216 May 7 2014 testDB.ns
-rw------- 1 m202 m202 67108864 May 7 2014 testDB.2
-rw------- 1 m202 m202 268435456 May 7 2014 local.1
-rw------- 1 m202 m202 33554432 May 8 2014 testDB.1
drwxr-xr-x 2 m202 m202 4096 May 8 2014 journal
-rw-rw-r-- 1 m202 m202 69 Jun 2 12:22 storage.bson
-rw------- 1 m202 m202 16777216 Jun 2 12:22 testDB.0
-rw------- 1 m202 m202 16777216 Jun 2 12:39 local.ns
-rw------- 1 m202 m202 16777216 Jun 2 12:39 local.0
-rwxr-xr-x 1 m202 m202 0 Jun 2 12:40 mongod.lock
donghua@vmxdb01:~/week2/mongod-sec$ rm -rf *

donghua@vmxdb01:~/week2/mongod-sec$ mongod --port 30002 --dbpath /mongo/wk2/mongod-sec --replSet CorruptionTest --smallfiles --oplogSize 128 --nojournal
2016-06-02T12:41:36.913+0100 I CONTROL [initandlisten] MongoDB starting : pid=4057 port=30002 dbpath=/mongo/wk2/mongod-sec 64-bit host=vmxdb01
2016-06-02T12:41:36.913+0100 I CONTROL [initandlisten]
2016-06-02T12:41:36.913+0100 I CONTROL [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'.
2016-06-02T12:41:36.913+0100 I CONTROL [initandlisten] ** We suggest setting it to 'never'
2016-06-02T12:41:36.913+0100 I CONTROL [initandlisten]
2016-06-02T12:41:36.913+0100 I CONTROL [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/defrag is 'always'.
2016-06-02T12:41:36.913+0100 I CONTROL [initandlisten] ** We suggest setting it to 'never'
2016-06-02T12:41:36.913+0100 I CONTROL [initandlisten]
2016-06-02T12:41:36.913+0100 I CONTROL [initandlisten] db version v3.0.5
2016-06-02T12:41:36.913+0100 I CONTROL [initandlisten] git version: 8bc4ae20708dbb493cb09338d9e7be6698e4a3a3
2016-06-02T12:41:36.913+0100 I CONTROL [initandlisten] build info: Linux ip-10-183-35-50 3.13.0-24-generic #46-Ubuntu SMP Thu Apr 10 19:11:08 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49
2016-06-02T12:41:36.913+0100 I CONTROL [initandlisten] allocator: tcmalloc
2016-06-02T12:41:36.913+0100 I CONTROL [initandlisten] options: { net: { port: 30002 }, replication: { oplogSizeMB: 128, replSet: "CorruptionTest" }, storage: { dbPath: "/mongo/wk2/mongod-sec", journal: { enabled: false }, mmapv1: { smallFiles: true } } }
2016-06-02T12:41:36.913+0100 I INDEX [initandlisten] allocating new ns file /mongo/wk2/mongod-sec/local.ns, filling with zeroes...
2016-06-02T12:41:37.102+0100 I STORAGE [FileAllocator] allocating new datafile /mongo/wk2/mongod-sec/local.0, filling with zeroes...
2016-06-02T12:41:37.102+0100 I STORAGE [FileAllocator] creating directory /mongo/wk2/mongod-sec/_tmp
2016-06-02T12:41:37.117+0100 I STORAGE [FileAllocator] done allocating datafile /mongo/wk2/mongod-sec/local.0, size: 16MB, took 0.012 secs
2016-06-02T12:41:37.125+0100 I REPL [initandlisten] Did not find local replica set configuration document at startup; NoMatchingDocument Did not find replica set configuration document in local.system.replset
2016-06-02T12:41:37.127+0100 I NETWORK [initandlisten] waiting for connections on port 30002
2016-06-02T12:41:37.991+0100 I NETWORK [initandlisten] connection accepted from 127.0.0.1:43789 #1 (1 connection now open)
2016-06-02T12:41:37.992+0100 I NETWORK [initandlisten] connection accepted from 127.0.0.1:43790 #2 (2 connections now open)
2016-06-02T12:41:38.010+0100 I REPL [WriteReplSetConfig] Starting replication applier threads
2016-06-02T12:41:38.011+0100 I REPL [ReplicationExecutor] New replica set config in use: { _id: "CorruptionTest", version: 3, members: [ { _id: 0, host: "127.0.0.1:30001", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 1, host: "127.0.0.1:30002", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 2, host: "127.0.0.1:30003", arbiterOnly: true, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatTimeoutSecs: 10, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 } } }
2016-06-02T12:41:38.012+0100 I REPL [ReplicationExecutor] This node is 127.0.0.1:30002 in the config
2016-06-02T12:41:38.013+0100 I REPL [ReplicationExecutor] transition to STARTUP2
2016-06-02T12:41:38.015+0100 I REPL [ReplicationExecutor] Member 127.0.0.1:30003 is now in state ARBITER
2016-06-02T12:41:38.016+0100 I REPL [ReplicationExecutor] Member 127.0.0.1:30001 is now in state PRIMARY
2016-06-02T12:41:38.017+0100 I REPL [rsSync] ******
2016-06-02T12:41:38.018+0100 I REPL [rsSync] creating replication oplog of size: 128MB...
2016-06-02T12:41:38.018+0100 I STORAGE [FileAllocator] allocating new datafile /mongo/wk2/mongod-sec/local.1, filling with zeroes...
2016-06-02T12:41:38.053+0100 I STORAGE [FileAllocator] done allocating datafile /mongo/wk2/mongod-sec/local.1, size: 256MB, took 0.033 secs
2016-06-02T12:41:38.079+0100 I REPL [rsSync] ******
2016-06-02T12:41:38.079+0100 I REPL [rsSync] initial sync pending
2016-06-02T12:41:38.080+0100 I REPL [ReplicationExecutor] syncing from: 127.0.0.1:30001
2016-06-02T12:41:38.082+0100 I REPL [rsSync] initial sync drop all databases
2016-06-02T12:41:38.082+0100 I STORAGE [rsSync] dropAllDatabasesExceptLocal 1
2016-06-02T12:41:38.083+0100 I REPL [rsSync] initial sync clone all databases
2016-06-02T12:41:38.083+0100 I REPL [rsSync] initial sync cloning db: testDB
2016-06-02T12:41:38.086+0100 I INDEX [rsSync] allocating new ns file /mongo/wk2/mongod-sec/testDB.ns, filling with zeroes...
2016-06-02T12:41:38.166+0100 I STORAGE [FileAllocator] allocating new datafile /mongo/wk2/mongod-sec/testDB.0, filling with zeroes...
2016-06-02T12:41:38.170+0100 I STORAGE [FileAllocator] done allocating datafile /mongo/wk2/mongod-sec/testDB.0, size: 16MB, took 0.003 secs
2016-06-02T12:41:38.311+0100 I STORAGE [FileAllocator] allocating new datafile /mongo/wk2/mongod-sec/testDB.1, filling with zeroes...
2016-06-02T12:41:38.315+0100 I STORAGE [FileAllocator] done allocating datafile /mongo/wk2/mongod-sec/testDB.1, size: 32MB, took 0.003 secs
2016-06-02T12:41:38.331+0100 I INDEX [rsSync] build index on: testDB.BlogColl properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "testDB.BlogColl" }
2016-06-02T12:41:38.332+0100 I INDEX [rsSync] building index using bulk method
2016-06-02T12:41:38.546+0100 I INDEX [rsSync] build index done. scanned 100000 total records. 0 secs
2016-06-02T12:41:38.548+0100 I REPL [rsSync] initial sync data copy, starting syncup
2016-06-02T12:41:38.549+0100 I REPL [rsSync] oplog sync 1 of 3
2016-06-02T12:41:38.549+0100 I REPL [rsSync] oplog sync 2 of 3
2016-06-02T12:41:38.550+0100 I REPL [rsSync] initial sync building indexes
2016-06-02T12:41:38.550+0100 I REPL [rsSync] initial sync cloning indexes for : testDB
2016-06-02T12:41:38.551+0100 I STORAGE [rsSync] copying indexes for: { name: "BlogColl", options: {} }
2016-06-02T12:41:38.552+0100 I REPL [rsSync] oplog sync 3 of 3
2016-06-02T12:41:38.554+0100 I REPL [rsSync] initial sync finishing up
2016-06-02T12:41:38.554+0100 I REPL [rsSync] replSet set minValid=536aac95:1ba5
2016-06-02T12:41:38.554+0100 I REPL [rsSync] initial sync done
2016-06-02T12:41:38.558+0100 I REPL [ReplicationExecutor] transition to RECOVERING
2016-06-02T12:41:38.558+0100 I REPL [ReplicationExecutor] transition to SECONDARY
2016-06-02T12:41:39.018+0100 I REPL [ReplicationExecutor] could not find member to sync from

CorruptionTest:SECONDARY> rs.slaveOk();
CorruptionTest:SECONDARY> use testDB;
switched to db testDB
CorruptionTest:SECONDARY> db.BlogColl.find().explain("executionStats")
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "testDB.BlogColl",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [ ]
},
"winningPlan" : {
"stage" : "COLLSCAN",
"filter" : {
"$and" : [ ]
},
"direction" : "forward"
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 100000,
"executionTimeMillis" : 27,
"totalKeysExamined" : 0,
"totalDocsExamined" : 100000,
"executionStages" : {
"stage" : "COLLSCAN",
"filter" : {
"$and" : [ ]
},
"nReturned" : 100000,
"executionTimeMillisEstimate" : 10,
"works" : 100002,
"advanced" : 100000,
"needTime" : 1,
"needFetch" : 0,
"saveState" : 781,
"restoreState" : 781,
"isEOF" : 1,
"invalidates" : 0,
"direction" : "forward",
"docsExamined" : 100000
}
},
"serverInfo" : {
"host" : "vmxdb01",
"port" : 30002,
"version" : "3.0.5",
"gitVersion" : "8bc4ae20708dbb493cb09338d9e7be6698e4a3a3"
},
"ok" : 1
}
CorruptionTest:SECONDARY>

Wednesday, June 1, 2016

sqoop examples

[cloudera@quickstart ~]$ sqoop import-all-tables \
    -m 1 \
    --connect jdbc:mysql://quickstart:3306/retail_db \
    --username=retail_dba \
    --password=cloudera \
    --compression-codec=snappy \
    --as-parquetfile \
    --warehouse-dir=/user/hive/warehouse \
    --hive-import

=========================================================

[cloudera@quickstart ~]$ ps -ef|grep mysql
root      1950     1 0 06:28 ?        00:00:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --socket=/var/lib/mysql/mysql.sock --pid-file=/var/run/mysqld/mysqld.pid --basedir=/usr --user=mysql
mysql     2070 1950 0 06:28 ?        00:00:01 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
cloudera 7650 7486 0 07:03 pts/0    00:00:00 grep mysql
[cloudera@quickstart ~]$
[cloudera@quickstart ~]$
[cloudera@quickstart ~]$ sqoop import-all-tables \
>     -m 1 \
>     --connect jdbc:mysql://quickstart:3306/retail_db \
>     --username=retail_dba \
>     --password=cloudera \
>     --compression-codec=snappy \
>     --as-parquetfile \
>     --warehouse-dir=/user/hive/warehouse \
>     --hive-import
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
16/06/01 07:03:30 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.0
16/06/01 07:03:30 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/06/01 07:03:30 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
16/06/01 07:03:30 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
16/06/01 07:03:30 WARN tool.BaseSqoopTool: It seems that you're doing hive import directly into default
16/06/01 07:03:30 WARN tool.BaseSqoopTool: hive warehouse directory which is not supported. Sqoop is
16/06/01 07:03:30 WARN tool.BaseSqoopTool: firstly importing data into separate directory and then
16/06/01 07:03:30 WARN tool.BaseSqoopTool: inserting data into hive. Please consider removing
16/06/01 07:03:30 WARN tool.BaseSqoopTool: --target-dir or --warehouse-dir into /user/hive/warehouse in
16/06/01 07:03:30 WARN tool.BaseSqoopTool: case that you will detect any issues.
16/06/01 07:03:31 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
16/06/01 07:03:31 INFO tool.CodeGenTool: Beginning code generation
16/06/01 07:03:31 INFO tool.CodeGenTool: Will generate java class as codegen_categories
16/06/01 07:03:31 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
16/06/01 07:03:31 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
16/06/01 07:03:31 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/9c5d81b8a4fc931d02a97efaa7fd3d8a/codegen_categories.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/06/01 07:03:33 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/9c5d81b8a4fc931d02a97efaa7fd3d8a/codegen_categories.jar
16/06/01 07:03:33 WARN manager.MySQLManager: It looks like you are importing from mysql.
16/06/01 07:03:33 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
16/06/01 07:03:33 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
16/06/01 07:03:33 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
16/06/01 07:03:33 INFO mapreduce.ImportJobBase: Beginning import of categories
16/06/01 07:03:33 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
16/06/01 07:03:33 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
16/06/01 07:03:34 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
16/06/01 07:03:34 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
16/06/01 07:03:35 INFO hive.metastore: Trying to connect to metastore with URI thrift://127.0.0.1:9083
16/06/01 07:03:35 INFO hive.metastore: Opened a connection to metastore, current connections: 1
16/06/01 07:03:35 INFO hive.metastore: Connected to metastore.
16/06/01 07:03:35 INFO hive.metastore: Closed a connection to metastore, current connections: 0
16/06/01 07:03:35 INFO hive.metastore: Trying to connect to metastore with URI thrift://127.0.0.1:9083
16/06/01 07:03:35 INFO hive.metastore: Opened a connection to metastore, current connections: 1
16/06/01 07:03:35 INFO hive.metastore: Connected to metastore.
16/06/01 07:03:36 INFO hive.metastore: Closed a connection to metastore, current connections: 0
16/06/01 07:03:36 INFO hive.metastore: Trying to connect to metastore with URI thrift://127.0.0.1:9083
16/06/01 07:03:36 INFO hive.metastore: Opened a connection to metastore, current connections: 1
16/06/01 07:03:36 INFO hive.metastore: Connected to metastore.
16/06/01 07:03:36 INFO hive.metastore: Closed a connection to metastore, current connections: 0
16/06/01 07:03:36 INFO hive.metastore: Trying to connect to metastore with URI thrift://127.0.0.1:9083
16/06/01 07:03:36 INFO hive.metastore: Opened a connection to metastore, current connections: 1
16/06/01 07:03:36 INFO hive.metastore: Connected to metastore.
16/06/01 07:03:36 INFO hive.HiveManagedMetadataProvider: Creating a managed Hive table named: categories
16/06/01 07:03:37 INFO hive.metastore: Closed a connection to metastore, current connections: 0
16/06/01 07:03:37 INFO hive.metastore: Trying to connect to metastore with URI thrift://127.0.0.1:9083
16/06/01 07:03:37 INFO hive.metastore: Opened a connection to metastore, current connections: 1
16/06/01 07:03:37 INFO hive.metastore: Connected to metastore.
16/06/01 07:03:49 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
16/06/01 07:03:49 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/06/01 07:03:59 INFO db.DBInputFormat: Using read commited transaction isolation
16/06/01 07:03:59 INFO mapreduce.JobSubmitter: number of splits:1
16/06/01 07:03:59 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1464787780964_0001
16/06/01 07:04:00 INFO impl.YarnClientImpl: Submitted application application_1464787780964_0001
16/06/01 07:04:00 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1464787780964_0001/
16/06/01 07:04:00 INFO mapreduce.Job: Running job: job_1464787780964_0001
16/06/01 07:04:27 INFO mapreduce.Job: Job job_1464787780964_0001 running in uber mode : false
16/06/01 07:04:27 INFO mapreduce.Job: map 0% reduce 0%
16/06/01 07:04:43 INFO mapreduce.Job: map 100% reduce 0%
16/06/01 07:04:44 INFO mapreduce.Job: Job job_1464787780964_0001 completed successfully
16/06/01 07:04:44 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=211514
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=5685
        HDFS: Number of bytes written=3446
        HDFS: Number of read operations=48
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=10
    Job Counters
        Launched map tasks=1
        Other local map tasks=1
        Total time spent by all maps in occupied slots (ms)=12540
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=12540
        Total vcore-seconds taken by all map tasks=12540
        Total megabyte-seconds taken by all map tasks=12840960
    Map-Reduce Framework
        Map input records=58
        Map output records=58
        Input split bytes=87
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=141
        CPU time spent (ms)=2640
        Physical memory (bytes) snapshot=336764928
        Virtual memory (bytes) snapshot=1592012800
        Total committed heap usage (bytes)=326107136
    File Input Format Counters
        Bytes Read=0
    File Output Format Counters
        Bytes Written=0
16/06/01 07:04:44 INFO mapreduce.ImportJobBase: Transferred 3.3652 KB in 55.2648 seconds (62.3543 bytes/sec)
16/06/01 07:04:44 INFO mapreduce.ImportJobBase: Retrieved 58 records.
16/06/01 07:04:44 INFO tool.CodeGenTool: Beginning code generation
16/06/01 07:04:44 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `customers` AS t LIMIT 1
16/06/01 07:04:44 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/9c5d81b8a4fc931d02a97efaa7fd3d8a/codegen_categories.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/06/01 07:04:49 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/9c5d81b8a4fc931d02a97efaa7fd3d8a/codegen_categories.jar
16/06/01 07:04:49 INFO mapreduce.ImportJobBase: Beginning import of customers
16/06/01 07:04:49 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
16/06/01 07:04:49 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `customers` AS t LIMIT 1
16/06/01 07:04:49 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `customers` AS t LIMIT 1
16/06/01 07:04:49 INFO hive.metastore: Closed a connection to metastore, current connections: 0
16/06/01 07:04:49 INFO hive.metastore: Trying to connect to metastore with URI thrift://127.0.0.1:9083
16/06/01 07:04:49 INFO hive.metastore: Opened a connection to metastore, current connections: 1
16/06/01 07:04:49 INFO hive.metastore: Connected to metastore.
16/06/01 07:04:50 INFO hive.metastore: Closed a connection to metastore, current connections: 0
16/06/01 07:04:50 INFO hive.metastore: Trying to connect to metastore with URI thrift://127.0.0.1:9083
16/06/01 07:04:50 INFO hive.metastore: Opened a connection to metastore, current connections: 1
16/06/01 07:04:50 INFO hive.metastore: Connected to metastore.
16/06/01 07:04:50 INFO hive.metastore: Closed a connection to metastore, current connections: 0
16/06/01 07:04:50 INFO hive.metastore: Trying to connect to metastore with URI thrift://127.0.0.1:9083
16/06/01 07:04:50 INFO hive.metastore: Opened a connection to metastore, current connections: 1
16/06/01 07:04:50 INFO hive.metastore: Connected to metastore.
16/06/01 07:04:50 INFO hive.HiveManagedMetadataProvider: Creating a managed Hive table named: customers
16/06/01 07:04:50 INFO hive.metastore: Closed a connection to metastore, current connections: 0
16/06/01 07:04:50 INFO hive.metastore: Trying to connect to metastore with URI thrift://127.0.0.1:9083
16/06/01 07:04:50 INFO hive.metastore: Opened a connection to metastore, current connections: 1
16/06/01 07:04:50 INFO hive.metastore: Connected to metastore.
16/06/01 07:04:50 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/06/01 07:05:00 INFO db.DBInputFormat: Using read commited transaction isolation
16/06/01 07:05:00 INFO mapreduce.JobSubmitter: number of splits:1
16/06/01 07:05:00 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1464787780964_0002
16/06/01 07:05:00 INFO impl.YarnClientImpl: Submitted application application_1464787780964_0002
16/06/01 07:05:00 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1464787780964_0002/
16/06/01 07:05:00 INFO mapreduce.Job: Running job: job_1464787780964_0002
16/06/01 07:05:27 INFO mapreduce.Job: Job job_1464787780964_0002 running in uber mode : false
16/06/01 07:05:27 INFO mapreduce.Job: map 0% reduce 0%
16/06/01 07:05:44 INFO mapreduce.Job: map 100% reduce 0%
16/06/01 07:05:49 INFO mapreduce.Job: Job job_1464787780964_0002 completed successfully
16/06/01 07:05:49 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=212281
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=13005
        HDFS: Number of bytes written=257967
        HDFS: Number of read operations=48
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=10
    Job Counters
        Launched map tasks=1
        Other local map tasks=1
        Total time spent by all maps in occupied slots (ms)=13696
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=13696
        Total vcore-seconds taken by all map tasks=13696
        Total megabyte-seconds taken by all map tasks=14024704
    Map-Reduce Framework
        Map input records=12435
        Map output records=12435
        Input split bytes=87
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=148
        CPU time spent (ms)=5080
        Physical memory (bytes) snapshot=341532672
        Virtual memory (bytes) snapshot=1585790976
        Total committed heap usage (bytes)=268435456
    File Input Format Counters
        Bytes Read=0
    File Output Format Counters
        Bytes Written=0
16/06/01 07:05:49 INFO mapreduce.ImportJobBase: Transferred 251.9209 KB in 59.1842 seconds (4.2566 KB/sec)
16/06/01 07:05:49 INFO mapreduce.ImportJobBase: Retrieved 12435 records.
16/06/01 07:05:49 INFO tool.CodeGenTool: Beginning code generation
16/06/01 07:05:49 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `departments` AS t LIMIT 1
16/06/01 07:05:50 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/9c5d81b8a4fc931d02a97efaa7fd3d8a/codegen_categories.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/06/01 07:05:55 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/9c5d81b8a4fc931d02a97efaa7fd3d8a/codegen_categories.jar
16/06/01 07:05:55 INFO mapreduce.ImportJobBase: Beginning import of departments
16/06/01 07:05:55 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
16/06/01 07:05:55 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `departments` AS t LIMIT 1
16/06/01 07:05:55 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `departments` AS t LIMIT 1
16/06/01 07:05:55 INFO hive.metastore: Closed a connection to metastore, current connections: 0
16/06/01 07:05:55 INFO hive.metastore: Trying to connect to metastore with URI thrift://127.0.0.1:9083
16/06/01 07:05:55 INFO hive.metastore: Opened a connection to metastore, current connections: 1
16/06/01 07:05:55 INFO hive.metastore: Connected to metastore.
16/06/01 07:05:55 INFO hive.metastore: Closed a connection to metastore, current connections: 0
16/06/01 07:05:55 INFO hive.metastore: Trying to connect to metastore with URI thrift://127.0.0.1:9083
16/06/01 07:05:56 INFO hive.metastore: Opened a connection to metastore, current connections: 1
16/06/01 07:05:56 INFO hive.metastore: Connected to metastore.
16/06/01 07:05:56 INFO hive.metastore: Closed a connection to metastore, current connections: 0
16/06/01 07:05:56 INFO hive.metastore: Trying to connect to metastore with URI thrift://127.0.0.1:9083
16/06/01 07:05:56 INFO hive.metastore: Opened a connection to metastore, current connections: 1
16/06/01 07:05:56 INFO hive.metastore: Connected to metastore.
16/06/01 07:05:56 INFO hive.HiveManagedMetadataProvider: Creating a managed Hive table named: departments
16/06/01 07:05:56 INFO hive.metastore: Closed a connection to metastore, current connections: 0
16/06/01 07:05:56 INFO hive.metastore: Trying to connect to metastore with URI thrift://127.0.0.1:9083
16/06/01 07:05:56 INFO hive.metastore: Opened a connection to metastore, current connections: 1
16/06/01 07:05:56 INFO hive.metastore: Connected to metastore.
16/06/01 07:05:56 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/06/01 07:06:06 INFO db.DBInputFormat: Using read commited transaction isolation
16/06/01 07:06:06 INFO mapreduce.JobSubmitter: number of splits:1
16/06/01 07:06:06 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1464787780964_0003
16/06/01 07:06:06 INFO impl.YarnClientImpl: Submitted application application_1464787780964_0003
16/06/01 07:06:06 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1464787780964_0003/
16/06/01 07:06:06 INFO mapreduce.Job: Running job: job_1464787780964_0003
16/06/01 07:06:35 INFO mapreduce.Job: Job job_1464787780964_0003 running in uber mode : false
16/06/01 07:06:35 INFO mapreduce.Job: map 0% reduce 0%
16/06/01 07:08:20 INFO mapreduce.Job: map 100% reduce 0%
16/06/01 07:08:33 INFO mapreduce.Job: Job job_1464787780964_0003 completed successfully
16/06/01 07:08:34 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=211387
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=4453
        HDFS: Number of bytes written=1982
        HDFS: Number of read operations=48
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=10
    Job Counters
        Launched map tasks=1
        Other local map tasks=1
        Total time spent by all maps in occupied slots (ms)=105084
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=105084
        Total vcore-seconds taken by all map tasks=105084
        Total megabyte-seconds taken by all map tasks=107606016
    Map-Reduce Framework
        Map input records=6
        Map output records=6
        Input split bytes=87
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=1788
        CPU time spent (ms)=3220
        Physical memory (bytes) snapshot=253460480
        Virtual memory (bytes) snapshot=1580556288
        Total committed heap usage (bytes)=267911168
    File Input Format Counters
        Bytes Read=0
    File Output Format Counters
        Bytes Written=0
16/06/01 07:08:34 INFO mapreduce.ImportJobBase: Transferred 1.9355 KB in 157.3586 seconds (12.5954 bytes/sec)
16/06/01 07:08:34 INFO mapreduce.ImportJobBase: Retrieved 6 records.
16/06/01 07:08:34 INFO tool.CodeGenTool: Beginning code generation
16/06/01 07:08:34 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `order_items` AS t LIMIT 1
16/06/01 07:08:34 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/9c5d81b8a4fc931d02a97efaa7fd3d8a/codegen_categories.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/06/01 07:08:39 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/9c5d81b8a4fc931d02a97efaa7fd3d8a/codegen_categories.jar
16/06/01 07:08:39 INFO mapreduce.ImportJobBase: Beginning import of order_items
16/06/01 07:08:39 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
16/06/01 07:08:39 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `order_items` AS t LIMIT 1
16/06/01 07:08:39 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `order_items` AS t LIMIT 1
16/06/01 07:08:39 INFO hive.metastore: Closed a connection to metastore, current connections: 0
16/06/01 07:08:39 INFO hive.metastore: Trying to connect to metastore with URI thrift://127.0.0.1:9083
16/06/01 07:08:39 INFO hive.metastore: Opened a connection to metastore, current connections: 1
16/06/01 07:08:39 INFO hive.metastore: Connected to metastore.
16/06/01 07:08:40 INFO hive.metastore: Closed a connection to metastore, current connections: 0
16/06/01 07:08:40 INFO hive.metastore: Trying to connect to metastore with URI thrift://127.0.0.1:9083
16/06/01 07:08:40 INFO hive.metastore: Opened a connection to metastore, current connections: 1
16/06/01 07:08:40 INFO hive.metastore: Connected to metastore.
16/06/01 07:08:40 INFO hive.metastore: Closed a connection to metastore, current connections: 0
16/06/01 07:08:40 INFO hive.metastore: Trying to connect to metastore with URI thrift://127.0.0.1:9083
16/06/01 07:08:40 INFO hive.metastore: Opened a connection to metastore, current connections: 1
16/06/01 07:08:40 INFO hive.metastore: Connected to metastore.
16/06/01 07:08:40 INFO hive.HiveManagedMetadataProvider: Creating a managed Hive table named: order_items
16/06/01 07:08:40 INFO hive.metastore: Closed a connection to metastore, current connections: 0
16/06/01 07:08:40 INFO hive.metastore: Trying to connect to metastore with URI thrift://127.0.0.1:9083
16/06/01 07:08:40 INFO hive.metastore: Opened a connection to metastore, current connections: 1
16/06/01 07:08:40 INFO hive.metastore: Connected to metastore.
16/06/01 07:08:40 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/06/01 07:08:49 INFO db.DBInputFormat: Using read commited transaction isolation
16/06/01 07:08:49 INFO mapreduce.JobSubmitter: number of splits:1
16/06/01 07:08:49 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1464787780964_0004
16/06/01 07:08:49 INFO impl.YarnClientImpl: Submitted application application_1464787780964_0004
16/06/01 07:08:49 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1464787780964_0004/
16/06/01 07:08:49 INFO mapreduce.Job: Running job: job_1464787780964_0004
16/06/01 07:09:23 INFO mapreduce.Job: Job job_1464787780964_0004 running in uber mode : false
16/06/01 07:09:23 INFO mapreduce.Job: map 0% reduce 0%
16/06/01 07:09:56 INFO mapreduce.Job: map 100% reduce 0%
16/06/01 07:10:20 INFO mapreduce.Job: Job job_1464787780964_0004 completed successfully
16/06/01 07:10:21 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=211981
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=9725
        HDFS: Number of bytes written=1649632
        HDFS: Number of read operations=48
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=10
    Job Counters
        Launched map tasks=1
        Other local map tasks=1
        Total time spent by all maps in occupied slots (ms)=46491
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=46491
        Total vcore-seconds taken by all map tasks=46491
        Total megabyte-seconds taken by all map tasks=47606784
    Map-Reduce Framework
        Map input records=172198
        Map output records=172198
        Input split bytes=87
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=741
        CPU time spent (ms)=6910
        Physical memory (bytes) snapshot=412913664
        Virtual memory (bytes) snapshot=1575092224
        Total committed heap usage (bytes)=342360064
    File Input Format Counters
        Bytes Read=0
    File Output Format Counters
        Bytes Written=0
16/06/01 07:10:21 INFO mapreduce.ImportJobBase: Transferred 1.5732 MB in 100.2474 seconds (16.0699 KB/sec)
16/06/01 07:10:21 INFO mapreduce.ImportJobBase: Retrieved 172198 records.
16/06/01 07:10:21 INFO tool.CodeGenTool: Beginning code generation
16/06/01 07:10:21 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `orders` AS t LIMIT 1
16/06/01 07:10:21 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/9c5d81b8a4fc931d02a97efaa7fd3d8a/codegen_categories.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/06/01 07:10:27 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/9c5d81b8a4fc931d02a97efaa7fd3d8a/codegen_categories.jar
16/06/01 07:10:27 INFO mapreduce.ImportJobBase: Beginning import of orders
16/06/01 07:10:27 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
16/06/01 07:10:27 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `orders` AS t LIMIT 1
16/06/01 07:10:27 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `orders` AS t LIMIT 1
16/06/01 07:10:27 INFO hive.metastore: Closed a connection to metastore, current connections: 0
16/06/01 07:10:27 INFO hive.metastore: Trying to connect to metastore with URI thrift://127.0.0.1:9083
16/06/01 07:10:27 INFO hive.metastore: Opened a connection to metastore, current connections: 1
16/06/01 07:10:27 INFO hive.metastore: Connected to metastore.
16/06/01 07:10:27 INFO hive.metastore: Closed a connection to metastore, current connections: 0
16/06/01 07:10:27 INFO hive.metastore: Trying to connect to metastore with URI thrift://127.0.0.1:9083
16/06/01 07:10:27 INFO hive.metastore: Opened a connection to metastore, current connections: 1
16/06/01 07:10:27 INFO hive.metastore: Connected to metastore.
16/06/01 07:10:28 INFO hive.metastore: Closed a connection to metastore, current connections: 0
16/06/01 07:10:28 INFO hive.metastore: Trying to connect to metastore with URI thrift://127.0.0.1:9083
16/06/01 07:10:28 INFO hive.metastore: Opened a connection to metastore, current connections: 1
16/06/01 07:10:28 INFO hive.metastore: Connected to metastore.
16/06/01 07:10:28 INFO hive.HiveManagedMetadataProvider: Creating a managed Hive table named: orders
16/06/01 07:10:28 INFO hive.metastore: Closed a connection to metastore, current connections: 0
16/06/01 07:10:28 INFO hive.metastore: Trying to connect to metastore with URI thrift://127.0.0.1:9083
16/06/01 07:10:28 INFO hive.metastore: Opened a connection to metastore, current connections: 1
16/06/01 07:10:28 INFO hive.metastore: Connected to metastore.
16/06/01 07:10:28 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/06/01 07:10:37 INFO db.DBInputFormat: Using read commited transaction isolation
16/06/01 07:10:37 INFO mapreduce.JobSubmitter: number of splits:1
16/06/01 07:10:37 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1464787780964_0005
16/06/01 07:10:38 INFO impl.YarnClientImpl: Submitted application application_1464787780964_0005
16/06/01 07:10:38 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1464787780964_0005/
16/06/01 07:10:38 INFO mapreduce.Job: Running job: job_1464787780964_0005
16/06/01 07:11:10 INFO mapreduce.Job: Job job_1464787780964_0005 running in uber mode : false
16/06/01 07:11:10 INFO mapreduce.Job: map 0% reduce 0%
16/06/01 07:11:33 INFO mapreduce.Job: map 100% reduce 0%
16/06/01 07:11:45 INFO mapreduce.Job: Job job_1464787780964_0005 completed successfully
16/06/01 07:11:46 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=211574
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=6589
        HDFS: Number of bytes written=489972
        HDFS: Number of read operations=48
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=10
    Job Counters
        Launched map tasks=1
        Other local map tasks=1
        Total time spent by all maps in occupied slots (ms)=23779
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=23779
        Total vcore-seconds taken by all map tasks=23779
        Total megabyte-seconds taken by all map tasks=24349696
    Map-Reduce Framework
        Map input records=68883
        Map output records=68883
        Input split bytes=87
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=780
        CPU time spent (ms)=6320
        Physical memory (bytes) snapshot=398024704
        Virtual memory (bytes) snapshot=1588559872
        Total committed heap usage (bytes)=342360064
    File Input Format Counters
        Bytes Read=0
    File Output Format Counters
        Bytes Written=0
16/06/01 07:11:46 INFO mapreduce.ImportJobBase: Transferred 478.4883 KB in 77.3825 seconds (6.1834 KB/sec)
16/06/01 07:11:46 INFO mapreduce.ImportJobBase: Retrieved 68883 records.
16/06/01 07:11:46 INFO tool.CodeGenTool: Beginning code generation
16/06/01 07:11:46 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `products` AS t LIMIT 1
16/06/01 07:11:46 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/9c5d81b8a4fc931d02a97efaa7fd3d8a/codegen_categories.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/06/01 07:11:53 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/9c5d81b8a4fc931d02a97efaa7fd3d8a/codegen_categories.jar
16/06/01 07:11:53 INFO mapreduce.ImportJobBase: Beginning import of products
16/06/01 07:11:53 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
16/06/01 07:11:53 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `products` AS t LIMIT 1
16/06/01 07:11:53 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `products` AS t LIMIT 1
16/06/01 07:11:54 INFO hive.metastore: Closed a connection to metastore, current connections: 0
16/06/01 07:11:54 INFO hive.metastore: Trying to connect to metastore with URI thrift://127.0.0.1:9083
16/06/01 07:11:54 INFO hive.metastore: Opened a connection to metastore, current connections: 1
16/06/01 07:11:54 INFO hive.metastore: Connected to metastore.
16/06/01 07:11:54 INFO hive.metastore: Closed a connection to metastore, current connections: 0
16/06/01 07:11:54 INFO hive.metastore: Trying to connect to metastore with URI thrift://127.0.0.1:9083
16/06/01 07:11:54 INFO hive.metastore: Opened a connection to metastore, current connections: 1
16/06/01 07:11:54 INFO hive.metastore: Connected to metastore.
16/06/01 07:11:54 INFO hive.metastore: Closed a connection to metastore, current connections: 0
16/06/01 07:11:54 INFO hive.metastore: Trying to connect to metastore with URI thrift://127.0.0.1:9083
16/06/01 07:11:54 INFO hive.metastore: Opened a connection to metastore, current connections: 1
16/06/01 07:11:54 INFO hive.metastore: Connected to metastore.
16/06/01 07:11:54 INFO hive.HiveManagedMetadataProvider: Creating a managed Hive table named: products
16/06/01 07:11:54 INFO hive.metastore: Closed a connection to metastore, current connections: 0
16/06/01 07:11:54 INFO hive.metastore: Trying to connect to metastore with URI thrift://127.0.0.1:9083
16/06/01 07:11:54 INFO hive.metastore: Opened a connection to metastore, current connections: 1
16/06/01 07:11:54 INFO hive.metastore: Connected to metastore.
16/06/01 07:11:55 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/06/01 07:12:04 INFO db.DBInputFormat: Using read commited transaction isolation
16/06/01 07:12:04 INFO mapreduce.JobSubmitter: number of splits:1
16/06/01 07:12:04 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1464787780964_0006
16/06/01 07:12:04 INFO impl.YarnClientImpl: Submitted application application_1464787780964_0006
16/06/01 07:12:04 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1464787780964_0006/
16/06/01 07:12:04 INFO mapreduce.Job: Running job: job_1464787780964_0006
16/06/01 07:12:39 INFO mapreduce.Job: Job job_1464787780964_0006 running in uber mode : false
16/06/01 07:12:39 INFO mapreduce.Job: map 0% reduce 0%
16/06/01 07:13:14 INFO mapreduce.Job: map 100% reduce 0%
16/06/01 07:13:15 INFO mapreduce.Job: Job job_1464787780964_0006 completed successfully
16/06/01 07:13:15 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=211879
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=9261
        HDFS: Number of bytes written=47239
        HDFS: Number of read operations=48
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=10
    Job Counters
        Launched map tasks=1
        Other local map tasks=1
        Total time spent by all maps in occupied slots (ms)=22970
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=22970
        Total vcore-seconds taken by all map tasks=22970
        Total megabyte-seconds taken by all map tasks=23521280
    Map-Reduce Framework
        Map input records=1345
        Map output records=1345
        Input split bytes=87
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=119
        CPU time spent (ms)=3040
        Physical memory (bytes) snapshot=298315776
        Virtual memory (bytes) snapshot=1590517760
        Total committed heap usage (bytes)=185597952
    File Input Format Counters
        Bytes Read=0
    File Output Format Counters
        Bytes Written=0
16/06/01 07:13:15 INFO mapreduce.ImportJobBase: Transferred 46.1318 KB in 80.5216 seconds (586.6626 bytes/sec)
16/06/01 07:13:15 INFO mapreduce.ImportJobBase: Retrieved 1345 records.

Donghua's Blog - DBAGlobe

Thursday, June 30, 2016

Different Key orders make the document different in MongoDB?

Thursday, June 23, 2016

Import large JSON document into mongo DB

Thursday, June 2, 2016

Point in time recovery - MongoDB

Restart the mongod with an empty data directory and let MongoDB’s initial syncing

Wednesday, June 1, 2016

sqoop examples

Disclaimer

Labels

Blog Archive

Search This Website

Tweets by Me