Thursday, June 23, 2016

Import large JSON document into mongo DB

[mongod@localhost week5]$ ls -lh posts.json
-rw-r--r--. 1 mongod mongod 34M Jun 21 12:16 posts.json

[mongod@localhost week5]$ mongoimport -d blog -c posts --drop posts.json
2016-06-23T12:46:05.738+0800    connected to: localhost
2016-06-23T12:46:05.739+0800    dropping: blog.posts
2016-06-23T12:46:07.758+0800    Failed: lost connection to server
2016-06-23T12:46:07.758+0800    imported 0 documents

Import failed with exception below:


2016-06-23T12:46:07.321+0800 D COMMAND  [conn2] run command admin.$cmd { getnonce: 1 }
2016-06-23T12:46:07.321+0800 I COMMAND  [conn2] command admin.$cmd command: getnonce { getnonce: 1 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:65 locks:{} protocol:op_query 0ms
2016-06-23T12:46:07.724+0800 I -        [conn2] Assertion: 10334:BSONObj size: 33581101 (0x200682D) is invalid. Size must be between 0 and 16793600(16MB) First element: insert: "posts"
2016-06-23T12:46:07.757+0800 I CONTROL  [conn2]
 0x1315ab2 0x12b38c8 0x12a01a8 0x12a025c 0x9da679 0xbd59f8 0xa29b3d 0xcd2172 0xcd4aa6 0x9b79fc 0x12c2fad 0x7ff585e7adc5 0x7ff585baac4d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"F15AB2","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"EB38C8","s":"_ZN5mongo10logContextEPKc"},{"b":"400000","o":"EA01A8","s":"_ N5mongo11msgassertedEiPKc"},{"b":"400000","o":"EA025C"},{"b":"400000","o":"5DA679","s":"_ZNK5mongo7BSONObj14_assertInvalidEv"},{"b":"400000","o":"7D59F8","s":"_ZN5mongo9DbM ssage9nextJsObjEv"},{"b":"400000","o":"629B3D","s":"_ZN5mongo12QueryMessageC2ERNS_9DbMessageE"},{"b":"400000","o":"8D2172"},{"b":"400000","o":"8D4AA6","s":"_ZN5mongo16assem leResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE"},{"b":"400000","o":"5B79FC","s":"_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_2 AbstractMessagingPortE"},{"b":"400000","o":"EC2FAD","s":"_ZN5mongo17PortMessageServer17handleIncomingMsgEPv"},{"b":"7FF585E73000","o":"7DC5"},{"b":"7FF585AB4000","o":"F6C4D ,"s":"clone"}],"processInfo":{ "mongodbVersion" : "3.2.6", "gitVersion" : "05552b562c7a0b3143a729aaa0838e558dc49b25", "compiledModules" : [], "uname" : { "sysname" : "Linux , "release" : "3.8.13-118.6.2.el7uek.x86_64", "version" : "#2 SMP Thu May 19 13:15:51 PDT 2016", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "build d" : "398421CA3C84F69951FDE89431516DCB02070495" }, { "b" : "7FFE03EBA000", "elfType" : 3, "buildId" : "DB1547CFFDE01D055FAC6A7F583CCC31E1BEBA83" }, { "b" : "7FF586D9B000",  path" : "/lib64/libssl.so.10", "elfType" : 3, "buildId" : "686A25D0A83D002183C835FA5694A8110C78D3BC" }, { "b" : "7FF5869B3000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "D5775E07029F79874AEEC477778DB1C40E91B84D" }, { "b" : "7FF5867AB000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "CCC119FE8F4D8D262AFC67DDC36 2F266F30586F" }, { "b" : "7FF5865A7000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "DECA756A631284EEE020E3AE1CADCE9BDFB9914D" }, { "b" : "7FF5862A5000", "pat " : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "F129A8EBE4CD7D042019BFACD9A58677C18F2AE3" }, { "b" : "7FF58608F000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "bu ldId" : "97D5E2F5739B715C3A0EC9F95F7336E232346CA8" }, { "b" : "7FF585E73000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "0D3AFB2828490FDE7B2BE3BEEBACBDB D94A999F" }, { "b" : "7FF585AB4000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "725A7BE34325FB744D49930F508FBF831E573C3E" }, { "b" : "7FF587008000", "path" :  /lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "09E1BB4D034C7263810A41100647068858A7ECB6" }, { "b" : "7FF585868000", "path" : "/lib64/libgssapi_krb5.so.2", "elfTy e" : 3, "buildId" : "E993A70A266D64269D29914AD59BE6FF54B0F531" }, { "b" : "7FF585583000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "EA78218A5AAEAB292259A0 1D59A9DC9EC95AF60" }, { "b" : "7FF58537F000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "5F93ABF89BFAD8DF5AB77D741156BDBBB77FC692" }, { "b" : "7FF58514D 00", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "6D3C012960EBE87D267A66BE09DBFAA9E7DD6801" }, { "b" : "7FF584F37000", "path" : "/lib64/libz.so.1", "elfT pe" : 3, "buildId" : "FC37913FB197B822BCDBF3697D061E248698CEC1" }, { "b" : "7FF584D28000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "8C509F8EACC7FC 178B8D76587252291AC98E663" }, { "b" : "7FF584B24000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "3B0403FDFDE98D24FA2B5EA33202259FADF9E9A1" }, { "b" : " FF58490A000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "DDBBFDDB9C3C33C6349711D1ED27218E5271757B" }, { "b" : "7FF5846E6000", "path" : "/lib64/libselinux so.1", "elfType" : 3, "buildId" : "DF5DEAA29286E350B0106E13E70CEB0E7D0B6562" }, { "b" : "7FF584485000", "path" : "/lib64/libpcre.so.1", "elfType" : 3, "buildId" : "1DEC80B8 143A7960489C7B7AA8DDF182D6E2BE6" }, { "b" : "7FF584260000", "path" : "/lib64/liblzma.so.5", "elfType" : 3, "buildId" : "61D7D46225E85F144221E1424B87FBF3CB2B9D3F" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x1315ab2]
 mongod(_ZN5mongo10logContextEPKc+0x138) [0x12b38c8]
 mongod(_ZN5mongo11msgassertedEiPKc+0x88) [0x12a01a8]
 mongod(+0xEA025C) [0x12a025c]
 mongod(_ZNK5mongo7BSONObj14_assertInvalidEv+0x3B9) [0x9da679]
 mongod(_ZN5mongo9DbMessage9nextJsObjEv+0x118) [0xbd59f8]
 mongod(_ZN5mongo12QueryMessageC2ERNS_9DbMessageE+0x5D) [0xa29b3d]
 mongod(+0x8D2172) [0xcd2172]
 mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x696) [0xcd4aa6]
 mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortE+0xEC) [0x9b79fc]
 mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x26D) [0x12c2fad]
 libpthread.so.0(+0x7DC5) [0x7ff585e7adc5]
 libc.so.6(clone+0x6D) [0x7ff585baac4d]
-----  END BACKTRACE  -----
2016-06-23T12:46:07.757+0800 I NETWORK  [conn2] AssertionException handling request, closing client connection: 10334 BSONObj size: 33581101 (0x200682D) is invalid. Size mu t be between 0 and 16793600(16MB) First element: insert: "posts"
2016-06-23T12:46:07.772+0800 D NETWORK  [conn1] SocketException: remote: 127.0.0.1:62169 error: 9001 socket exception [CLOSED] server [127.0.0.1:62169]
2016-06-23T12:46:07.772+0800 I NETWORK  [conn1] end connection 127.0.0.1:62169 (0 connections now open)
2016-06-23T12:46:33.529+0800 D -        [PeriodicTaskRunner] cleaning up unused lock buckets of the global lock manager

Workaround 1: use multiple workers. (below example using 4 workers, instead of 1 worker handle 34MB document)

[mongod@localhost week5]$ mongoimport -d blog -c posts --drop -j 4 posts.json
2016-06-23T12:52:01.339+0800    connected to: localhost
2016-06-23T12:52:01.340+0800    dropping: blog.posts
2016-06-23T12:52:03.346+0800    imported 1000 documents

Workaround 2: use hidden option "batchSize"

[mongod@localhost week5]$ mongoimport -d blog -c posts --drop --batchSize 1 posts.json
2016-06-23T12:53:19.703+0800    connected to: localhost
2016-06-23T12:53:19.705+0800    dropping: blog.posts
2016-06-23T12:53:22.698+0800    [######################..] blog.posts   32.4 MB/33.9 MB (95.7%)
2016-06-23T12:53:22.862+0800    [########################] blog.posts   33.9 MB/33.9 MB (100.0%)
2016-06-23T12:53:22.862+0800    imported 1000 documents

[mongod@localhost week5]$ mongoimport -d blog -c posts --drop --batchSize 100 posts.json
2016-06-23T12:55:15.239+0800    connected to: localhost
2016-06-23T12:55:15.241+0800    dropping: blog.posts
2016-06-23T12:55:17.085+0800    imported 1000 documents