DataX将DB2数据同步到GBase中

DataX 是阿里开源的异构数据源离线同步工具,致力于实现包括关系型数据库(MySQL、Oracle等)、HDFS、Hive、ODPS、HBase、FTP等各种异构数据源之间稳定高效的数据同步功能。阿里云开源离线同步工具DataX3.0介绍

在2015年的时候,项目有用到DataX1.0来开发提数功能,使用很方便,我们自己也扩展一些插件,如DB2Writer、DB2Reader、GBaseWriter、GBaseReader等,不得不说,DataX的插件功能真的很强大很灵活。
再到后来,2018年的时候,涉及到跨库数据定时同步。因为当时所涉及的数据量不大,也就几万,最多十几万的数据处理和同步。所以选用了Kettle,这款工具上手也比较快,图形化操作,我们在GUI中设计好我们的ktr和kjb,在定时通过命令的方式来执行kjb的方式实现定时ETL数据处理、同步功能。
但是,Kettle已经满足不了我们了,因为我们要把上亿的指标数据从DB2同步到GBase中,当然比较好的方式多是从DB2导出文件然后load到GBase中,但是目前权限不够,也因为只有一台主机,没办法做相应的操作。那么就又得请出我们的DataX杀手锏。

手动编译DataX

工具:

  • Maven
  • Git
  • Idea
  • Jdk1.8
    当然你也可以不用Git和Idea直接下载下来执行Maven命令即可.
    使用Idea导入DataX:


    点击clone后,就会下载代码,下载完成后就会下载依赖包。

    等待依赖包安装完成,执行打包命令:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    mvn -U clean package assembly:assembly -Dmaven.test.skip=true
    [INFO] Reading assembly descriptor: package.xml
    [INFO] datax/lib\commons-io-2.4.jar already added, skipping
    [INFO] datax/lib\commons-lang3-3.3.2.jar already added, skipping
    [INFO] datax/lib\commons-math3-3.1.1.jar already added, skipping
    [INFO] datax/lib\datax-common-0.0.1-SNAPSHOT.jar already added, skipping
    [INFO] datax/lib\datax-transformer-0.0.1-SNAPSHOT.jar already added, skipping
    [INFO] datax/lib\fastjson-1.1.46.sec01.jar already added, skipping
    [INFO] datax/lib\hamcrest-core-1.3.jar already added, skipping
    [INFO] datax/lib\logback-classic-1.0.13.jar already added, skipping
    [INFO] datax/lib\logback-core-1.0.13.jar already added, skipping
    [INFO] datax/lib\slf4j-api-1.7.10.jar already added, skipping
    [INFO] Building tar : F:\WorkAbout\work\DataX\target\datax.tar.gz
    [INFO] datax/lib\commons-io-2.4.jar already added, skipping
    [INFO] datax/lib\commons-lang3-3.3.2.jar already added, skipping
    [INFO] datax/lib\commons-math3-3.1.1.jar already added, skipping
    [INFO] datax/lib\datax-common-0.0.1-SNAPSHOT.jar already added, skipping
    [INFO] datax/lib\datax-transformer-0.0.1-SNAPSHOT.jar already added, skipping
    [INFO] datax/lib\fastjson-1.1.46.sec01.jar already added, skipping
    [INFO] datax/lib\hamcrest-core-1.3.jar already added, skipping
    [INFO] datax/lib\logback-classic-1.0.13.jar already added, skipping
    [INFO] datax/lib\logback-core-1.0.13.jar already added, skipping
    [INFO] datax/lib\slf4j-api-1.7.10.jar already added, skipping
    [INFO] datax/lib\commons-io-2.4.jar already added, skipping
    [INFO] datax/lib\commons-lang3-3.3.2.jar already added, skipping
    [INFO] datax/lib\commons-math3-3.1.1.jar already added, skipping
    [INFO] datax/lib\datax-common-0.0.1-SNAPSHOT.jar already added, skipping
    [INFO] datax/lib\datax-transformer-0.0.1-SNAPSHOT.jar already added, skipping
    [INFO] datax/lib\fastjson-1.1.46.sec01.jar already added, skipping
    [INFO] datax/lib\hamcrest-core-1.3.jar already added, skipping
    [INFO] datax/lib\logback-classic-1.0.13.jar already added, skipping
    [INFO] datax/lib\logback-core-1.0.13.jar already added, skipping
    [INFO] datax/lib\slf4j-api-1.7.10.jar already added, skipping
    [INFO] Copying files to F:\WorkAbout\work\DataX\target\datax
    [INFO] datax/lib\commons-io-2.4.jar already added, skipping
    [INFO] datax/lib\commons-lang3-3.3.2.jar already added, skipping
    [INFO] datax/lib\commons-math3-3.1.1.jar already added, skipping
    [INFO] datax/lib\datax-common-0.0.1-SNAPSHOT.jar already added, skipping
    [INFO] datax/lib\datax-transformer-0.0.1-SNAPSHOT.jar already added, skipping
    [INFO] datax/lib\fastjson-1.1.46.sec01.jar already added, skipping
    [INFO] datax/lib\hamcrest-core-1.3.jar already added, skipping
    [INFO] datax/lib\logback-classic-1.0.13.jar already added, skipping
    [INFO] datax/lib\logback-core-1.0.13.jar already added, skipping
    [INFO] datax/lib\slf4j-api-1.7.10.jar already added, skipping
    [WARNING] Assembly file: F:\WorkAbout\work\DataX\target\datax is not a regular file (it may be a directory). It cannot be attached to the project build for installation or deployment.
    [INFO] ------------------------------------------------------------------------
    [INFO] Reactor Summary:
    [INFO]
    [INFO] datax-all 0.0.1-SNAPSHOT ........................... SUCCESS [14:23 min]
    [INFO] datax-common ....................................... SUCCESS [ 3.743 s]
    [INFO] datax-transformer .................................. SUCCESS [ 4.746 s]
    [INFO] datax-core ......................................... SUCCESS [ 35.608 s]
    [INFO] plugin-rdbms-util .................................. SUCCESS [ 2.233 s]
    [INFO] mysqlreader ........................................ SUCCESS [ 3.712 s]
    [INFO] drdsreader ......................................... SUCCESS [ 2.849 s]
    [INFO] sqlserverreader .................................... SUCCESS [ 2.668 s]
    [INFO] postgresqlreader ................................... SUCCESS [ 3.018 s]
    [INFO] oraclereader ....................................... SUCCESS [ 2.720 s]
    [INFO] odpsreader ......................................... SUCCESS [ 4.960 s]
    [INFO] otsreader .......................................... SUCCESS [ 5.210 s]
    [INFO] otsstreamreader .................................... SUCCESS [ 4.729 s]
    [INFO] plugin-unstructured-storage-util ................... SUCCESS [ 1.542 s]
    [INFO] txtfilereader ...................................... SUCCESS [ 11.342 s]
    [INFO] hdfsreader ......................................... SUCCESS [ 36.736 s]
    [INFO] streamreader ....................................... SUCCESS [ 2.925 s]
    [INFO] ossreader .......................................... SUCCESS [ 12.321 s]
    [INFO] ftpreader .......................................... SUCCESS [ 12.288 s]
    [INFO] mongodbreader ...................................... SUCCESS [ 11.568 s]
    [INFO] rdbmsreader ........................................ SUCCESS [ 3.784 s]
    [INFO] hbase11xreader ..................................... SUCCESS [ 15.342 s]
    [INFO] hbase094xreader .................................... SUCCESS [ 13.149 s]
    [INFO] tsdbreader ......................................... SUCCESS [ 3.722 s]
    [INFO] opentsdbreader ..................................... SUCCESS [ 6.592 s]
    [INFO] cassandrareader .................................... SUCCESS [ 11.834 s]
    [INFO] mysqlwriter ........................................ SUCCESS [ 3.282 s]
    [INFO] drdswriter ......................................... SUCCESS [ 9.150 s]
    [INFO] odpswriter ......................................... SUCCESS [ 13.498 s]
    [INFO] txtfilewriter ...................................... SUCCESS [ 21.791 s]
    [INFO] ftpwriter .......................................... SUCCESS [ 19.378 s]
    [INFO] hdfswriter ......................................... SUCCESS [ 32.956 s]
    [INFO] streamwriter ....................................... SUCCESS [ 5.611 s]
    [INFO] otswriter .......................................... SUCCESS [ 5.545 s]
    [INFO] oraclewriter ....................................... SUCCESS [ 4.152 s]
    [INFO] sqlserverwriter .................................... SUCCESS [ 2.689 s]
    [INFO] postgresqlwriter ................................... SUCCESS [ 2.502 s]
    [INFO] osswriter .......................................... SUCCESS [ 12.467 s]
    [INFO] mongodbwriter ...................................... SUCCESS [ 12.070 s]
    [INFO] adswriter .......................................... SUCCESS [ 9.578 s]
    [INFO] ocswriter .......................................... SUCCESS [ 5.485 s]
    [INFO] rdbmswriter ........................................ SUCCESS [ 2.927 s]
    [INFO] hbase11xwriter ..................................... SUCCESS [ 15.926 s]
    [INFO] hbase094xwriter .................................... SUCCESS [ 15.623 s]
    [INFO] hbase11xsqlwriter .................................. SUCCESS [ 27.912 s]
    [INFO] hbase11xsqlreader .................................. SUCCESS [ 35.410 s]
    [INFO] elasticsearchwriter ................................ SUCCESS [ 7.341 s]
    [INFO] tsdbwriter ......................................... SUCCESS [ 3.051 s]
    [INFO] adbpgwriter ........................................ SUCCESS [ 5.857 s]
    [INFO] gdbwriter .......................................... SUCCESS [ 11.935 s]
    [INFO] cassandrawriter .................................... SUCCESS [ 6.054 s]
    [INFO] hbase20xsqlreader .................................. SUCCESS [ 3.264 s]
    [INFO] hbase20xsqlwriter 0.0.1-SNAPSHOT ................... SUCCESS [ 3.577 s]
    [INFO] ------------------------------------------------------------------------
    [INFO] BUILD SUCCESS
    [INFO] ------------------------------------------------------------------------
    [INFO] Total time: 23:03 min
    [INFO] Finished at: 2019-12-18T11:59:17+08:00
    [INFO] ------------------------------------------------------------------------
    打包成功后,会在对应目录下生成target文件夹.我这儿打包好的文件有1.12G,因为里面包含了很多插件以及插件所需要的依赖包。我只留下了mysql和rdbms两个插件。

因为我是按照的Python3.6.5如果直接使用官网的命令的话,会报错:

1
2
3
4
5
F:\WorkAbout\work\DataX\target\datax\datax\bin>python datax.py -r mysqlreader -w mysqlwriter
File "datax.py", line 114
print readerRef
^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print(readerRef)?

所以,咱们得转换一下啦,Python3.x已经为我们提供了2to3的工具,在安装目录下面的Tools\scripts中,来体验一下它的强大:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
D:\Programs\Python\Python36-32\Tools\scripts>2to3.py --output-dir=D:/datax -W -n F:\WorkAbout\work\DataX\target\datax\datax\bin\datax.py
WARNING: --write-unchanged-files/-W implies -w.
lib2to3.main: Output in 'D:/datax' will mirror the input directory 'F:\\WorkAbout\\work\\DataX\\target\\datax\\datax\\bin' layout.
RefactoringTool: Skipping optional fixer: buffer
RefactoringTool: Skipping optional fixer: idioms
RefactoringTool: Skipping optional fixer: set_literal
RefactoringTool: Skipping optional fixer: ws_comma
RefactoringTool: Refactored F:\WorkAbout\work\DataX\target\datax\datax\bin\datax.py
--- F:\WorkAbout\work\DataX\target\datax\datax\bin\datax.py (original)
+++ F:\WorkAbout\work\DataX\target\datax\datax\bin\datax.py (refactored)
@@ -52,13 +52,13 @@

def suicide(signum, e):
global child_process
- print >> sys.stderr, "[Error] DataX receive unexpected signal %d, starts to suicide." % (signum)
+ print("[Error] DataX receive unexpected signal %d, starts to suicide." % (signum), file=sys.stderr)

if child_process:
child_process.send_signal(signal.SIGQUIT)
time.sleep(1)
child_process.kill()
- print >> sys.stderr, "DataX Process was killed ! you did ?"
+ print("DataX Process was killed ! you did ?", file=sys.stderr)
sys.exit(RET_STATE["KILL"])


@@ -111,10 +111,10 @@
def generateJobConfigTemplate(reader, writer):
readerRef = "Please refer to the %s document:\n https://github.com/alibaba/DataX/blob/master/%s/doc/%s.md \n" % (reader,reader,reader)
writerRef = "Please refer to the %s document:\n https://github.com/alibaba/DataX/blob/master/%s/doc/%s.md \n " % (writer,writer,writer)
- print readerRef
- print writerRef
+ print(readerRef)
+ print(writerRef)
jobGuid = 'Please save the following configuration as a json file and use\n python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json \nto run the job.\n'
- print jobGuid
+ print(jobGuid)
jobTemplate={
"job": {
"setting": {
@@ -134,15 +134,15 @@
writerTemplatePath = "%s/plugin/writer/%s/plugin_job_template.json" % (DATAX_HOME,writer)
try:
readerPar = readPluginTemplate(readerTemplatePath);
- except Exception, e:
- print "Read reader[%s] template error: can\'t find file %s" % (reader,readerTemplatePath)
+ except Exception as e:
+ print("Read reader[%s] template error: can\'t find file %s" % (reader,readerTemplatePath))
try:
writerPar = readPluginTemplate(writerTemplatePath);
- except Exception, e:
- print "Read writer[%s] template error: : can\'t find file %s" % (writer,writerTemplatePath)
+ except Exception as e:
+ print("Read writer[%s] template error: : can\'t find file %s" % (writer,writerTemplatePath))
jobTemplate['job']['content'][0]['reader'] = readerPar;
jobTemplate['job']['content'][0]['writer'] = writerPar;
- print json.dumps(jobTemplate, indent=4, sort_keys=True)
+ print(json.dumps(jobTemplate, indent=4, sort_keys=True))

def readPluginTemplate(plugin):
with open(plugin, 'r') as f:
@@ -168,7 +168,7 @@

if options.remoteDebug:
tempJVMCommand = tempJVMCommand + " " + REMOTE_DEBUG_CONFIG
- print 'local ip: ', getLocalIp()
+ print('local ip: ', getLocalIp())

if options.loglevel:
tempJVMCommand = tempJVMCommand + " " + ("-Dloglevel=%s" % (options.loglevel))
@@ -198,11 +198,11 @@


def printCopyright():
- print '''
+ print('''
DataX (%s), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.

-''' % DATAX_VERSION
+''' % DATAX_VERSION)
sys.stdout.flush()


RefactoringTool: Writing converted F:\WorkAbout\work\DataX\target\datax\datax\bin\datax.py to D:/datax\datax.py.
RefactoringTool: Files that were modified:
RefactoringTool: F:\WorkAbout\work\DataX\target\datax\datax\bin\datax.py

我将转到好的py文件更名为datax3.py,复制到bin目录下面,重新执行命令:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
F:\WorkAbout\work\DataX\target\datax\datax\bin>python datax3.py -r mysqlreader -w mysqlwriter

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.


Please refer to the mysqlreader document:
https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md

Please refer to the mysqlwriter document:
https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md

Please save the following configuration as a json file and use
python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json
to run the job.

{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"column": [],
"connection": [
{
"jdbcUrl": [],
"table": []
}
],
"password": "",
"username": "",
"where": ""
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"column": [],
"connection": [
{
"jdbcUrl": "",
"table": []
}
],
"password": "",
"preSql": [],
"session": [],
"username": "",
"writeMode": ""
}
}
}
],
"setting": {
"speed": {
"channel": ""
}
}
}
}

我们可以把命令执行结果保存在文件中:

1
python datax3.py -r mysqlreader -w mysqlwriter > mysql2gbase.json

会在当前目录生成mysql2gbase.json文件。
做相应的配置即可:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
{
"job": {
"content": [{
"reader": {
"name": "mysqlreader",
"parameter": {
"column": ["SCENE_ID", "OP_TIME", "OP_JOB_NO", "REMARK"],
"connection": [{
"jdbcUrl": ["jdbc:mysql://10.101.42.91:1299/zhcj_test"],
"table": ["op_log"]
}],
"password": "bingosoft!",
"username": "root",
"where": ""
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"column": ["SCENE_ID", "OP_TIME", "OP_JOB_NO", "REMARK"],
"connection": [{
"jdbcUrl": "jdbc:mysql://10.101.42.91:1299/zhcs_test",
"table": ["datax_op_log"]
}],
"password": "bingosoft!",
"preSql": [],
"session": [],
"username": "root",
"writeMode": "insert"
}
}
}],
"setting": {
"speed": {
"channel": "4"
}
}
}
}

执行同步命令python datax3.py mysql2gbase.json

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.


2019-12-18 18:38:21.572 [main] INFO VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2019-12-18 18:38:21.578 [main] INFO Engine - the machine info =>

osInfo: Oracle Corporation 1.8 25.231-b11
jvmInfo: Windows 10 amd64 10.0
cpu num: 8

totalPhysicalMemory: -0.00G
freePhysicalMemory: -0.00G
maxFileDescriptorCount: -1
currentOpenFileDescriptorCount: -1

GC Names [PS MarkSweep, PS Scavenge]

MEMORY_NAME | allocation_size | init_size
PS Eden Space | 256.00MB | 256.00MB
Code Cache | 240.00MB | 2.44MB
Compressed Class Space | 1,024.00MB | 0.00MB
PS Survivor Space | 42.50MB | 42.50MB
PS Old Gen | 683.00MB | 683.00MB
Metaspace | -0.00MB | 0.00MB


2019-12-18 18:38:21.594 [main] INFO Engine -
{
"content":[
{
"reader":{
"name":"mysqlreader",
"parameter":{
"column":[
"SCENE_ID",
"OP_TIME",
"OP_JOB_NO",
"REMARK"
],
"connection":[
{
"jdbcUrl":[
"jdbc:mysql://10.101.42.91:1299/zhcj_test"
],
"table":[
"op_log"
]
}
],
"password":"**********",
"username":"root",
"where":""
}
},
"writer":{
"name":"mysqlwriter",
"parameter":{
"column":[
"SCENE_ID",
"OP_TIME",
"OP_JOB_NO",
"REMARK"
],
"connection":[
{
"jdbcUrl":"jdbc:mysql://10.101.42.91:1299/zhcs_test",
"table":[
"datax_op_log"
]
}
],
"password":"**********",
"preSql":[],
"session":[],
"username":"root",
"writeMode":"insert"
}
}
}
],
"setting":{
"speed":{
"channel":"4"
}
}
}

2019-12-18 18:38:21.606 [main] WARN Engine - prioriy set to 0, because NumberFormatException, the value is: null
2019-12-18 18:38:21.608 [main] INFO PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2019-12-18 18:38:21.608 [main] INFO JobContainer - DataX jobContainer starts job.
2019-12-18 18:38:21.609 [main] INFO JobContainer - Set jobId = 0
2019-12-18 18:38:21.906 [job-0] INFO OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:mysql://10.101.42.91:1299/zhcj_test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true.
2019-12-18 18:38:21.962 [job-0] INFO OriginalConfPretreatmentUtil - table:[op_log] has columns:[SCENE_ID,OP_TIME,OP_JOB_NO,REMARK].
2019-12-18 18:38:22.184 [job-0] INFO OriginalConfPretreatmentUtil - table:[datax_op_log] all columns:[
SCENE_ID,OP_TIME,OP_JOB_NO,REMARK
].
2019-12-18 18:38:22.242 [job-0] INFO OriginalConfPretreatmentUtil - Write data [
insert INTO %s (SCENE_ID,OP_TIME,OP_JOB_NO,REMARK) VALUES(?,?,?,?)
], which jdbcUrl like:[jdbc:mysql://10.101.42.91:1299/zhcs_test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]
2019-12-18 18:38:22.243 [job-0] INFO JobContainer - jobContainer starts to do prepare ...
2019-12-18 18:38:22.243 [job-0] INFO JobContainer - DataX Reader.Job [mysqlreader] do prepare work .
2019-12-18 18:38:22.243 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] do prepare work .
2019-12-18 18:38:22.244 [job-0] INFO JobContainer - jobContainer starts to do split ...
2019-12-18 18:38:22.244 [job-0] INFO JobContainer - Job set Channel-Number to 4 channels.
2019-12-18 18:38:22.247 [job-0] INFO JobContainer - DataX Reader.Job [mysqlreader] splits to [1] tasks.
2019-12-18 18:38:22.248 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] splits to [1] tasks.
2019-12-18 18:38:22.260 [job-0] INFO JobContainer - jobContainer starts to do schedule ...
2019-12-18 18:38:22.262 [job-0] INFO JobContainer - Scheduler starts [1] taskGroups.
2019-12-18 18:38:22.264 [job-0] INFO JobContainer - Running by standalone Mode.
2019-12-18 18:38:22.272 [taskGroup-0] INFO TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2019-12-18 18:38:22.275 [taskGroup-0] INFO Channel - Channel set byte_speed_limit to -1, No bps activated.
2019-12-18 18:38:22.275 [taskGroup-0] INFO Channel - Channel set record_speed_limit to -1, No tps activated.
2019-12-18 18:38:22.282 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2019-12-18 18:38:22.284 [0-0-0-reader] INFO CommonRdbmsReader$Task - Begin to read record by Sql: [select SCENE_ID,OP_TIME,OP_JOB_NO,REMARK from op_log
] jdbcUrl:[jdbc:mysql://10.101.42.91:1299/zhcj_test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
2019-12-18 18:38:23.295 [0-0-0-reader] INFO CommonRdbmsReader$Task - Finished read record by Sql: [select SCENE_ID,OP_TIME,OP_JOB_NO,REMARK from op_log
] jdbcUrl:[jdbc:mysql://10.101.42.91:1299/zhcj_test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
2019-12-18 18:38:23.592 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[1311]ms
2019-12-18 18:38:23.592 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] completed it's tasks.
2019-12-18 18:38:32.281 [job-0] INFO StandAloneJobContainerCommunicator - Total 15167 records, 849566 bytes | Speed 82.96KB/s, 1516 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.832s | All Task WaitReaderTime 0.058s | Percentage 100.00%
2019-12-18 18:38:32.281 [job-0] INFO AbstractScheduler - Scheduler accomplished all tasks.
2019-12-18 18:38:32.281 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] do post work.
2019-12-18 18:38:32.281 [job-0] INFO JobContainer - DataX Reader.Job [mysqlreader] do post work.
2019-12-18 18:38:32.281 [job-0] INFO JobContainer - DataX jobId [0] completed successfully.
2019-12-18 18:38:32.282 [job-0] INFO HookInvoker - No hook invoked, because base dir not exists or is a file: F:\WorkAbout\work\DataX\target\datax\datax\hook
2019-12-18 18:38:32.283 [job-0] INFO JobContainer -
[total cpu info] =>
averageCpu | maxDeltaCpu | minDeltaCpu
-1.00% | -1.00% | -1.00%


[total gc info] =>
NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTime
PS MarkSweep | 0 | 0 | 0 | 0.000s | 0.000s | 0.000s
PS Scavenge | 0 | 0 | 0 | 0.000s | 0.000s | 0.000s

2019-12-18 18:38:32.283 [job-0] INFO JobContainer - PerfTrace not enable!
2019-12-18 18:38:32.284 [job-0] INFO StandAloneJobContainerCommunicator - Total 15167 records, 849566 bytes | Speed 82.96KB/s, 1516 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.832s | All Task WaitReaderTime 0.058s | Percentage 100.00%
2019-12-18 18:38:32.285 [job-0] INFO JobContainer -
任务启动时刻 : 2019-12-18 18:38:21
任务结束时刻 : 2019-12-18 18:38:32
任务总计耗时 : 10s
任务平均流量 : 82.96KB/s
记录写入速度 : 1516rec/s
读出记录总数 : 15167
读写失败总数 : 0

上面只是一个简单的示例,接下来。我们将进入正式实践环节。首先需要开发一个GbaseWriter,因为没有涉及到太多细节的处理,直接按照DataX插件开发宝典开发一个,打包插件mvn -U clean package assembly:assembly -pl gbasewriter -am -Dmaven.test.skip=true,然后执行:python datax.py -r mysqlreader -w gbasewriter,按照实际情况配置即可,运行效果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.


2019-12-20 15:07:22.954 [main] INFO VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2019-12-20 15:07:22.960 [main] INFO Engine - the machine info =>

osInfo: Oracle Corporation 1.8 25.231-b11
jvmInfo: Windows 10 amd64 10.0
cpu num: 8

totalPhysicalMemory: -0.00G
freePhysicalMemory: -0.00G
maxFileDescriptorCount: -1
currentOpenFileDescriptorCount: -1

GC Names [PS MarkSweep, PS Scavenge]

MEMORY_NAME | allocation_size | init_size
PS Eden Space | 256.00MB | 256.00MB
Code Cache | 240.00MB | 2.44MB
Compressed Class Space | 1,024.00MB | 0.00MB
PS Survivor Space | 42.50MB | 42.50MB
PS Old Gen | 683.00MB | 683.00MB
Metaspace | -0.00MB | 0.00MB


2019-12-20 15:07:22.996 [main] INFO Engine -
{
"content":[
{
"reader":{
"name":"mysqlreader",
"parameter":{
"column":[
"SCENE_ID",
"OP_TIME",
"OP_JOB_NO",
"REMARK"
],
"connection":[
{
"jdbcUrl":[
"jdbc:mysql://10.101.42.91:1299/zhcj_test"
],
"table":[
"op_log"
]
}
],
"password":"**********",
"username":"root",
"where":""
}
},
"writer":{
"name":"gbasewriter",
"parameter":{
"column":[
"SCENE_ID",
"OP_TIME",
"OP_JOB_NO",
"REMARK"
],
"connection":[
{
"jdbcUrl":"jdbc:gbase://10.101.42.91:1238/gdb_cd",
"table":[
"datax_op_log"
]
}
],
"password":"********",
"preSql":[],
"session":[],
"username":"cd_system",
"writeMode":"insert"
}
}
}
],
"setting":{
"speed":{
"channel":"2"
}
}
}

2019-12-20 15:07:23.009 [main] WARN Engine - prioriy set to 0, because NumberFormatException, the value is: null
2019-12-20 15:07:23.011 [main] INFO PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2019-12-20 15:07:23.011 [main] INFO JobContainer - DataX jobContainer starts job.
2019-12-20 15:07:23.013 [main] INFO JobContainer - Set jobId = 0
2019-12-20 15:07:23.344 [job-0] INFO OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:mysql://10.101.42.91:1299/zhcj_test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true.
2019-12-20 15:07:23.404 [job-0] INFO OriginalConfPretreatmentUtil - table:[op_log] has columns:[SCENE_ID,OP_TIME,OP_JOB_NO,REMARK].
2019-12-20 15:07:23.761 [job-0] INFO OriginalConfPretreatmentUtil - table:[datax_op_log] all columns:[
SCENE_ID,OP_TIME,OP_JOB_NO,REMARK
].
2019-12-20 15:07:23.962 [job-0] INFO OriginalConfPretreatmentUtil - Write data [
insert INTO %s (SCENE_ID,OP_TIME,OP_JOB_NO,REMARK) VALUES(?,?,?,?)
], which jdbcUrl like:[jdbc:gbase://10.101.42.91:1238/gdb_cd]
2019-12-20 15:07:23.962 [job-0] INFO JobContainer - jobContainer starts to do prepare ...
2019-12-20 15:07:23.962 [job-0] INFO JobContainer - DataX Reader.Job [mysqlreader] do prepare work .
2019-12-20 15:07:23.963 [job-0] INFO JobContainer - DataX Writer.Job [gbasewriter] do prepare work .
2019-12-20 15:07:23.963 [job-0] INFO JobContainer - jobContainer starts to do split ...
2019-12-20 15:07:23.963 [job-0] INFO JobContainer - Job set Channel-Number to 2 channels.
2019-12-20 15:07:24.060 [job-0] INFO JobContainer - DataX Reader.Job [mysqlreader] splits to [1] tasks.
2019-12-20 15:07:24.060 [job-0] INFO JobContainer - DataX Writer.Job [gbasewriter] splits to [1] tasks.
2019-12-20 15:07:24.173 [job-0] INFO JobContainer - jobContainer starts to do schedule ...
2019-12-20 15:07:24.199 [job-0] INFO JobContainer - Scheduler starts [1] taskGroups.
2019-12-20 15:07:24.201 [job-0] INFO JobContainer - Running by standalone Mode.
2019-12-20 15:07:24.245 [taskGroup-0] INFO TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2019-12-20 15:07:24.313 [taskGroup-0] INFO Channel - Channel set byte_speed_limit to -1, No bps activated.
2019-12-20 15:07:24.313 [taskGroup-0] INFO Channel - Channel set record_speed_limit to -1, No tps activated.
2019-12-20 15:07:24.434 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2019-12-20 15:07:24.441 [0-0-0-reader] INFO CommonRdbmsReader$Task - Begin to read record by Sql: [select SCENE_ID,OP_TIME,OP_JOB_NO,REMARK from op_log
] jdbcUrl:[jdbc:mysql://10.101.42.91:1299/zhcj_test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
2019-12-20 15:07:34.475 [job-0] INFO StandAloneJobContainerCommunicator - Total 0 records, 0 bytes | Speed 0B/s, 0 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 0.00%
2019-12-20 15:07:44.502 [job-0] INFO StandAloneJobContainerCommunicator - Total 2560 records, 149496 bytes | Speed 14.60KB/s, 256 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.142s | All Task WaitReaderTime 0.035s | Percentage 0.00%
2019-12-20 15:07:54.502 [job-0] INFO StandAloneJobContainerCommunicator - Total 2560 records, 149496 bytes | Speed 0B/s, 0 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.142s | All Task WaitReaderTime 0.035s | Percentage 0.00%
2019-12-20 15:08:04.503 [job-0] INFO StandAloneJobContainerCommunicator - Total 4608 records, 269164 bytes | Speed 11.69KB/s, 204 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 24.193s | All Task WaitReaderTime 0.044s | Percentage 0.00%
2019-12-20 15:08:14.504 [job-0] INFO StandAloneJobContainerCommunicator - Total 4608 records, 269164 bytes | Speed 0B/s, 0 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 24.193s | All Task WaitReaderTime 0.044s | Percentage 0.00%
2019-12-20 15:08:24.505 [job-0] INFO StandAloneJobContainerCommunicator - Total 6656 records, 390129 bytes | Speed 11.81KB/s, 204 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 43.026s | All Task WaitReaderTime 0.050s | Percentage 0.00%
2019-12-20 15:08:44.505 [job-0] INFO StandAloneJobContainerCommunicator - Total 8704 records, 510488 bytes | Speed 5.88KB/s, 102 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 60.643s | All Task WaitReaderTime 0.053s | Percentage 0.00%
2019-12-20 15:08:54.506 [job-0] INFO StandAloneJobContainerCommunicator - Total 10752 records, 612379 bytes | Speed 9.95KB/s, 204 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 78.139s | All Task WaitReaderTime 0.055s | Percentage 0.00%
2019-12-20 15:09:14.506 [job-0] INFO StandAloneJobContainerCommunicator - Total 12800 records, 731913 bytes | Speed 5.84KB/s, 102 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 94.918s | All Task WaitReaderTime 0.058s | Percentage 0.00%
2019-12-20 15:09:24.507 [job-0] INFO StandAloneJobContainerCommunicator - Total 12800 records, 731913 bytes | Speed 0B/s, 0 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 94.918s | All Task WaitReaderTime 0.058s | Percentage 0.00%
2019-12-20 15:09:34.467 [0-0-0-reader] INFO CommonRdbmsReader$Task - Finished read record by Sql: [select SCENE_ID,OP_TIME,OP_JOB_NO,REMARK from op_log
] jdbcUrl:[jdbc:mysql://10.101.42.91:1299/zhcj_test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
2019-12-20 15:09:34.508 [job-0] INFO StandAloneJobContainerCommunicator - Total 14848 records, 840903 bytes | Speed 10.64KB/s, 204 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 112.371s | All Task WaitReaderTime 0.060s | Percentage 0.00%
2019-12-20 15:09:42.112 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[137708]ms
2019-12-20 15:09:42.112 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] completed it's tasks.
2019-12-20 15:09:44.508 [job-0] INFO AbstractScheduler - Scheduler accomplished all tasks.
2019-12-20 15:09:44.508 [job-0] INFO JobContainer - DataX Writer.Job [gbasewriter] do post work.
2019-12-20 15:09:44.509 [job-0] INFO JobContainer - DataX Reader.Job [mysqlreader] do post work.
2019-12-20 15:09:44.509 [job-0] INFO JobContainer - DataX jobId [0] completed successfully.
2019-12-20 15:09:44.510 [job-0] INFO HookInvoker - No hook invoked, because base dir not exists or is a file: F:\WorkAbout\datax\datax\hook
2019-12-20 15:09:44.511 [job-0] INFO JobContainer -
[total cpu info] =>
averageCpu | maxDeltaCpu | minDeltaCpu
-1.00% | -1.00% | -1.00%


[total gc info] =>
NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTime
PS MarkSweep | 0 | 0 | 0 | 0.000s | 0.000s | 0.000s
PS Scavenge | 0 | 0 | 0 | 0.000s | 0.000s | 0.000s

2019-12-20 15:09:44.511 [job-0] INFO JobContainer - PerfTrace not enable!
2019-12-20 15:09:44.511 [job-0] INFO StandAloneJobContainerCommunicator - Total 15167 records, 849566 bytes | Speed 5.93KB/s, 108 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 129.781s | All Task WaitReaderTime 0.061s | Percentage 100.00%
2019-12-20 15:09:44.512 [job-0] INFO JobContainer -
任务启动时刻 : 2019-12-20 15:07:23
任务结束时刻 : 2019-12-20 15:09:44
任务总计耗时 : 141s
任务平均流量 : 5.93KB/s
记录写入速度 : 108rec/s
读出记录总数 : 15167
读写失败总数 : 0


没办法,机器太low。这并不是Datax真实的效果。现在就可以把Datax放在服务器上去同步数据了。

DataX将DB2数据同步到GBase中

https://blogs.52fx.biz/posts/615128810.html

作者

eyiadmin

发布于

2019-12-18

更新于

2024-05-31

许可协议

评论