Powered By Blogger

Sunday, November 3, 2019

Sqoop export with staging directory


[cloudera@quickstart Downloads]$ hadoop dfs -rm -r -f /data/card_transactions1.csv

CREATE TABLE card_transactions(
    transaction_id INT(10),
    card_id BIGINT,
    member_id BIGINT,
    amount INT(10),
    postcode INT(10),
    pos_id BIGINT,
    transaction_dt varchar(255),
    status varchar(255),
    PRIMARY KEY (transaction_id)
);

CREATE TABLE card_transactions_stage(
    transaction_id INT(10),
    card_id BIGINT,
    member_id BIGINT,
    amount INT(10),
    postcode INT(10),
    pos_id BIGINT,
    transaction_dt varchar(255),
    status varchar(255),
    PRIMARY KEY (transaction_id)
);


sqoop export \
--connect jdbc:mysql://quickstart.cloudera:3306/banking \
--username root \
--password cloudera \
--table card_transactions \
--staging-table card_transactions_stage \
--export-dir /data/card_transactions1.csv \
--fields-terminated-by ','

We could see movement of data from staging to actual table
19/11/03 02:32:21 INFO manager.SqlManager: Migrated 53292 records from `card_transactions_stage` to `card_transactions`


[cloudera@quickstart Downloads]$ sqoop export --connect jdbc:mysql://quickstart.cloudera:3306/banking --username root --password cloudera --table card_transactions --staging-table card_transactions_stage --export-dir /data/card_transactions1.csv --fields-terminated-by ','
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/11/03 02:31:40 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.0
19/11/03 02:31:40 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/11/03 02:31:41 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
19/11/03 02:31:41 INFO tool.CodeGenTool: Beginning code generation
19/11/03 02:31:41 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `card_transactions` AS t LIMIT 1
19/11/03 02:31:41 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `card_transactions` AS t LIMIT 1
19/11/03 02:31:41 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/516ca6121f0a4c0754e38ab72bd1f805/card_transactions.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
19/11/03 02:31:43 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/516ca6121f0a4c0754e38ab72bd1f805/card_transactions.jar
19/11/03 02:31:43 INFO mapreduce.ExportJobBase: Data will be staged in the table: card_transactions_stage
19/11/03 02:31:43 INFO mapreduce.ExportJobBase: Beginning export of card_transactions
19/11/03 02:31:43 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
19/11/03 02:31:44 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
19/11/03 02:31:45 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
19/11/03 02:31:45 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
19/11/03 02:31:45 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
19/11/03 02:31:46 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
19/11/03 02:31:47 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1281)
at java.lang.Thread.join(Thread.java:1355)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)
19/11/03 02:31:49 INFO input.FileInputFormat: Total input paths to process : 1
19/11/03 02:31:49 INFO input.FileInputFormat: Total input paths to process : 1
19/11/03 02:31:49 INFO mapreduce.JobSubmitter: number of splits:4
19/11/03 02:31:49 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
19/11/03 02:31:49 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1572771724749_0005
19/11/03 02:31:50 INFO impl.YarnClientImpl: Submitted application application_1572771724749_0005
19/11/03 02:31:50 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1572771724749_0005/
19/11/03 02:31:50 INFO mapreduce.Job: Running job: job_1572771724749_0005
19/11/03 02:31:59 INFO mapreduce.Job: Job job_1572771724749_0005 running in uber mode : false
19/11/03 02:31:59 INFO mapreduce.Job:  map 0% reduce 0%
19/11/03 02:32:17 INFO mapreduce.Job:  map 25% reduce 0%
19/11/03 02:32:19 INFO mapreduce.Job:  map 50% reduce 0%
19/11/03 02:32:20 INFO mapreduce.Job:  map 100% reduce 0%
19/11/03 02:32:20 INFO mapreduce.Job: Job job_1572771724749_0005 completed successfully
19/11/03 02:32:21 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=684948
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=5197024
HDFS: Number of bytes written=0
HDFS: Number of read operations=19
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Launched map tasks=4
Data-local map tasks=4
Total time spent by all maps in occupied slots (ms)=65214
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=65214
Total vcore-milliseconds taken by all map tasks=65214
Total megabyte-milliseconds taken by all map tasks=66779136
Map-Reduce Framework
Map input records=53292
Map output records=53292
Input split bytes=636
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=505
CPU time spent (ms)=14390
Physical memory (bytes) snapshot=1014005760
Virtual memory (bytes) snapshot=6245519360
Total committed heap usage (bytes)=939524096
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
19/11/03 02:32:21 INFO mapreduce.ExportJobBase: Transferred 4.9563 MB in 35.2544 seconds (143.96 KB/sec)
19/11/03 02:32:21 INFO mapreduce.ExportJobBase: Exported 53292 records.
19/11/03 02:32:21 INFO mapreduce.ExportJobBase: Starting to migrate data from staging table to destination.
19/11/03 02:32:21 INFO manager.SqlManager: Migrated 53292 records from `card_transactions_stage` to `card_transactions`
[cloudera@quickstart Downloads]$ 

No comments:

Post a Comment