Powered By Blogger

Sunday, November 3, 2019

sqoop export

sqoop help
sqoop version
sqoop help eval
sqoop help import

1>query.out 2>query.err

1-refres to  stoutput
2- referes to error

sqoop import --connect jdbc:mysql://quickstart.cloudera:3306/retail_db --username root --password cloudera --table order_items --warehouse-dir /user/training/sqoop_import/retail_db --delete-target-dir --split-by order_item_order_id \
1>query.out 2>query.err
Above command will move the errors to different file
sqoop help eval 1>query1.out 2>query1.err


____

Crate mysql table
mysql -u root -p
cloudera

create database banking;
use banking;

CREATE TABLE card_transactions(
    card_id BIGINT,
    member_id BIGINT,
    amount INT(10),
    postcode INT(10),
    pos_id BIGINT,
    transaction_dt varchar(255),
    status varchar(255),
    PRIMARY KEY (card_id,transaction_dt)
);



hadoop fs -mkdir /data
hadoop fs -put
hadoop fs -put /home/cloudera/Downloads/card_transactions.csv /data

//Get the content from hdfs put in mysql

sqoop export \
--connect jdbc:mysql://quickstart.cloudera:3306/banking \
--username root \
--password cloudera \
--table card_transactions \
--export-dir /data/card_transactions.csv \
--fields-terminated-by ','



[cloudera@quickstart Downloads]$ hadoop fs -put /home/cloudera/Downloads/card_transactions.csv /data
[cloudera@quickstart Downloads]$ hdfs dfs -ls /data
Found 1 items
-rw-r--r--   1 cloudera supergroup    4829457 2019-11-03 01:38 /data/card_transactions.csv
[cloudera@quickstart Downloads]$ sqoop export \
> --connect jdbc:mysql://quickstart.cloudera:3306/banking \
> --username root \
> --password cloudera \
> --table card_transactions \
> --export-dir /data/card_transactions.csv \
> --fields-terminated-by ','
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/11/03 01:43:26 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.0
19/11/03 01:43:26 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/11/03 01:43:26 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
19/11/03 01:43:26 INFO tool.CodeGenTool: Beginning code generation
19/11/03 01:43:26 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `card_transactions` AS t LIMIT 1
19/11/03 01:43:26 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `card_transactions` AS t LIMIT 1
19/11/03 01:43:26 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/d64dd67768a60c506a534b82dfc5d5c9/card_transactions.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
19/11/03 01:43:28 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/d64dd67768a60c506a534b82dfc5d5c9/card_transactions.jar
19/11/03 01:43:28 INFO mapreduce.ExportJobBase: Beginning export of card_transactions
19/11/03 01:43:28 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
19/11/03 01:43:28 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
19/11/03 01:43:29 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
19/11/03 01:43:29 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
19/11/03 01:43:29 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
19/11/03 01:43:29 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
19/11/03 01:43:32 INFO input.FileInputFormat: Total input paths to process : 1
19/11/03 01:43:32 INFO input.FileInputFormat: Total input paths to process : 1
19/11/03 01:43:32 INFO mapreduce.JobSubmitter: number of splits:4
19/11/03 01:43:32 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
19/11/03 01:43:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1572771724749_0002
19/11/03 01:43:32 INFO impl.YarnClientImpl: Submitted application application_1572771724749_0002
19/11/03 01:43:32 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1572771724749_0002/
19/11/03 01:43:32 INFO mapreduce.Job: Running job: job_1572771724749_0002
19/11/03 01:43:38 INFO mapreduce.Job: Job job_1572771724749_0002 running in uber mode : false
19/11/03 01:43:38 INFO mapreduce.Job:  map 0% reduce 0%
19/11/03 01:43:46 INFO mapreduce.Job:  map 100% reduce 0%
19/11/03 01:43:46 INFO mapreduce.Job: Job job_1572771724749_0002 failed with state FAILED due to: Task failed task_1572771724749_0002_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0

19/11/03 01:43:46 INFO mapreduce.Job: Counters: 12
Job Counters
Failed map tasks=1
Killed map tasks=3
Launched map tasks=4
Data-local map tasks=4
Total time spent by all maps in occupied slots (ms)=13744
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=13744
Total vcore-milliseconds taken by all map tasks=13744
Total megabyte-milliseconds taken by all map tasks=14073856
Map-Reduce Framework
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
19/11/03 01:43:46 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
19/11/03 01:43:46 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 16.8784 seconds (0 bytes/sec)
19/11/03 01:43:46 INFO mapreduce.ExportJobBase: Exported 0 records.
19/11/03 01:43:46 ERROR tool.ExportTool: Error during export:
Export job failed!
at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:439)
at org.apache.sqoop.manager.SqlManager.exportTable(SqlManager.java:931)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:80)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:99)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
[cloudera@quickstart Downloads]$ sqoop export \
> --connect jdbc:mysql://quickstart.cloudera:3306/banking \
> --username root \
> --password cloudera \
> --table card_transactions \
> --import-dir /data/card_transactions.csv \
> --fields-terminated-by ','
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/11/03 01:44:33 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.0
19/11/03 01:44:33 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/11/03 01:44:33 ERROR tool.BaseSqoopTool: Error parsing arguments for export:
19/11/03 01:44:33 ERROR tool.BaseSqoopTool: Unrecognized argument: --import-dir
19/11/03 01:44:33 ERROR tool.BaseSqoopTool: Unrecognized argument: /data/card_transactions.csv
19/11/03 01:44:33 ERROR tool.BaseSqoopTool: Unrecognized argument: --fields-terminated-by
19/11/03 01:44:33 ERROR tool.BaseSqoopTool: Unrecognized argument: ,

Try --help for usage instructions.
[cloudera@quickstart Downloads]$ mysql
ERROR 1045 (28000): Access denied for user 'cloudera'@'localhost' (using password: NO)
[cloudera@quickstart Downloads]$ mysql -u root -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 25
Server version: 5.1.73 Source distribution

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> use banking
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> select count(*) from card_transactions;
+----------+
| count(*) |
+----------+
|      100 |
+----------+
1 row in set (0.00 sec)

Here if we see partial records are imported, that is because we have
duplicate records for the primary key constraint.



Avoid partial errors : By creating staging table

Create the staging table
CREATE TABLE card_transactions_stage(
    card_id BIGINT,
    member_id BIGINT,
    amount INT(10),
    postcode INT(10),
    pos_id BIGINT,
    transaction_dt varchar(255),
    status varchar(255),
    PRIMARY KEY (card_id,transaction_dt)
);


truncate table card_transactions;
sqoop export \
--connect jdbc:mysql://quickstart.cloudera:3306/banking \
--username root \
--password cloudera \
--table card_transactions \
--staging-table card_transactions_stage \
--export-dir /data/card_transactions.csv \
--fields-terminated-by ','



[cloudera@quickstart Downloads]$ sqoop export \
> --connect jdbc:mysql://quickstart.cloudera:3306/banking \
> --username root \
> --password cloudera \
> --table card_transactions \
> --staging-table card_transactions_stage \
> --export-dir /data/card_transactions.csv \
> --fields-terminated-by ','
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/11/03 02:08:37 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.0
19/11/03 02:08:37 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/11/03 02:08:37 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
19/11/03 02:08:37 INFO tool.CodeGenTool: Beginning code generation
19/11/03 02:08:38 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `card_transactions` AS t LIMIT 1
19/11/03 02:08:38 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `card_transactions` AS t LIMIT 1
19/11/03 02:08:38 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/a176c268cd0ba49b1ee8d0c0d6dd9e52/card_transactions.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
19/11/03 02:08:40 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/a176c268cd0ba49b1ee8d0c0d6dd9e52/card_transactions.jar
19/11/03 02:08:40 INFO mapreduce.ExportJobBase: Data will be staged in the table: card_transactions_stage
19/11/03 02:08:40 INFO mapreduce.ExportJobBase: Beginning export of card_transactions
19/11/03 02:08:40 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
19/11/03 02:08:41 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
19/11/03 02:08:42 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
19/11/03 02:08:42 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
19/11/03 02:08:42 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
19/11/03 02:08:42 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
19/11/03 02:08:46 INFO input.FileInputFormat: Total input paths to process : 1
19/11/03 02:08:46 INFO input.FileInputFormat: Total input paths to process : 1
19/11/03 02:08:46 INFO mapreduce.JobSubmitter: number of splits:4
19/11/03 02:08:46 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
19/11/03 02:08:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1572771724749_0003
19/11/03 02:08:47 INFO impl.YarnClientImpl: Submitted application application_1572771724749_0003
19/11/03 02:08:47 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1572771724749_0003/
19/11/03 02:08:47 INFO mapreduce.Job: Running job: job_1572771724749_0003
19/11/03 02:08:57 INFO mapreduce.Job: Job job_1572771724749_0003 running in uber mode : false
19/11/03 02:08:57 INFO mapreduce.Job:  map 0% reduce 0%
19/11/03 02:09:12 INFO mapreduce.Job:  map 100% reduce 0%
19/11/03 02:09:12 INFO mapreduce.Job: Job job_1572771724749_0003 failed with state FAILED due to: Task failed task_1572771724749_0003_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0

19/11/03 02:09:13 INFO mapreduce.Job: Counters: 12
Job Counters
Failed map tasks=1
Killed map tasks=3
Launched map tasks=4
Data-local map tasks=4
Total time spent by all maps in occupied slots (ms)=42896
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=42896
Total vcore-milliseconds taken by all map tasks=42896
Total megabyte-milliseconds taken by all map tasks=43925504
Map-Reduce Framework
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
19/11/03 02:09:13 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
19/11/03 02:09:13 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 30.4616 seconds (0 bytes/sec)
19/11/03 02:09:13 INFO mapreduce.ExportJobBase: Exported 0 records.
19/11/03 02:09:13 ERROR tool.ExportTool: Error during export:
Export job failed!
at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:439)
at org.apache.sqoop.manager.SqlManager.exportTable(SqlManager.java:931)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:80)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:99)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
[cloudera@quickstart Downloads]$

Observe mysql table, only staging table is having the entries
and the actual table is not affected.

mysql> use banking
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> select count(*) from card_transactions;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from card_transactions_stage;
+----------+
| count(*) |
+----------+
|      100 |
+----------+
1 row in set (0.00 sec)

mysql> 

No comments:

Post a Comment