Powered By Blogger

Saturday, November 2, 2019

Sqoop increase number of mappers and how sqoop work

SELECT t.* FROM `order_items` AS t LIMIT 1

Gets one record, and gets the datatype of data.
it will build plain pojo class with getters and setters.
it will compile and creates the jar.

lets assume employee table is having primary key
1,2,3 ,...... 10000

Runs boundary vals query :
finds min and max, does (max -min)/4

1-25000 first mapper
250001-50000 second mapper
500001 -75000 third mapper
75001 -100000 4th mapper

By default we will have 4 mappers. and if needed
we can increase.


We might have to tune the quey based on the values.
by specifying bound value query, so that data
is not skewed.

There is no gurantee of having equal load across the mappers.

Changing the number of mappers
--num-mappers/map is the property to be used

We cannot increase the number of mappers to big number
as we will be bound by number of connections to
databse


[cloudera@quickstart ~]$ sqoop import --connect jdbc:mysql://quickstart.cloudera:3306/retail_db --username root --password cloudera --table order_items --warehouse-dir /user/training/sqoop_import/retail_db --delete-target-dir --num-mappers 8
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/11/02 02:44:52 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.0
19/11/02 02:44:52 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/11/02 02:44:52 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
19/11/02 02:44:52 INFO tool.CodeGenTool: Beginning code generation
19/11/02 02:44:53 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `order_items` AS t LIMIT 1
19/11/02 02:44:53 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `order_items` AS t LIMIT 1
19/11/02 02:44:53 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/a8f57d7254a81d47b1ecde7adf2f360b/order_items.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
19/11/02 02:44:56 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/a8f57d7254a81d47b1ecde7adf2f360b/order_items.jar
19/11/02 02:44:58 INFO tool.ImportTool: Destination directory /user/training/sqoop_import/retail_db/order_items deleted.
19/11/02 02:44:58 WARN manager.MySQLManager: It looks like you are importing from mysql.
19/11/02 02:44:58 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
19/11/02 02:44:58 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
19/11/02 02:44:58 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
19/11/02 02:44:58 INFO mapreduce.ImportJobBase: Beginning import of order_items
19/11/02 02:44:58 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
19/11/02 02:44:58 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
19/11/02 02:44:58 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
19/11/02 02:44:58 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
19/11/02 02:45:02 INFO db.DBInputFormat: Using read commited transaction isolation
19/11/02 02:45:02 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`order_item_id`), MAX(`order_item_id`) FROM `order_items`
19/11/02 02:45:02 INFO db.IntegerSplitter: Split size: 21524; Num splits: 8 from: 1 to: 172198
19/11/02 02:45:02 INFO mapreduce.JobSubmitter: number of splits:8
19/11/02 02:45:03 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1572629054486_0005
19/11/02 02:45:03 INFO impl.YarnClientImpl: Submitted application application_1572629054486_0005
19/11/02 02:45:03 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1572629054486_0005/
19/11/02 02:45:03 INFO mapreduce.Job: Running job: job_1572629054486_0005
19/11/02 02:45:14 INFO mapreduce.Job: Job job_1572629054486_0005 running in uber mode : false
19/11/02 02:45:14 INFO mapreduce.Job:  map 0% reduce 0%
19/11/02 02:46:05 INFO mapreduce.Job:  map 25% reduce 0%
19/11/02 02:46:09 INFO mapreduce.Job:  map 38% reduce 0%
19/11/02 02:46:18 INFO mapreduce.Job:  map 75% reduce 0%
19/11/02 02:46:31 INFO mapreduce.Job:  map 100% reduce 0%
19/11/02 02:46:32 INFO mapreduce.Job: Job job_1572629054486_0005 completed successfully
19/11/02 02:46:33 INFO mapreduce.Job: Counters: 31
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=1372048
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1028
HDFS: Number of bytes written=5408880
HDFS: Number of read operations=32
HDFS: Number of large read operations=0
HDFS: Number of write operations=16
Job Counters
Killed map tasks=1
Launched map tasks=8
Other local map tasks=8
Total time spent by all maps in occupied slots (ms)=368237
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=368237
Total vcore-milliseconds taken by all map tasks=368237
Total megabyte-milliseconds taken by all map tasks=377074688
Map-Reduce Framework
Map input records=172198
Map output records=172198
Input split bytes=1028
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=4827
CPU time spent (ms)=49080
Physical memory (bytes) snapshot=1767706624
Virtual memory (bytes) snapshot=12508205056
Total committed heap usage (bytes)=1294467072
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=5408880
19/11/02 02:46:33 INFO mapreduce.ImportJobBase: Transferred 5.1583 MB in 94.5463 seconds (55.868 KB/sec)
19/11/02 02:46:33 INFO mapreduce.ImportJobBase: Retrieved 172198 records.
[cloudera@quickstart ~]$ hdfs dfs -ls /user/training/sqoop_import/retail_db/order_itemsFound 9 items
-rw-r--r--   1 cloudera supergroup          0 2019-11-02 02:46 /user/training/sqoop_import/retail_db/order_items/_SUCCESS
-rw-r--r--   1 cloudera supergroup     635884 2019-11-02 02:46 /user/training/sqoop_import/retail_db/order_items/part-m-00000
-rw-r--r--   1 cloudera supergroup     667934 2019-11-02 02:46 /user/training/sqoop_import/retail_db/order_items/part-m-00001
-rw-r--r--   1 cloudera supergroup     671614 2019-11-02 02:46 /user/training/sqoop_import/retail_db/order_items/part-m-00002
-rw-r--r--   1 cloudera supergroup     671641 2019-11-02 02:46 /user/training/sqoop_import/retail_db/order_items/part-m-00003
-rw-r--r--   1 cloudera supergroup     678960 2019-11-02 02:46 /user/training/sqoop_import/retail_db/order_items/part-m-00004
-rw-r--r--   1 cloudera supergroup     692957 2019-11-02 02:46 /user/training/sqoop_import/retail_db/order_items/part-m-00005
-rw-r--r--   1 cloudera supergroup     694053 2019-11-02 02:46 /user/training/sqoop_import/retail_db/order_items/part-m-00006
-rw-r--r--   1 cloudera supergroup     695837 2019-11-02 02:46 /user/training/sqoop_import/retail_db/order_items/part-m-00007
[cloudera@quickstart ~]$


Observe that there are 8 files created as we have 8 mappers configured






No comments:

Post a Comment