Alfresco: Sqoop basics

Sunday, October 20, 2019

Sqoop basics

Day2- sqoop

Only mappers will run. There will not be any reducers.

HDFS to RDBMS can be done - sqoop export
RDBMS to HDFS can be done which is called sqoop import

By default 4 mappers will be assigned for the sqoop import execution

To look table from hadoop environment

To list databases
sqoop-list-databases \
--connect "jdbc:mysql://quickstart.cloudera:3306" \
--username retail_dba \
--password cloudera

To check list of tables in a particular db
sqoop-list-tables \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username retail_dba \
--password cloudera

To check records in the table
sqoop-eval \
--connect "jdbc:mysql://quickstart.cloudera:3306" \
--username retail_dba \
--password cloudera \
--query "select * from retail_db.customers limit 10"

create a table in sql and insert some data
CREATE TABLE people (PersonID int,LastName varchar(255),FirstName varchar(255),Address varchar(255),City varchar(255));

to just sqoop one table run below command
========================================
sqoop import \
--connect jdbc:mysql://quickstart.cloudera:3306/retail_db \
--username root \
--password cloudera \
--table orders \
--m 1 \
--target-dir /queryresult

m implies number of mappers
target-dir implies where to copy the content of it.
target-dir should not be existing, so that system will create

Number of mappers if you increase its going to hit DB performance. Bound by number of connections.

For importing all the tables from the database
sqoop-import-all-tables \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username retail_dba \
--password cloudera \
--as-sequencefile \
-m 4 \
--warehouse-dir /user/cloudera/sqoopdir

--as-sequencefile supports fileformat.
It will bring table by table.

Directory will have subfolders /user/cloudera/sqoopdir same as tablename.
SO extensively warehouse-dir will be used

Alfresco

Sunday, October 20, 2019

Sqoop basics

No comments:

Post a Comment