Day2- sqoop
Only mappers will run. There will not be any reducers.
HDFS to RDBMS can be done - sqoop export
RDBMS to HDFS can be done which is called sqoop import
By default 4 mappers will be assigned for the sqoop import execution
To look table from hadoop environment
To list databases
sqoop-list-databases \
--connect "jdbc:mysql://quickstart.cloudera:3306" \
--username retail_dba \
--password cloudera
To check list of tables in a particular db
sqoop-list-tables \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username retail_dba \
--password cloudera
To check records in the table
sqoop-eval \
--connect "jdbc:mysql://quickstart.cloudera:3306" \
--username retail_dba \
--password cloudera \
--query "select * from retail_db.customers limit 10"
create a table in sql and insert some data
CREATE TABLE people (PersonID int,LastName varchar(255),FirstName varchar(255),Address varchar(255),City varchar(255));
to just sqoop one table run below command
========================================
sqoop import \
--connect jdbc:mysql://quickstart.cloudera:3306/retail_db \
--username root \
--password cloudera \
--table orders \
--m 1 \
--target-dir /queryresult
m implies number of mappers
target-dir implies where to copy the content of it.
target-dir should not be existing, so that system will create
Number of mappers if you increase its going to hit DB performance. Bound by number of connections.
For importing all the tables from the database
sqoop-import-all-tables \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username retail_dba \
--password cloudera \
--as-sequencefile \
-m 4 \
--warehouse-dir /user/cloudera/sqoopdir
--as-sequencefile supports fileformat.
It will bring table by table.
Directory will have subfolders /user/cloudera/sqoopdir same as tablename.
SO extensively warehouse-dir will be used
Only mappers will run. There will not be any reducers.
HDFS to RDBMS can be done - sqoop export
RDBMS to HDFS can be done which is called sqoop import
By default 4 mappers will be assigned for the sqoop import execution
To look table from hadoop environment
To list databases
sqoop-list-databases \
--connect "jdbc:mysql://quickstart.cloudera:3306" \
--username retail_dba \
--password cloudera
To check list of tables in a particular db
sqoop-list-tables \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username retail_dba \
--password cloudera
To check records in the table
sqoop-eval \
--connect "jdbc:mysql://quickstart.cloudera:3306" \
--username retail_dba \
--password cloudera \
--query "select * from retail_db.customers limit 10"
create a table in sql and insert some data
CREATE TABLE people (PersonID int,LastName varchar(255),FirstName varchar(255),Address varchar(255),City varchar(255));
to just sqoop one table run below command
========================================
sqoop import \
--connect jdbc:mysql://quickstart.cloudera:3306/retail_db \
--username root \
--password cloudera \
--table orders \
--m 1 \
--target-dir /queryresult
m implies number of mappers
target-dir implies where to copy the content of it.
target-dir should not be existing, so that system will create
Number of mappers if you increase its going to hit DB performance. Bound by number of connections.
For importing all the tables from the database
sqoop-import-all-tables \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username retail_dba \
--password cloudera \
--as-sequencefile \
-m 4 \
--warehouse-dir /user/cloudera/sqoopdir
--as-sequencefile supports fileformat.
It will bring table by table.
Directory will have subfolders /user/cloudera/sqoopdir same as tablename.
SO extensively warehouse-dir will be used
No comments:
Post a Comment