Powered By Blogger

Friday, November 15, 2019

Hadoop commands and MR execution


hadoop fs -ls /
List the contents of the folder in root directory of hadoop

cloudera@quickstart mapreduce_output]$ hadoop fs -ls /
Found 8 items
drwxrwxrwx   - hdfs     supergroup          0 2017-10-23 09:15 /benchmarks
drwxr-xr-x   - cloudera supergroup          0 2019-11-03 03:58 /data
drwxr-xr-x   - hbase    supergroup          0 2019-11-03 01:02 /hbase
drwxr-xr-x   - solr     solr                0 2017-10-23 09:18 /solr
drwxr-xr-x   - cloudera supergroup          0 2019-11-09 02:01 /spark_data
drwxrwxrwt   - hdfs     supergroup          0 2019-11-01 08:59 /tmp
drwxr-xr-x   - hdfs     supergroup          0 2019-11-03 01:14 /user
drwxr-xr-x   - hdfs     supergroup          0 2017-10-23 09:17 /var
[cloudera@quickstart mapreduce_output]$


hadoop creating directory

hadoop fs -mkdir /mapreduce_input



moving the file from local to hdfs directory

cloudera@quickstart mapreduce_input]$ pwd
/home/cloudera/Desktop/mapreduce_input
[cloudera@quickstart mapreduce_input]$ ls
file1.txt

hadoop fs -put /home/cloudera/Desktop/mapreduce_input/file1.txt /mapreduce_input

[cloudera@quickstart mapreduce_input]$ hadoop fs -put /home/cloudera/Desktop/mapreduce_input/file1.txt /mapreduce_input
[cloudera@quickstart mapreduce_input]$
[cloudera@quickstart mapreduce_input]$ hadoop fs -ls /mapreduce_input
Found 1 items
-rw-r--r--   1 cloudera supergroup         54 2019-11-15 23:18 /mapreduce_input/file1.txt


put command used to put the file into hdfs
get command used to pull content from the hdfs.


instead of put even we can use copyFromLocal
instead of get even we can use copyToLocal

copyFromLocal

hadoop fs -copyFromLocal /home/cloudera/Desktop/mapreduce_input/file1.txt /mapreduce_output/


Running jar

hadoop jar /home/cloudera/Desktop/wordCountNew.jar /mapreduce_input/file1.txt /mapreduce_output


While running the code,,, jar will be kept in edge node or local path.
The jar will be sent across the data nodes.

Request goes to yarn, talks to named node. Named node will give blocks and machines where the
data is been located. The jar will be sent to all these machines via yarn.

Hadoop works on the principle of Data locality .. code going to data.

No comments:

Post a Comment