Powered By Blogger

Saturday, October 19, 2019

AWS Setup of HDFS

EC2 setup, free credits cannot be used for EMR cluster
AWS Management Console

Services - Search for EMR -
Create cluster
provide clustername - basancluster
select emr-5.27.0
which will have hadoop Ganglia hive

Can use m4.large machine - 0.10$/hr per instance cost


Google for ec2 pricing : can check the price of machines

number of instances 3
create key pair and configure the AWS to access cluster using PEM file

s3 is free in AWS
It will take some time to create the AWS cluster after the request.


Enable ssh to the master for accessing via ssh command
    click on create security group

inbound  : traffic coming inside the system, to access ssh create inbound rule

Edit inbound rules :
ssh 22 select : myip

Once this is done it will take some time for creating the cluster

Connect to the master node using below command
ssh -i filecreated.pem hadoop@ip.compute.amazonaws.com


AWS commands
hive
create database trendytech;
use trendytech;
show table

upload the file in s3 by creating bucket.(Search for S3 and create bucket under that folder and file
can br created)
In the services -> search for s3 buckets

loading data from s3 to table
load data  inpath 's3://basantech-basan/dataset.csv' into table country_input

remove the word local

Under the bucket create the folder and the folder path can be configured

Amazon - AWS - EMR - Elastic map reduce
Free tier is not valid for emr in AWS
Google - GCP- GoogleDataProc
Microsoft - Azure - HD Insight

AWS is very famous among all the clusters

Need not use Cloudera Manager.

hadoop fs -ls /user/hive/warehouse can be seen in elastic search as well.

Search on Services - EMR
Terminate to kill the service to kill the service and avoding getting charged on your credit card.

No comments:

Post a Comment