In the version of HDP1.0 we had only HDFS and MR
HDP 2.0 we started having HDFS , MR and YARN
With Spark YARN, MESOS or Kunernetes can be used as negotiator
HDFS : Addresses distributed storage issues
Pig : It is scripting language which can be used for doing ETL kind of jobs. Int he data pipeline we have to do cleaning using PIG scripts and then load into Hive table.
Hbase : It is Nosql database which provides ACID behavour. With the Hive we will not be able to update or delete the record . With Hbase we will be able to update or delete the record.
There is a way to make Hive table accessible in Hbase.
Oozie : It will be used for scheduling jobs.
Usually scenarios involving processing of table goes to hive, search for a key related use cases will fall in Hbase
Pig uses only Mapper phase , all the other components will be using Mapper and Reducer
The Balanced approach of 2-racks and 3-copies in Rack awareness mechanism is adopted to
Minimize Write-bandwidth and Maximize Redundancy.
Name node federation concept is meant for -
Load sharing.
In which of the following scenarios would the introduction of combiner can lead to wrong results -
Calculating the average.
Consider the following table structure:students (name STRING,id INT,subjects ARRAY,feeDetails MAP,phoneNumber STRUCT ) . To list the subjects taken by each student, we can use the following query, which executes successfully: select name, explode(subjects) from students;
False
Which of the following work-flow is valid in MR
map->partition->shuffle->sort->reduce.
Name node federation : metadata can be divieded to other node. It is for load sharing.
Secondary node : for checkpointing and tollerance.
HDP 2.0 we started having HDFS , MR and YARN
With Spark YARN, MESOS or Kunernetes can be used as negotiator
HDFS : Addresses distributed storage issues
Pig : It is scripting language which can be used for doing ETL kind of jobs. Int he data pipeline we have to do cleaning using PIG scripts and then load into Hive table.
Hbase : It is Nosql database which provides ACID behavour. With the Hive we will not be able to update or delete the record . With Hbase we will be able to update or delete the record.
There is a way to make Hive table accessible in Hbase.
Oozie : It will be used for scheduling jobs.
Usually scenarios involving processing of table goes to hive, search for a key related use cases will fall in Hbase
Pig uses only Mapper phase , all the other components will be using Mapper and Reducer
The Balanced approach of 2-racks and 3-copies in Rack awareness mechanism is adopted to
Minimize Write-bandwidth and Maximize Redundancy.
Name node federation concept is meant for -
Load sharing.
In which of the following scenarios would the introduction of combiner can lead to wrong results -
Calculating the average.
Consider the following table structure:students (name STRING,id INT,subjects ARRAY
False
Which of the following work-flow is valid in MR
map->partition->shuffle->sort->reduce.
Name node federation : metadata can be divieded to other node. It is for load sharing.
Secondary node : for checkpointing and tollerance.
Too Good article,Thank you.
ReplyDeleteKeep Doing your Job..
Big Data Hadoop Course