If we want to have more than one reducer Partition will come into picture.
Partition will help in deciding which contents goes to which reducer.
Shuffle and sort handled by the framework.
What if we are having 2 reducers. Partitioner will come into picture.
Each mappers output will be split into 2 partitioner if we are having 2 reducers.
Before shuffling partitioning will happen.
Mapper ==> parttion + shuffle + sort ==> reducer
If we have one reducer we will get one output file, If we have 2 reducers we will get 2 output file.
If we are having 0 reducer then the output file will be equivalent to number of mappers.
partition helps in deciding which map output key goes to which reducer. Which is used only if we
are having multiple reducers.
By default system will have hash function which will partition data, if needed can be pverwritten.
hash function should be consistent. This means same key is assigned same partition always.
To optimise we can add the combiner , which acts
local aggregation happening in the map machine is called as combiner.
combiner works in mappers machine.
Same reducer code of reducer can be used as combiner code.
We can write custom combiner, it need not be same as Reducer code.
Because of combiner , improves parallelism and reduces the data transfer.
Mapper ==> Combiner => parttion + shuffle + sort ==> reducer
Partition will help in deciding which contents goes to which reducer.
Shuffle and sort handled by the framework.
What if we are having 2 reducers. Partitioner will come into picture.
Each mappers output will be split into 2 partitioner if we are having 2 reducers.
Before shuffling partitioning will happen.
Mapper ==> parttion + shuffle + sort ==> reducer
If we have one reducer we will get one output file, If we have 2 reducers we will get 2 output file.
If we are having 0 reducer then the output file will be equivalent to number of mappers.
partition helps in deciding which map output key goes to which reducer. Which is used only if we
are having multiple reducers.
By default system will have hash function which will partition data, if needed can be pverwritten.
hash function should be consistent. This means same key is assigned same partition always.
To optimise we can add the combiner , which acts
local aggregation happening in the map machine is called as combiner.
combiner works in mappers machine.
Same reducer code of reducer can be used as combiner code.
We can write custom combiner, it need not be same as Reducer code.
Because of combiner , improves parallelism and reduces the data transfer.
Mapper ==> Combiner => parttion + shuffle + sort ==> reducer
No comments:
Post a Comment