Powered By Blogger

Saturday, November 9, 2019

Spark Summary


What is spark
RDD
Immutability
inmemory
DAG or execution plan
Transformations and actions

transformation are lazy
predicate Pushdown
map tranformation
filter tranformation
flatmap tranformation
reduceByKey tranformation
reduce - is an action


Difference between reduceByKey and reduce
Difference between reduceByKey and reduce is an action
reduceByKey works on key value pair,
reduce - works on individual element

1
2
3
4
to calculate sum of all the elements we can use reduce. So reduce works on single element

reduceByKey can lead to multiple results.
reduce will give single result.

reduceByKey will give the rdd as result as it might have multiple results.
reduce will be single result as scala variable, so it is an action


Understand the mindset of Spark developers.

No comments:

Post a Comment