pi estimation compute heavy process
parallelize : across machines
val rdd1 = sc.parallelize(1 to 100000)
//independent of input number
val rdd2 = rdd1.filter { _ =>
val x = math.random
val y = math.random
x*x+y*y < 1
}
val count = rdd2.count()
println(s"Pi is roughly ${4.0*count/100000}")
circle equation is x*x+y*y = 1 so, we are checking what all points which are falling
within the circle
scala> val rdd1 = sc.parallelize(1 to 100000)
rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[10] at parallelize at:27
scala> //independent of input number
scala> val rdd2 = rdd1.filter { _ =>
| val x = math.random
| val y = math.random
| x*x+y*y < 1
|
| }
rdd2: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[11] at filter at:29
scala> val count = rdd2.count()
count: Long = 78668
scala> println(s"Pi is roughly ${4.0*count/100000}")
Pi is roughly 3.14672
scala>
parallelize : across machines
val rdd1 = sc.parallelize(1 to 100000)
//independent of input number
val rdd2 = rdd1.filter { _ =>
val x = math.random
val y = math.random
x*x+y*y < 1
}
val count = rdd2.count()
println(s"Pi is roughly ${4.0*count/100000}")
circle equation is x*x+y*y = 1 so, we are checking what all points which are falling
within the circle
scala> val rdd1 = sc.parallelize(1 to 100000)
rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[10] at parallelize at
scala> //independent of input number
scala> val rdd2 = rdd1.filter { _ =>
| val x = math.random
| val y = math.random
| x*x+y*y < 1
|
| }
rdd2: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[11] at filter at
scala> val count = rdd2.count()
count: Long = 78668
scala> println(s"Pi is roughly ${4.0*count/100000}")
Pi is roughly 3.14672
scala>
No comments:
Post a Comment