Apache Spark Accumulators, Simplified !
Adi Polak

Adi Polak @adipolak

About: 1 out of 25 influential women in Software Development according to Apiumhub. I am a software developer who would like to learn more!

Location:
Global
Joined:
Dec 11, 2018

Apache Spark Accumulators, Simplified !

Publish Date: Jan 13 '20
14 1

This is part of the series Apache Spark bitesize.
Every post will take you less than 2 minutes to read, so you can learn something new while waiting for your coffee to brew.

Apache Spark Accumulators:

We can think about accumulators like counters, distributed counters.
Any simple value can be accumulated.
Basically, there are types that can be merged.
The result type being accumulated is the same as the types of elements being merged.

These are variables that are "added" to through an associative operation and can, therefore, be efficiently supported in parallel and distributed.
For the supported accumulator of numeric value types, Spark creates an optimized MapReduce.
However, programmers can add support for new types as well by implementing the abstract class AccumulatorV2.

Let's look at a simple example:


val accum = sc.accumulator(0) // create accumulator
val dataFrame = sc.parallelize(Array(1, 2, 3, 4)) //create DataFrame
dataFrame.foreach(x => accum += x) // distributed operation
accum.value // can be called only from driver
Enter fullscreen mode Exit fullscreen mode

What happened here? We created a new accumulator variable named accum, and a DataFrame named dataFrame.
We call foreach on the DataFrame and create the logic of - foreach x in DataFrame add its value to accum.
The foreach function acts in a distributed manner.
Later we can extract the value from the accumulator by calling .value Which can be called ( and return final answer ) only from the driver.

💡 We want your Ideas and Comments 💡

If some of the terms seem off, or you are not sure what they mean, checkout Apache Spark Basics.

Didn't find what you were looking for? write me.

❤️ Thank's for reading ❤️

Comments 1 total

  • Felix Terkhorn
    Felix TerkhornApr 7, 2020

    Very cool. I love that I am refreshing my Spark basics with these articles, despite my lack of sleep. Bites is good!

Add comment