Skip to main content

Apache Spark Components CheatSheet

Troubled by confusing concepts such as Executors, Node, RDD, Task in spark? Invest just 2 minutes of your time to make some order in this mess!


I'll clean up these apache spark concepts for you!

Spark building blocks: executor,tasks,cache,sparkcontext,cluster manager


Executor => Multiple Tasks: is a JVM process sitting on all nodes.  Executors receive tasks (jars with your code) deserialize it, and run it as a task.

Executors utilize cache so that the tasks can run faster.

Node => Multiple Executors: Each node has multiple executors.

RDD => Big DataStructure: Its main strength is that it represents data which cannot be stored on a single machine, so its data is distributed, partitioned, split across computers.

Input => RDD: Every RDD is born out of some input like a text file, hadoop files etc.

Output => RDD: The output of functions in spark can produce an RDD.  So it's like one function after another each receives an input RDD and outputs an output RDD, it's functional.

RDD[Type, Type] : RDD's are typed, they are data of a certain type.

RDD => 1,2,3: RDD's are ordered.

RDD => Zzzz: RDD's are lazily evaluated.  We said functional, didn't we? so you have multiple transformations on your data and only when you hit an action you need the actual data.

RDD => Partitioned: RDD's are partitioned between servers, we said it's big data so we need to partition it.

RDD => Array(thing1, thing2, thing3) : You can think of RDD's as a bunch of things.

Guys if you have any other mess and want me to cheatsheet something for you just comment below, also I would highly appreciate any comment's about this post please feedback me!

Comments

  1. Thanks for the information. The one thing I have noticed in this website is that you were continuously updating the changes that you have been made. It is a good sign to attract more people and I appreciate it. Hope more update and news from you.
    Oracle Training | Online Course | Certification in chennai | Oracle Training | Online Course | Certification in bangalore | Oracle Training | Online Course | Certification in hyderabad | Oracle Training | Online Course | Certification in pune | Oracle Training | Online Course | Certification in coimbatore

    ReplyDelete
  2. Really it was an awesome article… very interesting to read…. oracle training in chennai

    ReplyDelete
  3. Infycle Technologies is the best software training center in Chennai and is widely known for its excellence in giving the best software training in Chennai. Providing quality software programming training with 100% assured placement & to build a strong career for every individual and young professionals in the software industry is the ultimate aim of Infycle Technologies. Apart from all, the students love the 100% practical training, which is the specialty of Infycle Technologies. To proceed with your career with a solid base, reach Infycle Technologies through 7502633633.Best Software Training Center in Chennai | Infycle Technologies

    ReplyDelete

  4. This post is so interactive and informative.keep update more information...
    ccna Training in Tambaram
    ccna course in Chennai

    ReplyDelete
  5. Red Gate .NET Reflector Crack is a program with which users can extract the source code for Windows programs and apply the required changes.Red Gate .NET Reflector Crack

    ReplyDelete
  6. Beyond Compare Key License Keygen fully lets key's the latest stage to give you various countenances for the same data format without .Beyond Compare Crack</

    ReplyDelete
  7. Surprise Quotes For Him our man, despite his gruff look, longs to be cherished and wanted by you, furthermore on hear that you just love him. Surprise Quotes For Him

    ReplyDelete


  8. This is a very well-written piece. Keep posting great things on your page. Your blog is wonderful.
    https://softkeygen.com/scrivener-crack-license-key/

    ReplyDelete

Post a Comment

Popular posts from this blog

Functional Programming in Scala for Working Class OOP Java Programmers - Part 1

Introduction Have you ever been to a scala conf and told yourself "I have no idea what this guy talks about?" did you look nervously around and see all people smiling saying "yeah that's obvious " only to get you even more nervous? . If so this post is for you, otherwise just skip it, you already know fp in scala ;) This post is optimistic, although I'm going to say functional programming in scala is not easy, our target is to understand it, so bare with me. Let's face the truth functional programmin in scala is difficult if is difficult if you are just another working class programmer coming mainly from java background. If you came from haskell background then hell it's easy. If you come from heavy math background then hell yes it's easy. But if you are a standard working class java backend engineer with previous OOP design background then hell yeah it's difficult. Scala and Design Patterns An interesting point of view on scala, is

Alternatives to Using UUIDs

  Alternatives to Using UUIDs UUIDs are valuable for several reasons: Global Uniqueness : UUIDs are designed to be globally unique across systems, ensuring that no two identifiers collide unintentionally. This property is crucial for distributed systems, databases, and scenarios where data needs to be uniquely identified regardless of location or time. Standardization : UUIDs adhere to well-defined formats (such as UUIDv4) and are widely supported by various programming languages and platforms. This consistency simplifies interoperability and data exchange. High Collision Resistance : The probability of generating duplicate UUIDs is extremely low due to the combination of timestamp, random bits, and other factors. This collision resistance is essential for avoiding data corruption. However, there are situations where UUIDs may not be the optimal choice: Length and Readability : UUIDs are lengthy (typically 36 characters in their canonical form) and may not be human-readable. In URLs,

Bellman Ford Graph Algorithm

The Shortest path algorithms so you go to google maps and you want to find the shortest path from one city to another.  Two algorithms can help you, they both calculate the shortest distance from a source node into all other nodes, one node can handle negative weights with cycles and another cannot, Dijkstra cannot and bellman ford can. One is Dijkstra if you run the Dijkstra algorithm on this map its input would be a single source node and its output would be the path to all other vertices.  However, there is a caveat if Elon mask comes and with some magic creates a black hole loop which makes one of the edges negative weight then the Dijkstra algorithm would fail to give you the answer. This is where bellman Ford algorithm comes into place, it's like the Dijkstra algorithm only it knows to handle well negative weight in edges. Dijkstra has an issue handling negative weights and cycles Bellman's ford algorithm target is to find the shortest path from a single node in a graph t