Hadoop CheatSheet

Introduction
We have decided to aggregate in a single post the most important things to know about hadoop in a concise way. Let’s us know if you have any comments!

Hadoop

##########
## HDFS ##
##########

NameNode # => Managing filesystem namespace, if you loose it you have no pointers to your data, you practially lost your data.

DataNode # => You know it holds data, installed on each worker.

Block # => Each file split to B1,B2,.. where each block size 128MB replication is on blocks.  Name node knows that File X is split to B1,B2 and where.

##########
## YARN ##
##########

ResourceManager # => Like `NameNode` for computing, tracks NodeManagers and how available they are for work.

NodeManager # => Like `Datanode` for computing, offer computational resources run applications tasks in containers.

ApplicationMaster # => Each application has `ApplicationMaster` process which negotiates resources with `ResourceManager` which delivers a `container` descriptor back to `ApplicationMaster` processa and asks `NodeManager` to launch the `container.`
 
 ################
 ## Map Reduce ##
 ################
 
 Map(k1, v1) --> list(k2, v2) # => map takes keyvalue pair and produces zero or more intermediate keyvalue pairs
 
 Recduce(k2, list(v2)) --> list(k3, v3) # => Reduce take a single key and list of values and produces zero or more keyvalue, usually aggregation.

Summary

We kept that post small so you could rest :), but in general we went through what NameNode, DataNode, Block, ResourceManager, NodeManager, ApplicationMaster, Task are in a very short and concise way, isn’t that just great :) If you liked it please hit the share button below and leave a comment for any comment! :)

Comments

Cognex Technology5 January 2021 at 22:20
Cognex offers AWS Training in Chennai
using classroom and AWS Online Training globally.
ReplyDelete
Replies

Add comment

Code Code Code Blog

Search This Blog

Hadoop CheatSheet

Introduction
We have decided to aggregate in a single post the most important things to know about hadoop in a concise way. Let’s us know if you have any comments!

Hadoop

Summary

Labels

Comments

Post a Comment

Popular posts from this blog

Functional Programming in Scala for Working Class OOP Java Programmers - Part 1

Alternatives to Using UUIDs

Bellman Ford Graph Algorithm

Code Code Code Blog

Hadoop CheatSheet

Introduction We have decided to aggregate in a single post the most important things to know about hadoop in a concise way. Let’s us know if you have any comments! Hadoop

Summary

Labels

Comments

Post a Comment

Popular posts from this blog

Functional Programming in Scala for Working Class OOP Java Programmers - Part 1

Alternatives to Using UUIDs

Bellman Ford Graph Algorithm

Introduction
We have decided to aggregate in a single post the most important things to know about hadoop in a concise way. Let’s us know if you have any comments!

Hadoop