Introduction We have decided to aggregate in a single post the most important things to know about hadoop in a concise way. Let’s us know if you have any comments! Hadoop ########## ## HDFS ## ########## NameNode # => Managing filesystem namespace, if you loose it you have no pointers to your data, you practially lost your data. DataNode # => You know it holds data, installed on each worker. Block # => Each file split to B1,B2,.. where each block size 128MB replication is on blocks. Name node knows that File X is split to B1,B2 and where. ########## ## YARN ## ########## ResourceManager # => Like `NameNode` for computing, tracks NodeManagers and how available they are for work. NodeManager # => Like `Datanode` for computing, offer computational resources run applications tasks in containers. ApplicationMaster # => Each application has `ApplicationMaster` process which negotiates resources with `ResourceManager` which delivers a `containe
Software Engineering Best Practices, System Design, High Scale, Algorithms, Math, Programming Languages, Statistics, Machine Learning, Databases, Front Ends, Frameworks, Low Level Machine Structure, Papers and Computing, Computer Science Book Reviews - Everything!