Skip to main content

Posts

Showing posts from June, 2017

Apache Spark Components CheatSheet

Troubled by confusing concepts such as Executors, Node, RDD, Task in spark? Invest just 2 minutes of your time to make some order in this mess! I'll clean up these apache spark concepts for you! Spark building blocks: executor,tasks,cache,sparkcontext,cluster manager Executor => Multiple Tasks:  is a JVM process sitting on all nodes.  Executors receive tasks (jars with your code) deserialize it, and run it as a task. Executors utilize cache  so that the tasks can run faster. Node => Multiple Executors: Each node has multiple executors. RDD => Big DataStructure : Its main strength is that it represents data which cannot be stored on a single machine, so its data is distributed, partitioned, split across computers. Input => RDD: Every RDD is born out of some input like a text file, hadoop files etc. Output => RDD : The output of functions in spark can produce an RDD.  So it's like one function after another each receives an input RDD and

Best practices for solving programming Interview questions

You see it's much easier than you think there exists a limit set of rules you should apply to most of the programming interview questions which involves algorithms and data structures. I have prepared a summary of them for you, just read below and get your tips for today. When you have no clue / Under panic attack => Brute Force! If you don't have a clue,  brute force  the fu**** question! In most cases the question you are presented with has a brute force solution.  Mention clearly that you are brute forcing it and say that the time complexity is O(n^2) or whatever it is.  Then think where do you waste time in your brute force solution, try to improve that part, in many cases, this will get you closer to the actual answer. By brute forcing you get to be familiarize with the problem better.  A common theme for brute forcing means you are going to have a for loop inside a foor loop something like the below, so it's great to get familiarize with common br

CS Interview - CS Topics To Study

Below is a list of topics to study for cs interview.  If you have any comments please let us know. The topics include data structures, sorting, search, graph search, math, compression, security, web, recursion, general programming, data science: kafka, hadoop, storm, UML, java, scalability, multithreading. For each topic we have a status column, use it for our own to track the status of your progress in the study this topic.  In addition, we have a tutorial column where we point to the best video or tutorial for study this topic, this doc is a work in progress, please let us know for any suggestion. Now by far the best book (although I think I could have created a better version) for studying for programing interviews is: " Cracking The Coding Interview "

Scalability CheatSheet - Paxos

Scalability CheatSheet— Part 3 — PAXOS We like journaling , seriously, it helps us avoid data corruption you could update a data and fail really — i mean there could be an electricity shutdown whatever, this is why we like journaling it’s append only, so nothing can really be corrupted except for what you append, but if it’s corrupted you don’t consider it as appended. Reading is now difficult you need to read all your journaling, so from time to time you create a snapshot of state, you see so now you have a snapshot and you augment it with your appended only jouranling. Just Read  — When you read you just read, you don’t lock, you don’t care about the world, it’s like you are high you just read and reading now does not disturb any writes just read without disturbing writes, its immutable, all is cool dude, we are reading. Collision  — What happens if two distributed machines try to write at the very same time same timestamp and different content on same key? OMG we jus

Scalability And Performance Split Your Data and Simplify

Introduction Here are a few guidelines for supporting scalability and performance in your systems. Simplify  — Simplify your code and design, you will gain from it an easier to understand and a scalable system, your life will be scalable, the more complex it is the less it’s possible to scale it out and the more complex your life is. Of Course if its not possible to simplify do not we are sane people, but many times we only think its not possible to simplify while it’s possible, so do ourself a favour and put some effort on this. 2. X Axis Duplicate Data Create multiple read only db’s or clones for your data and thus scale your reads. You can then use a read query to read across multiple copies of your data thus less strain on your servers. 3. Y Axis Split on business your data like microservices also in db level not only in service level, different roles, different db’s. Do you sell both underwear and have another line of business for atomic energy manufacturing? what do