Skip to main content

HyperLogLog The Easy Way

Think of your recipe box, filled with cards for all your favorite dishes. You want to know how many different recipes you have, but counting them all can be tedious.

HyperLogLog helps you get a close answer without the hassle. Here's the trick:

  1. Secret Code Machine: Imagine a machine that assigns a unique code to each recipe card, like a special fingerprint. This code doesn't reveal the recipe itself, but it guarantees each card has a different code.

  2. Counting Zeros: Now, for each code, you count the zeroes at the beginning, before the actual code starts. A code like "000123" has three leading zeroes, while "789456" has none.

  3. Zeroes Hint at Variety: The key thing is: the more unique recipes you have (more cards), the more likely you are to find a code with a lot of leading zeroes. It's like a bigger box - the more cards you add, the higher the chance of finding one with a long string of zeroes at the beginning.

  4. Smart Approximation: By checking the number of leading zeroes in each code, especially the one with the most, HyperLogLog can give you a very good idea of how many unique recipe cards (unique items) are in the box.

So, even though HyperLogLog doesn't physically count each card, it uses the information from those leading zeroes to give you a close estimate of the total number of unique recipes in your collection.

Comments

Popular posts from this blog

Functional Programming in Scala for Working Class OOP Java Programmers - Part 1

Introduction Have you ever been to a scala conf and told yourself "I have no idea what this guy talks about?" did you look nervously around and see all people smiling saying "yeah that's obvious " only to get you even more nervous? . If so this post is for you, otherwise just skip it, you already know fp in scala ;) This post is optimistic, although I'm going to say functional programming in scala is not easy, our target is to understand it, so bare with me. Let's face the truth functional programmin in scala is difficult if is difficult if you are just another working class programmer coming mainly from java background. If you came from haskell background then hell it's easy. If you come from heavy math background then hell yes it's easy. But if you are a standard working class java backend engineer with previous OOP design background then hell yeah it's difficult. Scala and Design Patterns An interesting point of view on scala, is

Alternatives to Using UUIDs

  Alternatives to Using UUIDs UUIDs are valuable for several reasons: Global Uniqueness : UUIDs are designed to be globally unique across systems, ensuring that no two identifiers collide unintentionally. This property is crucial for distributed systems, databases, and scenarios where data needs to be uniquely identified regardless of location or time. Standardization : UUIDs adhere to well-defined formats (such as UUIDv4) and are widely supported by various programming languages and platforms. This consistency simplifies interoperability and data exchange. High Collision Resistance : The probability of generating duplicate UUIDs is extremely low due to the combination of timestamp, random bits, and other factors. This collision resistance is essential for avoiding data corruption. However, there are situations where UUIDs may not be the optimal choice: Length and Readability : UUIDs are lengthy (typically 36 characters in their canonical form) and may not be human-readable. In URLs,

Bellman Ford Graph Algorithm

The Shortest path algorithms so you go to google maps and you want to find the shortest path from one city to another.  Two algorithms can help you, they both calculate the shortest distance from a source node into all other nodes, one node can handle negative weights with cycles and another cannot, Dijkstra cannot and bellman ford can. One is Dijkstra if you run the Dijkstra algorithm on this map its input would be a single source node and its output would be the path to all other vertices.  However, there is a caveat if Elon mask comes and with some magic creates a black hole loop which makes one of the edges negative weight then the Dijkstra algorithm would fail to give you the answer. This is where bellman Ford algorithm comes into place, it's like the Dijkstra algorithm only it knows to handle well negative weight in edges. Dijkstra has an issue handling negative weights and cycles Bellman's ford algorithm target is to find the shortest path from a single node in a graph t