Skip to main content

SQL Window functions (OVER, PARTITION_BY, ...)

Introduction


When you run an SQL Query you select rows, but what if you want to have a summary per multiple rows, for example you want to get the top basketball for each country, in this case we don't only group by country, but we want also to get the top player for each of the country.  This means we want to group by country and then select the first player.  In standard SQL we do this with joining with same table, but we could also use partition by and windowing functions.
For each row the window function is computed across the rows that fall into the same partition as the current row. 
Window functions are permitted only in the SELECT list and the ORDER BY clause of the query
They are forbidden elsewhere, such as in GROUP BYHAVING and WHERE clauses. This is because they logically execute after the processing of those clauses

Over, Partition By

So in order to do a window we need this input:

- How do we want to group the data which windows do we want to have?

so 

def createWindowOnRows(ByWhat): DataWithWindows


In order to specify the ByWhat part you use the OVER and also the PARTITION BY by simply specif

OVER (PARTITION BY depname)
Now that we know that we can partition the rows that we get back by department name, all we need to do is to combine it with a simple standard SQL query. 

The SQL query that we are going to combine that windowing function is a standard SQL query that contains aggregation functions but do not contain in our simple case a group by because we already do the group by with the partition by. 

So, our basic structure of simple query that would use over and partition by in order to window the data to get aggregated data as output is: 
SELECT x,y,count(z), OVER (PARTITION BY x) FROM MyTable
Note the following:

  1. We have a standard SQL SELECT statement
  2. One of the fields in our case in SELECT was in window we said OVER (PARTITION BY x)
  3. One of the fields in our case had an aggregation we had count(z)
  4. One of the fields (we had a total of 3) in select did not have any aggregation nor any window therefore we get an actual row for each such value in our case that was employee number, so we did get a row for each such employee number it was not aggregated but for each such employee number we did get the average salary for his department.
So what we see is that the window function 

performed a calculation across a set of rows that are somehow related to the current row.  The rows retain their separate identity.

It's as if the current row has looked around into multiple rows in the window and added response from the window into the current row.

You can treat the OVER as a function just as any other function like COUNT only in this case the function is a window function where in the input of this function you give it the way you want to partition your data your windowing.

Rank()


The window function will always contain an OVER clause, this is the main function, the main that we call. The partition by is how we ask SQL to divide the rows to window them.  

With rank you control the order of the rows that fall into the window add a new column with name rank according to the order by
SELECT depname, empno, salary, rank() OVER (PARTITION BY depname ORDER BY salary DESC) FROM empsalary;
Note that it would produce a new value, the new value in the select is a new column named rank,  this new column would have the order that we specify in the ORDER BY salary DESC that resides inside the OVER function.

You can have the same values in the rank column you would have actually the same values across departments because the rank is computed per each window.

Comments

Popular posts from this blog

Functional Programming in Scala for Working Class OOP Java Programmers - Part 1

Introduction Have you ever been to a scala conf and told yourself "I have no idea what this guy talks about?" did you look nervously around and see all people smiling saying "yeah that's obvious " only to get you even more nervous? . If so this post is for you, otherwise just skip it, you already know fp in scala ;) This post is optimistic, although I'm going to say functional programming in scala is not easy, our target is to understand it, so bare with me. Let's face the truth functional programmin in scala is difficult if is difficult if you are just another working class programmer coming mainly from java background. If you came from haskell background then hell it's easy. If you come from heavy math background then hell yes it's easy. But if you are a standard working class java backend engineer with previous OOP design background then hell yeah it's difficult. Scala and Design Patterns An interesting point of view on scala, is

Alternatives to Using UUIDs

  Alternatives to Using UUIDs UUIDs are valuable for several reasons: Global Uniqueness : UUIDs are designed to be globally unique across systems, ensuring that no two identifiers collide unintentionally. This property is crucial for distributed systems, databases, and scenarios where data needs to be uniquely identified regardless of location or time. Standardization : UUIDs adhere to well-defined formats (such as UUIDv4) and are widely supported by various programming languages and platforms. This consistency simplifies interoperability and data exchange. High Collision Resistance : The probability of generating duplicate UUIDs is extremely low due to the combination of timestamp, random bits, and other factors. This collision resistance is essential for avoiding data corruption. However, there are situations where UUIDs may not be the optimal choice: Length and Readability : UUIDs are lengthy (typically 36 characters in their canonical form) and may not be human-readable. In URLs,

Bellman Ford Graph Algorithm

The Shortest path algorithms so you go to google maps and you want to find the shortest path from one city to another.  Two algorithms can help you, they both calculate the shortest distance from a source node into all other nodes, one node can handle negative weights with cycles and another cannot, Dijkstra cannot and bellman ford can. One is Dijkstra if you run the Dijkstra algorithm on this map its input would be a single source node and its output would be the path to all other vertices.  However, there is a caveat if Elon mask comes and with some magic creates a black hole loop which makes one of the edges negative weight then the Dijkstra algorithm would fail to give you the answer. This is where bellman Ford algorithm comes into place, it's like the Dijkstra algorithm only it knows to handle well negative weight in edges. Dijkstra has an issue handling negative weights and cycles Bellman's ford algorithm target is to find the shortest path from a single node in a graph t