Window functions in SQL
What are window functions anyway?
Window function allow to you to do aggregates and calculations on a partition of rows. This can be useful if, for example you want to add a column that shows a cumulative sum over the category that the row belongs to. They can also be used to efficiently remove duplicate rows.
I have an automated setup for populating a PostgreSQL with mock data that can be tested with the queries below. The data is a collection of streamed music songs from a fictional music streaming service containing only the songs from the video game GTA Vice City. Songs don’t belong to albums but instead to a radio channel.
Deduplicate and keeping the latest ingested value
Partition by event_id which is unique, then select only the first row in each partition.
SELECT COUNT(*)
FROM
(SELECT *,
ROW_NUMBER() OVER (PARTITION BY (event_id) ORDER BY ingestion_time DESC) AS rn
FROM streams) dedup
WHERE rn=1;
Running count for number of streams
Compute a running count up until the current row.
SELECT
event_id,
song_name,
stream_started.
COUNT(*) OVER (PARTITION BY song_id ORDER BY stream_started ROWS BETWEEN UNBOUNDED
PRECEDING and CURRENT ROW) AS running_total_streams
FROM streams
ORDER BY stream_started;
Show previous and next stream time for song 0
We can look at leading and lagging rows for an individual row.
SELECT
event_id,
song_name,
stream_started,
LAG(stream_started, 1) OVER (ORDER BY stream_started) as previous_stream,
LEAD(stream_started, 1) OVER (ORDER BY stream_started) as previous_stream
FROM streams
WHERE song_id = 0;
Show total number of streams for the song and the total number of streams in the channel
Count the number of streams for a song and the total number of streams for all songs in the channel it belongs to.
SELECT
event_id,
song_name,
stream_started,
COUNT(song_id) OVER (PARTITION BY song_id) as total_streams_of_this_song,
COUNT(song_id) OVER (PARTITION BY channel) as total_streams_in_channel
FROM streams;
Top 3 ranked songs from each channel
WITH song_count AS (
SELECT song_name, COUNT(song_id) AS total_streams_on_channel, channel
FROM streams
GROUP BY song_id, song_name, channel
),
song_rank AS (
SELECT
song_name,
channel,
total_streams_on_channel,
DENSE_RANK() OVER (PARTITION BY channel ORDER BY total_streams_on_channel DESC) AS rank_on_channel
FROM song_count
)
SELECT *
FROM song_rank
WHERE rank_on_channel <= 3;
Find the most and least streamed song
WITH song_count AS (
SELECT song_name,
COUNT(song_id) AS total_streams
FROM streams
GROUP BY song_id, song_name
)
SELECT song_name, total_streams
FROM (
SELECT song_name,
MIN(total_streams) OVER() min_streams,
MAX(total_streams) OVER() max_streams,
total_streams
FROM song_count
) tmp
WHERE total_streams IN (min_streams, max_streams);