PARTITION BY explained
The PARTITION BY clause is a feature of the SQL language that allows you to partition or group the rows of a result set based on one or more columns. The clause is often used in conjunction with the OVER clause to perform complex analytical calculations.
Here's an example of how to use the PARTITION BY clause in SQL:
Let's say we have a table called "sales_data" with the following columns:
sale_id | customer_id | sale_date | sale_amount |
1 | 101 | 2022-01-01 | 100 |
2 | 102 | 2022-01-01 | 150 |
3 | 101 | 2022-01-02 | 200 |
4 | 103 | 2022-01-02 | 75 |
5 | 102 | 2022-01-03 | 50 |
6 | 101 | 2022-01-03 | 125 |
We can use the PARTITION BY clause to group the sales data by customer and sale date. Here's the SQL code to do that:
SELECT customer_id, sale_date, SUM(sale_amount) OVER (PARTITION BY customer_id, sale_date) AS total_sales
FROM sales_data
In this example, we're selecting the customer ID, sale date, and the total sales amount for each customer on each sale date. The PARTITION BY clause is used to group the sales data by customer ID and sale date. The SUM function is then used to calculate the total sales for each group.
The result set would look like this:
customer_id | sale_date | total_sales |
101 | 2022-01-01 | 100 |
101 | 2022-01-02 | 200 |
101 | 2022-01-03 | 125 |
102 | 2022-01-01 | 150 |
102 | 2022-01-03 | 50 |
103 | 2022-01-02 | 75 |
As you can see, the result set is grouped by customer ID and sale date, and the total sales for each group are calculated using the SUM function.
Here are a few more examples of using the PARTITION BY clause:
- Calculate the rank of each product by category based on the number of units sold:
SELECT product_name, category, units_sold,
RANK() OVER (PARTITION BY category ORDER BY units_sold DESC) AS rank
FROM sales_data
This code would calculate the rank of each product by category based on the number of units sold. The result set would include the product name, category, units sold, and rank for each product within its category.
- Calculate the running total of sales for each month and year:
SELECT sale_date, sale_amount,
SUM(sale_amount) OVER (PARTITION BY EXTRACT(YEAR FROM sale_date), EXTRACT(MONTH FROM sale_date)
ORDER BY sale_date) AS running_total
FROM sales_data
This code would calculate the running total of sales for each month and year. The result set would include the sale date, sale amount, and the running total for each month and year.