Posted by Andrew, Aug. 31, 2022, 12:58 a.m.
Question #59 Why does LIMIT work, and not a WHERE condition?
Hi, working through this question, and basically have everything right except for the LIMIT portion of the solution. My query is:
WITH film_revenue AS(
SELECT
f.film_id,
c.name,
SUM(p.amount) as total_revenue
FROM film AS f
LEFT JOIN inventory AS inv
ON f.film_id = inv.film_id
LEFT JOIN rental AS r
ON inv.inventory_id = r.inventory_id
LEFT JOIN payment AS p
ON r.rental_id = p.rental_id
INNER JOIN film_category AS fc
ON inv.film_id = fc.film_id
INNER JOIN category AS c
ON fc.category_id = c.category_id
GROUP BY f.film_id, c.name
)
SELECT
fr.film_id,
fr.name,
fr.total_revenue * 100.0 / SUM(fr.total_revenue) OVER(PARTITION BY fr.name) AS revenue_percentage
FROM film_revenue AS fr
WHERE fr.film_id <= 10
ORDER BY fr.film_id;
Results in:
| film_id | name | revenue_percentage |
|---|---|---|
| 1 | Documentary | 49.2565304755525787 |
| 2 | Horror | 16.5716969317470257 |
| 3 | Documentary | 50.7434695244474213 |
| 4 | Horror | 28.7319974953036944 |
| 5 | Family | 100.0000000000000000 |
| 6 | Foreign | 100.0000000000000000 |
| 7 | Comedy | 100.0000000000000000 |
| 8 | Horror | 32.1916092673763306 |
| 9 | Horror | 22.5046963055729493 |
| 10 | Sports | 100.0000000000000000 |
So, the percentage returned is the film revenue's percentage, partitioned by the category, but only for the 10 rows. So if there is only 1 category for that film id, it is 100% of the category's revenue.
You change the WHERE clause into a LIMIT 10, and you get the correct result. But why is that? Are there certain rules in the PARTITION BY clause that that don't apply to the regular window function without the partition? In #58, the WHERE condition worked fine without the PARTITION BY clause so I assume that's where something is going wrong here.
Side note: I used LEFT JOINS from the film table, in the chance that one of the film ids from 1-10 did not have any revenue, so LIMIT 10 would still work (I think - please let me know if I'm wrong here!), but I'm more interested in why the WHERE condition does not work here.
EDIT: Same issue comes up in problem #60 and in #61. In #61, sub query is used specifically to bypass this issue, so I'm guessing that the WHERE condition is being run before the window function. In a situation where you cannot use LIMIT and you need to use a WHERE condition, you'd have to run a sub query for the columns you want to return that contains the partition, then use the WHERE clause outside of the sub query. Is this the optimal way to go about this problem?