Instructions:
- An actor’s productivity is defined as the number of movies he/she has played.
- Write a query to return the
category_id, actor_id and number of movies
by the most productive actor in that category. - For example: John Doe filmed the most action movies, your query will return John as the result for action movie category.
- Do this for every movie category.
Table 1: actor
col_name | col_type -------------+-------------------------- actor_id | integer first_name | text last_name | text
Table 2: film_actor
Films and their casts
col_name | col_type -------------+-------------------------- actor_id | smallint film_id | smallint
Table 3: film_category
A film can only belong to one category
col_name | col_type -------------+-------------------------- film_id | smallint category_id | smallint
Sample results
category_id | actor_id | num_movies -------------+----------+------------ 1 | 50 | 6 2 | 150 | 6 3 | 17 | 7 4 | 86 | 6 5 | 196 | 6 6 | 48 | 6 7 | 7 | 7
Expected results
Solution postgres
WITH actor_movies AS (
SELECT
FC.category_id,
FA.actor_id,
COUNT(DISTINCT F.film_id) num_movies
FROM film_actor FA
INNER JOIN film F
ON F.film_id = FA.film_id
INNER JOIN film_category FC
ON FC.film_id = F.film_id
GROUP BY FC.category_id, FA.actor_id
)
SELECT category_id, actor_id, num_movies
FROM (
SELECT
category_id,
actor_id,
num_movies,
ROW_NUMBER()OVER(PARTITION BY category_id ORDER BY num_movies DESC) AS productivity_idx
FROM actor_movies
) X
WHERE productivity_idx = 1;
Explanation
This query retrieves the most productive actors for each film category based on the number of movies they have appeared in. It does this by first creating a CTE (Common Table Expression) called `actor_movies` that combines data from three tables: `film_actor`, `film`, and `film_category`. It selects the category ID, actor ID, and the count of distinct film IDs for each actor and category combination, and groups the results by category and actor.
Then, the main query selects the category ID, actor ID, and number of movies from the `actor_movies` CTE, and adds a calculated column called `productivity_idx`. This column is generated using the `ROW_NUMBER()` function, which assigns a sequential number to each row within a category based on the number of movies the actor has appeared in, in descending order. So, the most productive actor for each category will have a `productivity_idx` of 1.
Finally, the outer query filters the results to only show the rows where `productivity_idx` equals 1, i.e., the most productive actor for each category.