Instructions:
- An actor’s productivity is defined as the number of movies he/she has played.
- Write a query to return the
category_id, actor_id and number of movies
by the most productive actor in that category. - For example: John Doe filmed the most action movies, your query will return John as the result for action movie category.
- Do this for every movie category.
Table 1: actor
col_name | col_type -------------+-------------------------- actor_id | integer first_name | text last_name | text
Table 2: film_actor
Films and their casts
col_name | col_type -------------+-------------------------- actor_id | smallint film_id | smallint
Table 3: film_category
A film can only belong to one category
col_name | col_type -------------+-------------------------- film_id | smallint category_id | smallint
Sample results
category_id | actor_id | num_movies -------------+----------+------------ 1 | 50 | 6 2 | 150 | 6 3 | 17 | 7 4 | 86 | 6 5 | 196 | 6 6 | 48 | 6 7 | 7 | 7
Expected results
Solution postgres
WITH actor_movies AS (
SELECT
FC.category_id,
FA.actor_id,
COUNT(DISTINCT F.film_id) num_movies
FROM film_actor FA
INNER JOIN film F
ON F.film_id = FA.film_id
INNER JOIN film_category FC
ON FC.film_id = F.film_id
GROUP BY FC.category_id, FA.actor_id
)
SELECT category_id, actor_id, num_movies
FROM (
SELECT
category_id,
actor_id,
num_movies,
ROW_NUMBER()OVER(PARTITION BY category_id ORDER BY num_movies DESC) AS productivity_idx
FROM actor_movies
) X
WHERE productivity_idx = 1;
Explanation
This query retrieves the most productive actors for each film category based on the number of movies they have appeared in. It does this by first creating a CTE (Common Table Expression) called actor_movies
that combines data from three tables: film_actor
, film
, and film_category
. It selects the category ID, actor ID, and the count of distinct film IDs for each actor and category combination, and groups the results by category and actor.
Then, the main query selects the category ID, actor ID, and number of movies from the actor_movies
CTE, and adds a calculated column called productivity_idx
. This column is generated using the ROW_NUMBER()
function, which assigns a sequential number to each row within a category based on the number of movies the actor has appeared in, in descending order. So, the most productive actor for each category will have a productivity_idx
of 1.
Finally, the outer query filters the results to only show the rows where productivity_idx
equals 1, i.e., the most productive actor for each category.