74. The most productive actors by category

hard

Instructions:

  • An actor’s productivity is defined as the number of movies he/she has played.
  • Write a query to return the category_id, actor_id and number of moviesby the most productive actor in that category.
  • For example: John Doe filmed the most action movies, your query will return John as the result for action movie category.
  • Do this for every movie category.

Table 1: actor

  col_name   | col_type
-------------+--------------------------
 actor_id    | integer
 first_name  | text
 last_name   | text

Table 2: film_actor

Films and their casts

  col_name   | col_type
-------------+--------------------------
 actor_id    | smallint
 film_id     | smallint

Table 3: film_category

A film can only belong to one category

  col_name   | col_type
-------------+--------------------------
 film_id     | smallint
 category_id | smallint

Sample results

 category_id | actor_id | num_movies
-------------+----------+------------
           1 |       50 |          6
           2 |      150 |          6
           3 |       17 |          7
           4 |       86 |          6
           5 |      196 |          6
           6 |       48 |          6
           7 |        7 |          7

Expected results

Solution postgres

WITH actor_movies AS (
  SELECT 
    FC.category_id,
    FA.actor_id, 
    COUNT(DISTINCT F.film_id) num_movies
  FROM film_actor FA
  INNER JOIN film F
  ON F.film_id = FA.film_id
  INNER JOIN film_category FC
  ON FC.film_id = F.film_id
  GROUP BY FC.category_id, FA.actor_id
)
SELECT category_id, actor_id, num_movies
FROM (
	SELECT 
		category_id, 
		actor_id, 
		num_movies,
		ROW_NUMBER()OVER(PARTITION BY category_id ORDER BY num_movies DESC) AS productivity_idx
	FROM actor_movies
) X
WHERE productivity_idx = 1;
    

Explanation

This query retrieves the most productive actors for each film category based on the number of movies they have appeared in. It does this by first creating a CTE (Common Table Expression) called actor_movies that combines data from three tables: film_actor, film, and film_category. It selects the category ID, actor ID, and the count of distinct film IDs for each actor and category combination, and groups the results by category and actor.

Then, the main query selects the category ID, actor ID, and number of movies from the actor_movies CTE, and adds a calculated column called productivity_idx. This column is generated using the ROW_NUMBER() function, which assigns a sequential number to each row within a category based on the number of movies the actor has appeared in, in descending order. So, the most productive actor for each category will have a productivity_idx of 1.

Finally, the outer query filters the results to only show the rows where productivity_idx equals 1, i.e., the most productive actor for each category.



More ROW_NUMBER, RANK, DENSE_RANK questions

ID Title Level FTPR
208 Top 3 urls by testing groups apple hard
100%
203 Top product by country by month apple easy
100%
201 Top 3 students for each subject snap hard
100%
188 Top 1 popular question by department google hard
14%
166 Top 10 customers based on spend growth walmart medium
11%
165 Session stitching walmart hard
17%
158 First trip completion rate lyft medium
22%
157 Number of trips before a driver got banned lyft medium
10%
148 Most popular video category social medium
13%
145 Returning customers after first buy afterpay hard
16%
144 Third order afterpay medium
9%
142 First order date afterpay easy
40%
139 Poor first delivery experience doordash medium
17%
137 Extremely late first orders doordash medium
9%
135 Unlucky employees robinhood easy
16%
127 Average rating after 10th trip uber hard
11%
123 Top listing in the United States, United Kingdom and Canada airbnb medium
10%
122 Top country by wow growth airbnb hard
13%
120 First ever booking airbnb hard
10%
119 Top 2 countries by bookings airbnb hard
10%
116 Top answers day by device amazon easy
14%
110 Most popular product by category amazon medium
12%
109 Top 3 most popular product categories amazon easy
11%
105 Number of days gap between last two actions mobile hard
8%
90 Top artist report spotify medium
21%
88 Top song report spotify medium
22%
83 Top search_query in US and UK on new year's day search engine medium
6%
76 Districts with the most and least customers easy
12%
75 Top customer by movie category medium
26%
73 Number of days to become a happy customer hard
11%
72 Days when they became happy customers medium
10%
64 Top 2 films by category hard
14%
63 Top 5 customers by store hard
8%
62 Shortest film by category medium
8%