183. Top 5 movies by number of watchers

easy netflix

  • ******************************* #182
  • *****************************************************************************************
  • ********************************************************************************************************************************************************




Table 1: netflix_account

 col_name    | col_type
 account_id  | bigint
 country     | character varying(2)
 created_dt  | date

Table 2: netflix_content

 col_name          | col_type
 content_id        | bigint
 content_type      | character varying(10) -- either TV or Movie
 original_language | character varying(2) -- 'EN',  'ES', 'CN', 'KO'
 release_date      | date

Table 3: netflix_daily_streaming

Daily aggregated watch time by account by content.

  col_name    | col_type
 date         | date
 account_id   | bigint
 content_id   | bigint
 duration     | int -- in seconds

Sample results

 content_id | uu_cnt
    1000638 |     21
    1000577 |     20
    1000409 |     19
    1000515 |     18
    1000789 |     18

Solution postgres

WITH movies AS (
  SELECT content_id, release_date
  FROM netflix_content
  WHERE content_type = 'Movie'
user_watch_time AS (
    SELECT M.content_id, S.account_id, SUM(duration) watch_time_in_seconds
    FROM movies M
    INNER JOIN netflix_daily_streaming S
    ON S.content_id = M.content_id
    WHERE S.date <= M.release_date + 28
    GROUP BY 1,2
SELECT content_id, COUNT(CASE WHEN watch_time_in_seconds > 600 THEN account_id ELSE NULL END) uu_cnt
FROM user_watch_time
GROUP BY content_id


This query is selecting data from two tables: netflix_content and netflix_daily_streaming. It is using a common table expression (CTE) to create a temporary table named "movies" that includes the content_id and release_date from the netflix_content table, but only for rows where the content_type is 'Movie'.

The query then creates another temporary table named "user_watch_time" that joins the "movies" table with the "netflix_daily_streaming" table on content_id. It selects the content_id, account_id, and the sum of the duration (in seconds) for each viewing of that content by each account. However, it only includes rows where the date of the viewing is within 28 days of the release date of the movie.

Finally, the query selects the content_id and the count of accounts that watched the movie for more than 10 minutes (600 seconds). This is done using a CASE statement within a COUNT function. The results are grouped by content_id and ordered in descending order by the count of accounts. The LIMIT clause is used to limit the results to the top 5 movies with the highest count of accounts watching for more than 10 minutes.

Expected results

More Netflix questions

ID Title Level FTPR
181 Users who watched less than one hour of Netflix netflix easy
182 Top 5 content by number of watchers netflix medium
184 Top 5 movies by watch time netflix medium
185 Top 5 non-English contents by watch time netflix medium