182. Top 5 content by number of watchers

medium netflix

  • ********************************5***************************************************************************************
  • ****************************************************************************************************************************************

*********

************************************************************************************************************************************************************************************************************************************************************************************************************************

**************************************************************************************

 

Table 1: netflix_account

 col_name    | col_type
-------------+---------------------
 account_id  | bigint
 country     | character varying(2)
 created_dt  | date

Table 2: netflix_content

 col_name          | col_type
-------------------+---------------------
 content_id        | bigint
 content_type      | character varying(10) -- either TV or Movie
 original_language | character varying(2) -- 'EN',  'ES', 'CN', 'KO'
 release_date      | date

Table 3: netflix_daily_streaming

Daily aggregated watch time by account by content.

  col_name    | col_type
--------------+---------------------
 date         | date
 account_id   | bigint
 content_id   | bigint
 duration     | int -- in seconds

Sample results

 content_id | uu_cnt
------------+--------
    1000638 |     21
    1000491 |     21
    1000641 |     21
    1000577 |     20
    1000224 |     20

Solution postgres

WITH user_watch_time AS (
    SELECT C.content_id, S.account_id, SUM(duration) watch_time_in_seconds
    FROM netflix_content C
    INNER JOIN netflix_daily_streaming S
    ON S.content_id = C.content_id
    WHERE S.date <= C.release_date + 28
    GROUP BY 1,2
)
SELECT content_id, COUNT(CASE WHEN watch_time_in_seconds > 600 THEN account_id ELSE NULL END) uu_cnt
FROM user_watch_time
GROUP BY content_id
ORDER BY 2 DESC
LIMIT 5;
    

Explanation

This query is trying to find out which 5 Netflix contents have the most number of users who watched them for more than 10 minutes (600 seconds) within the first 28 days of their release.

The query starts with creating a temporary table called user_watch_time, which contains the content_id, account_id, and the total watch time in seconds for each user who watched a particular content within the first 28 days of its release. This is achieved by joining two tables - netflix_content and netflix_daily_streaming - on the content_id column, filtering the records where the streaming date is within 28 days of the release date, and then grouping the records by content and account.

The second part of the query then selects the content_id and the count of account_ids (UU count) where the watch time was more than 10 minutes (600 seconds). The COUNT function is used in combination with a CASE statement to only count the records where the watch time is more than 10 minutes, and return NULL for all other records. The results are then grouped by content_id, in descending order of the UU count, and only the top 5 records are returned using the LIMIT keyword.

Overall, this query is useful for identifying the most popular contents among Netflix

Expected results


More Netflix questions

ID Title Level FTPR
181 Users who watched less than one hour of Netflix netflix easy
19%
183 Top 5 movies by number of watchers netflix easy
46%
184 Top 5 movies by watch time netflix medium
16%
185 Top 5 non-English contents by watch time netflix medium
49%