- ********************************5***************************************************************************************
- ****************************************************************************************************************************************
*********
************************************************************************************************************************************************************************************************************************************************************************************************************************
**************************************************************************************
Table 1: netflix_account
col_name | col_type -------------+--------------------- account_id | bigint country | character varying(2) created_dt | date
Table 2: netflix_content
col_name | col_type -------------------+--------------------- content_id | bigint content_type | character varying(10) -- either TV or Movie original_language | character varying(2) -- 'EN', 'ES', 'CN', 'KO' release_date | date
Table 3: netflix_daily_streaming
Daily aggregated watch time by account by content.
col_name | col_type --------------+--------------------- date | date account_id | bigint content_id | bigint duration | int -- in seconds
Sample results
content_id | uu_cnt ------------+-------- 1000638 | 21 1000491 | 21 1000641 | 21 1000577 | 20 1000224 | 20
Solution postgres
WITH user_watch_time AS (
SELECT C.content_id, S.account_id, SUM(duration) watch_time_in_seconds
FROM netflix_content C
INNER JOIN netflix_daily_streaming S
ON S.content_id = C.content_id
WHERE S.date <= C.release_date + 28
GROUP BY 1,2
)
SELECT content_id, COUNT(CASE WHEN watch_time_in_seconds > 600 THEN account_id ELSE NULL END) uu_cnt
FROM user_watch_time
GROUP BY content_id
ORDER BY 2 DESC
LIMIT 5;
Explanation
This query is trying to find out which 5 Netflix contents have the most number of users who watched them for more than 10 minutes (600 seconds) within the first 28 days of their release.
The query starts with creating a temporary table called user_watch_time, which contains the content_id, account_id, and the total watch time in seconds for each user who watched a particular content within the first 28 days of its release. This is achieved by joining two tables - netflix_content and netflix_daily_streaming - on the content_id column, filtering the records where the streaming date is within 28 days of the release date, and then grouping the records by content and account.
The second part of the query then selects the content_id and the count of account_ids (UU count) where the watch time was more than 10 minutes (600 seconds). The COUNT function is used in combination with a CASE statement to only count the records where the watch time is more than 10 minutes, and return NULL for all other records. The results are then grouped by content_id, in descending order of the UU count, and only the top 5 records are returned using the LIMIT keyword.
Overall, this query is useful for identifying the most popular contents among Netflix
Expected results