- ******************************* #182
- *****************************************************************************************
- ********************************************************************************************************************************************************
*********
***********************************************************************************************************************************************************************************************************************************************************************************************************************
**************************************************************************************
Table 1: netflix_account
col_name | col_type -------------+--------------------- account_id | bigint country | character varying(2) created_dt | date
Table 2: netflix_content
col_name | col_type -------------------+--------------------- content_id | bigint content_type | character varying(10) -- either TV or Movie original_language | character varying(2) -- 'EN', 'ES', 'CN', 'KO' release_date | date
Table 3: netflix_daily_streaming
Daily aggregated watch time by account by content.
col_name | col_type --------------+--------------------- date | date account_id | bigint content_id | bigint duration | int -- in seconds
Sample results
content_id | uu_cnt ------------+-------- 1000638 | 21 1000577 | 20 1000409 | 19 1000515 | 18 1000789 | 18
Solution postgres
WITH movies AS (
SELECT content_id, release_date
FROM netflix_content
WHERE content_type = 'Movie'
),
user_watch_time AS (
SELECT M.content_id, S.account_id, SUM(duration) watch_time_in_seconds
FROM movies M
INNER JOIN netflix_daily_streaming S
ON S.content_id = M.content_id
WHERE S.date <= M.release_date + 28
GROUP BY 1,2
)
SELECT content_id, COUNT(CASE WHEN watch_time_in_seconds > 600 THEN account_id ELSE NULL END) uu_cnt
FROM user_watch_time
GROUP BY content_id
ORDER BY 2 DESC
LIMIT 5;
Explanation
This query is selecting data from two tables: netflix_content and netflix_daily_streaming. It is using a common table expression (CTE) to create a temporary table named "movies" that includes the content_id and release_date from the netflix_content table, but only for rows where the content_type is 'Movie'.
The query then creates another temporary table named "user_watch_time" that joins the "movies" table with the "netflix_daily_streaming" table on content_id. It selects the content_id, account_id, and the sum of the duration (in seconds) for each viewing of that content by each account. However, it only includes rows where the date of the viewing is within 28 days of the release date of the movie.
Finally, the query selects the content_id and the count of accounts that watched the movie for more than 10 minutes (600 seconds). This is done using a CASE statement within a COUNT function. The results are grouped by content_id and ordered in descending order by the count of accounts. The LIMIT clause is used to limit the results to the top 5 movies with the highest count of accounts watching for more than 10 minutes.
Expected results