-
*****************************************************************************************************************************************
Table 1: netflix_account
col_name | col_type -------------+--------------------- account_id | bigint country | character varying(2) created_dt | date
Table 2: netflix_daily_streaming
Daily aggregated watch time by account by content.
col_name | col_type --------------+--------------------- date | date account_id | bigint content_id | bigint duration | int -- in seconds
Sample results
account_id | watch_time_in_seconds ------------+----------------------- 8000022 | 1584 8000194 | 1767 8000065 | 2141 8000053 | 1019
Solution postgres
SELECT A.account_id, SUM(COALESCE(S.duration, 0)) watch_time_in_seconds
FROM netflix_account A
LEFT JOIN netflix_daily_streaming S
ON S.account_id = A.account_id
AND S.date - A.created_dt <= 7
GROUP BY A.account_id
HAVING SUM(COALESCE(S.duration, 0)) < 3600;
Explanation
This query is trying to find the total amount of time that each Netflix user has spent watching content on the platform.
It does this by joining the netflix_account
and netflix_daily_streaming
tables on the account_id
field, which links each user to their viewing history. The LEFT JOIN
keyword is used to include all accounts, even if they haven't watched anything yet.
The COALESCE
function is used to handle cases where a user hasn't watched anything on a particular day, in which case the duration
field will be null. This function replaces null values with 0, so that the sum of all durations will always be a valid number.
The GROUP BY
clause is used to group the results by account_id
, which means that the total watch time will be calculated for each individual user.
Finally, the HAVING
clause is used to filter out any users whose total watch time is less than one hour (3600 seconds). This means that only users who have watched at least one hour of content in the past week will be included in the results.
Expected results