Posted by Sylvia, April 26, 2022, 1:58 p.m.
Question 102
I have two questions given the assumption below:
- A user can have multiple sessions with different duration and can be counted multiple times.
- Does it mean we only count the average OR the maximum duration for any session? It does not seem to make a difference whether I use "AVG(duration)" or "MAX(duration)" because it is on a user_id and session_id level, but I could be wrong.
- If a user has multiple sessions, should I use count(distinct session_id) OR count(distinct user_id)? I think it should be session_id based on the fact that a user can have multiple sessions.
Thanks in advance for your detailed explanations! I found the business sense aspect of the SQL questions to be more challenging than the technical aspect. :)
My Answer:
WITH session_user_duration AS
(
SELECT a.user_id, a.session_id,
AVG(b.duration) as duration
FROM session_web a
INNER JOIN session_web_duration b
ON a.session_id = b.session_id
GROUP BY a.user_id, a.session_id
),
session_user_label AS (
SELECT user_id,
session_id,
CASE WHEN duration < 10 THEN '< 10'
WHEN duration >= 10 AND duration < 20 THEN '10, 20'
WHEN duration >= 20 AND duration < 45 THEN '20, 45'
WHEN duration >= 45 AND duration < 60 THEN '45, 60'
WHEN duration >= 60 AND duration < 200 THEN '60, 200'
WHEN duration >= 200 AND duration < 600 THEN '200, 600'
WHEN duration >= 600 THEN '>= 600'
ELSE NULL
END AS duration_category
FROM session_user_duration
)
SELECT duration_category, count(distinct session_id) as count
FROM session_user_label
GROUP by duration_category
Solution:
WITH session_user_duration AS (
SELECT
W.session_id,
W.user_id,
MAX(D.duration) AS duration
FROM session_web W
INNER JOIN session_web_duration D
ON D.session_id = W.session_id
GROUP BY W.session_id, W.user_id
),
session_user_label AS (
SELECT session_id,
CASE WHEN duration < 10 THEN '< 10'
WHEN duration < 20 THEN '10, 20'
WHEN duration < 45 THEN '20, 45'
WHEN duration < 60 THEN '45, 60'
WHEN duration < 200 THEN '60, 200'
WHEN duration < 600 THEN '200, 600'
WHEN duration >= 600 THEN '>= 600'
ELSE NULL END AS duration_category,
user_id
FROM session_user_duration
)
SELECT duration_category, COUNT(DISTINCT user_id)
FROM session_user_label
GROUP BY duration_category
ORDER BY duration_category;