191. Number of replies by group

medium twitter

***********

  • ******************************************************************************************2001.
  • *************************************************************************************
  • ***************************************************************

Table 1: tweets

   col_name     | col_type
----------------+--------------------------
 user_id        | bigint
 tweet_id       | bigint
 reply_tweet_id | bigint
 created_at     | timestamp
 content        | varchar(280) -- max of 280 chars

Table 2: twitter_experiment_users

Users that have been assigned into a treatment group or not for each ab testing. A user can only belong to either treatment or control group by a single experiment.

   col_name   | col_type
--------------+--------------------------
 exp_id       | bigint
 user_id      | bigint
 is_treatment | boolean

Sample results

   group   |      avg_replies
-----------+------------------------
 control   | 0.40000000000000000000
 treatment | 0.62500000000000000000

Solution postgres

WITH user_tweet_replies AS (
    SELECT user_id, COUNT(DISTINCT tweet_id) AS num_tweets, SUM(num_replies) AS total_replies
    FROM (
        SELECT T1.user_id, T1.tweet_id, COUNT(T2.tweet_id) AS num_replies
        FROM tweets T1
        LEFT JOIN tweets T2
        ON T1.tweet_id = T2.reply_tweet_id
        GROUP BY 1,2
    ) X
    GROUP BY user_id
),
exp_2001 AS (
    SELECT user_id, is_treatment
    FROM twitter_experiment_users
    WHERE exp_id = 2001
)
SELECT
    CASE WHEN E.is_treatment = true then 'treatment' ELSE 'control' END AS group,
    SUM(total_replies) * 1.0 / SUM(num_tweets) AS avg_replies
FROM user_tweet_replies R
INNER JOIN exp_2001 E
ON E.user_id = R.user_id
GROUP BY 1;
    

Explanation

This query is analyzing data related to a Twitter experiment (with ID 2001) that involves a treatment and control group. The goal is to compare the average number of replies per tweet for users in the treatment vs. control group.

The query starts by creating a temporary table called "user_tweet_replies" using a subquery. This subquery joins the "tweets" table with itself (using a left join) to count the number of replies each tweet has received. It then aggregates this data to count the total number of tweets and replies for each user.

The second temporary table, "exp_2001", is created to filter the users in the experiment by their group assignment (treatment vs. control).

Finally, the main query joins these two temporary tables and calculates the average number of replies per tweet for each group. It uses a CASE statement to label the groups as "treatment" or "control" and calculates the ratio of total replies to total tweets for each group. The results are grouped by group label and returned as the final output.

Expected results


More Twitter questions

ID Title Level FTPR
189 Campaign activation date twitter medium
19%
190 Twitter campaign spend report twitter hard
22%
192 Number of replies to the original tweet by group twitter medium
25%
193 Difference of replies by testing groups twitter medium
33%