193. Difference of replies by testing groups

medium twitter

********************************#192

  • ***********************************************************************************************************************************2001
  • ******************************************************************************************************************************************************************************************************************
  • ****************************************************************************************************

Table 1: tweets

   col_name     | col_type
----------------+--------------------------
 user_id        | bigint
 tweet_id       | bigint
 reply_tweet_id | bigint
 created_at     | timestamp
 content        | varchar(280) -- max of 280 chars

Table 2: twitter_experiment_users

Users that have been assigned into a treatment group or not for each ab testing. A user can only belong to either treatment or control group by a single experiment.

   col_name   | col_type
--------------+--------------------------
 exp_id       | bigint
 user_id      | bigint
 is_treatment | boolean

Sample results

          diff
------------------------
 0.33333333333333333333

Solution postgres

WITH user_tweet_replies AS (
    SELECT user_id, COUNT(DISTINCT tweet_id) AS num_tweets, SUM(num_replies) AS total_replies
    FROM (
        SELECT T1.user_id, T1.tweet_id, COUNT(T2.tweet_id) AS num_replies
        FROM (
            SELECT * FROM tweets
            WHERE reply_tweet_id IS NULL
        )  T1
        LEFT JOIN tweets T2
        ON T1.tweet_id = T2.reply_tweet_id
        GROUP BY 1,2
    ) X
    GROUP BY user_id
),
exp_2001 AS (
    SELECT user_id, is_treatment
    FROM twitter_experiment_users
    WHERE exp_id = 2001
)
SELECT
    SUM(CASE WHEN E.is_treatment is true then total_replies ELSE 0 END) *1.0 / SUM(CASE WHEN E.is_treatment is true then num_tweets ELSE 0 END) -
    SUM(CASE WHEN E.is_treatment is false then total_replies ELSE 0 END) *1.0 / SUM(CASE WHEN E.is_treatment is false then num_tweets ELSE 0 END) AS diff
FROM user_tweet_replies R
INNER JOIN exp_2001 E
ON E.user_id = R.user_id
;
    

Explanation

This query is calculating the difference in average number of replies per tweet between a treatment group and a control group in a Twitter experiment.

The first part of the query creates a temporary table called "user_tweet_replies" that calculates the total number of tweets and replies for each user in the dataset. It does this by joining the "tweets" table with itself on tweet_id and reply_tweet_id, and grouping the result by user_id and tweet_id.

The second part of the query creates another temporary table called "exp_2001" that selects the user_id and is_treatment columns from a separate table called "twitter_experiment_users", where the experiment ID is 2001.

Finally, the query joins the two temporary tables and calculates the difference in average number of replies per tweet between the treatment and control groups using a CASE statement. It does this by summing the total number of replies and tweets for each group and dividing the result. The final output is a single value representing the difference in averages between the two groups.

Expected results


More Twitter questions

ID Title Level FTPR
189 Campaign activation date twitter medium
19%
190 Twitter campaign spend report twitter hard
22%
191 Number of replies by group twitter medium
18%
192 Number of replies to the original tweet by group twitter medium
25%