Forum

Posted by Surbhi, March 13, 2022, 4:10 p.m.

100. Why is my solution incorrect?

In the recommended solution, we use window functions but I think we should be able to solve this question using just one full outer join. Could you please help me understand why this approach is incorrect? select advertiser_id, sum(coalesce(ad_info.spend,0)) / sum(advertiser.cost) as roi from advertiser left outer join ad_info on advertiser.ad_id = ad_info.ad_id group by 1

Answers

First comment, the solution isn't showing window functions, but rather CTEs. Basically anywhere you see a CTE, you can swap this to some form of sub query, but it can get difficult to read. As for the reason the join won't work, this gets tricky, but some of the costs are being duplicated when you do a join on the ad_id. It's best to look at the data a bit and consider what happens in a join like this where multiple users spend money based on a given ad. The spend adds up correctly, but with the join, the costs are repeated even though the company only pays for that ad once.

Thanks for the response Mike. Couple of follow up questions 1. Do you have recommendations on how to select between JOINS vs CTEs like in this case? 2. Could you provide an example of the join that would should incorrect results?

Surbhi, March 14, 2022, 7:55 a.m.

Formatting is a little weird here but: > Thanks for the response Mike. Couple of follow up questions > 1. Do you have recommendations on how to select between JOINS vs CTEs like in this case? Typically CTEs can be used to provide clarity as a shorthand for subqueries. It just makes things easier to read, and in some cases, the CTE can be cached / reused. It's also very similar to a temp table depending on the database server in question. Most interviewers are likely fine with either, as long as you can explain what you're doing. Typically I use CTEs unless it's a stupidly short query that doesn't really make sense to pull out of the main query. > 2. Could you provide an example of the join that would should incorrect results? It's a little tricky to show in here, but if you look at a simple: select * from ad_info left outer join advertiser on ad_info.ad_id = advertiser.ad_id This will give you several results where you can start to see the problem in the cost column. You'll notice that the ad_id and the cost are the same in multiple columns where the user_id is different. This fits that the total customer_spend for the ad is higher, but the costs are duplicated in the rows (which is shouldn't be.) The example solution just shows all costs for a given ad mapped to the advertiser id, and then all the user spend mapped to an ad_id without having to worry about the join duplication issue.

Mike, March 14, 2022, 1:45 p.m.

Hey Mike,

I felt like I have fallen for this trap too. MY code is something like this:

WITH advertiser_info AS
(SELECT advertiser_id,a.ad_id, SUM(spend) as spend, SUM(cost) as cost
FROM advertiser a 
LEFT JOIN ad_info ai
ON a.ad_id = ai.ad_id
GROUP BY 1,2
)

SELECT advertiser_id, COALESCE(SUM(spend),0)*1.0/SUM(cost) as roi
FROM advertiser_info
GROUP BY 1

I think the CTE part is where things goes wrong but I am not so sure why

Luming, Oct. 11, 2022, 12:25 a.m.
SQLPad user avatar

Mike (228)

March 13, 2022, 11:54 p.m.