@Leon, would you downgrade someone who does a cross-join of date+restaurants to get all date/restaurant combinations to fill in the blank revenue dates? Yes, more intensive on the database but allows more configuration beyond just restaurant 100011 in the future.
WITH
all_dates_and_restaurants AS (
SELECT
DISTINCT
restaurant_id,
date
FROM dates
CROSS JOIN food_order
)
SELECT
date,
adr.restaurant_id,
SUM(COALESCE(total_amt, 0)) as daily_revenue
FROM all_dates_and_restaurants adr
LEFT JOIN food_order f
ON adr.restaurant_id = f.restaurant_id
AND adr.date = DATE(f.created_at)
WHERE EXTRACT(MONTH FROM date) = 8
AND EXTRACT(YEAR FROM date) = 2021
AND adr.restaurant_id = 100011
GROUP BY date, adr.restaurant_id
ORDER BY date ASC
Colin,
Dec. 20, 2023, 8:44 p.m.
Hey @colin,
Great question! Generally, I don't recommend using Cartesian or cross joins. The main issue is that you'll likely be unable to utilize indexes from the Cartesian join results, rendering your query quite inefficient.
However, if you create a physical table from the cross-joined results and then add indexes, your subsequent queries might perform well.
Another point to consider, from a non-technical perspective, is that in the business world, metrics like sales, revenue, or traffic often follow a long-tail distribution. For instance, the top 10 restaurants might account for 90% of total orders. Consequently, a cross join might generate a lot of 'sparseness' in your results.
But given that this is a simulated scenario, and I think that a cross join does make the code more readable for others, it could be a good approach in this context.
Cheers,
-Leon
Leon,
Dec. 20, 2023, 9:38 p.m.
Hey @colin,
Great question! Generally, I don't recommend using Cartesian or cross joins. The main issue is that you'll likely be unable to utilize indexes from the Cartesian join results, rendering your query quite inefficient.
However, if you create a physical table from the cross-joined results and then add indexes, your subsequent queries might perform well.
Another point to consider, from a non-technical perspective, is that in the business world, metrics like sales, revenue, or traffic often follow a long-tail distribution. For instance, the top 10 restaurants might account for 90% of total orders. Consequently, a cross join might generate a lot of 'sparseness' in your results.
But given that this is a simulated scenario, and I think that a cross join does make the code more readable for others, it could be a good approach in this context.
Cheers,
-Leon
Leon,
Dec. 20, 2023, 9:55 p.m.