Forum

Posted by Yogita, Nov. 26, 2021, 2:06 p.m.

#78 : Why COUNT(rental_id) is giving wrong/different result from COUNT(*) used in the solution?

Am getting wrong quartile for customer_id 9. Am getting a quartile 2 whereas it should be 1. When I change my COUNT(rental_id) to COUNT(*) OR COUNT(DISTINCT rental_id) I get the right answer. Am not sure why!
If there are duplicates in the rental table, shouldn't COUNT(*) be giving the same wrong answer?
My solution is :
WITH rental_per_cust AS
(SELECT c.customer_id,
max(c.store_id) AS store_id,
COUNT(rental_id) AS cnt
FROM customer c
JOIN rental r ON c.customer_id = r.customer_id
GROUP BY c.customer_id
)
SELECT *
FROM
(SELECT customer_id,
store_id,
NTILE(4) OVER (PARTITION BY store_id ORDER BY cnt) AS quartile
FROM rental_per_cust) x
WHERE customer_id IN (1,2,3,4,5,6,7,8,9,10);

Answers

Hi Yogita,
This is an interesting question, the count(*) and count(rental_id) shouldn't matter, as they generate the same counts.
It's up to how the Postgres engine handles ties. Because customer_id = 9 can belong to either quartile 1 or quartile 2, it randomly assigns the bucket and there is no guarantee which.

I would consider your solution the equally correct one.
-- Query to check rental counts
WITH cnt_1 AS (
SELECT C.customer_id,
MAX(C.store_id) AS store_id, -- one customer can only belong to one store
COUNT(*) AS num_rentals FROM
rental R
INNER JOIN customer C
ON C.customer_id = R.customer_id
GROUP BY C.customer_id
),
cnt_2 AS (
SELECT C.customer_id,
MAX(C.store_id) AS store_id, -- one customer can only belong to one store
COUNT(rental_id) AS num_rentals FROM
rental R
INNER JOIN customer C
ON C.customer_id = R.customer_id
GROUP BY C.customer_id
)
SELECT COUNT(*) FROM cnt_1 C1
INNER JOIN cnt_2 C2
ON C1.customer_id = C2.customer_id
WHERE C1.num_rentals <> C2.num_rentals;
-- 0.
customer_id store_id num_rentals quartile
493 1 23 1
519 2 23 1
547 1 23 1
583 1 23 1
9 2 23 2
85 2 23 2
183 2 23 2
A lot of customers share the same number of rentals (all with 23 rentals), and their quartile buckets are assigned randomly by Postgres.
Thanks,
Leon

Thanks Leon!! That makes sense.

Yogita, Nov. 28, 2021, 1:59 p.m.
SQLPad user avatar

Leon (949)

Nov. 28, 2021, 1:41 p.m.