Quick summary
Summarize this blog with AI
Introduction
CASE WHEN looks harmless because the syntax is easy to read. In SQL interviews and analytics work, it is also one of the fastest ways to get a believable but wrong metric.
The common mistake is treating CASE like a filter. It is not a filter. A CASE expression returns a value for each row. Then COUNT, SUM, AVG, or another aggregate decides what to do with those returned values.
That one mental model explains most conditional aggregation bugs: why COUNT(CASE ... ELSE 0 END) counts every row, why missing values do not fall into <> branches, why WHERE and CASE can produce the same numerator but a different denominator, and why interviewers keep asking these questions even though the syntax seems basic.
The Sample Data
Use this small table throughout the article:
-- expenses
-- id | department | amount | status
-- 1 | HR | 50 | approved
-- 2 | Sales | 120 | approved
-- 3 | Sales | 200 | rejected
-- 4 | NULL | 80 | approved
-- 5 | Sales | NULL | approved
-- 6 | HR | 30 | NULL
The table intentionally includes a missing department, a missing amount, and a missing status. Those are the rows that expose weak SQL.
CASE Returns a Value, Not a Row
This query labels departments:
SELECT
id,
department,
CASE
WHEN department = 'Sales' THEN 'sales'
WHEN department <> 'Sales' THEN 'not sales'
END AS department_bucket
FROM expenses
ORDER BY id;
The result is:
| id | department | department_bucket |
|---|---|---|
| 1 | HR | not sales |
| 2 | Sales | sales |
| 3 | Sales | sales |
| 4 | NULL | NULL |
| 5 | Sales | sales |
| 6 | HR | not sales |
The row with department = NULL does not become not sales. In SQL, NULL <> 'Sales' is unknown, not true. If the missing department needs its own bucket, say so:
CASE
WHEN department = 'Sales' THEN 'sales'
WHEN department IS NULL THEN 'missing department'
ELSE 'not sales'
END
This matters in interviews because many people assume an ELSE branch exists when it does not. Without an explicit ELSE, unmatched rows return NULL.
The COUNT Trap
Suppose you want to count approved rows. This version is clear and portable:
SELECT
SUM(CASE WHEN status = 'approved' THEN 1 ELSE 0 END) AS approved_rows
FROM expenses;
Expected result:
| approved_rows |
|---|
| 4 |
This version also works because non-matching rows return NULL, and COUNT(expression) ignores NULL:
SELECT
COUNT(CASE WHEN status = 'approved' THEN 1 END) AS approved_rows
FROM expenses;
The dangerous version is this:
SELECT
COUNT(CASE WHEN status = 'approved' THEN 1 ELSE 0 END) AS wrong_count
FROM expenses;
Expected result:
| wrong_count |
|---|
| 6 |
That query counts every row because both 1 and 0 are non-null. This is the shortest way to explain the trap in an interview: COUNT counts non-null expressions, not truthy expressions.
Conditional SUM
For a conditional amount, return the amount for rows that match:
SELECT
SUM(CASE WHEN status = 'approved' THEN amount ELSE 0 END) AS approved_amount
FROM expenses;
Expected result:
| approved_amount |
|---|
| 250 |
Rows 1, 2, 4, and 5 are approved. Their amounts are 50, 120, 80, and NULL. The sum is 250 because SUM ignores the NULL amount.
This version may also return 250:
SELECT
SUM(CASE WHEN status = 'approved' THEN amount END) AS approved_amount
FROM expenses;
The difference appears when there are no approved rows. Without ELSE 0, the result is NULL. With ELSE 0, the result is zero. Neither is universally correct. Use the one that matches the metric definition.
CASE vs WHERE
These two queries can produce the same approved amount:
SELECT
SUM(CASE WHEN status = 'approved' THEN amount ELSE 0 END) AS approved_amount
FROM expenses;
SELECT
SUM(amount) AS approved_amount
FROM expenses
WHERE status = 'approved';
They do not ask the same question. The first keeps all rows in the base population and computes a conditional metric. The second removes non-approved rows before aggregation.
The difference becomes obvious when you include a denominator:
SELECT
COUNT(*) AS all_rows,
SUM(CASE WHEN status = 'approved' THEN 1 ELSE 0 END) AS approved_rows
FROM expenses;
| all_rows | approved_rows |
|---|---|
| 6 | 4 |
SELECT
COUNT(*) AS filtered_rows,
COUNT(*) AS approved_rows
FROM expenses
WHERE status = 'approved';
| filtered_rows | approved_rows |
|---|---|
| 4 | 4 |
Use WHERE when every metric should be calculated from the filtered population. Use CASE when you need multiple metrics from the same population.
Approval Rate: The Denominator Is the Real Question
This approval rate uses all rows in the denominator:
SELECT
1.0 * SUM(CASE WHEN status = 'approved' THEN 1 ELSE 0 END)
/ NULLIF(COUNT(*), 0) AS approval_rate
FROM expenses;
Expected result:
| approval_rate |
|---|
| 0.6667 |
That asks: what share of all rows were approved?
If rows with missing status should be excluded, define the denominator separately:
SELECT
1.0 * SUM(CASE WHEN status = 'approved' THEN 1 ELSE 0 END)
/ NULLIF(SUM(CASE WHEN status IS NOT NULL THEN 1 ELSE 0 END), 0) AS approval_rate_known_status
FROM expenses;
Expected result:
| approval_rate_known_status |
|---|
| 0.8000 |
Both answers can be defensible. A strong answer names the denominator before writing the final metric.
Multiple Conditional Metrics in One Query
Conditional aggregation is strongest when you need several metrics side by side:
SELECT
department,
COUNT(*) AS total_rows,
SUM(CASE WHEN status = 'approved' THEN 1 ELSE 0 END) AS approved_rows,
SUM(CASE WHEN status = 'rejected' THEN 1 ELSE 0 END) AS rejected_rows,
SUM(CASE WHEN status = 'approved' THEN amount ELSE 0 END) AS approved_amount
FROM expenses
GROUP BY department
ORDER BY department;
Expected result:
| department | total_rows | approved_rows | rejected_rows | approved_amount |
|---|---|---|---|---|
| HR | 2 | 1 | 0 | 50 |
| Sales | 3 | 2 | 1 | 120 |
| NULL | 1 | 1 | 0 | 80 |
If this type of grouping feels shaky, read SQL GROUP BY Errors: WHERE vs HAVING and the Bare Column Mistake. If your counts come after a join, also read SQL COUNT with LEFT JOIN: Zero Rows, DISTINCT, and Many-to-Many Traps.
PostgreSQL FILTER Syntax
PostgreSQL supports a cleaner syntax for conditional aggregates:
SELECT
department,
COUNT(*) AS total_rows,
COUNT(*) FILTER (WHERE status = 'approved') AS approved_rows,
COUNT(*) FILTER (WHERE status = 'rejected') AS rejected_rows,
SUM(amount) FILTER (WHERE status = 'approved') AS approved_amount
FROM expenses
GROUP BY department;
This is easier to read when you have many conditional metrics. The tradeoff is portability. FILTER is supported in PostgreSQL and several analytical engines, but SUM(CASE WHEN ... THEN ... ELSE ... END) is the safer pattern when you need code that moves across PostgreSQL, MySQL, SQL Server, and SQLite.
Dialect Notes
| Dialect | Useful note |
|---|---|
| PostgreSQL | Supports aggregate FILTER. Use NULLIF(denominator, 0) to avoid division-by-zero errors. |
| MySQL | SUM(condition) can work because boolean expressions evaluate to 1 or 0, but SUM(CASE ...) is clearer and more portable. |
| SQL Server | Use CASE for conditional aggregation. Watch integer division; cast or multiply by 1.0 for rates. |
| BigQuery | COUNTIF(condition) is available for conditional counts. SUM(CASE ...) remains portable. |
| Snowflake | Supports COUNT_IF. Use CASE when you want explicit cross-dialect logic. |
Safer Pattern for Complex Business Logic
When the conditions are complex, classify rows first and aggregate second:
WITH labeled_orders AS (
SELECT
customer_id,
order_date,
amount,
CASE
WHEN status = 'paid' AND refunded_at IS NULL THEN 'net_paid'
WHEN status = 'paid' AND refunded_at IS NOT NULL THEN 'refunded'
WHEN status = 'cancelled' THEN 'cancelled'
ELSE 'other'
END AS order_bucket
FROM orders
WHERE order_date >= DATE '2026-01-01'
)
SELECT
customer_id,
COUNT(*) AS total_orders,
SUM(CASE WHEN order_bucket = 'net_paid' THEN 1 ELSE 0 END) AS net_paid_orders,
SUM(CASE WHEN order_bucket = 'net_paid' THEN amount ELSE 0 END) AS net_paid_amount
FROM labeled_orders
GROUP BY customer_id;
This is easier to test because you can run the CTE by itself and inspect whether each row was labeled correctly before trusting the final aggregate.
Before You Submit the Query
- Did you use
WHEREfor rules that define the row population? - Did you use
CASEfor metrics inside that population? - Did you avoid
COUNT(CASE WHEN ... ELSE 0 END)? - Did you define how
NULLvalues should behave? - Did you protect rate calculations with
NULLIF? - Did you name the denominator out loud?
If the query involves missing values or anti-joins, the companion guide SQL NULL Logic for Interviews covers the same three-valued logic in more depth. If the query involves join-created row multiplication, read Why SQL Joins Create Duplicate Rows.
FAQ
Why does COUNT(CASE WHEN condition THEN 1 ELSE 0 END) count every row? Because COUNT(expression) counts every non-null expression. Both 1 and 0 are non-null.
Should I use COUNT or SUM for conditional counts? Prefer SUM(CASE WHEN condition THEN 1 ELSE 0 END) for a portable, explicit conditional count. COUNT(CASE WHEN condition THEN 1 END) is valid, but easier to break accidentally.
Is CASE faster than WHERE? Usually no. WHERE can reduce the row set before aggregation. CASE is for producing conditional values within the row set.
What is the safest interview explanation? Say: I use WHERE to define the population, CASE to define conditional metrics, and SUM of one-or-zero flags for conditional counts.