SQL CASE WHEN for Interviews: Conditional Aggregation

Quick summary

Summarize this blog with AI

Introduction

CASE WHEN looks harmless because the syntax is easy to read. In SQL interviews and analytics work, it is also one of the fastest ways to get a believable but wrong metric.

The common mistake is treating CASE like a filter. It is not a filter. A CASE expression returns a value for each row. Then COUNT, SUM, AVG, or another aggregate decides what to do with those returned values.

That one mental model explains most conditional aggregation bugs: why COUNT(CASE ... ELSE 0 END) counts every row, why missing values do not fall into <> branches, why WHERE and CASE can produce the same numerator but a different denominator, and why interviewers keep asking these questions even though the syntax seems basic.

The Sample Data

Use this small table throughout the article:

-- expenses
-- id | department | amount | status
--  1 | HR         |     50 | approved
--  2 | Sales      |    120 | approved
--  3 | Sales      |    200 | rejected
--  4 | NULL       |     80 | approved
--  5 | Sales      |   NULL | approved
--  6 | HR         |     30 | NULL

The table intentionally includes a missing department, a missing amount, and a missing status. Those are the rows that expose weak SQL.

CASE Returns a Value, Not a Row

This query labels departments:

SELECT
  id,
  department,
  CASE
    WHEN department = 'Sales' THEN 'sales'
    WHEN department <> 'Sales' THEN 'not sales'
  END AS department_bucket
FROM expenses
ORDER BY id;

The result is:

id	department	department_bucket
1	HR	not sales
2	Sales	sales
3	Sales	sales
4	NULL	NULL
5	Sales	sales
6	HR	not sales

The row with department = NULL does not become not sales. In SQL, NULL <> 'Sales' is unknown, not true. If the missing department needs its own bucket, say so:

CASE
  WHEN department = 'Sales' THEN 'sales'
  WHEN department IS NULL THEN 'missing department'
  ELSE 'not sales'
END

This matters in interviews because many people assume an ELSE branch exists when it does not. Without an explicit ELSE, unmatched rows return NULL.

The COUNT Trap

Suppose you want to count approved rows. This version is clear and portable:

SELECT
  SUM(CASE WHEN status = 'approved' THEN 1 ELSE 0 END) AS approved_rows
FROM expenses;

Expected result:

approved_rows
4

This version also works because non-matching rows return NULL, and COUNT(expression) ignores NULL:

SELECT
  COUNT(CASE WHEN status = 'approved' THEN 1 END) AS approved_rows
FROM expenses;

The dangerous version is this:

SELECT
  COUNT(CASE WHEN status = 'approved' THEN 1 ELSE 0 END) AS wrong_count
FROM expenses;

Expected result:

wrong_count
6

That query counts every row because both 1 and 0 are non-null. This is the shortest way to explain the trap in an interview: COUNT counts non-null expressions, not truthy expressions.

Conditional SUM

For a conditional amount, return the amount for rows that match:

SELECT
  SUM(CASE WHEN status = 'approved' THEN amount ELSE 0 END) AS approved_amount
FROM expenses;

Expected result:

approved_amount
250

Rows 1, 2, 4, and 5 are approved. Their amounts are 50, 120, 80, and NULL. The sum is 250 because SUM ignores the NULL amount.

This version may also return 250:

SELECT
  SUM(CASE WHEN status = 'approved' THEN amount END) AS approved_amount
FROM expenses;

The difference appears when there are no approved rows. Without ELSE 0, the result is NULL. With ELSE 0, the result is zero. Neither is universally correct. Use the one that matches the metric definition.

CASE vs WHERE

These two queries can produce the same approved amount:

SELECT
  SUM(CASE WHEN status = 'approved' THEN amount ELSE 0 END) AS approved_amount
FROM expenses;

SELECT
  SUM(amount) AS approved_amount
FROM expenses
WHERE status = 'approved';

They do not ask the same question. The first keeps all rows in the base population and computes a conditional metric. The second removes non-approved rows before aggregation.

The difference becomes obvious when you include a denominator:

SELECT
  COUNT(*) AS all_rows,
  SUM(CASE WHEN status = 'approved' THEN 1 ELSE 0 END) AS approved_rows
FROM expenses;

all_rows	approved_rows
6	4

SELECT
  COUNT(*) AS filtered_rows,
  COUNT(*) AS approved_rows
FROM expenses
WHERE status = 'approved';

filtered_rows	approved_rows
4	4

Use WHERE when every metric should be calculated from the filtered population. Use CASE when you need multiple metrics from the same population.

Approval Rate: The Denominator Is the Real Question

This approval rate uses all rows in the denominator:

SELECT
  1.0 * SUM(CASE WHEN status = 'approved' THEN 1 ELSE 0 END)
      / NULLIF(COUNT(*), 0) AS approval_rate
FROM expenses;

Expected result:

approval_rate
0.6667

That asks: what share of all rows were approved?

If rows with missing status should be excluded, define the denominator separately:

SELECT
  1.0 * SUM(CASE WHEN status = 'approved' THEN 1 ELSE 0 END)
      / NULLIF(SUM(CASE WHEN status IS NOT NULL THEN 1 ELSE 0 END), 0) AS approval_rate_known_status
FROM expenses;

Expected result:

approval_rate_known_status
0.8000

Both answers can be defensible. A strong answer names the denominator before writing the final metric.

Multiple Conditional Metrics in One Query

Conditional aggregation is strongest when you need several metrics side by side:

SELECT
  department,
  COUNT(*) AS total_rows,
  SUM(CASE WHEN status = 'approved' THEN 1 ELSE 0 END) AS approved_rows,
  SUM(CASE WHEN status = 'rejected' THEN 1 ELSE 0 END) AS rejected_rows,
  SUM(CASE WHEN status = 'approved' THEN amount ELSE 0 END) AS approved_amount
FROM expenses
GROUP BY department
ORDER BY department;

Expected result:

department	total_rows	approved_rows	rejected_rows	approved_amount
HR	2	1	0	50
Sales	3	2	1	120
NULL	1	1	0	80

If this type of grouping feels shaky, read SQL GROUP BY Errors: WHERE vs HAVING and the Bare Column Mistake. If your counts come after a join, also read SQL COUNT with LEFT JOIN: Zero Rows, DISTINCT, and Many-to-Many Traps.

PostgreSQL FILTER Syntax

PostgreSQL supports a cleaner syntax for conditional aggregates:

SELECT
  department,
  COUNT(*) AS total_rows,
  COUNT(*) FILTER (WHERE status = 'approved') AS approved_rows,
  COUNT(*) FILTER (WHERE status = 'rejected') AS rejected_rows,
  SUM(amount) FILTER (WHERE status = 'approved') AS approved_amount
FROM expenses
GROUP BY department;

This is easier to read when you have many conditional metrics. The tradeoff is portability. FILTER is supported in PostgreSQL and several analytical engines, but SUM(CASE WHEN ... THEN ... ELSE ... END) is the safer pattern when you need code that moves across PostgreSQL, MySQL, SQL Server, and SQLite.

Dialect Notes

Dialect	Useful note
PostgreSQL	Supports aggregate `FILTER`. Use `NULLIF(denominator, 0)` to avoid division-by-zero errors.
MySQL	`SUM(condition)` can work because boolean expressions evaluate to 1 or 0, but `SUM(CASE ...)` is clearer and more portable.
SQL Server	Use `CASE` for conditional aggregation. Watch integer division; cast or multiply by `1.0` for rates.
BigQuery	`COUNTIF(condition)` is available for conditional counts. `SUM(CASE ...)` remains portable.
Snowflake	Supports `COUNT_IF`. Use `CASE` when you want explicit cross-dialect logic.

Safer Pattern for Complex Business Logic

When the conditions are complex, classify rows first and aggregate second:

WITH labeled_orders AS (
  SELECT
    customer_id,
    order_date,
    amount,
    CASE
      WHEN status = 'paid' AND refunded_at IS NULL THEN 'net_paid'
      WHEN status = 'paid' AND refunded_at IS NOT NULL THEN 'refunded'
      WHEN status = 'cancelled' THEN 'cancelled'
      ELSE 'other'
    END AS order_bucket
  FROM orders
  WHERE order_date >= DATE '2026-01-01'
)
SELECT
  customer_id,
  COUNT(*) AS total_orders,
  SUM(CASE WHEN order_bucket = 'net_paid' THEN 1 ELSE 0 END) AS net_paid_orders,
  SUM(CASE WHEN order_bucket = 'net_paid' THEN amount ELSE 0 END) AS net_paid_amount
FROM labeled_orders
GROUP BY customer_id;

This is easier to test because you can run the CTE by itself and inspect whether each row was labeled correctly before trusting the final aggregate.

Before You Submit the Query

Did you use WHERE for rules that define the row population?
Did you use CASE for metrics inside that population?
Did you avoid COUNT(CASE WHEN ... ELSE 0 END)?
Did you define how NULL values should behave?
Did you protect rate calculations with NULLIF?
Did you name the denominator out loud?

If the query involves missing values or anti-joins, the companion guide SQL NULL Logic for Interviews covers the same three-valued logic in more depth. If the query involves join-created row multiplication, read Why SQL Joins Create Duplicate Rows.

FAQ

Why does COUNT(CASE WHEN condition THEN 1 ELSE 0 END) count every row? Because COUNT(expression) counts every non-null expression. Both 1 and 0 are non-null.

Should I use COUNT or SUM for conditional counts? Prefer SUM(CASE WHEN condition THEN 1 ELSE 0 END) for a portable, explicit conditional count. COUNT(CASE WHEN condition THEN 1 END) is valid, but easier to break accidentally.

Is CASE faster than WHERE? Usually no. WHERE can reduce the row set before aggregation. CASE is for producing conditional values within the row set.

What is the safest interview explanation? Say: I use WHERE to define the population, CASE to define conditional metrics, and SUM of one-or-zero flags for conditional counts.