How to Fill Missing Values in SQL Without Hiding Data Quality Problems

SQL Updated 六月 19, 2026 6 mins read Leon Leon
How to Fill Missing Values in SQL Without Hiding Data Quality Problems cover image

Quick summary

Summarize this blog with AI

Filling missing values sounds simple: use COALESCE(), replace null with zero, and move on. In real analysis, that can quietly change the answer. A missing value might mean zero, unknown, not applicable, not collected yet, or broken upstream logic. Those meanings should not all become the same number.

This guide shows how to fill missing values in SQL while preserving enough evidence to trust the result. The examples use portable SQL patterns, with notes where database dialects differ.

First decide what the missing value means

Before writing a replacement rule, classify the null. Most analyst mistakes come from skipping this step.

  • True zero: no discount was applied, no items were returned, no support tickets were opened.
  • Unknown: the source system did not provide the value, or the value failed validation.
  • Not applicable: a field is blank because the record type does not use it.
  • Pending: the event has not happened yet, such as a shipment date before shipping.
  • Join miss: the value disappeared because a lookup table did not match.

Only the first case should automatically become zero. The others need flags, audit queries, or separate categories.

Use COALESCE for presentation, not as a first cleaning step

COALESCE() returns the first non-null value in a list.

SELECT
    order_id,
    COALESCE(discount_amount, 0) AS discount_amount
FROM orders;

This is fine for a report if the business definition says missing discount equals zero. It is risky if null means unknown. A safer pattern keeps the raw value and adds an analysis-ready field:

SELECT
    order_id,
    discount_amount,
    CASE WHEN discount_amount IS NULL THEN 1 ELSE 0 END AS discount_amount_missing,
    COALESCE(discount_amount, 0) AS discount_amount_for_reporting
FROM orders;

Now a dashboard can sum the reporting field while an analyst can still monitor how much of the result depends on imputation.

Convert fake missing values before analysis

Many imports do not use real SQL nulls. They use empty strings, the text N/A, a dash, or zero as a placeholder. Clean those before aggregation.

SELECT
    customer_id,
    NULLIF(TRIM(phone_number), '') AS phone_number_clean,
    NULLIF(TRIM(postal_code), '') AS postal_code_clean
FROM raw_customers;

For numeric fields imported as text, handle placeholder values before casting:

SELECT
    order_id,
    CAST(
        NULLIF(NULLIF(TRIM(discount_text), ''), 'N/A')
        AS decimal(10, 2)
    ) AS discount_amount
FROM raw_orders;

If your source uses zero to mean missing, do not globally replace all zeros. Scope the rule to fields where zero is impossible or explicitly documented as a placeholder.

Fill from a lookup table when the value is deterministic

Sometimes the missing value can be filled from a trusted reference table. For example, a product category may be missing in an order export but available in the product master.

SELECT
    o.order_id,
    o.product_id,
    o.category AS category_from_order,
    p.category AS category_from_product,
    COALESCE(o.category, p.category) AS category_clean,
    CASE
        WHEN o.category IS NULL AND p.category IS NOT NULL THEN 'filled_from_product_master'
        WHEN o.category IS NULL AND p.category IS NULL THEN 'still_missing'
        ELSE 'source_value'
    END AS category_fill_method
FROM orders o
LEFT JOIN product_master p
    ON o.product_id = p.product_id;

The fill method matters. If a later audit asks why a category changed, you can explain whether it came from the source file or a reference table.

Fill within a group using window functions

Some datasets store attributes sparsely. For example, a subscription table may have multiple rows per account, and only one row includes the account segment. If the segment is stable within account, you can fill it from another row in the same group.

SELECT
    account_id,
    event_date,
    segment,
    MAX(segment) OVER (PARTITION BY account_id) AS segment_filled
FROM account_events;

This works only when there is at most one true segment per account. Validate that assumption first:

SELECT
    account_id,
    COUNT(DISTINCT segment) AS distinct_segments
FROM account_events
WHERE segment IS NOT NULL
GROUP BY account_id
HAVING COUNT(DISTINCT segment) > 1;

If this query returns rows, a simple fill could create false data. You need business rules for which segment wins.

Forward-fill time series carefully

Forward-fill means carrying the last known value forward. It is common for inventory levels, account status, subscription plan, and sensor readings. SQL support varies by database. Some systems support IGNORE NULLS; others need a two-step pattern.

A portable approach is to create groups that increment each time a non-null value appears, then take the max value within each group.

WITH marked AS (
    SELECT
        account_id,
        event_date,
        plan_name,
        SUM(CASE WHEN plan_name IS NOT NULL THEN 1 ELSE 0 END) OVER (
            PARTITION BY account_id
            ORDER BY event_date
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
        ) AS fill_group
    FROM account_plan_events
), filled AS (
    SELECT
        account_id,
        event_date,
        plan_name,
        MAX(plan_name) OVER (
            PARTITION BY account_id, fill_group
        ) AS plan_name_filled
    FROM marked
)
SELECT *
FROM filled;

Forward-fill is appropriate only when the value stays true until a later event changes it. Do not forward-fill measurements like daily revenue unless the business definition explicitly calls for it.

Use missingness metrics in the final output

Every imputed dataset should include a quick check on how much filling happened.

SELECT
    COUNT(*) AS rows_total,
    SUM(CASE WHEN category_fill_method = 'source_value' THEN 1 ELSE 0 END) AS source_rows,
    SUM(CASE WHEN category_fill_method = 'filled_from_product_master' THEN 1 ELSE 0 END) AS filled_rows,
    SUM(CASE WHEN category_fill_method = 'still_missing' THEN 1 ELSE 0 END) AS still_missing_rows
FROM cleaned_orders;

This makes the quality of the result visible. A metric based on 1% filled values is different from a metric based on 45% filled values.

Common SQL missing-value mistakes

  • Replacing null with zero before averaging: this can pull an average down when null means unknown rather than zero.
  • Filling after a broken join: if a dimension table failed to match, fix the key or lookup first.
  • Using one default for every field: missing category, missing date, missing amount, and missing boolean values need different rules.
  • Dropping rows too early: filtering nulls can remove exactly the cases the business needs to investigate.
  • Failing to label imputed values: downstream users cannot judge trust without knowing what was filled.

A practical checklist

  1. Identify all placeholder values: empty string, N/A, dash, impossible zero, or invalid date.
  2. Classify the meaning of missingness for each field.
  3. Preserve the raw value in a staging table.
  4. Create a clean value and a missing/fill flag.
  5. Validate assumptions before filling within groups.
  6. Measure how many rows were filled and how many remain missing.
  7. Document which filled fields are safe for reporting and which require caution.

FAQ

Should I use COALESCE or ISNULL?

COALESCE() is standard SQL and accepts multiple expressions. ISNULL() is common in SQL Server and accepts two arguments. For portable analyst SQL, prefer COALESCE() unless your team standardizes on a dialect-specific function.

Is replacing null with zero ever correct?

Yes, when the business definition says missing means zero. Examples include no discount, no returns, or no tickets after a proper left join. It is not correct when missing means unknown, pending, not applicable, or failed import.

Should I delete rows with missing values?

Only after checking why they are missing and whether the analysis requires complete cases. For operational analytics, missing rows are often the signal: unmapped statuses, broken joins, bad imports, or incomplete processes.

Interview Prep

Begin Your SQL, Python, and R Journey

Master 230 interview-style coding questions and build the data skills needed for analyst, scientist, and engineering roles.

Related Articles

All Articles