Finding Fraud in $1 Trillion of Medicaid Data with DuckDB

CMS recently published provider-level Medicaid spending data from T-MSIS — every fee-for-service, managed care, and CHIP claim from 2018 through 2024, aggregated by billing provider, procedure code, and month. 227 million rows. $1.09 trillion in payments. I wanted to see what falls out when you run some basic fraud heuristics against it.

The dataset is available at opendata.hhs.gov/datasets/medicaid-provider-spending/.

The dataset

The download is a single 2.94 GB parquet file with seven columns:

Column	Type	Description
`BILLING_PROVIDER_NPI_NUM`	string	NPI of the billing provider
`SERVICING_PROVIDER_NPI_NUM`	string	NPI of the servicing provider
`HCPCS_CODE`	string	Procedure code
`CLAIM_FROM_MONTH`	string	Month (YYYY-MM)
`TOTAL_UNIQUE_BENEFICIARIES`	int64	Unique patients
`TOTAL_CLAIMS`	int64	Claim count
`TOTAL_PAID`	double	Dollars paid by Medicaid

Here’s what the first few rows look like — the data is sorted by TOTAL_PAID descending, so the biggest line items are at the top:

┌──────────────────────────┬────────────────────────────┬────────────┬──────────────────┬────────────────────────────┬──────────────┬──────────────┐
│ BILLING_PROVIDER_NPI_NUM │ SERVICING_PROVIDER_NPI_NUM │ HCPCS_CODE │ CLAIM_FROM_MONTH │ TOTAL_UNIQUE_BENEFICIARIES │ TOTAL_CLAIMS │  TOTAL_PAID  │
├──────────────────────────┼────────────────────────────┼────────────┼──────────────────┼────────────────────────────┼──────────────┼──────────────┤
│ 1376609297               │ 1376609297                 │ T1019      │ 2024-07          │                      39765 │      1205701 │ 118887675.31 │
│ 1376609297               │ 1376609297                 │ T1019      │ 2024-08          │                      39677 │      1152534 │ 115561066.11 │
│ 1376609297               │ 1376609297                 │ T1019      │ 2024-05          │                      39678 │      1157235 │  112823255.3 │
│ 1376609297               │ 1376609297                 │ T1019      │ 2024-06          │                      39834 │      1164582 │ 111449173.13 │
│ 1376609297               │ 1376609297                 │ T1019      │ 2024-09          │                      39527 │      1099808 │ 111199832.57 │
└──────────────────────────┴────────────────────────────┴────────────┴──────────────────┴────────────────────────────┴──────────────┴──────────────┘

Right away you notice NPI 1376609297 billing $118M in a single month for T1019 (personal care services) with 1.2 million claims. That’s going to come up again.

A quick summary of the full dataset:

con.sql(f"""
    SELECT
        COUNT(*) as total_rows,
        COUNT(DISTINCT BILLING_PROVIDER_NPI_NUM) as unique_billing_npis,
        COUNT(DISTINCT SERVICING_PROVIDER_NPI_NUM) as unique_servicing_npis,
        COUNT(DISTINCT HCPCS_CODE) as unique_procedures,
        MIN(CLAIM_FROM_MONTH) as earliest_month,
        MAX(CLAIM_FROM_MONTH) as latest_month,
        SUM(TOTAL_PAID) as total_paid
    FROM '{PATH}'
""")

227M rows, 617,503 billing providers, 1.6M servicing providers, 10,881 procedure codes, spanning 84 months (Jan 2018 – Dec 2024), totaling $1.09 trillion.

Why parquet, not CSV

HHS offers this dataset as a ZIP download. Inside is parquet, not CSV. This matters a lot for a file this size.

Compression. The parquet file is 2.94 GB. The same data in CSV would be roughly 15-20 GB. Parquet uses column-oriented compression (dictionary encoding for the repeated NPI strings, run-length encoding for sorted columns, and Snappy/ZSTD compression on top). The three NPI/code string columns have high cardinality but lots of repetition within row groups — exactly the pattern parquet compresses best.

Column pruning. If your query only touches BILLING_PROVIDER_NPI_NUM and TOTAL_PAID, parquet readers skip the other five columns entirely. With CSV, you always read every byte of every row. For a 7-column file, that’s up to 5/7ths of I/O you never have to do.

Predicate pushdown. DuckDB can push WHERE clauses down into the parquet scan, skipping entire row groups whose min/max statistics prove no rows match. When I filter WHERE TOTAL_CLAIMS > 1000, DuckDB doesn’t even read row groups where the max claims value is under 1,000.

Type preservation. CSV makes you guess types and parse strings. Parquet knows TOTAL_PAID is a double and TOTAL_CLAIMS is an int64 — no parsing overhead, no type ambiguity.

For a dataset with 227M rows, these differences compound. The parquet file isn’t just smaller — it’s fundamentally faster to query.

Don’t load 227M rows into pandas

My first attempt was naive:

import pandas as pd
df = pd.read_parquet('/data/medicaid-provider-spending.parquet')

The Jupyter kernel OOM-killed immediately on a 16 GB pod.

The math makes it obvious in hindsight. Parquet decompresses to a much larger in-memory representation, and pandas stores each string as a Python object with ~56 bytes of overhead regardless of string length. Three string columns across 227M rows:

227M rows × 3 string columns × ~56 bytes/object ≈ 38 GB
+ 2 int64 columns × 8 bytes × 227M              ≈  3.6 GB
+ 1 float64 column × 8 bytes × 227M              ≈  1.8 GB
                                                  ≈ 43 GB total

Even a 64 GB pod would be tight once you start doing groupbys that create intermediate DataFrames.

The fix: DuckDB. It queries parquet files directly on disk using memory-mapped I/O, streaming through row groups without materializing the full dataset. Peak memory usage stays in the low single-digit GBs. Every query in this analysis ran in under 3 minutes on that same 16 GB pod.

import duckdb

PATH = '/data/medicaid-provider-spending.parquet'
con = duckdb.connect()

con.sql(f"SELECT COUNT(*) FROM '{PATH}'")
# → (227083361,)

No read_parquet(), no DataFrames, no OOM. DuckDB treats the parquet file as a table and pushes the full query plan — filters, aggregations, window functions — down to the scan.

The trick is to do all the heavy lifting in DuckDB SQL, then call .df() only on the final result sets (which are thousands of rows, not millions):

# Heavy aggregation stays in DuckDB — result is ~68K rows, perfectly fine for pandas
outliers = con.sql(f"""
    SELECT ... FROM '{PATH}'
    GROUP BY ... HAVING z_score > 3
""").df()

Five fraud signals

Healthcare fraud detection comes down to finding providers whose billing patterns deviate sharply from their peers. I built five independent signals and combined them into a composite risk score. Here’s every query.

1. Cost-per-beneficiary outliers

For each provider and procedure code, I sum the total dollars paid and divide by unique beneficiaries across all months. Then z-score within each procedure code (restricting to procedures with 20+ providers for stable statistics). Anything above z=3 is flagged.

outliers_cost = con.sql(f"""
    WITH prov_proc AS (
        SELECT
            BILLING_PROVIDER_NPI_NUM,
            HCPCS_CODE,
            SUM(TOTAL_PAID) as total_paid,
            SUM(TOTAL_UNIQUE_BENEFICIARIES) as total_benes,
            SUM(TOTAL_CLAIMS) as total_claims,
            COUNT(DISTINCT CLAIM_FROM_MONTH) as months_active,
            SUM(TOTAL_PAID) / GREATEST(SUM(TOTAL_UNIQUE_BENEFICIARIES), 1)
                as paid_per_bene
        FROM '{PATH}'
        GROUP BY BILLING_PROVIDER_NPI_NUM, HCPCS_CODE
    ),
    proc_stats AS (
        SELECT
            HCPCS_CODE,
            COUNT(*) as num_providers,
            AVG(paid_per_bene) as mean_ppb,
            STDDEV_POP(paid_per_bene) as std_ppb
        FROM prov_proc
        GROUP BY HCPCS_CODE
        HAVING COUNT(*) >= 20
    ),
    scored AS (
        SELECT
            p.*,
            (p.paid_per_bene - s.mean_ppb) / NULLIF(s.std_ppb, 0)
                as z_paid_per_bene
        FROM prov_proc p
        JOIN proc_stats s USING (HCPCS_CODE)
    )
    SELECT * FROM scored
    WHERE z_paid_per_bene > 3
    ORDER BY z_paid_per_bene DESC
""").df()

This scans the full 2.94 GB parquet file, builds the two-level aggregation entirely in DuckDB, and returns only the outliers. It ran in 192 seconds.

Result: 68,037 provider/procedure combinations flagged across 27,348 unique billing NPIs.

The top hit: NPI 1467653303 billing $10,416 per beneficiary for CPT 99213 — a standard 15-minute office visit that typically pays around $57. A z-score of 183. They billed $541K for 52 beneficiaries in a single month. Either those were the most expensive office visits in the history of medicine, or something’s wrong.

Other standouts:

NPI 1437412525: $22,064/beneficiary for 92507 (speech therapy), z=94
NPI 1073608998: $65,778/beneficiary for J3490 (unclassified drugs), z=68
NPI 1427138726: $8,817/beneficiary for 99232 (subsequent hospital care), z=62 — and they did it across 16 months, billing $6M total

2. Billing mill detection

In Medicaid, the billing provider is the entity that submits the claim, and the servicing provider is the one who actually performed the service. Most providers bill for their own services — BILLING_PROVIDER_NPI_NUM = SERVICING_PROVIDER_NPI_NUM. A billing mill funnels claims through many servicing providers under one billing entity, taking a cut.

billing_mill = con.sql(f"""
    SELECT
        BILLING_PROVIDER_NPI_NUM,
        COUNT(DISTINCT SERVICING_PROVIDER_NPI_NUM) as num_servicing_npis,
        COUNT(DISTINCT HCPCS_CODE) as num_procedures,
        SUM(TOTAL_PAID) as total_paid,
        SUM(TOTAL_CLAIMS) as total_claims,
        SUM(TOTAL_UNIQUE_BENEFICIARIES) as total_benes,
        COUNT(DISTINCT CLAIM_FROM_MONTH) as months_active
    FROM '{PATH}'
    GROUP BY BILLING_PROVIDER_NPI_NUM
    ORDER BY num_servicing_npis DESC
""").df()

The distribution tells the story:

count    617,503
mean           4.6
std           33.0
25%            1.0
50%            1.0    ← median is 1
75%            1.0    ← 75th percentile is still 1
max        5,746

The median and 75th percentile are both 1. Most providers bill for themselves. Then there’s NPI 1679525919 with 5,746 servicing NPIs, billing $863 million across 84 months with 1,579 distinct procedure codes. That’s either a massive health system or something worth investigating.

I flag the top 1% by servicing NPI count as potential billing mills.

3. Volume impossibilities

Each row in this dataset is already aggregated to one billing-provider/servicing-provider/procedure/month combination. So when TOTAL_CLAIMS exceeds 1,000 for a single row, that means one provider billed over 1,000 claims for one procedure code in one month — averaging 50+ per working day with zero days off.

high_volume = con.sql(f"""
    SELECT
        BILLING_PROVIDER_NPI_NUM,
        SERVICING_PROVIDER_NPI_NUM,
        HCPCS_CODE,
        CLAIM_FROM_MONTH,
        TOTAL_CLAIMS,
        TOTAL_UNIQUE_BENEFICIARIES,
        TOTAL_PAID
    FROM '{PATH}'
    WHERE TOTAL_CLAIMS > 1000
    ORDER BY TOTAL_CLAIMS DESC
""").df()

Result: 1,783,926 rows exceeded 1,000 claims/month across 30,143 billing NPIs, totaling $391 billion.

The single most extreme: NPI 1225163876 billed 1,607,071 claims in February 2022 for code 1286Z — but only to 1,906 beneficiaries, and was paid just $230K. That’s 842 claims per beneficiary in a single month. The low payment amount suggests these may be bulk capitation or encounter records rather than fee-for-service fraud, but the ratio is still bizarre.

The real volume monster is NPI 1376609297 (the same provider from the first row of the dataset) — they consistently bill 1.1-1.2 million claims per month for T1019, each month paying out $100M+. They occupy 19 of the top 20 highest-volume rows.

4. Spending spike detection

Fraudulent providers often show a “ramp and run” pattern — billing spikes dramatically before the entity disappears or gets caught. I used LAG() to compute month-over-month spending growth per provider and flagged anything with >5x growth where the spike month exceeded $50K:

spikes = con.sql(f"""
    WITH monthly AS (
        SELECT
            BILLING_PROVIDER_NPI_NUM,
            CLAIM_FROM_MONTH,
            SUM(TOTAL_PAID) as monthly_paid,
            SUM(TOTAL_CLAIMS) as monthly_claims,
            SUM(TOTAL_UNIQUE_BENEFICIARIES) as monthly_benes
        FROM '{PATH}'
        GROUP BY BILLING_PROVIDER_NPI_NUM, CLAIM_FROM_MONTH
    ),
    with_prev AS (
        SELECT *,
            LAG(monthly_paid) OVER (
                PARTITION BY BILLING_PROVIDER_NPI_NUM
                ORDER BY CLAIM_FROM_MONTH
            ) as prev_paid
        FROM monthly
    )
    SELECT
        BILLING_PROVIDER_NPI_NUM,
        CLAIM_FROM_MONTH,
        prev_paid,
        monthly_paid,
        monthly_paid / GREATEST(prev_paid, 1) as growth_ratio,
        monthly_claims,
        monthly_benes
    FROM with_prev
    WHERE monthly_paid / GREATEST(prev_paid, 1) > 5
      AND monthly_paid > 50000
      AND prev_paid IS NOT NULL
    ORDER BY growth_ratio DESC
""").df()

This query aggregates 227M rows to provider-month level, computes lag-based growth ratios via a window function, and filters — all pushed down into DuckDB’s execution engine. It ran in 91 seconds.

Result: 18,916 spike events across 13,527 providers.

The top spikes all share a pattern: prev_paid of $0 (or near-zero) followed by millions in the next month. NPI 1770700221 went from $0 to $4.76M in July 2023. NPI 1336117670 went from $0 to $4.74M in February 2018. These are providers that appear out of nowhere with massive billing.

NPI 1144347824 is interesting — they appear four times in the top 20 with spikes in Dec 2020, Feb 2021, Apr 2021, and Jun 2021. Repeated $0-to-$550K+ cycles suggest an on-again-off-again billing pattern.

5. Procedure concentration

Most providers bill across multiple procedure codes, even within a narrow specialty. A provider deriving >95% of all Medicaid revenue from a single HCPCS code at high dollar volumes can indicate upcoding (always picking the most expensive code) or phantom billing (billing for services never rendered using a single code).

concentrated = con.sql(f"""
    WITH prov_proc AS (
        SELECT
            BILLING_PROVIDER_NPI_NUM,
            HCPCS_CODE,
            SUM(TOTAL_PAID) as total_paid,
            SUM(TOTAL_CLAIMS) as total_claims,
            SUM(TOTAL_UNIQUE_BENEFICIARIES) as total_benes
        FROM '{PATH}'
        GROUP BY BILLING_PROVIDER_NPI_NUM, HCPCS_CODE
    ),
    prov_total AS (
        SELECT BILLING_PROVIDER_NPI_NUM,
               SUM(total_paid) as provider_total
        FROM prov_proc
        GROUP BY BILLING_PROVIDER_NPI_NUM
    ),
    ranked AS (
        SELECT
            p.*,
            t.provider_total,
            p.total_paid / t.provider_total as pct_of_total,
            ROW_NUMBER() OVER (
                PARTITION BY p.BILLING_PROVIDER_NPI_NUM
                ORDER BY p.total_paid DESC
            ) as rn
        FROM prov_proc p
        JOIN prov_total t USING (BILLING_PROVIDER_NPI_NUM)
        WHERE t.provider_total > 500000
    )
    SELECT * FROM ranked
    WHERE rn = 1 AND pct_of_total > 0.95
    ORDER BY provider_total DESC
""").df()

Result: 25,397 providers with >$500K total and >95% from one code, representing $193 billion.

The dominant code across the top 20 is T1019 (personal care services). NPI 1376609297 tops the list at $5.5 billion, 98% from T1019. The next five are also T1019 providers at $1-3B each. These are likely large home and community-based services (HCBS) agencies — the concentration on one code may be legitimate program design, but the scale demands scrutiny.

NPI 1932341898 stands out: $997M, 99.99% from H0044 (supported employment), with only two procedure codes total. That’s extreme even for this list.

Composite scoring

Each provider gets a binary flag for each of the five signals. The composite fraud score is just the sum.

import pandas as pd

all_providers = con.sql(f"""
    SELECT BILLING_PROVIDER_NPI_NUM,
           SUM(TOTAL_PAID) as total_spending
    FROM '{PATH}'
    GROUP BY BILLING_PROVIDER_NPI_NUM
""").df()

risk = all_providers.copy()

flag1_npis = set(outliers_cost['BILLING_PROVIDER_NPI_NUM'])
risk['flag_cost_outlier'] = risk['BILLING_PROVIDER_NPI_NUM'] \
    .isin(flag1_npis).astype(int)

threshold_mill = billing_mill['num_servicing_npis'].quantile(0.99)
flag2_npis = set(billing_mill[
    billing_mill['num_servicing_npis'] >= threshold_mill
]['BILLING_PROVIDER_NPI_NUM'])
risk['flag_billing_mill'] = risk['BILLING_PROVIDER_NPI_NUM'] \
    .isin(flag2_npis).astype(int)

flag3_npis = set(high_volume['BILLING_PROVIDER_NPI_NUM'])
risk['flag_high_volume'] = risk['BILLING_PROVIDER_NPI_NUM'] \
    .isin(flag3_npis).astype(int)

flag4_npis = set(spikes['BILLING_PROVIDER_NPI_NUM'])
risk['flag_spike'] = risk['BILLING_PROVIDER_NPI_NUM'] \
    .isin(flag4_npis).astype(int)

flag5_npis = set(concentrated['BILLING_PROVIDER_NPI_NUM'])
risk['flag_concentrated'] = risk['BILLING_PROVIDER_NPI_NUM'] \
    .isin(flag5_npis).astype(int)

flag_cols = [
    'flag_cost_outlier', 'flag_billing_mill',
    'flag_high_volume', 'flag_spike', 'flag_concentrated'
]
risk['fraud_score'] = risk[flag_cols].sum(axis=1)

This is the one step where I use pandas — the DuckDB queries already reduced 227M rows down to manageable result sets (tens of thousands of NPIs), so the set-membership lookups and joins are trivial.

Distribution:

Score	Providers
0	539,161
1	57,424
2	17,690
3	3,034
4	194
2+	20,918

The 20,918 providers with 2+ flags account for $535 billion — nearly half of all Medicaid spending in the dataset.

The top 5

For the highest-scoring providers, I pulled their full procedure breakdowns and monthly spending ranges:

for npi in top5:
    proc_breakdown = con.sql(f"""
        SELECT HCPCS_CODE,
               SUM(TOTAL_PAID) as paid,
               SUM(TOTAL_CLAIMS) as claims,
               SUM(TOTAL_UNIQUE_BENEFICIARIES) as benes
        FROM '{PATH}'
        WHERE BILLING_PROVIDER_NPI_NUM = '{npi}'
        GROUP BY HCPCS_CODE ORDER BY paid DESC LIMIT 5
    """).df()

NPI 1700090834 — Score 4/5 ($1.13B)

Flags: cost outlier, billing mill, high volume, spike

HCPCS_CODE         paid        claims    benes     pct
     H0019  553,957,700   2,884,496   155,197    53.2%
     H0004  188,211,147   1,572,012   505,021    18.1%
     H0005  108,477,327   1,623,733   203,179    10.4%
     H0020  103,053,473   7,020,805   276,149     9.9%
     H0006   88,021,893   1,166,081   255,743     8.4%

Monthly spending range: $37,576 — $25,877,770

All H-codes (behavioral health). H0019 alone is $554M for day treatment services across 155K beneficiaries. The monthly range swinging from $38K to $25.9M means there were massive ramp-up periods. This has the profile of a large behavioral health managed care entity — but one that also trips cost outlier and billing mill flags.

NPI 1932341898 — Score 4/5 ($997M)

Flags: cost outlier, high volume, spike, concentrated

HCPCS_CODE         paid        claims    benes     pct
     H0044  996,688,991     304,663   283,937   100.0%
     T2028      112,131       1,468        36     0.0%

Monthly spending range: $512,298 — $21,818,712

$997M from a single code. H0044 is supported employment — helping people with disabilities find and maintain jobs. 283,937 beneficiaries at an average of $3,511 each. Only two procedure codes ever billed. The 43x spread between minimum and maximum monthly spending ($512K to $21.8M) is hard to explain with normal program growth.

NPI 1114931391 — Score 4/5 ($382M)

Flags: cost outlier, high volume, spike, concentrated

HCPCS_CODE         paid        claims    benes     pct
     S3620  370,523,694   1,324,711   974,114    97.1%
     83655    8,882,287     685,614   667,070     2.3%
     85018    1,601,721     623,389   607,346     0.4%
     80061      691,330      45,838    44,922     0.2%
     82947       49,396      11,154    10,951     0.0%

Monthly spending range: $47 — $15,497,422

97% from S3620 (newborn screening panel). The monthly range from $47 to $15.5M is the most extreme spread in the top 5. This likely represents a state-contracted newborn screening lab — the code and beneficiary counts support that. But a $47 month to $15.5M month means either the contract changed drastically or there are data quality issues. The secondary codes (83655, 85018) are basic lab tests, consistent with a lab operation.

NPI 1629283197 — Score 4/5 ($306M)

Flags: billing mill, high volume, spike, concentrated

HCPCS_CODE         paid        claims    benes     pct
     T1015  298,897,046     718,309   655,601    99.1%
     99393      722,772      52,037    39,635     0.2%
     99392      688,366      47,287    38,251     0.2%
     99394      608,918      31,656    24,001     0.2%
     92014      591,683      27,987    16,169     0.2%

Monthly spending range: $594,623 — $5,028,749

99% from T1015 (clinic-based behavioral health). The secondary codes (99392-99394) are preventive care E&M visits, and 92014 is an eye exam — suggesting this billing entity has a clinic component too. But the billing mill flag means claims are flowing through many servicing NPIs under this one billing number.

NPI 1619341716 — Score 4/5 ($303M)

Flags: cost outlier, billing mill, high volume, spike

HCPCS_CODE         paid       claims    benes     pct
     99211   45,853,871    488,964   412,175    34.6%
     90832   37,594,923    253,130   125,580    28.4%
     99213   29,286,349    255,935   204,037    22.1%
     99214    9,967,573     88,118    73,463     7.5%
     99212    9,740,312     66,604    55,192     7.4%

Monthly spending range: $66,177 — $7,160,676

The most diversified billing pattern of the top 5. A mix of E&M visits (99211-99214) and psychotherapy (90832). The cost outlier flag means they’re charging significantly more per beneficiary than peers for these common codes. The billing mill flag means many servicing NPIs are operating under this one billing entity. The 108x spread between min and max monthly billing ($66K to $7.2M) indicates major scaling events.

Caveats

These are flags, not findings. Several patterns have legitimate explanations:

Large health systems and MCOs naturally have many servicing NPIs — that’s not fraud, it’s organizational structure. The billing mill signal needs to be interpreted in context.
State-contracted labs (like the newborn screening provider at NPI 1114931391) can have legitimate volume and spending spikes tied to contract awards and state program changes.
T1019/T1015/H-codes are commonly used in home and community-based services (HCBS) and behavioral health, where high volumes per billing entity may reflect program design rather than fraud. States often contract with large managed care entities for these services.
This data is aggregated — we can’t see individual claim details, diagnosis codes, patient demographics, or service locations. Many fraud patterns only become visible at the claim level.

The real value is in combining signals. A provider that’s both a cost outlier and has billing mill patterns and shows spending spikes is far more suspicious than one that only trips a single wire. The 194 providers at score 4/5 are the ones I’d start investigating.

Technical notes

The full analysis runs as a Jupyter notebook backed by DuckDB on a 16 GB Kubernetes pod. Some implementation details worth noting:

DuckDB’s parquet performance is remarkable. The z-score query (Signal 1) does a full scan → two-level aggregation → window function → filter, all on 227M rows of parquet. It completes in 192 seconds. The spending spike query uses LAG() window functions over the aggregated monthly data — 91 seconds. These are the kinds of queries that would require Spark or a data warehouse for most teams.

The pandas handoff is intentional. DuckDB returns result sets via .df() only after reducing 227M rows to thousands. The composite scoring step — five set-membership lookups and a sum — is trivial in pandas. There’s no reason to push that into SQL.

GREATEST(x, 1) prevents division by zero. Several providers have zero beneficiaries or zero prior-month spending. Rather than filtering them out (and potentially missing interesting cases), I clamp the denominator to 1. This means zero-bene providers get a per-bene cost equal to their total paid — which is correct behavior for flagging purposes.