Skip to main content
Back to blog

The $1 Trillion Fraud Problem: A Data-Driven Analysis

Fraud costs the United States over $1 trillion annually across healthcare, tax evasion, government programs, consumer fraud, and wage theft. 50+ investigations, 330 datasets, 20+ federal sources.

frauddata analysisgovernmentmachine learning
By Josh Elberg
Share:

Add up every category of fraud in the United States and the total exceeds $1 trillion per year. That is not a guess. It is the sum of documented estimates from the IRS, HHS, DOJ, FTC, DOL, FBI, and other federal agencies.

We spent months building The State of Fraud in America, a 50+ page investigation covering every major fraud category. 330 datasets. 20+ federal sources. Machine learning models applied to public data. Here is what we found.

The Categories

Fraud in America is not one problem. It is dozens of distinct problems that share common characteristics: information asymmetry, enforcement gaps, and economic incentives that make fraud profitable relative to the probability of getting caught.

Tax gap: $696 billion. The single largest category. The IRS estimates this amount goes unpaid annually. Audit rates fell 68% since 2010. Full analysis.

Healthcare fraud: $60+ billion. We analyzed 1.38 million Medicare providers and flagged 73,245 for upcoding patterns. 380 providers on the OIG exclusion list still appeared in billing data. $786 million in estimated excess payments from upcoding alone. Full analysis.

PPP and pandemic fraud: $200+ billion. Our Isolation Forest model flagged $32 billion in anomalous PPP loans from the SBA dataset. 968,522 loans analyzed. $4.67 trillion in total pandemic relief with minimal cross-program checks. Full analysis.

Identity theft: $11 billion. 1.1 million FTC reports in 2024. Credit card fraud, synthetic identities, and government benefits fraud. Seniors lost $4.9 billion to cybercrime alone. Full analysis.

Wage theft: $50+ billion (estimated). DOL recovered $4.5 billion in 363,000+ enforcement actions. The total including unreported cases exceeds all property crime combined. Full analysis.

Nursing home fraud and abuse. 14,703 facilities analyzed. $471 million in fines. 1-star homes bill Medicare 7.2% more than 5-star homes. PE-owned chains average 2.26 stars vs 3.03 for non-PE. Full analysis.

These six categories alone exceed $1 trillion. And they do not include insurance fraud, securities fraud, government contracting fraud, environmental violations, or dozens of other categories that our investigation also covers.

What Connects Them

Across all categories, three patterns repeat:

Enforcement has not kept pace with the problem. IRS audit rates are down 68%. The DOL investigates a fraction of workplaces. CMS nursing home inspections are infrequent. Agencies do not have the resources to act on the signals their own data produces.

Cross-referencing catches what siloed analysis misses. When we connected datasets across agencies, new patterns emerged. PPP borrowers who also appeared in Medicare anomaly flags. Nursing home chains with both high fines and high Medicare billing. Entities flagged independently by multiple statistical models. Our cross-dataset analysis found 7 novel correlations.

The data exists. The analysis is lagging. Every dataset in our investigation is public. CMS, SBA, IRS, FTC, DOL, FBI, SEC, and other agencies publish the underlying data. The anomalies are detectable with standard statistical methods. Our proven-patterns test showed that 9 out of 10 confirmed fraud cases could have been flagged using public data before enforcement action.

The Methods

We applied different analytical methods depending on the dataset:

  • Isolation Forest for PPP loan anomaly detection
  • Beneish M-Score for corporate financial statement analysis
  • Z-score outlier detection for Medicare billing patterns
  • Geographic clustering for opioid prescribing networks
  • Complaint velocity analysis for CFPB consumer data
  • Cross-entity matching across federal databases

Every method is documented. Every data source is cited. The analysis is designed to be reproducible.

Why This Report Exists

Most fraud coverage focuses on individual cases: a doctor who billed $50 million in fake claims, a PPP borrower who bought a Lamborghini. Those stories are real, but they obscure the systemic picture.

The systemic picture is that fraud is a structural feature of how government programs are designed, funded, and enforced. The tax gap exists because audit rates are near zero for most taxpayers. PPP fraud happened because verification was removed by design. Wage theft persists because enforcement is fragmented and underfunded.

Understanding the systemic picture requires looking at the data across all categories simultaneously. That is what this investigation does.

Explore the Full Report

The complete investigation includes 50+ interactive pages with charts, data tables, and methodology documentation. Every data point is traceable to a federal source.

View The State of Fraud in America

About the Author

Founder & Principal Consultant

Josh helps SMBs implement AI and analytics that drive measurable outcomes. With experience building data products and scaling analytics infrastructure, he focuses on practical, cost-effective solutions that deliver ROI within months, not years.

Get practical AI & analytics insights delivered to your inbox

No spam, ever. Unsubscribe anytime.

Related Posts

We tested our statistical models against 10 proven fraud cases, from Feeding Our Future ($250M) to Wells Fargo ($3B). 9 out of 10 were flagged before enforcement action.

March 27, 2026

We analyzed 968,522 PPP loans using Isolation Forest ML. $32 billion in anomalous patterns detected: round dollar amounts, fake employees, shared addresses. Here is what the data reveals.

March 27, 2026

1.1 million identity theft reports filed with the FTC in 2024, totaling $11 billion in losses. Credit card fraud, synthetic identities, and which states are hit hardest.

March 27, 2026

Ready to discuss your needs?

I work with SMBs to implement analytics and adopt AI that drives measurable outcomes.