Why LLMs Won't Replace Risk Models

Every few months, someone asks me if LLMs will replace traditional credit risk models. The short answer is no. The longer answer is more interesting.

// The Appeal

I understand the attraction. LLMs can process unstructured data—bank statements, employment letters, customer service transcripts—that traditional models ignore. They can find patterns humans miss. They're flexible, powerful, and increasingly cheap to run.

But credit risk isn't a pattern recognition problem. It's a decision-making system with regulatory, ethical, and business constraints that LLMs aren't designed to handle.

// The Explainability Problem

When you deny someone credit, you need to tell them why. Not "the model said so"—an actual reason. The Equal Credit Opportunity Act requires adverse action notices with specific explanations.

Traditional models give you this for free. Logistic regression coefficients map directly to reason codes. "Your debt-to-income ratio exceeded our threshold" is a sentence you can write.

LLMs don't work this way. Their decisions emerge from billions of parameters interacting in ways we can't trace. You can bolt on explainability tools—SHAP values, attention weights—but these are approximations, not explanations.

// The Stability Problem

Credit models need to be stable. When you approve a loan, you're making a prediction about the next 3-5 years. Your model shouldn't change its mind because you retrained it on last month's data.

LLMs are sensitive to prompt phrasing, training data order, and random seeds. The same application processed twice might get different scores. That's not a bug in research—it's a catastrophe in production.

// Where LLMs Actually Help

None of this means LLMs are useless in credit risk. They're transforming the edges:

Document processing: Extracting income from bank statements, parsing employment letters
Customer service: Explaining decisions, answering policy questions
Feature engineering: Generating candidate variables from unstructured data for traditional models to evaluate
Fraud detection: Flagging anomalies for human review

The pattern: LLMs as tools that feed into or surround the core decision, not as the decision-maker itself.

// The Actual Future

Credit risk will evolve. Models will get better, data will get richer, decisions will get faster. But the core constraint remains: you're making high-stakes decisions about people's financial lives, and those decisions need to be explainable, stable, and fair.

LLMs are powerful. But power isn't the bottleneck. Trust is.