Why Machine Learning Models Fail in Fundraising

The Inescapable Buzz

I first became aware of predictive modeling in fundraising in the early 2010s, when people like Josh Birkholz¹ talked about it at Apra conferences, and large universities were building their own models. Models were the “new new” we all wanted but mostly couldn’t afford. Since then, many organizations have bought or built their own predictive models to guide fundraising strategies. Recently I was surprised to see we are still having the same debate in prospect development circles: True believers have seen them work and see them as vital tools, while skeptics have seen them completely fail to gain traction on a team, and see them as an expensive distraction. Why have we not progressed in 15 years?

First, a quick explanation of terms. Predictive modeling is the process of using historical data to make predictions about future outcomes. Think the Netflix recommendation engine or Amazon’s product recommendations. In fundraising, this often means predicting which donors are most likely to give, how much they might give, or when they might give. The technique used is typically machine learning (ML), which means using algorithms to identify patterns in data and make predictions based on those patterns. ML has been around since the late 1950s, and the common algorithms used for fundraising (logistic regression and decision trees) have been well-established for decades.

For your context in the age of artificial intelligence: ML is a subset of AI, distinct from generative AI, which focuses on creating new (“new”) content rather than making predictions based on existing data. The magic and explosive growth of gen AI is not ML; ML is comparatively boring, and that’s ok. We still like the advice on what to watch and shop for.

Deployment Failures

How does an ML model fail in fundraising? Assuming it is built and tested correctly, it should work just fine within the parameters of its accuracy. The ML development process includes a “test set” of data withheld from the data used to train the model; the model’s performance in predicting on the test set tells us how well it should work in the real world.

Ignore.md

One way an ML model can fail is if people ignore it. Many fundraisers start with some doubt and uncertainty about ML models, and with good reason: they are complex and hard to explain, and the results can be counterintuitive. Unlike wealth screening, ML models feel like a black box. I think there is a tendency for data scientists to play up the power and mystery of the models (and the highly paid consultants who built them), which feeds into the distrust.

So, the models are ignored, and we stick with capacity, RFM, engagement, affinity, etc. etc. It gets tiresome to keep pushing a model on someone who doesn’t want it, and eventually even the executive sponsor decides it was an expensive mistake and moves on. This is partly a stakeholder engagement issue, which is a topic on its own.

Wrong Tool for the Job

In other cases, fundraisers DO use the model but quickly lose trust when they see that it has scored their warmest, most engaged prospects lower than expected. Was their intuition wrong in cultivating this donor? Have they misread cues that they are ready to commit a major gift in the next 6 months? Likely not! In these cases, the model is not wiser than the human who has been having lunch with the prospect. It was not intended to tell them which prospects are actually not good, and to stop buying them lunch.

These models are typically intended to surface new prospects who have not yet gone through the human qualification process and been found to be a good prospect.

While you should definitely look at the scores for active prospects in validating the model, you may not want to add the data to their records. That will just highlight the weakness of the model every time a fundraiser sees it. If it’s intended to prioritize outreach to uncontacted major gift prospects, only score those.

Trust and Understanding

Often it’s not the model itself, but rather the end users’ lack of trust and understanding. If fundraisers don’t understand how the model works, why it makes certain recommendations, and how they are expected to use the model, they won’t use it effectively and the story becomes “the model sucks.”

Model Failures

Dumb Models

Why does a model fail? It is quite possible that the model does suck. Many vendors will make a big deal about how they will “use 100+ factors in the model” (power and mystery will cost you $30K+ per model), when in reality many algorithms will sort through those and pick the handful that are truly predictive, so it doesn’t take days and days to compute. And some vendors will sell you a “model” that is extremely generic and little more than RFM by another name.

Sparse Data

Especially in higher education, we can have a great deal of data on our prospects: employment history, addresses, family details, and how they engage with the organization. Heck, we might know what clubs they were in and who their roommates were. But the real stuff is buried in the very human interactions and relationships that are hard to quantify. Did a faculty member pay their last term’s tuition, allowing them to graduate? Did they fall in love with their future spouse in the library, sharing silent looks across carrels? Are they about to sell a large, low-profile data security firm?

If very thorough notes are stored in the CRM about conversations had with constituents, they can be analyzed with natural language processing to develop some insights. That’s a different type of ML work. But it’s not easy to do and you probably don’t want to spend an extra pile of money for a vendor to do that.

In cause or social service organizations, we tend to have much sparser data, and constituents may not have decades-long relationships with our organization. In these cases, a predictive model might not be feasible if all you’ve got is giving histories. In any case, be sure you have enough cases of the thing you are trying to predict. If you’ve only got a handful of examples, the model won’t be able to learn effectively.

In the end, it’s the constituents we haven’t yet spoken with that need an ML model to help us sort through. And the data we have may not be all that predictive of how someone will behave. Most organizations I’ve worked with need to focus their attention on current engagement data:

Put email opens and clicks in your CRM or a data warehouse.
Ditto for social media interactions, if you can afford a listening tool.
Track event attendance, volunteerism, and other interactions in your CRM.
Use this data to build engagement scores and develop insights.

This will be incredibly useful data to have on hand next time you think about a predictive model.

Closing Thoughts

We are still having the same debate about predictive modeling in 2026 because we haven’t fixed the fundamental issues with data quality, user trust, and understanding. I suspect the key here is that the skeptics refer to “the model.” The model for what?! If you deployed a predictive model and people aren’t clear on what it’s predicting, then of course they will think it’s junk and not use it.

Be sure you have sufficient data for modeling, and ask about the diagnostics.
Be clear what the model is for and how to use it.
Don’t put the scores on records where it isn’t needed.
Ensure users understand the limitations and assumptions of the model.

Building predictive models with ML is still a worthy investment. And its cousin gen AI is actually making it easier to do with tools like Cursor and Claude Code (but please do consult a professional). It’s the human work of storytelling and user engagement that will make for a successful deployment.

YEARS later I realize Josh was describing feature engineering in his metaphor of searching for the optimal viewing experience of a Star Wars re-release. Ahead of his time, he was. ↩