When I was a little girl, I used to gaze at the traffic out the car window and study the numbers on license plates. I would reduce each one to its basic elements — the prime numbers that made it up. 45 = 3 x 3 x 5. That’s called factoring, and it was my favorite investigative pastime. As a budding math nerd, I was especially intrigued by the primes.
My love for math eventually became a passion. I went to math camp when I was 14 and came home clutching a Rubik’s Cube to my chest. Math provided a neat refuge from the messiness of the real world. It marched forward, its field of knowledge expanding relentlessly, proof by proof. And I could add to it. I majored in math in college and went on to get my Ph.D. My thesis was on algebraic number theory, a field with roots in all that factoring I did as a child. Eventually, I became a tenure-track professor at Barnard, which had a combined math department with Columbia University.
And then I made a big change. I quit my job and went to work as a quant [quantitative analyst] for D.E. Shaw, a leading hedge fund. In leaving academia for finance, I carried mathematics from abstract theory into practice. The operations we performed on numbers translated into trillions of dollars sloshing from one account to another. At first I was excited and amazed by working in this new laboratory, the global economy. But in the autumn of 2008, after I’d been there for a bit more than a year, it came crashing down. The crash made it all too clear that mathematics, once my refuge, was not only deeply entangled in the world’s problems but also fueling many of them. The housing crisis, the collapse of major financial institutions, the rise of unemployment — all had been aided and abetted by mathematicians wielding magic formulas. What’s more, thanks to the extraordinary powers that I loved so much, math was able to combine with technology to multiply the chaos and misfortune, adding efficiency and scale to systems that I now recognized as flawed.
If we had been clear-headed, we all would have taken a step back at this point to figure out how math had been misused and how we could prevent a similar catastrophe in the future. But instead, in the wake of the crisis, new mathematical techniques were hotter than ever and expanding into still more domains. They churned 24/7 through petabytes of information, much of it scraped from social media or e-commerce websites. And increasingly they focused not on the movements of global financial markets but on human beings, on us. Mathematicians and statisticians were studying our desires, movements, and spending power. They were predicting our trustworthiness and calculating our potential as students, workers, lovers, criminals.
This was the Big Data economy, and it promised spectacular gains. A computer program could speed through thousands of résumés or loan applications in a second or two and sort them into neat lists, with the most promising candidates on top. This not only saved time but also was marketed as fair and objective. After all, it didn’t involve prejudiced humans digging through reams of paper, just machines processing cold numbers. By 2010 or so, mathematics was asserting itself as never before in human affairs, and the public largely welcomed it.
Yet I saw trouble. The math-powered applications powering the data economy were based on choices made by fallible human beings. Some of these choices were no doubt made with the best intentions. Nevertheless, many of these models encoded human prejudice, misunderstanding, and bias into the software systems that increasingly managed our lives. Like gods, these mathematical models were opaque, their workings invisible to all but the highest priests in their domain: mathematicians and computer scientists. Their verdicts, even when wrong or harmful, were beyond dispute or appeal. And they tended to punish the poor and the oppressed in our society while making the rich richer.
I came up with a name for these harmful kinds of models: Weapons of Math Destruction, or WMDs for short. I’ll walk you through an example, pointing out its destructive characteristics along the way.
As often happens, this case started with a laudable goal. In 2007, Washington, D.C.’s new mayor, Adrian Fenty, was determined to turn around the city’s underperforming schools. He had his work cut out for him: At the time, barely one out of every two high school students was persisting to graduation after ninth grade, and only 8 percent of eighth graders were performing at grade level in math. Fenty hired an education reformer named Michelle Rhee to fill a powerful new post, chancellor of Washington’s schools.
The going theory was that the students weren’t learning enough because their teachers weren’t doing a good job. So in 2009, Rhee implemented a plan to weed out the low-performing teachers.
This is the trend in troubled school districts around the country, and from a systems engineering perspective the thinking makes perfect sense: Evaluate the teachers. Get rid of the worst ones, and place the best ones where they can do the most good. In the language of data scientists, this “optimizes” the school system, presumably ensuring better results for the kids. Except for “bad” teachers, who could argue with that? Rhee developed a teacher assessment tool called IMPACT, and at the end of the 2009–10 school year, the district fired all the teachers whose scores put them in the bottom 2 percent. At the end of the following year, another 5 percent, or 206 teachers, were booted out.
Sarah Wysocki, a fifth-grade teacher, didn’t seem to have any reason to worry. She had been at MacFarland Middle School for only two years but was already getting excellent reviews from her principal and her students’ parents. One evaluation praised her attentiveness to the children; another called her “one of the best teachers I’ve ever come into contact with.”
Yet at the end of the 2010–11 school year, Wysocki received a miserable score on her IMPACT evaluation. Her problem was a new scoring system known as value-added modeling, which purported to measure her effectiveness in teaching math and language skills. That score, generated by an algorithm, represented half of her overall evaluation, and it outweighed the positive reviews from school administrators and the community. This left the district with no choice but to fire her, along with 205 other teachers who had IMPACT scores below the minimal threshold.
There was a logic to the school district’s approach. Administrators, after all, could be friends with terrible teachers. They could admire their style or their apparent dedication. Bad teachers can seem good. So Washington, like many other school systems, would minimize this human bias and pay more attention to scores based on hard results: achievement scores in math and reading. The numbers would speak clearly, district officials promised. They would be more fair.
Wysocki, of course, felt the numbers were horribly unfair, and she wanted to know where they came from. “I don’t think anyone understood them,” she later told me. How could a good teacher get such dismal scores?
Well, she learned, it was complicated. The district had hired a consultancy, Princeton-based Mathematica Policy Research, to come up with the evaluation system. Mathematica’s challenge was to measure the educational progress of the students in the district and then to calculate how much of their advance or decline could be attributed to their teachers. This wasn’t easy, of course.
The researchers knew that many variables, from students’ socioeconomic backgrounds to the effects of learning disabilities, could affect student outcomes. The algorithms had to make allowances for such differences, which was one reason they were so complex.
Attempting to reduce human behavior, performance, and potential to algorithms is no easy job. “There are so many factors that go into learning and teaching that it would be very difficult to measure them all,” Wysocki says. What’s more, attempting to score a teacher’s effectiveness by analyzing the test results of only 25 or 30 students is statistically unsound, even laughable.
The numbers are far too small given all the things that could go wrong. Indeed, if we were to analyze teachers with the statistical rigor of a search engine, we’d have to test them on thousands or even millions of randomly selected students. Statisticians count on large numbers to balance out exceptions and anomalies. (And WMDs often punish individuals who happen to be the exception.)
Equally important, statistical systems require feedback — something to tell them when they’re off track. Statisticians use errors to train their models and make them smarter. If Amazon.com, through a faulty correlation, started recommending lawn care books to teenage girls, the clicks would plummet, and the algorithm would be tweaked until it got it right. Without feedback, however, a statistical engine can continue spinning out faulty and damaging analysis while never learning from its mistakes.
Many WMDs behave like that. They define their own reality and use it to justify their results. This type of model is self-perpetuating, highly destructive — and very common.
When Mathematica’s scoring system tags Sarah Wysocki and 205 other teachers as failures, the district fires them. But how does it ever learn if it was right? It doesn’t. The system itself has determined that they were failures, and that is how they are viewed. Two hundred and six “bad” teachers are gone. That fact alone appears to demonstrate how effective the value-added model is.
It is cleansing the district of underperforming teachers. Instead of searching for the truth, the score comes to embody it.
This is just one example of a WMD feedback loop. Others include employers who are increasingly using credit scores to evaluate potential hires. Those who pay their bills promptly, the thinking goes, are more likely to show up to work on time and follow the rules. In fact, there are plenty of responsible people and good workers who suffer misfortune and see their credit scores fall.
But the belief that bad credit correlates with bad job performance leaves those with low scores less likely to find work. Joblessness pushes them toward poverty, which further worsens their scores, making it even harder for them to land a job. It’s a downward spiral. And employers never learn how many good employees they’ve missed out on by focusing on credit scores. In WMDs, many poisonous assumptions are camouflaged by math and go largely untested and unquestioned.
For years, Washington teachers complained about the arbitrary scores and clamored for details on what went into them. It’s an algorithm, they were told. It’s very complex. That’s the nature of WMDs. The analysis is outsourced to coders and statisticians. And as a rule, they let the machines do the talking.
You cannot appeal to a WMD. That’s part of their fearsome power. They do not listen. Nor do they bend. They’re deaf not only to charm, threats, and cajoling but also to logic — even when there is good reason to question the data that feed their conclusions. Yes, if it becomes clear that automated systems are screwing up on an embarrassing and systematic basis, programmers will go back in and tweak the algorithms. But for the most part, the programs deliver unflinching verdicts, and the human beings employing them can only shrug, as if to say, “Hey, what can you do?” The human victims of WMDs are held to a far higher standard of evidence than the algorithms themselves.
After the shock of her firing, Sarah Wysocki was out of a job for only a few days. She had plenty of people, including her principal, to vouch for her as a teacher, and she promptly landed a position at a school in an affluent district in northern Virginia. So thanks to a highly questionable model, a poor school lost a good teacher, and a rich school, which didn’t fire people on the basis of their students’ scores, gained one.
Ill-conceived mathematical models now micromanage the economy, from advertising to prisons. These WMDs have many of the same characteristics as the model that derailed Sarah Wysocki’s career in Washington’s public schools. They’re opaque, unquestioned, and unaccountable, and they operate at a scale to sort, target, or “optimize” millions of people. By confusing their findings with on-the-ground reality, most of them create pernicious WMD feedback loops.
But there’s one important distinction between a school district’s model and, say, a WMD that scouts out prospects for extortionate payday loans. They have different payoffs. For the school district, the payoff is a kind of political currency, a sense that problems are being fixed. But for businesses, it’s just the standard currency: money. For many of the businesses running these rogue algorithms, the money pouring in seems to prove that their models are working. Look at it through their eyes and it makes sense. When they’re building statistical systems to find customers or manipulate desperate borrowers, growing revenue appears to show that they’re on the right track. The software is doing its job. The trouble is that profits end up serving as a stand-in or proxy for truth. This dangerous confusion crops up again and again.
This happens because data scientists all too often lose sight of the folks on the receiving end of the transaction. They certainly understand that a data-crunching program is bound to misinterpret people a certain percentage of the time, putting them in the wrong groups and denying them a job or a chance at their dream house. But as a rule, the people running the WMDs don’t dwell on those errors. Their feedback is money, which is also their incentive. Their systems are engineered to gobble up more data and fine-tune their analytics so that more money will pour in. Investors, of course, feast on these returns and shower WMD companies with more money.
But the poor are hardly the only victims of WMDs. Far from it. Malevolent models can blacklist qualified job applicants and dock the pay of workers who don’t fit a corporation’s picture of ideal health. These WMDs hit the middle class as hard as anyone. Even the rich find themselves micro-targeted by political models. And they scurry about as frantically as the rest of us to satisfy the remorseless WMD that rules college admissions and pollutes higher education.
It’s also important to note that these are the early days. Naturally, payday lenders and their ilk start off by targeting the poor and the immigrants. Those are the easiest targets, the low-hanging fruit. They have less access to information, and more of them are desperate. But WMDs generating fabulous profit margins are not likely to remain cloistered for long in the lower ranks. That’s not the way markets work. They’ll evolve and spread, looking for new opportunities. WMDs are targeting us all. And they’ll continue to multiply, sowing injustice, until we take steps to stop them.
How do we start to regulate the mathematical models that run more and more of our lives? I would suggest that the process begin with the modelers themselves. Like doctors, data scientists should pledge a “First do no harm” Hippocratic Oath, one that focuses on the possible misuses and misinterpretations of their models. Following the market crash of 2008, two financial engineers, Emanuel Derman and Paul Wilmott, drew up such an oath. It reads in part:
~ I will remember that I didn’t make the world, and it doesn’t satisfy my equations.
~ Though I will use models boldly to estimate value, I will not be overly impressed by mathematics.
That’s a good philosophical grounding. But solid values and self-regulation rein in only the scrupulous. To eliminate WMDs, our laws need to change, too.
Data is not going away. Nor are computers — much less mathematics. Predictive models are, increasingly, the tools we will be relying on to run our institutions, deploy our resources, and manage our lives. But these models are constructed not just from data but from the choices we make about which data to pay attention to — and which to leave out. Those choices are not just about logistics, profits, and efficiency. They are fundamentally moral.
We must come together to police these WMDs, to tame and disarm them. My hope is that they’ll be remembered, like the deadly coal mines of a century ago, as relics of the early days of this new revolution, before we learned how to bring fairness and accountability to the age of data. Math deserves much better than WMDs, and democracy does too.
Cathy O’Neil is a data scientist and author of the blog mathbabe.org. She earned a Ph.D. in mathematics from Harvard and taught at Barnard College before working for the hedge fund D.E. Shaw. O’Neil is co-author of Doing Data Science: Straight Talk from the Frontline and appears weekly on the Slate Money podcast.
This article is featured in the March/April 2017 issue of The Saturday Evening Post. Subscribe to the magazine for more art, inspiring stories, fiction, humor, and features from our archives.