Jul 10, 2020 7:00 AM

Meet the Secret Algorithm That's Keeping Students Out of College

The International Baccalaureate program canceled its high-stakes exam because of Covid-19. The formula it used to "predict" scores puzzles students and teachers.

The AI Database →

Application

Ethics

Prediction

Sector

Education

Technology

Machine learning

EIGHTEEN-YEAR-OLD ANAHITA NAGPAL fears her plans to start training this fall to be a doctor have been derailed by a statistical model.

Nagpal, who lives in Göttingen, Germany, had been offered a premed place and scholarship at NYU. Her acceptance was dependent on her results in the International Baccalaureate diploma, a two-year high school program recognized by colleges and taken by more than 170,000 students this year, most in the US. But she scored more poorly than expected.

Teen regrets about grades aren’t unusual, but the way the foundation behind the IB Diploma Programme calculated this year’s grades was. The results, released Monday, were determined by a formula that IB, the foundation behind the program, hastily deployed after canceling its usual springtime exams due to Covid-19. The system used signals including a student’s grades on assignments and grades from past grads at their school to predict what they would have scored had the pandemic not prevented in-person tests.

Nagpal and many other students, parents, and teachers say those predictions misfired. Many students received suspiciously low scores, they say, shattering their plans for the fall and beyond. Nagpal's backup plan if she missed out on NYU was to study medicine in Germany, but she doesn't think her lower-than expected grades will qualify her for a place. "Like so many, I was extremely shocked," she says. Nagpal later received an email from NYU saying it has not made a decision on her admission. NYU said it does not comment on individual cases.

More than 15,000 parents, students, and teachers have signed an online petition asking IB to “take a different approach with their grading algorithm and to make it fairer.” The foundation declined to answer questions about its system but said it had been checked against five years of past results and that disappointed students could use its existing appeals process, which comes with a fee. The foundation released summary statistics showing that this year’s average score was slightly higher than last year’s, and it says the distribution of grades was similar.

One math teacher at a school in the Middle East says IB should disclose the full workings of its model for outside scrutiny. He and a colleague with a math PhD have been puzzling over its design since several students lost scholarships to top universities, after receiving results much lower than expected by their teachers. Some students caught out are now unsure how they’ll pay for college. “My only guess is a flawed model,” he says.

Concerns about flawed math models are growing as more companies and governments apply computers to traditionally human problems such as bail decisions, identifying criminal suspects, and deciding what is hate speech. Rooting out bias and inaccuracy in such systems is a growing field of activism and academia.

People questioning IB’s algorithm-derived grades are now raising some of the same issues. They’re wondering how the system was designed and tested, why its workings weren’t fully disclosed, and whether it makes sense to use a formula to determine the grades that can shape a person’s opportunities in life.

When Covid-19 seized hold of the world in March, many teens in their final year of high school were left in a precarious position. Shelter-in-place orders made it challenging or impossible to complete the final assignments or tests that could determine their college and life choices.

Test providers scrambled to devise new ways to assess students. In the US, Educational Testing Service, which provides the GRE, and the College Board, which runs AP Exams, moved their tests online. That brought quirks and glitches—like requiring students to take their tests simultaneously regardless of time zone and retakes forced by technical errors—but it maintained a semblance of the normal process.

IB, headquartered in Geneva, opted to use a statistical formula instead—adding to the growing list of tech fixes proposed to automate away fallout from the pandemic. The workings of the IB diploma—and the timing of the results—proved particularly harmful for IB students applying to US colleges. Unlike AP tests, which are typically separate from high school grades, the IB results are intended to reflect a student’s work for the year. IB students are often granted college admission based on predicted grades, and they submit their final results when they become available over the summer. Some colleges, including NYU and Northeastern, warn on their admissions pages that students whose IB results don’t get close enough to those predictions may lose their place.

In normal times, IB diploma students select six subjects, from options such as physics and philosophy, and receive final grades determined in part by assignments but mostly by written tests administered in the spring. The program is offered by nearly 900 public schools in the US and is common in international schools around the world. In March, IB canceled all tests and said it would calculate each student’s final grades using a method developed by an unnamed educational organization that specializes in data analysis.

The idea was to use prior patterns to infer what a student would have scored in a 2020 not dominated by a deadly pandemic. IB did not disclose details of the methodology but said grades would be calculated based on a student’s assignment scores, predicted grades, and historical IB results from their school. The foundation said grade boundaries were set to reflect the challenges of remote learning during a pandemic. For schools where historical data was lacking, predictions would build on data pooled from other schools instead.

In a video IB posted about the process, Antony Furlong, the foundation’s manager for assessment research and design, said the system essentially created “a bespoke equation” for every school.

One visual arts teacher at a US school says what she and coworkers have seen suggests it wasn’t well tailored. “When I saw the marks, I was floored,” she says. “I am always conservative in my predicted grades, but every single student except one were downgraded.” Of 15 students she works with, four have to rethink their plans for this fall, because they missed out on college places, something she didn’t expect for any of them.

Determining whether IB’s system had flaws is challenging without knowing its formula or the inputs and outputs. Just because some humans don’t like the outputs of a data analysis doesn’t mean that it’s incorrect. But Suresh Venkatasubramanian, a professor at the University of Utah who studies the social consequences of automated decisionmaking, says it appears IB could have deployed its system more responsibly. “All this points to what happens when you try to install some sort of automated process without transparency,” he says. “The burden of proof should be on the system to justify its existence.”

Data analysis is more powerful than ever but remains far from being able to predict complex future human actions. Models that extrapolate from past statistical trends can end up treating people unfairly because their circumstances are different, even if results match past patterns on average.

Venkatasubramanian says that basing a student’s grades on past trends at their school, potentially unrelated to the student’s own school career, could be unfair. Using data from other schools—as IB did for schools with little track record—is a “red flag,” he says, because it would mean some students’ grades were calculated differently than others.

Constance Lavergne, whose son in the UK received lower-than-expected IB grades and missed out on his preferred college, is one of many parents struggling to understand what happened. She says her experience working closely with data analysts in the tech industry makes her suspicious of IB’s methodology. It would naturally generate noisier results for smaller classes, like her son’s, because they offer fewer past data points, she suggests. “There’s something wrong with the algorithm,” Lavergne says.

The math teacher in the Middle East said he believed his school had suffered because of how IB announced and calibrated its model. Students at the school submitted their assignments before IB said those assignments would help steer the grading model. Some IB students at other schools had not yet submitted those assignments, allowing them to put in extra effort, aided by knowing they didn’t have to prepare for exams. This weekend, he plans to work with his math PhD colleague and a software package to probe where the IB formula may have gone wrong.

Many students who received disappointing results are now looking to November, when IB typically offers a second round of in-person tests and they can take the written test that was canceled. Nagpal, the frustrated medical student, intends to take part, at a cost of about €700 ($791). If Covid-19 disrupts those tests too, she hopes IB will move them online rather than try any more experiments in data-led grading.

Updated, 7-13-20, 7:30pm ET: An earlier version of this article incorrectly said NYU had rescinded Anahita Nagpal's acceptance.