AI cheating tools are succeeding. The focus isn't on detection.
Videos promoting AI for homework assistance are ubiquitous, consistently advertising that it’s safe from detection. A New York Times investigation reveals that TikTok and YouTube are now filled with guides offering students two types of tools. Humanisers modify AI-generated content to avoid chatbot-like language, while Autotypers cleverly insert words into a document over an extended period, simulating typos, deletions, and revisions to create the illusion of authentic writing. Both tools are designed to bypass the software educators use to identify AI-generated work.
Here’s the troubling aspect: some companies providing detection solutions also market the tools that bypass them. Grammarly, currently owned by Superhuman, offers educators an “authorship” checker to analyze a document’s history for signs of AI involvement. Simultaneously, it can generate original text, enhance it, and rephrase parts that may trigger detection. GPTZero, initially developed as a Princeton thesis, is capable of producing complete papers, complete with citations, in mere seconds. The NYT discovered a marketer had created a fictitious teaching assistant persona on TikTok to promote these tools to students.
Jenny Maxwell, who oversees education at Superhuman, candidly stated the implications of this situation, describing the ongoing competition between detection and evasion as a “dead end.” Her conclusion: “Bigger cat, bigger mouse.”
Moreover, the effectiveness of these detectors is questionable. Research from the University of Florida assessed the five leading AI text detectors and found false-negative rates soaring as high as 99.6 percent, with minor vocabulary changes easily rendering many of them ineffective, according to Digital Trends. The tools also produce false positives, especially flagging non-native English speakers. Schools relying on these detectors for disciplinary actions are therefore on shaky ground, as the technology they depend on is, by its creators' admission, faltering.
In response to these challenges, educational institutions are adapting, with measures ranging from sensible to extreme. On a rational note, Harvard professors are increasingly favoring oral and handwritten exams, which cannot be taken by AI. On the extreme end of the spectrum is coercive action. In India, measures were taken to block Telegram for several days to prevent cheating during its national medical school entrance exam after the test was annulled and postponed due to a suspected leak. This exam attracts over two million candidates vying for approximately 100,000 spots. Digital rights organizations criticized the shutdown as excessive, reflecting a broader trend of governments implementing harsh measures against AI misuse.
Looking at the bigger picture, the anxiety surrounding cheating appears to stem from an enduring issue: the education system has reduced learning to a single numerical value, the grade. Philosopher C. Thi Nguyen refers to this as “value capture,” where an external metric takes precedence over what it was designed to measure. In his book "The Score," recently reviewed by MIT Technology Review, he uses GPA as a prime example, explaining how students shift their focus from understanding to merely attaining grades. This exemplifies Goodhart's Law: when a metric becomes a goal, it loses its validity as a measure.
AI serves as the most effective tool yet devised for optimizing this target. If the objective of an essay is purely the score rather than the thought process behind it, delegating the thinking becomes a rational choice, even as research highlights that this cognitive offloading undermines genuine skill development.
Even those creating this technology express unease. Jack Clark, co-founder of Anthropic, remarked to the BBC that the industry has a “gas pedal,” but lacks a “brake pedal,” noting that his company’s model is now responsible for writing much of its code. Anthropic has called for a coordinated halt on advanced AI developments. Meanwhile, Maxwell argues that denying students access to AI constitutes “educational malpractice,” as they are bound to utilize it in the workforce regardless.
Both perspectives hold merit. The competition in detection cannot be won, and identification was never the core issue. The more challenging question, which schools have sidestepped for decades, concerns the true purpose of grades. AI did not create this dilemma; it merely brought it to the forefront, making it impossible to ignore. Until a resolution is found, the metaphorical larger cat will continue pursuing the bigger mouse, with the mouse perpetually evading capture.
Other articles
AI cheating tools are succeeding. The focus isn't on detection.
AI humanizers and autotypers have now outperformed the detectors designed to identify cheaters. The core issue lies not in the tools themselves, but in what schools decide to evaluate.
