From the Editor’s Desk: On The Student’s “AI Problem”

From the Editor’s Desk: Editor-in-Chief Anna Wang ’28 considers Max Froomkin ’28’s recent investigation on AI use in The Student, highlighting several issues with AI detectors while exploring ways to tackle the AI problem.

Dear readers of The Student,

My name is Anna Wang and I am one of the editors-in-chief of The Student. If you’ve written for us, you may have seen my name appear among the many suggestions and comments in your article. If you’ve been reading The Student, you may know me as the purple-haired sophomore who brings you big news and occasional satires.

Today, I write to discuss an issue that was recently and repeatedly brought to my attention: an “artificial intelligence (AI) problem” in our Opinion section. In an April 15 letter, Max Froomkin ’28 pointed out that 33 out of 109 Opinion articles written in the 2025-26 academic year were marked by the AI detection tool Pangram as containing AI, with 13 marked as “entirely AI written.” Since then, I have had many discussions with the Opinion section alone and our editorial board as a whole about the implications of these findings on our paper. 

I would like to begin by extending a sincere thank you to Froomkin, who painstakingly ran almost 15 years of Opinion articles through Pangram and presenting the data in a compelling way. Froomkin dedicated his time to show us an issue that had been flying under our radar, and I am grateful for his decision to call for improvement through a thoughtful letter instead of jumping to individual accusations, which would’ve been undeniably the easier option. I write today to reflect on the AI problem and to discuss potential plans moving forward. I also write to encourage you to share your thoughts candidly, as Froomkin has graciously done.

Froomkin’s research prompted me to take a closer look at AI use in our Opinion section, and provided a good foundation for it. What Froomkin’s findings did not show — and what I wanted to test — was whether the Opinion pieces identified as AI-generated were more or less evenly distributed among many Opinion writers, or if they were mostly written by a few. 

I ran all Opinion articles from the past three issues through both Pangram and GPTZero, which Froomkin also used for a second opinion. Out of 23 articles, Pangram listed one as “entirely AI-written.” Five were labeled “mixed,” with percentage of AI use ranging from 12% to 60%. Meanwhile, GPTZero listed two articles as “AI-generated”: one with moderate confidence, the other with high confidence. Among the articles classified as “human-written,” GPTZero noted only moderate confidence for three articles, suggesting 4% to 13% possible AI use. It was “uncertain” for one article, which had 34% possible AI use. 

I then selected several additional opinion articles from writers whose recent articles were flagged as “AI-generated” or “mixed” (only if these writers have previously written for us), and ran further trials. During this process, I began to see a pattern: These writers’ previous articles have a much higher possibility of being labeled as “AI-generated” or “mixed” than the rest of the Opinion section’s output.

What should I make of these numbers? An easy conclusion would be that I have pinpointed the writers who perhaps relied too much on AI. But I would invite you to consider another possibility — one that requires us to go the extra mile and consider the fundamental flaws in AI detectors’ mechanism. 

AI detectors rely on pattern recognition. Pangram, for example, uses machine learning to tokenize text input and transforms each token into a 0 or 1 output — 0 being human, 1 being AI — by “finding the subtle stylistic choices that AI uses consistently.” GPTZero detects AI use based on “perplexity” and “burstiness,” perplexity being the grammatical oddness and unpredictability of sentences, and burstiness being how sentences’ perplexity varies over the entire piece. This means that they have an inevitable tendency to flag certain writing styles as more likely to be AI-generated than others.

Many studies have shown that AI detectors have a bias against non-native English speakers since they use vocabulary richness and grammatical complexity as indicators of original work. As a result, predictable word choices and simple phrases often return high AI use percentages. Universities across the country, such as Vanderbilt University and University of Pittsburgh, have disabled Turnitin’s AI detector for work submission since 2023 due to this concern. Neurodivergent writers are also subject to higher false positive rates due to similar reasons, including a tendency to use repetition in their writing. In addition, developers of AI detectors have admitted these tools’ flaws: OpenAI, the developer of ChatGPT, disabled its AI classifier in June 2023, stating that “sometimes human-written text will be incorrectly but confidently labeled as AI-written.”

The question becomes more complicated when AI detectors disagree with each other. During my trials, an Opinion article labeled as 100% AI by Pangram was only listed as 46% AI by GPTZero. Another article labeled 100% human-written on Pangram showed up as 80% AI on GPTZero. More examples of such inconsistencies show that this is a common theme among AI detectors. Whether an article is labeled “human” or not can sometimes just be a matter of which AI detector you use, no matter how accurate these tools individually claim to be.

I am not, however, dismissing the self-proclaimed reliability of these detectors entirely as marketing gimmicks. Nor am I denying the existence of an AI problem. If you look back at Froomkin’s data, you’ll see that suspected AI use emerged a year or two after the introduction of ChatGPT, and these numbers seemed to have grown with the subsequent proliferation of other generative AI tools. Unless, around the time of the birth of large language models, Opinion writers started to adopt writing styles that these AI detectors happen to take issue with, the AI problem cannot be dismissed entirely as a fault of detectors.

That leaves us with a theory that despite possible false positives resulting from inherent flaws in these tools, at least some of these numbers accurately reflect AI usage in the Opinion section. Both Froomkin’s findings and my own analysis, admittedly much more limited in scope, appear to support this theory. The harder question, and the one I want to take up here, is what kind of AI involvement constitutes a problem.

Whether and to what extent The Student should permit the use of AI in Opinion articles is a much broader debate, and not a debate I am prepared to take on today. AI policies across major journalistic outlets remain rather vague: The New York Times, for example, outlines acceptable AI use in analyzing data and writing headlines, but when it comes to how AI should not be used, it only states “we don’t use [AI] to write articles, and journalists are ultimately responsible for everything that we publish.” But having run my own analysis, the lingering question in my mind is this: Is there a difference between an article that is entirely based on a human idea and human-written but edited by AI, and an article that is entirely AI-generated based on vague prompts from a human, and then edited by a human?

I am inclined to say that yes, there is a difference, and that the difference is a fundamental one. The first example is human originality at work. Editing, so long as done with restraint, does not change authorship. We edit each and every article before publication to strengthen its argument and improve clarity — sometimes with a heavy hand, but none of us are entitled to claim authorship of the piece no matter how much effort we’ve put in. The second example, however, is more problematic. Prompting ChatGPT to generate an article on a certain topic is neither human idea nor human writing, whether or not the generated text is then edited by humans. Editing does not confer authorship, and human editing of AI-generated text is no exception.

My understanding of AI detectors is that they mainly work at the words and sentences level, not at the ideas level. GPTZero has several categories that suggest AI use (“Mechanical Transitions,” “Formulaic Flow,” “Predictable Syntax,” and “Robotic Formality,” to name a few), all of which focus on words and sentences. Though Pangram does not provide exact reasoning for labeling sentences as AI-generated, its “evidence” tab lists the use of em-dashes and triplets as patterns that “help [users] see why AI text often reads differently” (some of these — in fact — I do use quite liberally in my own drafting, revising, and editing). Each tool comes with a function that flags words and sentences that are suspected to be AI-generated, but neither appears to be able to detect whether an idea comes from a human or AI.

If this is the case, then both examples I mentioned above — AI-edited human writing and  human-edited AI writing — will be flagged by detectors as AI-generated. But is it the right call to reject both articles solely on the basis of AI involvement in the writing and editing process? AI detectors are likely not well-suited to distinguish between the two examples, but we, as humans and not algorithms, should treat them differently. One is a genuine human idea that enriches the campus community. The other results in our readers unknowingly digesting text regurgitated by machines.

I take a lot of pride in the editorial board’s human editing process on every article, but this process, as I already said, does not validate or alter authorship. Therefore, authorship has to be addressed as a policy issue at the submission stage. But here I find myself running into the exact same problem that the college’s academic departments are struggling to tackle: How do we ensure that the work we receive is authentic? We have already seen departments shifting to bluebook exams due to concerns over AI use. Should The Student implement similar measures by requiring op-eds to be written in person, or even by hand, to prevent AI usage?

I don’t think that’s a good use of our time and effort. The authorship question can turn into an endless witchhunt, because there is often no truth to be found. There is no method that can prove an absolute presence or absence of AI in any article. The arms race between large language models and AI detectors is endless: as AI models become better trained, AI detectors adjust to respond, “humanizers” arise to help AI writing avoid detection, and AI writing becomes more elusive. At some point, the authorship question will become an unproductive pursuit and will only take away from the intellectual depth and diversity The Student aims to offer. I would especially advise against taking these discussions to the anonymous platform Fizz, which itself is built on an algorithm that incentivizes personal attacks and incendiary comments.

Let me be clear: The Student discourages AI use. We do not believe in AI-generated writing. Nor do we believe in AI-generated verdicts. As editor-in-chief, I will not take retrospective public action against any piece that was identified by machine learning algorithms as likely AI-generated. Instead, what I am considering is an emphasis on the requirement that no one should claim authorship over articles they did not write. This requirement might seem self-explanatory, but with the prevalence of AI, I see a need for an explicit reminder. I am also planning to hold more discussions with the editorial board to determine parameters for acceptable AI use in research, translation, and proofreading, just as the New York Times has done. We may not reach a unified AI policy soon, and we doubt the college’s administration will either, but I hope we can continue to evolve with new technologies and learn alongside each other.

I, along with the entire editorial board, approach readers’ concerns with careful consideration and deep introspection. I welcome your thoughts, reflections, and challenges to this article. As readers, your constructive criticism is one of the many ways we continue to grow as a paper and as student journalists. 

Finally, for those who are curious, both Pangram and GPTZero marked this article as 100% human with high confidence.