Half-Baked Thoughts on ChatGPT and the College Essay

By: Dan Nexon — June 2^nd 2023 at 15:19

The Chronicle of Higher Education recently ran a piece by Owen Kichizo Terry, an undergraduate at Columbia University, on how college students are successfully using ChatGPT to produce their essays.

The more effective, and increasingly popular, strategy is to have the AI walk you through the writing process step by step. You tell the algorithm what your topic is and ask for a central claim, then have it give you an outline to argue this claim. Depending on the topic, you might even be able to have it write each paragraph the outline calls for, one by one, then rewrite them yourself to make them ﬂow better.

As an example, I told ChatGPT, “I have to write a 6-page close reading of the Iliad. Give me some options for very speciﬁc thesis statements.” (Just about every ﬁrst-year student at my university has to write a paper resembling this one.) Here is one of its suggestions: “The gods in the Iliad are not just capricious beings who interfere in human affairs for their own amusement but also mirror the moral dilemmas and conﬂicts that the mortals face.” It also listed nine other ideas, any one of which I would have felt comfortable arguing. Already, a major chunk of the thinking had been done for me. As any former student knows, one of the main challenges of writing an essay is just thinking through the subject matter and coming up with a strong, debatable claim. With one snap of the ﬁngers and almost zero brain activity, I suddenly had one.

My job was now reduced to defending this claim. But ChatGPT can help here too! I asked it to outline the paper for me, and it did so in detail, providing a ﬁve-paragraph structure and instructions on how to write each one. For instance, for “Body Paragraph 1: The Gods as Moral Arbiters,” the program wrote: “Introduce the concept of the gods as moral arbiters in the Iliad. Provide examples of how the gods act as judges of human behavior, punishing or rewarding individuals based on their actions. Analyze how the gods’ judgments reﬂect the moral codes and values of ancient Greek society. Use speciﬁc passages from the text to support your analysis.” All that was left now was for me to follow these instructions, and perhaps modify the structure a bit where I deemed the computer’s reasoning ﬂawed or lackluster.

The kid, who just completed their first year at Williams, confirms that this approach is already widespread at their campus.

I spent a few hours yesterday replicating the process for two classes in my rotation: one the politics of science fiction, the other on global power politics. Here are my takeaways about the current “state of play.”

First, professors who teach courses centered on “classic” literary and political texts need to adapt yesterday. We don’t expect students to make original arguments about Jane Austen or Plato; we expect them to wrestle with “enduring” issues (it’s not even clear to me what an “original” argument about Plato would look like). ChatGPT has—as does any other internet-based LLM—access to a massive database of critical commentary on such venerable texts. These conditions make the method very effective.

Second, this is also true for films, television, popular novels, and genre fiction. I ran this experiment on a few of the books that cycle on and off my “science-fiction” syllabus—including The Fifth Head of Cerberus, The Dispossessed, The Forever War, and Dawn—and the outcomes were pretty similar to what you’d expect from “literary” classics or political philosophy.

Third, ChatGPT does significantly less well with prompts that require putting texts into dialogue with one another. Or at least those that aren’t fixtures of 101 classes.

For example, I asked ChatGPT to help me create an essay that reads The Forever War through Carl Schmitt’s The Concept of the Political. The results were… problematic. I could’ve used them to write a great essay on how actors in The Forever War construct the Taurans as a threat in order to advance their own political interests. Which sounds great. Except that’s not actually Schmitt’s argument about the friend/enemy distinction.

ChatGPT did relatively better on “compare and contrast” essays. I used the same procedure to try to create an essay that compares The Dispossessed to The Player of Games. This is not a common juxtaposition in science-fiction scholarship or science-fiction online writing, but it’s extremely easy to the two works in conversation with one another. ChatGPT generated topics and outlines that picked up on that conversation, but in a very superficial way. It gave me what I consider “high-school starter essays,” with themes like ‘both works show how an individual can make a difference’ or ‘both works use fictional settings to criticize aspects of the real world.’

Now, maybe my standards are too high, but this is the level of analysis that leaves me asking “and?” Indeed, the same is true of example used in the essay: it’s very Cliff’s Notes. Now, it’s entirely possible to get “deeper” analysis via ChatGPT. You can drill down on one of the sections it offers in a sample outline; you can ask it more specific prompts. That kind of thing.

At some point, though, this starts to become a lot of work. It also requires you to actually know something about the material.

Which leads me to my fourth reaction: I welcome some of what ChatGPT does. It consistently provides solid “five-paragraph essay” outlines. I lose track of how many times during any given semester I tell students that “I need to know what your argument is by the time I finish your introduction” and “the topic of an essay is not its argument.” ChatGPT not only does that, but it also reminds students to do that.

In some respects, ChatGPT is just doing what I do when students me with me about their essays: helping them take very crude ideas and mold them into arguments, suggesting relevant texts to rope in, and so forth. As things currently stand, I think I do a much better job on the conceptual level, but I suspect that a “conversation” with ChatGPT might be more effective at pushing them on matters of basic organization.

Fifth, ChatGPT still has a long way to go when it comes to the social sciences—or, at least International Relations. For essays handling generic 101 prompts it did okay. I imagine students are already easily using it to get As on short essays about, say, the difference between “balance of power” and “balance of threat” or on the relative stability of unipolar, bipolar, and multipolar systems.

Perhaps they’re doing so with a bit less effort than it would take to Google the same subjects and reformulate what they find in their own words? Maybe that means they’re learning less? I’m not so sure.

The “superficiality” problem became much more intense when I asked it to provide essays on recent developments in the theory and analysis of power politics. When I asked it for suggestions for references, at least half of them were either total hallucinations or pastiches of real ones. Only about a quarter were actually appropriate, and many of these were old. Asking for more recent citations was a bust. Sometimes it simply changed the years.

I began teaching in the late 1990s and started as a full-time faculty member at Georgetown in 2002. In the intervening years, it’s becoming more and more difficult to know what to do about “outside sources” for analytical essays.

I want my students to find and use outside articles—which now means through Google Scholar, JSTOR, and other databases. But I don’t want them to bypass class readings for (what they seem to think are) “easier” sources, especially as many of them are now much more comfortable looking at a webpage than with reading a PDF. I would also be very happy if I never saw another citation to “journals” with names like ProQuest and JSTOR.

I find that those students who do (implicitly or explicitly) bypass the readings often hand in essays with oddball interpretations of the relevant theories, material, or empirics. This makes it difficult to tell if I’m looking at the result of a foolish decision (‘hey, this website talks about this exact issue, I’ll build my essay around that’) or an effort to recycle someone else’s paper.

The upshot is that I don’t think it’s obvious that LLMs are going to generate worse educational outcomes than we’re already seeing.

Which leads me to the sixth issue, which is where do we go from here. Needless to say, “it’s complicated.”

The overwhelming sentiment among my colleagues is that we’re seeing an implosion of student writing skills, and that this is a bad thing. But it’s hard to know how much that matters in a world in which LLM-based applications take over a lot of everyday writing.

I strongly suspect that poor writing skills are still a big problem. It seems likely that analytic thinking is connected to clear analytic writing—and that the relationship between the two is often both bidirectional and iterative. But if we can harness LLMs to help students understand how to clearly express ideas, then maybe that’s a net good.

Much of the chatter that I hear leans toward abandoning—or at least deemphasizing—the use of take-home essays. It means, for the vast majority of students, doing their analytic writing in a bluebook under time pressure. It’s possible that makes strong writing skills even more important, as it deprives students of the ability to get feedback on drafts and help with revisions. I’m not sure it helps to teach those skills, and it will bear even less resemblance to any writing that they do after college or graduate school than a take-home paper does.

(If that’s the direction we head in, then I suppose more school districts will need to reintroduce (or at least increase their emphasis on) instruction in longhand writing. It also has significant implications for how schools handle student accommodations; it could lead students to more aggressively pursue them in the hope of evading rules on the use of ChatGPT, which could in turn reintroduce some of the Orwellian techniques used to police exams during the height of the pandemic).

For now, one of the biggest challenges to producing essays via ChatGPT remains the “citation problem.” But given various workarounds, professors who want to prevent the illicit use of ChatGPT probably already cannot pin their hopes on finding screwy references. They’ll need to base more of their grading not just on whether a student demonstrates the ability to make a decent argument about the prompt, but on whether they demonstrate a “deeper” understanding of the logic and content of the references that they use. Professor will probably also need to mandate, or at least issue strict directions about, what sources students can use.

(To be clear, that increases the amount of effort required to grade a paper. I’m acutely aware of this problem, as I already take forever to mark up assignments. I tend to provide a lot of feedback and… let’s just say that it’s not unheard of for me to send a paper back to a student many months after the end of the class.)

We also need to ask ourselves what, exactly, is the net reduction in student learning if they read both a (correct) ChatGPT explanation of an argument and the quotations that ChatGPT extracts to support it. None of this strike me as substantively all that different from skimming an article, which we routinely tell students to do. At some level, isn’t this just another route to learning the material?

AI enthusiasts claim that it won’t be long before LLM hallucinations—especially those involving references—become a thing of the past. If that’s true, then we are also going to have to reckon with the extent that the use of general-purpose LLMs creates feedback loops that favor some sources, theories, and studies over others. We are already struggling with how algorithms, including those generated through machine-learning, shape our information environment on social-media platforms and in search engines. Google scholars’ algorithm is already affecting the citations that show up in academic papers, although here at least academics mediate the process.

Regardless, how am I going to approach ChatGPT in the classroom? I am not exactly sure. I’ve rotated back into teaching one of our introductory lecture courses, which is bluebook-centered to begin with. The other class, though, is a writing-heavy seminar.

In both my class I do intend to at least talk about the promises and pitfalls of ChatGPT, complete with some demonstrations of how it can go wrong. In my seminar, I’m leaning toward integrating it into the process and requiring that students hand in the transcripts from their sessions.

What do you think?

The Duck of Minerva

LIVE recording of Whiskey & IR Theory at BISA 2023

By: Dan Nexon — May 12^th 2023 at 15:32

Don’t miss the live recording of episodes 32 and 33 of Whiskey & IR Theory on June 21, 2023, starting at 3pm. We’ll be taping at the BISA annual conference. Rumors suggest that there may be whisky for tasting and schwagg for… something.

Episode 32 will be in “classic format.” We’ll discuss Robert Cox’s classic 1981 article, “Social Forces, States and World Orders: Beyond International Relations Theory.”

Episode 33 will be a “whiskey optional” on status and international-relations theory.

BISA attendees should register in advance for one or both of the special sessions.

The Duck of Minerva

Deterrence can never fail, it can only be failed

By: Dan Nexon — April 8^th 2023 at 16:01

The government of a country makes explicit or implicit threats to another: “if you cross this line, we will inflict harm upon you.” The threat fails; the government crosses the designated line. Has deterrence failed?

Well, yes. Of course. By definition. It is, for example, unequivocally true that the United States did not deter Russia from invading Georgia in 2008, nor Ukraine in 2014, nor Ukraine (again) in 2022. Should you have any doubts about this, you can always go read a nearly four-thousand word Foreign Policy article on the subject.

I agree with its authors, Liam Collins and Frank Sobchak, that U.S. policymakers made a number of mistakes in handling Russia. Trump’s rhetoric concerning NATO, Russia, and Ukraine did not exactly help make U.S. deterrence credible; then again, Trump wasn’t in office when Putin ordered the invasion. In retrospect, Obama’s decision to withhold lethal aid from Ukraine was probably mistake, as not much seemed to happen when the Trump administration reversed course. But do we really think that providing more javelins in 2015 or 2016 would have deterred Putin’s invasion?

Apparently, yes. For Collins and Subchak, Washington’s failure to deter Russia means that U.S. policymakers should, ipso facto, have adopted a more hardline policy toward Russia. But much like the opposite claim—that Georgia and Ukraine “prove” that the U.S. should have adopted a more accommodating approach toward Russia, for example, by not expanding NATO—we’re looking at reasoning that is less “ipso facto” than “post hoc ergo propter hoc.”

That is, just because X preceded Y does not mean X caused Y. In the context of policy analysis we might add that just because Y is bad doesn’t mean Y’ would be better.

Sometimes, X isn’t even X. The fact that ‘deterrence failed’ doesn’t imply that any attempt to accommodate Russia was a capitulation to Moscow. Sometimes the opposite is true.

For instance, Collins and Sobchak argue that Ukraine shows the folly of Obama’s decision to cancel the “Third Site” anti-ballistic missile system, which involving placing radar in the Czech Republic and interceptors in Poland.

But the Obama administration replaced the “Third Site” with the European Phased Adaptive Approach (EPAA), which (as the Russians soon figured out) was easier for the United States to upgrade into the kind of system Moscow worried about. EPAA also entailed eventual deployments in Romania; Obama committed to stationing Patriots on Polish territory, as well “left open the door to stationing new types of missile defense interceptors in Poland, an offer the Poles later agreed to accept.” Moreover, at the Wales NATO summit Obama convinced NATO to affirm that missile defense was part of its collective mission.

Given all of this, it seems bizarre to claim, as Richard Minter did in 2014, that after “Obama delayed deployment of missile defenses in Eastern Europe, Putin knew he had a free hand to reassemble the old Soviet Union piece-by-piece. Invading his neighbors would now be cost free.”

Now, Collins and Sobchak don’t write anything quite so ridiculous. But they sometimes land come within striking distance.

Consider the very opening of the article, which discusses the U.S. response to the Russia-Georgia war:

Recall the aftermath of the 2008 invasion of Georgia. The Bush administration airlifted Georgian soldiers serving in Iraq back to Georgia to fight, provided a humanitarian aid package, and offered tersely worded denouncements and demarches. But it categorically rejected providing Georgia with serious military assistance in the form of anti-tank missiles and air defense missiles and even refrained from implementing punishing economic sanctions against Russia. The United States’ lack of resolve to punish Russia for its gross violation of international law was underscored when U.S. National Security Advisor Stephen Hadley’s remark “Are we prepared to go to war with Russia over Georgia?”—made during a National Security Council meeting after the war started—was later released to the media.

Keep in mind that they’re talking about an effort to proved anti-tank missiles and air-defense systems during a war that lasted five days—one in which Russia systematically annihilated the shiny systems that the United States and its partners had previously provided. If the argument is that the United States should have given Georgia anti-tank weapons or air-defense missiles after the conflict, then (while that might have been a good idea) it’s not clear to me how that would’ve signaled U.S. resolve.

(Stephen Hadley’s remark first appeared, if I remember correctly, in Ron Asmus’ book about the Georgia war. So the passive voice is definitely doing some work here. At the time, Hadley refused to comment on the specific quotation but did confirm that the Bush administration decided that the risks of using force outweighed the benefits. This “revelation” shouldn’t have surprised anyone, including Moscow, since, you know, the United States did not, in fact, use force. What’s particularly strange about this example is that it’s backwards. What surprised people was the extent of support within the administration for a more aggressive response. The headline of the Politico article that I linked to above wasn’t “The United States didn’t risk war for Georgia.” It was “U.S. pondered military use in Georgia.”)

It is not obvious that the United States could have secured support for, say, more punishing sanctions. The Georgia War did not deter France from closing a deal to sell two Mistral-class helicopter carriers to Russia. Paris only cancelled that sale after the 2014 invasion of Ukraine, when Hollande (rather than Sarkozy) was president (interesting side note here).

But, as is typical for this genre, the article never seriously considers either the viability or the downside risks of alternative policies. This is… problematic… given that it is very difficult to assess what the world would like after fifteen years of concatenating changes produced by different policy decisions.

None of this means that we shouldn’t evaluate past policies and work through conterfactuals. That’s a crucial element of policy analysis, social-scientific inquiry, and policymaking, Collins and Sobchak, like too many others, don’t even do the bare minimum—in their case, despite writing a piece that runs as long as a short academic article in International Relations.

That failure is particularly pernicious when an obviously “bad outcome” makes it easy to gloss over. In fact, the last sentence of Collins and Sobchak’s article gives the game away:

The sad irony is that U.S. leaders, of both parties, chose to avoid deterrence for fear of escalating conflict—only to find themselves continually escalating their support once conflict started. Time after time, the United States chose the option that was perceived as the least provocative but that instead led to the Russians becoming convinced that they were safe to carry out the most provocative action of all: a full-scale invasion of Ukraine.

The United States ignored the eternal wisdom of the Latin phrase Si vis pacem, para bellum (“If you want peace, prepare for war”) and instead hoped that half-steps and compromise would suffice. While so far those decisions have prevented direct conflict between two nuclear-armed superpowers, they have caused Russia and the West to be locked in a continuing series of escalations with an increasing danger of a miscalculation that could lead to exactly that scenario.

The Duck of Minerva

The Best Propaganda is True

By: Dan Nexon — February 14^th 2023 at 19:54

David Pierson at The New York Times:

While many in the world see the Chinese spy balloon as a sign of Beijing’s growing aggressiveness, China has sought to cast the controversy as a symptom of the United States’ irrevocable decline.

Why else would a great power be spooked by a flimsy inflatable craft, China has argued, if not for a raft of internal problems like an intensely divided society and intractable partisan strife driving President Biden to act tough on Beijing?

This gets at why “the balloon incident” genuinely scares me; it suggests that the United States is doomed to the ratchet effects of domestic outbidding over who can be “tougher” on China.

That’s a recipe for threat inflation, poor policy decisions, and (even more) toxic domestic politics.

The Duck of Minerva

Episode 27: Everything is Relational

By: Dan Nexon — February 1^st 2023 at 17:42

It’s a nostalgia episode for our two hosts, Patrick and Dan.

They tackle Mustafa Emirbayer’s 1997 article in the American Journal of Sociology, “Manifesto for a Relational Sociology.” According to Emirbayer, “Sociologists today are faced with a fundamental dilemma: whether to conceive of the social world as consisting primarily in substances or processes, in static ‘things’ or in dynamic, unfolding relations.”

Was that also true of International Relations? PTJ and Dan certainly thought so back in 1999.
Is it still true today? The two may or may not answer this question, but they do work through Emirbayer’s article in no little detail.

Additional works alluded to in this podcast include Bhaskar, A Realist Theory of Science (1975); Emirbayer and Goodwin, “Network Analysis, Culture, and the Problem of Agency” (1994); Emirbayer and Mische, “What is Agency” (1998); Mann, The Sources of Social Power, Volume II (1993); Pratt, “From Norms to Normative Configurations: A Pragmatist and Relational Approach to Theorizing Normativity in IR” (2020); Sommers, “The Narrative Constitution of Identity: A Relational and Network Approach” (1994); Tilly, Durable Inequality (1998); and Wiener, Contestation and Constitution of Norms in Global International Relations (2018).

https://www.podomatic.com/podcasts/whiskeyindiaromeo/episodes/2023-01-29T14_48_01-08_00

FreshRSS

Half-Baked Thoughts on ChatGPT and the College Essay

LIVE recording of Whiskey & IR Theory at BISA 2023

Deterrence can never fail, it can only be failed

The Best Propaganda is True

Episode 27: Everything is Relational