FreshRSS

🔒
❌ About FreshRSS
There are new available articles, click to refresh the page.
Before yesterdayTom Pepinsky

ChatGPT-4 Knows Malaysia’s Ethnic Order

I am writing a book about Malaysia’s ethnic order*, using the Malay world as a way to think more generally about social categories like ethnicity and how to conceptualize them. This book project combines all of my interests: maritime Southeast Asia, political economy, colonial history (Portuguese, Dutch, English, and Japanese), language, analytic philosophy, social ontology, diaspora studies, criticizing things for not really making sense, trying to subvert paradigms, making custom maps, and statistics. The last of these is the subject of this post.

One of the tasks of my book is to demonstrate that my understanding of Malaysia’s ethnic order is not simply a projection of my own biases onto Malaysian history, society, and culture. To do this, I’ve been using survey data collected from Malaysians (and Indonesians… although that’s not really relevant for today’s post) to try to characterize the country’s ethnic order. An ethnic order, for me, is

the set of beliefs, practices, and ancillary institutions that organize and sustain ethnic categories within a given political community

Such an abstract concept is hard to characterize, and hard to survey people about without supplying exactly the answers that I hope that they will provide (thus invalidating the exercise). So what I have done is to survey Malaysians and ask them questions about what comes to mind when they think of various ethnic groups in the country. They provided open-ended responses, which I analyzed through a Structural Topic Model to estimate the relative salience of different types of responses for different types of ethnic groups.

The result is an analysis that says that, for example, when responses are generally focusing on things like language and religion, they are more likely to be talking about Malays versus Chinese. When they’re talking about economic matters, they are more likely to be talking about Chinese than Indians. And so forth. When the book is done, it will explain why such results are consistent with my particular approach to understanding Malayness in Malaysia.**

But with ChatGPT-4 and other large language models (LLMs) completely blowing up everything, it occurred to me this morning that one might wonder what the computer says about my data. I was inspired to do this by reading this paper from Petter Törnberg, which used ChatGPT-4 to predict what the partisan affiliated with the authors of tweets. I’m reporting here what I found from some initial playing around, and it is remarkable, scary, and open to a lot of interpretation (more on this at the end).

Here is what I did: I logged into ChatGPT-4 and opened up my file containing the open-ended survey responses. I then typed the following at the prompt:

I asked a Malaysian survey respondent the following question:

Now I am going to ask you about several different ethnic groups. I would like to you to tell me two (2) things that come to mind when you think about these groups. There is no right or wrong answer; you may think of particular words or phrases, or perhaps nothing at all.

I asked this about ethnic Malays, ethnic Chinese, and ethnic Indians in Malaysia. Imagine that the respondent gave the response “kurang rajin Islam.” Which of the three ethnic groups above do you think this response would be associated with?

The words kurang rajin Islam are the responses that one survey respondent actually gave. It is a combination of two responses, because I asked the respondent this question twice (one response was kurang rajin and the other was Islam). After hitting Enter, the ChatGPT-4 spit out the following:

Based on the response “kurang rajin Islam,” it is likely associated with the ethnic Malay group in Malaysia.

“Kurang rajin” translates to “less diligent” in English, and “Islam” refers to the religion of Islam. The phrase suggests a perception or association related to the Malay community and their commitment or dedication to their religious practices. It’s important to note that this response reflects the perspective of the individual respondent and should not be taken as a generalization for the entire Malay ethnic group.

This is actually bananas. The computer correctly identified which group that response was associated with. And it kept doing it. I asked

Imagine that the respondent gave the response “pemegang ekonomi.” Which of the three ethnic groups above do you think this response would be associated with?

And the computer said

The response “pemegang ekonomi” translates to “economic holders” or “economic owners” in English. Given this response, it is likely associated with the ethnic Chinese group in Malaysia.

The phrase “pemegang ekonomi” suggests a perception or association related to the economic prowess or prominence of the ethnic Chinese community in Malaysia. It reflects the recognition of their significant contributions to the economic sector in the country. However, it is important to note that this response represents the viewpoint of the individual respondent and should not be taken as a generalization for the entire ethnic Chinese group.

It gets worse, but I’ll spare you. And although it doesn’t always work, these initial forays show that LLMs are remarkably accurate at predicting what group a short survey response was referring to in this specific example. You can think of a lot of other ways that one might study social categories using these AI tools.

I don’t know exactly what to do with these results, but I do know that they’re getting at something significant. For one, the AI developers are mostly trying to rein in LLMs to prevent them from saying racist, sexist, and otherwise biased things. But here is a case where we have the reverse: actual statements (and rather explicitly biased in both cases) by real humans that can be classified according to the best guess of who is being referred to. How can this be stopped?

But I think that there’s another level at which to think about this result. That is, knowing that ChatGPT-4 is not thinking, but rather just doing a super-duper prediction model of what words are most likely to be sensible responses to other words, what could one learn from this? Is this evidence of something, and if so, of what? What people think? What people have written down? What Malaysians have written down? What researchers or journalists have written down?

There are even more levels. What about the coarseness of my own prompt (I wrote Malaysia, when I really mean peninsular Malaysia)? Should I give feedback on these results, thus contributing to reinforcement learning? If I write up these results, would I then be further contributing to stereotypes by producing a text that will feed into the same LLMs to produce even more such results? What if my own ethical goals are not to reinforce, but rather to undermine the present social order?

I don’t have any great answer to these questions. I am a pessimist about what LLMs will do to human society who nevertheless marvels at what they can do sometimes.

NOTES

* You can think of an ethnic order as like a “racial order.”

** But not Indonesia. That’s important. And also, not really even Sabah/Sarawak, this is about peninsular Malaysia. That’s also important.

tompepinsky

Fuck _____?

Two of the most interesting articles I have read in the past decade share a striking turn of phrase in their titles.

  1. Fuck Chineseness, by Allen Chun
  2. Fuck Nuance, by Kieran Healy

The titles were obviously chosen because they grab your attention. But they also introduce a stark critical argument about a concept (“Chineseness”, “nuance”) that is often taken to be beyond criticism. Each piece is worth a serious ponder.

What I want to know, however, is whether these two titles share a deeper connection. Is the title construction “Fuck ____” a reference to some other author or piece, from an earlier classic of social theory or something like it? Does a reader who is “in the know” catch this reference, and know to interpret the piece that follows with a particular understanding of what it’s meant to connote? I simply don’t know.

In digging around a little bit to figure out where these titles came from, I happened across a syllabus explorer that allows you to search for articles with “fuck” in their title. Click here to see all the results, in all their glory. Those pieces seem interesting all around, but none of them share this particular turn of phrase. I also searched JSTOR for articles with “fuck” in their title, and nothing predates Chun’s 1996 piece.

So the question remains: is this an interesting parallel, or a reference to something else?

tompepinsky

Learning about Nguyen the Accomplished

One of the privileges of teaching Southeast Asian Politics is how it helps me to discover new facts, new anecdotes, and new perspectives on old subjects. Sometimes I learn these things from my students, sometimes I learn these things as a result of trying to answer a question from students.

Several years ago, for example, I learned about the Joyoboyo prophesy while trying to answering the question from a student about what do Indonesians learn about the Japanese occupation during WWII? My recent lectures on modern Vietnamese politics have provided me with another good one.

Here is a picture taken from a Vietnamese school. Thanks to one solid year of Vietnamese language in graduate school Google Translate, I can get a sense of the conversation.

It relates the story of an enterprising young man in colonial-era Annam (French Indochina) named Nguyễn Tất Thành (that name is what is obscured by the head on the left… written Ng Tất Th___). He is speaking with his friend Pear (bạn = a familiar term of address, = pear) about his desire to go overseas to see the world and use his experiences to help the Vietnamese people:

Tôi muốn đi ra nước ngoài, xem nước Pháp và các nước khác. Sau khi xem xét họ làm như thế nào, Tôi sẽ trở về giúp đồng bào chúng ta.

I want to go abroad, to see France as well as other countries. After seeing how they do it, I will return to help our compatriots.

But look at Mr. Nguyễn’s hands. Part of the story I have been told is that Mr. Nguyễn responded to the question “how will you earn money to do this?” by responding “with my hands—my hands are my money.” It is remarkable to see that those hands are stamped with dollar signs rather than the đồng symbol () to signify this point.

The story also becomes more evocative when you realize that Nguyễn Tất Thành is a sobriquet that means “Nguyen the Accomplished.” And even more so once you realize that this same Nguyen was later known to the world as Ho Chi Minh.

tompepinsky

Bailing Out Stablecoins

This morning we are learning the details of the U.S. government’s decision to bail out depositors at Silicon Valley Bank, a failed bank that specialized in dealing with tech startups and venture capital. SVB got into trouble by going long on treasuries in a low interest rate environment. We are no longer in a low interest rate environment.

Federal deposit insurance is, in my view, one of the true miracles of modern American capitalism. It is a remarkable tool for ensuring financial stability and it is one of the fundamental prerequisites for making fractional reserve banking possible. It protects small investors and savers at minimal cost to the American taxpayer.

The bailout that was announced is much more than deposit insurance. The details aren’t yet fully known but if the headlines are accurate, all depositors will be made whole, regardless of whether their deposits exceeded the $250,000 FDIC limit. As it turns out, the vast majority of SVB deposits were far in excess of that. Much of that was in medium-sized business accounts used for rent, payrolls, and so forth.

We can quibble about whether or not it is a good idea to bail out those institutions; I am skeptical that it is a good idea to do that as a blanket policy, especially given that many of those institutions have revenue streams that far exceed their cash on hand. But regardless of those cases, regulators should draw a line at bailing out one of SVB’s biggest individual depositors: the stablecoin known as USDC.

USDC is run by Circle Internet Financial Ltd. As a stablecoin, it promises to maintain a 1:1 exchange rate with the US dollar. It has delivered on that promise during a period of economic growth, but it was forced to drop that peg when SVB collapsed.

Why is that? Because Circle had $3.3 billion in reserves at SVB. That’s right: their holdings were 13,000 times the maximum amount ensured by the FDIC. Bailing out Circle, and with it USDC, requires serious money.

What is the problem with bailing out this depositor? Why does it require special scrutiny? The answer is because the depositors who run stablecoins use their deposits differently than any other depositors do.

Stablecoins are cryptocurrency. They promise to deliver all of the benefits of fiat currency without government interference or regulation. The idea is that the computer can use blockchain technology and algorithmic trading strategies to produce an alternative to the dollar without relying on the mechanism of a central bank. Stablecoins are interesting because they promise, in the best case scenario, to reproduce something that I can use already: a US dollar, which is firmly and irrevocably pegged to its own value already.* The benefit of a stablecoin like USDC over the US dollar is that it enables transactions beyond the reach of the normal payments system, which can be used for activities like buying illegal things, pretending that you’re beyond the reach of the U.S. government, and speculating about our techno-future.

As it turns out, stablecoins are only stable if people believe that they are stable.** And people will only believe that they are stable if they are backed by something—that is, collateralized. That is what Circle’s SVB deposits were: a large pot of money that comprised some of its collateral to defend USDC’s value. When that money became uncertain, USDC had to drop its peg. It was no longer a stablecoin because its collateral was in question.

The problem become clear when we put all of the pieces together: SVB was acting as a central bank for USDC. Ensuring Circle’s deposits means that the US government is now the central bank for USDC.

Now, the politics that will follow from this promise to be very interesting. Crypto defenders will not want to notice that crypto got a government bailout to ensure its value. Crypto opponents might not want to acknowledge that crypto just got a big signal that it is too big to fail.

The US government bails out financial institutions from time to time. And the Fed is our central bank, after all. But it does so in exchange for the ability to regulate those financial institutions, and to take ownership over their assets when their activities threaten systemic financial stability. That is what the US government should do now in response to USDC. Otherwise, the US government should allow USDC to try to use its algorithms to save itself, which is the whole point of stablecoins in the first place.

Either the US government is the central bank that guarantees the value of USDC, or it is not. There is no third option.

NOTES

* This is a tautology, but it is on purpose.

** In the words of an old friend, decentralized algorithmic stablecoins are impossible. See more here. And here.

tompepinsky

When AI Does Statistics For Us, Will We Know Any Better?

There has been a lot of commentary about new and more powerful AI algorithms like ChatGPT, and how they will change how to write college essays, job applications, and so forth. I don’t think that enough people are paying attention to the fact that these AI algorithms can also write code.

I am not a computer programmer or a coding expert. But I have noticed that ChatGPT can answer questions about statistical programming pretty effectively. I can ask it how to work with ggplot2, for example, and it helps me to avoid silly errors. I had the thought recently that it might also be able to help me write code in a programming language that I don’t actually know, like Python.

Here is an example that I can imagine happening already. I have asked ChatGPT to write code for me. All I know is that I want to do logistic regression in Python using NumPy. I don’t know how to actually do this, but I typed the following into the prompt: Write code in Python to perform a logistic regression with a dataset named “data” and a dependent variable called “outcome” using numpy

And here is what I got:

import numpy as np

from numPy import logistic

# Split data into train and test sets

X_train, X_test, y_train, y_test = train_test_split(data, outcome, test_size=0.2, random_state=0)

# Fit the logistic regression model

log_reg = logistic.LogisticRegression(C=1e5)

log_reg.fit(X_train, y_train)

# Make predictions on the test set

y_pred = log_reg.predict(X_test)

# Compute accuracy

accuracy = log_reg.score(X_test, y_test)

print('Accuracy: {}'.format(accuracy))

A couple of remarks are in order. First, this code will work [EDIT: No it won’t. Needs more things to actually run. So long as you can figure out those,] if you need to do logits in Python, for whatever reason, you can just do them now. Second, this code is commented—it explains what the code is doing. (Automatic commenting is a very interesting computer science problem, one that AI is already being applied to.)

Third and most importantly, though, this is not the only possible response to the prompt. It is a response that smuggles in a lot of implicit decisions, and even assumptions about the data that you have and the use that you imagine for them. By this I mean, not every logit model has prediction accuracy as the objective. This code, though, presumes that that is your objective.

I came to this question as part of a conversation with some college friends about the future of the humanities, in response to the New Yorker essay that everyone is talking about. The idea is that people want marketable skills from college. But as this crowd of friends includes both professors like me and computer scientists with decades of professional coding experience, there was a deeper conversation about what sorts of marketable skills will still be marketable over a timespan of more than the next five years or so. What happens to coding-focused majors when computers can do lots of the coding themselves?*

I’ll conclude with a reflection. On my one serious visit to Silicon Valley, I spent the day mostly drinking free club sodas and flavored kombuchas at a FAAMG headquarters and just talking to people. That was a special weekend for a lot of reasons. But what I remember most from those conversations were the hints that “the singularity” was coming: for them, that was the coding invention that put coders out of business. They used this to explain why their children were getting violin lessons and tutoring in French, which I thought was precious at the time because it reflected a level of privilege and possibility that seemed entirely out of reach for anyone who wasn’t in that part of our new tech ecosystem. Maybe they were right, though, and maybe all of us will need to wrestle with these implications.

NOTE

* I would be remiss if I didn’t acknowledge that in some ways, this is just the latest “get off my lawn” complaint about how technology is replacing understanding by automating what used to be done manually. I probably would know more about statistics if I had to use punchcards and code up an optimizer rather than just typing logit y x into Stata.

I will happily concede this. But fast computers did put most people whose careers depended on punchcards out of business, so the analogy holds.

tompepinsky

❌