ChatGPT: Post-ASU+GSV Reflections on Generative AI

By: Michael Feldstein

The one question I heard over and over again in hallway conversations at ASU+GSV was “Do you think there will be a single presentation that doesn’t mention ChatGPT, Large Langauge Models (LLMs), and generative AI?”

Nobody I met said “yes.” AI seemed to be the only thing anybody talked about.

And yet the discourse sounded a little bit like GPT-2 trying to explain the uses, strengths, and limitations of GPT-5. It was filled with a lot of empty words, peppered in equal parts with occasional startling insights and ghastly hallucinations.

That lack of clarity is not a reflection of the conference or its attendees. Rather, it underscores the magnitude of the change that is only beginning. Generative AI is at least as revolutionary as the graphical user interface, the personal computer, the touch screen, or even the internet. Of course we don’t understand the ramifications yet.

Still, lessons learned from GPT-2 enabled the creation of GPT-3 and so on. So today, I reflect on some of the lessons I am learning so far regarding generative AI, particularly in EdTech.

Generative AI will destroy so we can create

Most conversations on the topic of generative AI have the words “ChatGPT” and “obsolete” in the same sentence. “ChatGPT will make writing obsolete.” “ChatGPT will make programmers obsolete.” “ChatGPT will make education obsolete.” “ChatGPT will make thinking and humans obsolete.” While some of these predictions will be wrong, the common theme behind them is right. Generative AI is a commoditizing force. It is a tsunami of creative destruction.

Consider the textbook industry. As long-time e-Literate readers know, I’ve been thinking a lot about how its story will end. Because of its unusual economic moats, it is one of the last media product categories to be decimated or disrupted by the internet. But those moats have been drained one by one. Its army of sales reps physically knocking on campus doors? Gone. The value of those expensive print production and distribution capabilities? Gone. Brand reputation? Long gone.

Just a few days ago, Cengage announced a $500 million cash infusion from its private equity owner:

“This investment is a strong affirmation of our performance and strategy by an investor who has deep knowledge of our industry and a track record of value creation,” said Michael E. Hansen, CEO, Cengage Group. “By replacing debt with equity capital from Apollo Funds, we are meaningfully reducing outstanding debt giving us optionality to invest in our portfolio of growing businesses.”Cengage Group Announces $500 Million Investment From Apollo Funds (prnewswire.com)

That’s PR-speak for “our private equity owners decided it would be better to give us yet another cash infusion than to let us go through yet another bankruptcy.”

What will happen to this tottering industry when professors, perhaps with the help of on-campus learning designers, can use an LLM to spit out their own textbooks tuned to the way they teach? What will happen when the big online universities decide they want to produce their own content that’s aligned with their competencies and is tied to assessments that they can track and tune themselves?

Don’t be fooled by the LLM hallucination fear. The technology doesn’t need to (and shouldn’t) produce a perfect, finished draft with zero human supervision. It just needs to lower the work required from expert humans enough that producing a finished, student-safe curricular product will be worth the effort.

How hard would it be for LLM-powered individual authors to replace the textbook industry? A recent contest challenged AI researchers to develop systems that match human judgment in scoring free text short-answer questions. “The winners were identified based on the accuracy of automated scores compared to human agreement and lack of bias observed in their predictions.” Six entrants met the challenge. All six were built on LLMs.

This is a harder test than generating anything in a typical textbook or courseware product today.

The textbook industry has received ongoing investment from private equity because of its slow rate of decay. Publishers threw off enough cash that the slum lords who owned them could milk their thirty-year-old platforms, twenty-year-old textbook franchises, and $75 PDFs for cash. As the Cengage announcement shows, that model is already starting to break down.

How long will it take before generative AI causes what’s left of this industry to visibly and rapidly disintegrate? I predict 24 months at most.

EdTech, like many industries, is filled with old product categories and business models that are like blighted city blocks of condemned buildings. They need to be torn down before something better can be built in their place. We will get a better sense of the new models that will rise as we see old models fall. Generative AI is a wrecking ball.

“Chat” is conversation

I pay $20/month for a subscription to ChatGPT Plus. I don’t just play with it. I use it as a tool every day. And I don’t treat it like a magic information answer machine. If you want a better version of a search engine, use Microsoft Bing Chat. To get real value out of ChatGPT, you have to treat it less like an all-knowing Oracle and more like a colleague. It knows some things that you don’t and vice versa. It’s smart but can be wrong. If you disagree with it or don’t understand its reasoning, you can challenge it or ask follow-up questions. Within limits, it is capable of “rethinking” its answer. And it can participate in a sustained conversation that leads somewhere.

For example, I wanted to learn how to tune an LLM so that it can generate high-quality rubrics by training it on a set of human-created rubrics. The first piece I needed to learn is how LLMs are tuned. What kind of magic computer programming incantations do I need to get somebody to write for me?

As it turns out, the answer is none, at least generally speaking. LLMs are tuned using plain English. You give it multiple pairs of input that a user might type into the text box and desired output from the machine. For example, suppose you want to tune the LLM to provide cooking recipes. Your tuning “program” might look something like this:

Input: How do I make scrambled eggs?
Output: [Recipe]

Obviously, the recipe output example you give would have a number of structured components, like an ingredient list and steps for cooking. Given enough examples, the LLM begins to identify patterns. You teach it how to respond to a type of question or a request by showing it examples of good answers.

I know this because ChatGPT explained it to me. It also explained that the GPT-4 model can’t be tuned this way yet but other LLMs, including earlier versions of GPT, can. With a little more conversation, I was able to learn how LLMs are tuned, which ones are tunable, and that I might even have the “programming” skills necessary to tune one of these beasts myself.

It’s a thrilling discovery for me. For each rubric, I can write the input. I can describe the kind of evaluation I want, including the important details I want it to address. I, Michael Feldstein, am capable of writing half the “program” needed to tune the algorithm for one of the most advanced AI programs on the planet.

But the output I want, a rubric, is usually expressed as a table. LLMs speak English. They can create tables but have to express their meaning in English and then translate that meaning into table format. Much like I do. This is a funny sort of conundrum. Normally, I can express what I want in English but don’t know how to get it into another format. This time I have to figure out how to express what the table means in English sentences.

I have a conversation with ChatGPT about how to do this. First I ask it about what the finished product would look like. It explains how to express a table in plain English, using a rubric as an example.

OK! That makes sense. Once it gives me the example, I get it. Since I am a human and understand my goal while ChatGPT is just a language model—as it likes to remind me—I can see ways to fine-tune what it’s given me. But it taught me the basic concept.

Now how do I convert many rubric tables? I don’t want to manually write all those sentences to describe the table columns, rows, and cells. I happen to know that, if I can get the table in a spreadsheet (as opposed to a word-processing document), I can export it as a CSV. Maybe that would help. I ask ChatGPT, “Could a computer program create those sentences from a CSV export?”

“Why yes! As long as the table has headings for each column, a program could generate these sentences from a CSV.”

“Could you write a program for me that does this?”

“Why, yes! If you give me the headings, I can write a Python program for you.”

It warns me that a human computer programmer should check its work. It always says that.

In this particular case, the program is simple enough that I’m not sure I would need that help. It also tells me, when I ask, that it can write a program that would import my examples into the GPT-3 model in bulk. And it again warns me that a human programmer should check its work.

ChatGPT taught me how I can tune an LLM to generate rubrics. By myself. Later, we discussed how to test and further improve the model, depending on how many rubrics I have as examples. How good would its results be? I don’t know yet. But I want to find out.

Don’t you?

LLMs won’t replace the need for all knowledge and skills

Notice that I needed both knowledge and skills in order to get what I needed from ChatGPT. I needed to understand rubrics, what a good one looks like, and how to describe the purpose of one. I needed to think through the problem of the table format far enough that I could ask the right questions. And I had to clarify several aspects of the goal and the needs throughout the conversation in order to get the answers I wanted. ChatGPT’s usefulness is shaped and limited by my capabilities and limitations as its operator.

This dynamic became more apparent when I explored with ChatGPT how to generate a courseware module. While this task may sound straightforward, it has several kinds of complexity to it. First, well-designed courseware modules have many interrelated parts from a learning design perspective. Learning objectives are related to assessments and specific content. Within even as simple an assessment as a multiple-choice question (MCQ), there are many interrelated parts. There’s the “stem,” or the question. There are “distractors,” which are wrong answers. Each answer may have feedback that is written in a certain way to support a pedagogical purpose. Each question may also have several successive hints, each of which is written in a particular way to support a particular pedagogical purpose. Getting these relationships—these semantic relationships—right will result in more effective teaching content. It will also contain structure that supports better learning analytics.

Importantly, many of these pedagogical concepts will be useful for generating a variety of different learning experiences. The relationships I’m trying to teach the LLM happen to come from courseware. But many of these learning design elements are necessary to design simulations and other types of learning experiences too. I’m not just teaching the LLM about courseware. I’m teaching it about teaching.

Anyway, feeding whole modules into an LLM as output examples wouldn’t guarantee that the software would catch all of these subtleties and relationships. ChatGPT didn’t know about some of the complexities involved in the task I want to accomplish. I had to explain them to it. Once it “understood,” we were able to have a conversation about the problem. Together, we came up with three different ways to slice and dice content examples into input-output pairs. In order to train the system to catch as many of the relationships and subtleties as possible, it would be best to feed the same content to the LLM all three ways.

Most publicly available courseware modules are not consistently and explicitly designed in ways that would make this kind of slicing and dicing easy (or even possible). Luckily, I happen to know where can get my hands on some high-quality modules that are marked up in XML. Since I know just a little bit about XML and how these modules use it, I was able to have a conversation with ChatGPT about which XML to strip out, the pros and cons of converting the rest into English versus leaving them as XML, how to use the XML Document Type Definition (DTD) to teach the software about some of the explicit and implicit relationships among the module parts, and how to write the software that would do the work of converting the modules into input-output pairs.

By the end of the exploratory chat, it was clear that the work I want to accomplish requires more software programming skill than I have, even with ChatGPT’s help. But now I can estimate how much time I need from a programmer. I also know the level of skill the programmer needs. So I can estimate the cost of getting the work done.

To get this result, I had to draw on considerable prior knowledge. More importantly, I had to draw on significant language and critical thinking skills.

Anyone who ever said that a philosophy degree like mine isn’t practical can eat my dust. Socrates was a prompt engineer. Most Western philosophers engage in some form of chain-of-thought prompting as a way of structuring their arguments.

Skills and knowledge aren’t dead. Writing and thinking skills most certainly aren’t. Far from it. If you doubt me, ask ChatGPT, “How might teaching students about Socrates’ philosophy and method help them learn to become better prompt engineers?” See what it has to say.

(For this question, I used the GPT-4 setting that’s available on ChatGPT Plus.)

Assessments aren’t dead either

Think about how either of the projects I described above could be scaffolded as a project-based learning assignment. Students could have access to the same tools I had: an LLM like ChatGPT and an LLM-enhanced search tool like Bing Chat. The catch is that they’d have to use the ones provided for them by the school. In other words, they’d have to show their work. If you add a discussion forum and a few relevant tutorials around it, you’d have a really interesting learning experience.

This could work for writing too. My next personal project with ChatGPT is to turn an analysis paper I wrote for a client into a white paper (with their blessing, of course). I’ve already done the hard work. The analysis is mine. The argument structure and language style are mine. But I’ve been struggling with writer’s block. I’m going to try using ChatGPT to help me restructure it into the format I want and add some context for an external audience.

Remember my earlier point about generative AI being a commoditizing force? It will absolutely commoditize generic writing. I’m OK with that, just as I’m OK with students using calculators in math and physics once they understand the math that the calculator is performing for them.

Students need to learn how to write generic prose for a simple reason. If they want to express themselves in extraordinary ways, whether through clever prompt engineering or beautiful art, they need to understand mechanics. The basics of generic writing are building blocks. The more subtle mechanics are part of the value that human writers can add to avoid being commoditized by generative AI. The differences between a comma, a semicolon, and an em-dash in expression are the kinds of fine-grained choices that expressive writers make. As are long sentences versus short ones, decisions about when and how often to use adjectives, choices between similar but not identical words, breaking paragraphs at the right place for clarity and emphasis, and so on.

For example, while I would use an LLM to help me convert a piece I’ve already written into a white paper, I can’t see myself using it to write a new blog post. The value in e-Literate lies in my ability to communicate novel ideas with precision and clarity. While I have no doubt that an LLM could imitate my sentence structures, I can’t see a way that it could offer me a shortcut for the kind of expressive thought work at the core of my professional craft.

If we can harness LLMs to help students learn how to write…um…prosaic prose, then they can start using their LLM “calculators” in their communications “physics” classes. They can focus on their clarity of thought and truly excellent communication. We rarely get to teach this level of expressive excellence. Now maybe we can do it on a broader basis.

In their current state of evolution, LLMs are like 3D printers for knowledge work. They shift the human labor from execution to design. From making to creating. From knowing more answers to asking better questions.

We read countless stories about the threat of destruction to the labor force partly because our economy has needed the white-collar equivalent of early 20th-Century assembly line workers. People working full-time jobs writing tweets. Or updates of the same report. Or HR manuals. Therefore our education system is designed to train people for that work.

We assume that masses of people will become useless, as will education, because we have trouble imagining an education system that teaches people—all people from all socio-economic strata—to become better thinkers rather than simply better knowers and doers.

But I believe we can do it. The hard part is the imagining. We haven’t been trained at it. Maybe our kids will learn to be better at it than we are. If we teach them differently from how we were taught.

Likely short-term evolution of the technology

Those of us who are not immersed in AI—including me—have been astonished at the rapid pace of change. I won’t pretend that I can see around corners. But certain short-term trends are already discernable to non-experts like me who are paying closer attention than we were two months ago.

First, generative AI models are already proliferating and showing hints of coming commoditization around the edges. We’ve been given the impression that these programs will always be so big and so expensive to run that only giant cloud companies will come to the table with new models. That the battle will be OpenAI/Microsoft versus Google. GPT-4 is rumored to have over a trillion nodes. That large of a model takes a lot of horsepower to build, train and run.

But researchers are already coming up with clever techniques to get impressive performance out of much smaller models. For example, Vicuña, a model developed by researchers at a few universities, is about 90% as good as GPT-4 by at least one test and has only 12 billion parameters. To put that in perspective, Vicuña can run on a decent laptop. The whole thing. Tt cost $300 to train (as opposed to the billions of dollars that have gone into ChatGPT and Google Bard). Vicuña is an early (though imperfect) example of the coming wave. Another LLM seems to pop up practically every week with new claims about being faster, smaller, smarter, cheaper, and more accurate.

A similar phenomenon is happening with image generation. Apple has quickly moved to provide software support for optimizing the open-source Stable Diffusion model on its hardware. You can now run an image generator program on your Macbook with decent performance. I’ve read speculation that the company will follow up with hardware acceleration on the next generation of its Apple Silicon microchips.

“Socrates typing on a laptop” as interpreted by Stable Diffusion

These models will not be equally good at all things. The corporate giants will continue to innovate and likely surprise us with new capabilities. Meanwhile, the smaller, cheaper, and open-source alternatives will be more than adequate for many tasks. Google has coined a lovely phrase: “model garden.” In the near term, there will be no one model to rule them all or even a duopoly of models. Instead, we will have many models, each of which is best suited for different purposes.

The kinds of educational use cases I described earlier in this post are relatively simple. It’s possible that we’ll see improvements in the ability to generate those types of learning content over the next 12 to 24 months, after which we may hit a point of diminishing returns. We may be running our education LLMs locally on our laptops (or even our phones) without having to rely on a big cloud provider running an expensive (and carbon-intensive) model.

One of the biggest obstacles to this growing diversity is not technological. It’s the training data. Questions regarding the use of copyrighted content to train these models are unresolved. Infringement lawsuits are popping up. It may turn out that the major short-term challenge to getting better LLMs in education may be access to reliable, well-structured training content that is unencumbered by copyright issues.

So much to think about…

I find myself babbling a bit in this post. This trend has many, many angles to think about. For example. I’ve skipped over the plagiarism issue because so many articles have been written about it already. I’ve only touched lightly on the hallucination problem. To me, these are temporary obsessions that arise out of our struggle to understand what this technology is good for and how we will work and play and think and create in the future.

One of the fun parts about this moment is watching so many minds at work on the possibilities, including ideas that are bubbling up from classroom educators and aren’t getting a lot of attention. For a fun sampling of that creativity, check out The ABCs of ChatGPT for Learning by Devan Walton.

Do yourself a favor. Explore. Immerse yourself in it. We’ve landed on a new planet. Yes, we face dangers, some of which are unknown. Still. A new planet. And we’re on it.

Strap on your helmet and go.

The post ChatGPT: Post-ASU+GSV Reflections on Generative AI appeared first on e-Literate.

ChatGPT Wrote This Article and then Totally Stole My Job!

By: Michael Feldstein

As I outlined recently in my “e-Literate’s Changing Themes for Changing Times” post, I am shifting my coverage somewhat. I’ll be developing and calling out tags I use for these themes so that you can go to an archive page on each one. This one will be listed under the “AI/ML” “third-wave EdTech,” and “future of work” tags.

I’ve been fascinated by the rapid progression of ChatGPT article fads:

Look at this weird thing that writes stuff!
I asked ChatGPT a question—and here’s what it answered!!
I asked ChatGPT to write this article—and it totally did!!!
Students could use ChatGPT to student essays write essays!!!! End of the world or totally awesome?????
I asked ChatGPT for suggestions about preventing students from using ChatGPT to cheat—and it gave me five great suggestions (and five terrible ones)!!!!!!

Waitaminute. Let’s back up.

Students finding easy ways to cheat is not exactly a new thing. Remember, “to Chegg” is a verb now. Let’s back up to fad #3. Writers are asking ChatGPT to write their articles, publishing those articles, and then advertising that the articles published under their by-line were written in 30 seconds by a machine.

Do they want to get replaced by an algorithm?

It seems to me we’re thinking about the problem that these algorithms present in the wrong way.

At the moment, ChatGPT is a toy

Language-generating algorithms ChatGPT and their image-generating cousins are toys in both good and bad ways. In a good way, they invite people to play. Anyone, whether or not they know anything about programming, can explore the possibilities and limitations of a certain kind of cutting-edge AI. It’s fun. And, like all good games, you learn by playing. You can learn what they can and can’t do well. How to write a good prompt. And so on. They make AI more accessible and, in a certain way, more understandable.

They’re toys in the bad sense because they weren’t designed for any particular purpose and tend to do weird things for any given task. For example, I tried to get DALL-E 2—ChatGPT’s image-generating sibling—to create a retro comic book cover depicting ChatGPT as a scary vampire. I tried half a dozen different prompts. The best result I got was with “An evil vampire on the cover of a retro comic book called ‘ChatGPT'”. Here’s the best image it came up with:

“An evil vampire on the cover of a retro comic book called ‘ChatGPT'” as interpreted by DALL-E 2

The algorithm does evil comic-book vampires pretty well. Spelling a word that was explicitly spelled for it in the prompt? Not so much.

Remember, a large language model algorithm like the one behind DALL-E 2 and ChatGPT don’t understand anything. It’s taking a bag of words and putting together pieces it found on the internet to construct something that seems like an appropriate response.

ChatGPT, for example, is notoriously bad at math. Because it doesn’t calculate. That’s not what it is designed to do. Its developers have been working on improving this limitation. So I asked it about how good at math it is:

Me: How good are you at math?

ChatGPT: I am able to perform basic arithmetic operations, such as addition, subtraction, multiplication and division. My training data includes mathematical concepts and I have been trained to recognize and respond to arithmetic questions. However, it is important to note that I may make mistakes in complex or unconventional mathematical computations.

Me What is 2794 + 83948?

ChatGPT: 87,742

That answer is incorrect.

So ChatGPT, the miracle of modern technology destined to destroy all education, is not as “smart” as a calculator. Note that you can ask Google or your Apple search prompt the same question and reliably get a correct answer. OpenAI (the makers of OpenAI) are likely trying to combine much simpler but mathematically reliable algorithms with the GPT-3 algorithm that underlies ChatGPT and DALL-E 2. The fact that they haven’t solved this problem yet suggests it is non-trivial. These different types of algorithms can be challenging to mesh. Many of the more impressive applications of artificial intelligence and machine learning are made possible by an ensemble of different algorithms working together, some of which fit together more naturally than others.

Don’t get me wrong; the math problem will be solved. The OpenAI engineers will crack it just as they are now pretty good at preventing ChatGPT from making stock predictions or answering questions about current events when its data set hasn’t been updated since 2021. But it’s a game of whack-a-mole. Because you can ask ChatGPT anything, people do ask it anything. The creators are learning a lot about the questions people ask and what can go wrong with the answers. This new knowledge will help them design more specific solutions. But a general-purpose prompt tool like ChatGPT will be hard to make good at solving any one particular problem.

I’m not convinced that ChatGPT, as it exists today, represents a big leap forward in essay cheating. It has length limitations, has to be fact-checked, can’t produce references, and spits out highly variable quality of reasoning and argumentation. Students would learn more by trying to fix the problems with a ChatGPT-generated draft than they would by going to a traditional essay mill.

Short answer questions are a different matter. ChatGPT is already dangerous in this area. But again, students can already “Chegg” those.

Yes, but…

Could somebody write a better program specifically for writing school essays? Or magazine articles? Yes. That work is already underway.

So what do we do about the essay cheating problem? Let’s start with the two most common answers. We can develop algorithms that detect prose that was written by other algorithms. That too is already underway. So we’ll have yet another flavor of the cheating/anti-cheating arms race that benefits nobody except the arms dealers. The anti-cheating tools may be necessary as one element of a holistic strategy, but they are not the ultimate answer.

Second, we can develop essay-writing prompts and processes that are hard for the algorithms to respond to. This would be useful, partly because it would be good for educators to rethink their stale old assignments and teaching practices anyway. But it’s a lot of often uncompensated work for which the educators have not been trained. And it ends up being another arms race because the algorithms will keep changing.

We miss the point if we respond to language-generating AI as a static threat that might become more sophisticated over time but won’t fundamentally change. ChatGPT is just a friendly way for us to develop intuitions about how one family of these algorithms works at the moment. You’re wrong if you think it is a one-time shock to the system. We’re just at the beginning. The pace of AI progress is accelerating. It is not just going to get incrementally better. It is going to radically change in capabilities at a rapid pace. It will continue to have limitations, but they will be different limitations.

So what do we do?

How about talking to the students?

When adaptive learning hit peak hype, a glib response to teacher hysteria started making the rounds: “If you [teachers] can be replaced by a computer, then you probably should be.”

Doesn’t that apply…um…generally?

If all students learn is how to use ChatGPT to write their essays, why wouldn’t their hypothetical future employer use ChatGPT instead of hiring them? Why would students spend $30K, $40K, $50K, or more a year to practice demonstrating that a free-to-use piece of software does their best work for them? Students need to learn the work these tools can do so they can also understand the work the tools can’t do. Because that is the work the students could get paid for. Technology will make some jobs obsolete, leave others untouched, change some, and create new ones. These categories will continue to evolve for the foreseeable future.

At a time when students are more conscious than ever about the price-to-value of a college education, they ought to be open to the argument that they will only make a decent living at jobs they can do better than the machine. So they should learn those skills. Why learn to write better? So you can learn to think more creatively and communicate that creativity precisely. Those are skills where the primates still have the advantage.

Once we engage students openly and honestly on that point, we will start building a social contract that will discourage cheating and establish the foundational understanding we need for rethinking the curriculum—not just to keep from falling too far behind the tech but to help students get out in front of it. The current limitations of these AI toys demonstrate both the dangers and the potential. Suppose you want to apply the technology to any particular domain. In that case, whether it’s math, writing advertising copy, or something else, you need to understand how the software works and how the human expertise and social or business processes work. Whole echelons of new careers will be created to solve these problems. We will need thinkers who can communicate. Learning how to formulate one’s own thoughts in writing is an excellent way to learn both skills.

Fighting the tech won’t solve the problem or even prevent it from getting worse. Neither will ignoring it. We have to engage with it. And by “we,” I include the students. After all, it’s their futures at risk here.

(Disclaimer: This blog post was written by ChatGPT.)

(I’m kidding, of course.)

(I am able to perform basic humor operations, such as generating dirty limericks and “your momma is so ugly” jokes. My training data includes humorous concepts, and I have been trained to recognize and respond to knock-knock questions. However, it is important to note that I may make mistakes in complex or unconventional humor.)

The post ChatGPT Wrote This Article and then Totally Stole My Job! appeared first on e-Literate.

FreshRSS