Skip to content

The AI Revolution in Medicine: GPT-4 and Beyond

.

Review: The AI Revolution in Medicine offers a compelling analysis of how GPT4 could potentially reshape the healthcare industry. Lee discusses the numerous ways GPT4 can revolutionise healthcare, from aiding in clinical problem-solving to simplifying time-consuming paperwork processes.

The book goes beyond the potential benefits, delving into the ethical and safety considerations of AI integration in healthcare. Lee emphasises the crucial need to verify GPT-4’s output and calls for regulatory approaches to ensure patient safety. As we move towards more comprehensive AI integration, it’s important to address potential risks and make sure these technologies ultimately serve to improve human lives.

It only scratches the surface of AI’s application in healthcare tho, but still it’s a good read for anyone intrigued by the future of healthcare and the challenges that lie ahead as we navigate the ever-evolving world of AI.

Excerpts:

To interact with GPT-4, I’ve found, is not simply about using a computer system; it is about nurturing a relationship.

Chapter 1: First Contact
The AI Revolution in Medicine: GPT-4 and Beyond by Peter Lee et. al

But throughout our investigations of healthcare applications of this system, we encountered real-world situations in which a doctor is struggling, not with a puzzling diagnostic case or a difficult treatment decision, nor the crushing burden of clinical paperwork – though we will see that GPT-4 can really help with those things. But perhaps most important of all, GPT-4 somehow finds a way to help doctors with what we might think of as the most human task a doctor faces: how to talk with a patient. GPT-4 often does so with startling clarity and compassion.

Chapter 1: First Contact
The AI Revolution in Medicine: GPT-4 and Beyond by Peter Lee et. al

Beyond being a conversationalist, beyond being able to reason and solve problems, and beyond possessing medical knowledge, we will see time and again throughout this book that GPT-4 seems able to amplify something about the human experience – our cultures, our emotions, and the importance of social graces. At times, no matter how hard we resist anthropomorphizing an AI system, GPT-4 actually appears to show empathy, becoming a true partner in addressing our healthcare goals.

Chapter 1: First Contact
The AI Revolution in Medicine: GPT-4 and Beyond by Peter Lee et. al

Unlike previous AI systems that were narrowly targeted at specific tasks such as reading radiological images or coding medical notes, a general-purpose AI technology such as GPT-4 will be brought into situations that may require educated guesses or informed judgments. We will see that, in effect, the “triad” of doctor – patient – AI assistant may end up being augmented to be doctor – patient – AI assistant – AI verifier, with the AI verifier being tasked with checking the conclusions and the work not only of the AI assistant, but of the doctor and patient themselves. For example, if the human doctor had written the note, they might have miscalculated the BMI or neglected to make a note of it, so the value of having GPT-4 play a verification role is high even when AI is not used to write the medical note.

Chapter 2: Medicina ex Machina
The AI Revolution in Medicine: GPT-4 and Beyond by Peter Lee et. al

How can we reap its benefits — speed, scale, and scope of analysis — while keeping it subordinate to the judgment, experience, and empathy of human doctors?

Chapter 2: Medicina ex Machina
The AI Revolution in Medicine: GPT-4 and Beyond by Peter Lee et. al

Both Google Translate and Microsoft’s comparable system, Translator, do their translations in a vacuum, devoid of any conversational or cultural context. As a result, they both produce the same translation, which is overly literal and thus incomprehensible. In contrast, GPT-4’s translation connects with the context of the ongoing conversation and a relevant aspect of French culture. This ability to connect goes deep. It encompasses cultural, historical, and social content.

Chapter 3: The Big Question: Does It “Understand?”
The AI Revolution in Medicine: GPT-4 and Beyond by Peter Lee et. al

The Big Question: Does GPT-4 really understand what it is saying? Does GPT-4 come up with its words and ideas intentionally, or are its outputs just the result of a mindless pattern-matching process, just stitching words together without any true understanding? In effect, does GPT-4 understand what it reads and writes?

Chapter 3: The Big Question: Does It “Understand?”
The AI Revolution in Medicine: GPT-4 and Beyond by Peter Lee et. al

Again, GPT-4 seems to show a “mind of its own” by refusing to comply with my request for a yes-or-no answer! It would be possible to probe GPT-4 further in this conversation, asking the system to name the specific ethical frameworks it claims to be using. But it would take several rounds of insistent discussion to coax the system to comply with the request for a yes/no answer.

Having looked at academic research on common-sense reasoning from a cognitive science perspective and moral judgments from a computer science perspective, we now turn to psychology and the concept of belief attribution in “theory of mind” tasks. A new research paper by Ullman in the field of intuitive psychology provides many vignettes of real-world situations designed to show the failure of large language models when simple alterations are made.

Chapter 3: The Big Question: Does It “Understand?”
The AI Revolution in Medicine: GPT-4 and Beyond by Peter Lee et. al

In my months of investigation, I have concluded that tests from the latest scientific research fail to prove that GPT-4 lacks understanding. And in fact, it is quite possible that something truly profound is going on that we do not yet grasp. GPT-4 may possess some type of “understanding” and “thought” that we have not yet identified. The one thing we can say for sure is that GPT-4 is something we have not seen before, and it would be a mistake to dismiss it as “just a large language model.”

Chapter 3: The Big Question: Does It “Understand?”
The AI Revolution in Medicine: GPT-4 and Beyond by Peter Lee et. al

That could be one of GPT-4’s greatest boons for medicine, but its potential risks are also so significant that I’d like to state my conclusion up front: For the foreseeable future, GPT-4 cannot be used in medical settings without direct human supervision.

Chapter 4: Trust but Verify
The AI Revolution in Medicine: GPT-4 and Beyond by Peter Lee et. al

You can see the dilemma developing: In healthcare settings, keeping a “human in the loop” looks like the solution, at least for now, to GPT-4’s less-than-100 percent accuracy. But years of bitter experience with “Dr. Google” and the COVID “misinfodemic” show that it matters which humans are in the loop, and that leaving patients to their own electronic devices can be rife with pitfalls. Yet because GPT-4 appears to be such an extraordinary tool for mining humanity’s store of medical information, there’s no question members of the public will want to use it that way — a lot.

Chapter 5: The AI-Augmented Patient
The AI Revolution in Medicine: GPT-4 and Beyond by Peter Lee et. al

More broadly, Moore sees AI medicine as headed toward a healthcare system where eventually, the only tasks left for physicians like him will be “complex decision-making and relationship management” — plus tasks that require physical contact, of course.

Chapter 5: The AI-Augmented Patient
The AI Revolution in Medicine: GPT-4 and Beyond by Peter Lee et. al

Rodriguez envisions multiple potential uses for GPT-4 and its kind that could help foster more health equity. The new AI could be particularly helpful for producing “literacy-level-appropriate and potentially culturally and linguistically tailored” patient information and important health messages — such as how to manage diabetes at home — at scale, and with interactivity, he said.

Chapter 5: The AI-Augmented Patient
The AI Revolution in Medicine: GPT-4 and Beyond by Peter Lee et. al

Risks aside, Rodriguez most emphasized priorities: If GPT-4 is as game-changing as it seems, he said, the first question for how to use it should be: “Who needs the most help in healthcare?” Ideally, he said, technology developers would say, “This time, we’re going to make sure that marginalized communities are put first.”

Chapter 5: The AI-Augmented Patient
The AI Revolution in Medicine: GPT-4 and Beyond by Peter Lee et. al

While nurses don’t always get deep training on drug interactions, their role in administering medications means they are the last line of defense against errors and unforeseen interactions. Having interactions [with GPT] like this one gives the sense of a “copilot” for nurses.

Chapter 6: So Much More: Math, Coding and Logic
The AI Revolution in Medicine: GPT-4 and Beyond by Peter Lee et. al

This all seems like simple common sense – and it is. But as I explained in Chapter 3, computer scientists and AI experts do not fully understand how or why GPT-4 can do this kind of reasoning – at least I certainly don’t. Nor do we understand its abilities in math and computer programming. There is, in fact, a considerable body of scientific research that would say an AI system such as GPT-4 should not be capable of these things. And yet, here we are, seeing responses by GPT-4 that are at once astonishing and mystifying.

This raises a very big problem: Because we don’t understand where GPT-4’s capabilities in math, programming, and reasoning come from, we don’t have a good way of understanding when, why, and how it makes mistakes or fails, and this can be a very dangerous situation when contemplating the use of GPT-4 in any medical situation. So, one question to ask is whether there are some things we can do to understand when GPT-4 might fail to provide reliable results, and get it to avoid failures in the first place.

Chapter 6: So Much More: Math, Coding and Logic
The AI Revolution in Medicine: GPT-4 and Beyond by Peter Lee et. al

What exaclty is GPT-4 anyway?

At its core, GPT-4 is what computer scientists call a machine learning system. The term, “machine learning” is a bit of a misnomer, because unlike human beings who learn by interacting with each other and the world, GPT-4 must be taken offline to be given new knowledge and capabilities. Essentially, it needs to be “turned off.” This offline process is called training, and it involves collecting lots and lots of text, images, video, and other bits of data, and then using a special set of algorithms to distill all that data into a special structure called a model. Once constructed, another special algorithm, called an inference engine, puts the model into action, for example to generate the responses of a chatbot.

There are many ways to create and structure a model. You may have heard of one type of model, called the large language model, or LLM for short. Today, LLMs are based on a neural network architecture called a neural transformer, which has a design that is vaguely inspired by the brain’s structure.

Chapter 6: So Much More: Math, Coding and Logic
The AI Revolution in Medicine: GPT-4 and Beyond by Peter Lee et. al

The basic building block of a neural network is extremely simple; the essence of each network node is simply a number and a few connections to other nodes. Its complexity comes about as a result of sheer scale. In other words, in terms of the number of nodes, GPT-4 is big. And I mean really big. The exact size of GPT-4’s neural network has not been publicly disclosed, but it is so large that only a handful of organizations worldwide have enough computing power to train it. It is likely the largest artificial neural network ever built and deployed to the public.

Now, here’s the most important point about GPT-4’s architecture: For the most part, its capabilities result from the scale of its neural network. GPT-4’s abilities to do math, engage in conversation, write computer programs, tell jokes, and more, were not programmed by humans. Instead, they emerged into existence – sometimes unexpectedly – as its neural network grew.

While some technologists — in particular, the ones at OpenAI — have long suspected that extreme scale might be a path to achieving human-level reasoning, it is still incredible to witness this come to life. And the fact that so much of this has just “popped into existence” once enough scale was achieved partly explains why its abilities — and its failure modes — are so mysterious. In analogy to our current inability to understand how the human brain accomplishes “thinking,” so, too, is our inability to understand much of how GPT-4 does what it does.

Chapter 6: So Much More: Math, Coding and Logic
The AI Revolution in Medicine: GPT-4 and Beyond by Peter Lee et. al

But the question remains: How do we assess the usefulness of GPT-4 in medical situations, especially in applications that involve math, statistics, and logical reasoning? Compounding the difficulty of assessing GPT-4 in math and logic is that some problems can have answers in a gray area between right and wrong, sort of like the subjective idea of “partial credit” in math classes. And in the very near future, it seems likely that people will be tempted to give GPT-4 problems that are beyond the user’s ability to solve or verify (and, in fact, might have no known solution at all!), thus making it all but impossible to know what to do with the answers that come back.

Our best advice today is to verify the outputs of GPT-4 (and use GPT-4 itself to aid in doing this). And if you can’t verify, then it is probably wise not to trust the results.

Chapter 6: So Much More: Math, Coding and Logic
The AI Revolution in Medicine: GPT-4 and Beyond by Peter Lee et. al

GPT-4’s ability to converse with the patient is more natural and easier than filling out a paper form.

Chapter 7: The Ultimate Paperwork Shredder
The AI Revolution in Medicine: GPT-4 and Beyond by Peter Lee et. al

Much of healthcare technology development has been focused on efficiency, which amounts to increasing the number of patients that can be seen in a day. But is that the right way to think about improvements to the healthcare system? Are we going for quantity or quality?

As we have seen here, GPT-4 can indeed make things like writing notes less time-consuming. But then the question is, where can the freed-up time best be used? By engaging with GPT-4, we see the possibility that more direct engagement between doctor and patient might be possible, and that time might open up for continuous self-improvement and a better “personal touch.”

Chapter 7: The Ultimate Paperwork Shredder
The AI Revolution in Medicine: GPT-4 and Beyond by Peter Lee et. al

Arcane processes like medication reimbursements and prior authorizations may seem like just part of the complicated drudgery of the healthcare business, but they are actually high-stakes issues for millions of people in the United States today. It’s not just the question of who gets to decide whether a prescription for Toprol or a hypertension treatment is justified and should be reimbursed, but how fairly and transparently those decisions get made. Is it up to the doctor, the insurance company, the government, or an AI like GPT-4? And, if mistakes are made, who is accountable?

These aren’t theoretical questions. Every day, decisions that have a big impact on people’s lives are made, and increasingly they are made in a data-driven matter using AI-powered predictive algorithms. Unfortunately, there is growing evidence that such AI-based decisions can lead to a dramatic increase in the number of health insurance claims that are denied. As a recent investigative report by STAT about Medicare Advantage denials found, “insurers are using unregulated predictive algorithms, under the guise of scientific rigor, to pinpoint the precise moment when they can plausibly cut off payment for an older patient’s treatment.”The impact of such decisions can be devastating to the lives of people and their families, and often there is no viable recourse because appeals can take many months and, well, it’s hard to argue with a machine.

AI systems are often criticized for reflecting the biases that are present in their training data. And since GPT-4 was trained on data from the Internet, it certainly must have integrated biases into its neural net. This is such an important problem that the developers at OpenAI and Microsoft have worked tirelessly to understand these biases and mitigate them, to the extent possible.

Chapter 7: The Ultimate Paperwork Shredder
The AI Revolution in Medicine: GPT-4 and Beyond by Peter Lee et. al

Here we see that GPT-4 reflects (probably accurately) the biases from its training data, but importantly, it apparently understands that these are biases and suggests that they perpetuate harmful stereotypes. Furthermore, it attempts to provide transparency by (a) reflecting in the three examples the biases that are likely present in its training data, and (b) explaining that these are harmful stereotypes. In this and countless other tests, GPT-4 represents a major step forward in fairness and transparency.

But the question still remains: Can GPT-4 or any AI system ever be trusted to make compassionate and fair decisions on insurance claims? Will it be fair to seniors, to women, and to all minorities? And can it make decisions in ways that are transparent enough to support explanation and recourse in case of disputes?

Chapter 7: The Ultimate Paperwork Shredder
The AI Revolution in Medicine: GPT-4 and Beyond by Peter Lee et. al

In a real trial, a researcher would have to read dozens of clinical notes to find even one eligible patient. If they missed a detail in those clinical notes that made the patient ineligible, that’s a wasted and expensive in-person clinic visit. If they overlook eligible patients, that results in fewer patients recruited for the study, which could also delay the trial at great cost.

In all, preparing for a trial may involve humans reading tens of thousands of clinic notes. Conservative estimates place the cost of reading all the relevant notes for a single patient at between $150 and $1,000. What if we could just have a large language model go through the entire electronic health record to look for eligible patients and exclude those that do not meet criteria? Having that capability could cut months to years from the timeline. It’s been estimated that a delay of one month can cost a pharmaceutical company between $600,000 and $8 million. And finding subjects is just one aspect of running a trial. The examples below also illustrate other aspects that, when taken together, add up to the prospect that large language models could mean a qualitative change in how we run trials. The cumulative impact could be not only measured in the millions of dollars saved in increased efficiency but also the shortening of the interval to bring a treatment to the ultimate yes/no regulatory decision that will directly influence patients’ lives.

Chapter 8: Smarter Science
The AI Revolution in Medicine: GPT-4 and Beyond by Peter Lee et. al

With large enough data sets over diverse populations and styles of practice, the LLM’s responses to prompts will represent the diversity of practices and populations. Without that breadth and diversity, the performance of the model will be biased by the nature of the hospitals it had data access to. Practically, only some of the hospital systems that have made de-identified data available to train various machine learning algorithms are publicly known.

I believe obtaining diverse patient data is essential but obtaining it through deals with hospital systems is a mistake. Going to patients directly will allow for sampling across geography and socioeconomic strata while respecting patient autonomy. It is a trend that is growing steadily. In the UK Biobank, more than 500,000 participants have consented to share their clinical (provider notes, laboratory studies) and research data (genomic sequence, research-grade MRIs) for research purposes. I believe this is one of the central societal discussions needed for using large language models in medicine: how do we guard against the bias that could arise from demographically skewed patient data while also ensuring that patients who contribute their data to training the model really want to? Until these decisions are made, the nature of the clinical data informing the models will be influenced by which hospitals in which countries happen to decide for altruistic or financial reasons to share their data.

Chapter 8: Smarter Science
The AI Revolution in Medicine: GPT-4 and Beyond by Peter Lee et. al

When it comes to medical uses for GPT-4 and its kind, we’re only at the very beginning of that lag period. So, this is the moment for broad, thoughtful consideration of how to ensure maximal safety and also maximal access.

Like any medical tool, AI needs those guardrails to keep patients as safe as possible. But it’s a tricky balance: those safety measures must not mean that the great advantages that we document in this book end up unavailable to many who could benefit from them. One of the most exciting aspects of this moment is that the new AI could accelerate healthcare in a direction that is better for patients, all patients, and for providers as well — if they have access.

The good news for medical regulators who consider how to handle something like GPT-4 is that they are by no means starting from zero. With previous, more narrow AI systems, they could look to well-trodden paths for regulating devices and drugs. In the United States, the FDA has approved hundreds of AI-augmented tools, and developed a framework for approving SaMD (Software as a Medical Device). As we noted in Chapter 4, and regulators around the world — including Europe, China, and Australia — have developed similar guidelines, generally regulating medical AI tools as they would medical devices.

The bad news is that all those approved AI systems perform very narrow functions, such as identifying brain hemorrhages or cancer on scans; the broad medical competence of GPT-4 makes it a very different animal. It is the difference between the Trial and the Trainee mode of competence and evaluation we described in Chapter 4. And the models have advanced so quickly that regulatory bodies tend to have only partial answers, if any, to the larger questions that have suddenly become urgent.

Chapter 9: Safety First
The AI Revolution in Medicine: GPT-4 and Beyond by Peter Lee et. al

Federal agencies and others agree, he said, that “this is not going to be some top-down government agencies telling you exactly what to do. More likely it’s going to be, ‘Here are the guidelines and guardrails set at a federal level,’ but then private industry goes and builds the assurance labs, the registries that tell us what products are good for what,” and more.

Some “guidelines and guardrails” are certain to be put in place. Even Elon Musk has been calling for “some kind of, like, regulatory authority or something overseeing AI development,” Reuters reported, to “make sure it’s operating in the public interest.” But how, exactly, regulators will balance innovation with patient interests remains to be seen.

Chapter 9: Safety First
The AI Revolution in Medicine: GPT-4 and Beyond by Peter Lee et. al

The impending AI revolution in medicine can and must be regulated. But how? Peter argues the following:

➊ The current FDA framework around Software as a Medical Device (SaMD) probably is not applicable. This is especially true for LLMs like GPT-4 that have been neither trained nor offered specifically for clinical use. And so while we believe this new breed of AI does require some form of regulation, we would urge regulators not to default automatically to regulating GPT-4 and other LLMs as SaMDs, because that would act as an instant, massive brake on their development for use in healthcare.

➋ If we want to use an existing framework for regulating GPT-4, the one that exists today is the certification and licensure that human beings go through. A question, then, is whether some kind of human-like certification process is workable in this case. However, as argued in Chapter 4, this Trainee model of certification does not seem particularly applicable to large language models. At least not at present.

➌And finally, we urge the medical community to get up to speed as quickly as possible, do the necessary research, and be the driving force behind the research and development of regulatory approaches for this new future of generally intelligent machines in medicine.

Chapter 9: Safety First
The AI Revolution in Medicine: GPT-4 and Beyond by Peter Lee et. al

So you can very well imagine the potential for these technologies where if you had access to a sort of medical advice-giver, you could say: ‘Oh, I just tested positive for COVID. What should I do?’ And ‘Should I take Paxlovid? What are the risks? Where do I go get it? My doctor won’t prescribe it — what do I do?

I think everybody having access to this sort of second opinion that you may be able to get from a tool like this could be extraordinary, just in terms of health outcomes. And moreover, I think that as you see these demographic shifts happening, you sort of have to have it. It’s not even a choice.

Chapter 10: The Big Black Bag
The AI Revolution in Medicine: GPT-4 and Beyond by Peter Lee et. al

So, as we think about the future – the benefits and risks, the capabilities and limits, and most of all, the appropriate and inappropriate uses – we must come to grips with the fact that GPT-4 represents a technological phase change. Previously, general intelligence was frozen inside human brains, and now it has melted into water and can flow everywhere.

Epilogue
The AI Revolution in Medicine: GPT-4 and Beyond by Peter Lee et. al

back

Buy on Amazon