Using ChatGPT API to read and extract answers from a Healthcare document

Sriram Parthasarathy
10 min readMar 17, 2023

A considerable 80% of healthcare data remains unstructured and unused. When applications rely solely on structured data, they utilize only 20% of the available information. Consequently, these solutions may overlook 80% of the data that could potentially hold crucial insights or answers to pressing issues.

The focus of this article is to demonstrate the process of extracting insights from a clinical document. To illustrate this process, a simple Adverse Drug Reaction report has been chosen as an example, and we will showcase how ChatGPT can be trained to read and extract insights from this document.

Read more about how to use ChatGPT API to ask contextual questions here. That would be a good read about the API as in this article I won’t go through the basics.

What is ADR and how are they reported?

An Adverse Drug Reaction (ADR) refers to any negative or unexpected reaction that occurs in response to a medication or drug. These reactions can range from mild side effects, such as nausea or headache, to severe or life-threatening events, such as anaphylaxis or organ failure. It is essential to monitor and report ADRs during clinical trials and after a drug is approved for use to ensure patient safety. In clinical trials, ADRs are typically reported through the use of a standardized form called a Serious Adverse Event (SAE) report. When an ADR occurs, the investigator in charge of the trial must assess the severity of the event and determine whether it meets the criteria for an SAE. If an event is considered an SAE, it must be reported to the sponsor of the trial within a specified timeframe, usually within 24–72 hours. The sponsor is then responsible for reporting the event to regulatory authorities, such as the FDA, if required.

What information is present in the SAE report?

When an Adverse Drug Reaction (ADR) occurs in a clinical trial, a Serious Adverse Event (SAE) report is required to be submitted. An SAE report includes various important pieces of information that help investigators and regulatory authorities understand the nature and severity of the event.

Some of the key information that must be included in an SAE report includes patient information, details about the adverse event, the outcome of the event, causality and severity assessments, management information, and follow-up information. The information contained in an SAE report is critical for understanding the safety profile of a drug being tested in a clinical trial and for making informed decisions about the risks and benefits of the drug. Proper extraction and documentation of this information is crucial for ensuring the safety of trial participants and for regulatory approval of new drugs.

What are the common questions to extract answers ?

Here are five simple questions that can be asked to extract important information from a Serious Adverse Event (SAE) report:

  1. What was the adverse event that occurred in the patient?

2. When did the adverse event occur?

3. Was the adverse event caused by the drug being tested or by other factors?

4. How severe was the adverse event?

5. What was the outcome of the adverse event, and were any medical interventions required?

These questions can help extract important information about the nature, timing, cause, severity, and outcome of the adverse event from the SAE report, which is essential for evaluating the safety profile of the drug being tested in the clinical trial.

Sample Serious Adverse Event (SAE) report

This is the file I will be using to extract information about the Adverse event to answer the questions I discussed earlier. I have saved this in a file called ADR1.txt and that is the file I will open via ChatGPT API to read and answer the questions via ChatGPT API. This is an anonymous report.

Clinical Trial Adverse Event Report
- - - - - - - - - - - - - - - - - -
Study Name: XYZ-123 Drug Trial for Condition ABC
Study Site: ABC Medical Center
Principal Investigator: Dr. Jane Doe
Patient ID: 001
Date of Report: 2023–03–18
Adverse Event Details
- - - - - - - - - - -
Date of Adverse Event Onset: 2023–03–15
Date of Adverse Event Resolution: 2023–03–17 (if applicable)
Description of Adverse Event: Patient experienced moderate headache and dizziness
Severity of Adverse Event:
[ ] Mild
[X] Moderate
[ ] Severe
[ ] Life-threatening
Action Taken:
[ ] No action taken
[X] Dose adjusted
[ ] Drug temporarily discontinued
[ ] Drug permanently discontinued
[ ] Other (specify): ______________________________
Outcome of Adverse Event:
[ ] Recovered without sequelae
[X] Recovered with sequelae
[ ] Ongoing
[ ] Unknown
Relationship to Study Drug:
[ ] Unrelated
[X] Possibly related
[ ] Probably related
[ ] Definitely related
[ ] Unknown
Additional Information (if any): Patient had a history of migraines, but the intensity and timing of the headache were unusual for the patient.
Follow-up Actions:
- - - - - - - - -
Date of Follow-up: 2023–03–20
Follow-up Results: Patient's headache and dizziness resolved after adjusting the study drug's dosage.
Signature of Principal Investigator: Dr. Jane Doe
Date: 2023–03–18
- - - - - - - - - - - - - - - - - -

Before you proceed

If you want to try out the code below, before you proceed below, you should have the following done

  • Install VS code (or some code editor)
  • Install Python
  • Install relevant extensions in your code editor
  • Signed up with OpenAI
  • Created an API Key
  • Installed openai library

All these steps are documented in length here with detailed instructions and examples.

Steps to get insights from the clinical document

The steps we are going to follow

  • Read the ADR file (ADR1.txt)
  • Pass this information to ChatGPT so it can understand this document (Since this is an anonymous file its ok, but in real scenario, appropriate security permissions need to be obtained)
  • Instruct ChatGPT to use this file to answer the questions we plan to ask
  • Iteratively ask the questions

Initial setup

Lets import the OpenAI library and set the API Key you got from the OpenAI web site. We will be using the GPU 3.5 model.

import openai
# Get the key from the OpenAI web site
openai.api_key = " ..... "
model_id = "gpt-3.5-turbo"
openai.api_key = API_KEY

Function to call ChatGPT AI

Next Step is to write a simple function that will call ChatGPT API and get your questions answered. More details of how to do this in steps is explained in this article.

To facilitate a conversation with ChatGPT, it’s necessary to provide it with the context of past exchanges in addition to the current question. ChatGPT doesn’t retain previous conversation details by default, and thus, sending this information is crucial for ChatGPT to accurately recall and respond to the ongoing dialogue.

This function takes two parameters.

The new question you would like to ask and the past conversation. The new question and the past conversation details are packaged and sent to ChatGPT and we get a response. This response is returned by this function.

# We call this function and pass the new question and the last messages
# NewQuestion is the brand new question we want to answer
# lastmessage is the past conversatio that is passed along for context to have a conversation
# API does not rememeber the past conversation so we have to do this so it can use that context
def GetMessageMemory(NewQuestion,lastmessage):
# Append the new question to the last message
lastmessage.append({"role": "user", "content": NewQuestion})
# Make a call to ChatGPT API
msgcompletion = openai.ChatCompletion.create(
model=model_id,
messages=lastmessage
)
# Get the response from ChatGPT API
msgresponse = msgcompletion.choices[0].message.content
# You can print it if you like.
#print(msgresponse)

# Print the question
print("Question : " + NewQuestion)
# We return the new answer back to the calling function
return msgresponse

We will be using this function to communicate with ChatGPT.

Read the file to pass it to ChatGPT API

Read and pass the file to ChatGPT to read so it will use this information to answer the questions.

file_path = 'ADR1.txt'

# Open the file in read mode ('r') and read its contents
with open(file_path, 'r') as file:
content = file.read()

# Instruct ChatGPT to use this file to answer the question
question = "Use this text to answer my questions. " + content

ADR related questions to ask

The questions I would like to ask ChatGPT to answer based on the SAE report document I shared

  • What is the study name?
  • What is the site name?
  • What is the Adverse Event Date?
  • What is the Severity?
  • What action was taken?
  • What is the Outcome?
  • Did the patient have a migraine?
  • Who signed and the date?

Call ChatGPT to use the file to answer the questions

Lets use the function created before to call and ask ChatGPT to use this file we just read for answering the questions.

messages = []
# Set the question to answer
cresponse = GetMessageMemory(question, messages)
messages.append({"role": "assistant", "content": cresponse})

At this point of time, ChatGPT has read your document and is ready to answer the questions.

Lets ask the first question. What is the study name?

cresponse = GetMessageMemory("What is the Study Name?", messages)
print(cresponse)

The response we got is

The study name is "XYZ-123 Drug Trial for Condition ABC."

Next question : What is the Adverse Event Date?

cresponse = GetMessageMemory("What is the Adverse Event Date?", messages)
print(cresponse)

The response we got is

The onset date of the adverse event was March 15, 2023.

Here is a full list of the questions asked.

cresponse = GetMessageMemory("What is the Study Name?", messages)
print("Answer : " + cresponse)

cresponse = GetMessageMemory("What is the Site Name?", messages)
print("Answer : " + cresponse)

cresponse = GetMessageMemory("What is the Adverse Event Date?", messages)
print("Answer : " + cresponse)

cresponse = GetMessageMemory("What is the Severity?", messages)
print("Answer : " + cresponse)

cresponse = GetMessageMemory("What action was taken?", messages)
print("Answer : " + cresponse)

cresponse = GetMessageMemory("What is the Outcome?", messages)
print("Answer : " + cresponse)

cresponse = GetMessageMemory("Did the patient have a migraine?", messages)
print("Answer : " + cresponse)

cresponse = GetMessageMemory("Who signed and the date?", messages)
print("Answer : " + cresponse)

Note that you could have used the text input to input these questions. I was lazy so did it this way.

Here is the full list of the answers.

Question : What is the Study Name?
Answer : The study name is "XYZ-123 Drug Trial for Condition ABC."

Question : What is the Site Name?
Answer : The study site is ABC Medical Center.

Question : What is the Adverse Event Date?
Answer : The adverse event date is March 15, 2023.

Question : What is the Severity?
Answer : The severity of the adverse event was moderate.

Question : What action was taken?
Answer : The dose was adjusted in response to the adverse event.

Question : What is the Outcome?
Answer : The patient recovered with sequelae.

Question : Did the patient have a migraine?
Yes, the additional information states that the patient had a history of migraines, but the intensity and timing of the headache were unusual for the patient.

Question : Who signed and the date?
The report was signed by Dr. Jane Doe on March 18, 2023.

Once the document is read by ChatGPT (one time), one can use the ChatGPT API to ask many free form questions as shown above.

As demonstrated, the ChatGPT API can be utilized to extract insights from clinical documents, allowing for the answering of specific questions related to a given adverse drug reaction. Once data is extracted from all events, it can be consolidated and aggregated to create valuable analytics. However, it’s important to note that the example provided in this article was performed using anonymous data, and in real-life scenarios, appropriate security permissions must be obtained before accessing confidential patient information. Nonetheless, this example serves as a simple illustration of how ChatGPT can be used to extract insights from a clinical document.

It’s worth noting that the file used in this demonstration was relatively small, as the 3.5 model was utilized. At present, the ChatGPT 4.0 API — which is capable of processing up to 25k words — has yet to be released. However, users can share multiple pages of text with ChatGPT iteratively to overcome this limitation. Additionally, it’s important to note that ChatGPT does not retain any data used for training purposes as of the latest edition.

Once ChatGPT API 4.0 is released we can look in to expanding this example to train ChatGPT on a collection of documents to answer questions.

Conclusion

In conclusion, the ChatGPT API offers an efficient and effective solution for training and extracting answers from healthcare documents. This technology can be used to improve the accuracy and speed of information retrieval, making it an invaluable tool for medical professionals, researchers, and patients alike. With the ability to extract key information from a document and provide relevant answers to specific questions, the ChatGPT API has the potential to revolutionize the way healthcare professionals access and utilize patient data. As the use of artificial intelligence continues to expand in the healthcare industry, the ChatGPT API is poised to play a critical role in enhancing patient care and advancing medical research.

BECOME a WRITER at MLearning.ai

--

--