Introduction

Large Language Models (LLMs) represent a significant breakthrough in Natural Language Processing (NLP) and Artificial Intelligence (AI)1. Prior to 2017, while NLP models could perform several language processing tasks, they were not easily accessible to non-domain experts. The introduction of the Transformer architecture in 2017 revolutionized the field, enabling NLP models to efficiently synthesize and analyze datasets using simple prompts. This allowed large-scale use by people worldwide, significantly broadening access to advanced language processing tools2. The Transformer technology led to the development of two game changers: Bidirectional Encoder Representations from Transformers (BERT) and Generative Pretrained Transformer (GPT), that used semi-supervised approach and acquired exceptional generalization capabilities with the ability to interpret and generate human-like text3. The launch of ChatGPT in 2022 gained public attention in almost every field of life owing to its accessibility and user-friendly interface. LLMs offer AI driven support particularly in literature review, summarizing articles, abstract screening, extracting data and drafting manuscript. Due to workload reduction and the ease offered, there has been an increasing interest in the incorporation of LLMs like ChatGPT, Perplexity, Llama by Meta (formerly Facebook), Google Bard and Claude, in academic research, as indicated by a rapid increase in the number of articles after ChatGPT’s release3,4.

Although there are numerous efficiency gains in utilizing LLMs in research, they however cannot replace humans particularly in contexts where meticulous understanding and original thought, along with accountability are crucial5,6. With deeper understanding of LLMs, it was found that LLMs are also capable of generating fake citations, rapidly generating large volumes of questionable information, and also amplifying biases3,7. This has led to negative ethical implications like authorship integrity and a surge in predatory practices and as a consequence, an “AI-driven infodemic” has emerged5. There is also a risk of public health threat resulting from ghost-written scientific articles, fake news and misinforming content3. In addressing these issues, as a first step, it is pertinent to understand attitudes of researchers towards LLMs in research by assessing researcher’s awareness and practices of the use of LLMs.

Our study provides a unique analysis of a targeted group of medical and paramedical researchers enrolled in a one-year-certification course- Global Clinical Scholars Research Training (GCSRT) Program, at Harvard Medical School (HMS). We aim to provide insights into the current trends in AI usage in research and publication along with a peek into the future scope and impact of LLMs. We strongly believe that the results of our study can aid journals to formulate future policies regarding the use of AI tools in the process of publication, thus ensuring credibility and maintaining integrity of medical publications.

Methods

Study design and population

This global survey was carried out using a cross-sectional design. It was conducted between April and June 2024, amongst a diverse group of medical and paramedical researchers who received training at the GCSRT program at Harvard. This program consists of researchers from over 50 countries and 6 continents spanning various specialties, career stages, age groups and genders. At the program, all participants receive advanced training in every stage of research including statistical analysis, publishing and grant writing8. They are therefore an ideal group to assess AI tools usage in research.

Study objectives

We had three primary objectives for this study. First, to assess the level of awareness of LLMs amongst global researchers. Second, to identify how LLMs are currently used in academic research and publishing amongst our survey respondents. And third, to analyze the potential future impact and ethical implications of AI tools in medical research and publishing.

Eligibility criteria

  1. (a)

    Inclusion Criteria: Medical and paramedical researchers who have been participants of the GCSRT program at HMS belonging to any cohort between 2020 and 2024, irrespective of their country of origin, research interests, active years in research, age or gender. Researchers who were members of the unofficial class WhatsApp groups and were proficient in reading and writing in English language were included specifically.

  2. (b)

    Exclusion Criteria: Researchers from cohorts outside of the above specified years, those who were not accessible through class WhatsApp groups, or were not proficient in reading and writing in English language were excluded from the study. Medical and paramedical researchers who have not undergone training at this program as well as non-medical researchers, were not invited for this study.

Questionnaire development and survey dissemination strategy

The survey was drafted using Google Forms, in English Language. It consisted of a total of 4 sections to cover our primary objectives- (1) Background, (2) Awareness of LLMs, (3) Impact of LLMs and (4) Future Policy. Each question was carefully reviewed for its relevance, validity, and unbiasedness. Data collectors for the study were voluntarily chosen from amongst the participants of the GCSRT Program. The data collectors from each of the targeted cohorts were made primary in-charge of reaching out to our target population in their cohort via personal messaging on WhatsApp and LinkedIn. The contact information of the survey respondents was obtained from the unofficial class WhatsApp groups and personal networks of the data collectors. A total of 3 personal messages including 2 reminders, spaced 7 days apart each, were sent to each prospective participant. Informed consent was obtained, and Google survey forms were filled out by a total of 226 researchers from over 59 countries.

Sample size and statistical methods

The link to the Google survey form was distributed to 5 cohorts of the GCSRT program consisting of a total of 550 medical and paramedical researchers. A total sample size of 220 was calculated by considering a margin of error of 5%, a confidence level of 95% and power of 0.8. Descriptive statistics of the survey respondents were presented as mean ± standard deviation for normally distributed continuous data, median (interquartile range) for non-normally distributed continuous data, and frequencies & percentages for categorical data. Continuous data were tested for normality using the Shapiro–Wilk test. Normally distributed data were analyzed using one way ANOVA while non-normally distributed data were analyzed using the Kruskal-Wallis test. Categorical data were analyzed with Chi-squared test or Fisher’s exact test. Qualitative data from open-ended questions were studied via thematic analysis. All statistical analyses were performed in Stata MP version 17.0 (StataCorp, College Station, TX, USA). All tests were two-tailed and considered significant at P < 0.05.

Ethical consideration

In accordance with the declaration of Helsinki8, this study was approved by the ethical review board at Allama Iqbal Medical College/ Jinnah Hospital Lahore, Pakistan (Reference no: ERB 163/9/30-04-2024/S1 ERB). It is not supported or endorsed by HMS. However, timely notification about the study was provided to the administration of the GCSRT program. Consent to participate was collected from every respondent as the first, mandatory response to the questionnaire. All personal information like email ID, nationality, and age was carefully de-identified and handled confidentially. The respondents were provided necessary information on the voluntariness of the study as well as contact information of the principal investigator.

Results

We analyzed the responses of 226 global researchers from over 59 countries and practicing across 65 different medical and paramedical specialties. Across the various countries of origin (Supplementary Table S1), the two most common regions of origin were the region of Americas (23.5%) and the South-East Asian region (23.5%).

Table 1 Represents academic and demographic characteristics of our survey respondents and compares them amongst respondents who were aware of LLMs to those who were not. The median of PubMed indexed publications among survey respondents were 7 (interquartile range: 2–18). 198 (87.6%) survey respondents were previously aware of LLMs. None of the characteristics were significantly associated with awareness of LLMs, except the number of PubMed indexed publications. Those who were aware of the use of LLMs have a higher number of publications compared to those who were not aware (p < 0.001).

Table 1 Academic and Demographic Characteristics of Survey Respondents

Table 2 represents aware respondents’ (n = 198) knowledge, attitudes and practices with respect to LLMs. Most were somewhat and moderately familiar with LLMs (33.3% and 30.8%, respectively). Of these aware respondents, the ones that have personally used LLMs previously (18.7%), mostly used it for grammatical error and formatting (64.9%), followed by writing (45.9%) and finally revision and editing (45.9%). When stratified with the number of active years in medical research, none of these variables were significantly associated.

Table 2 Knowledge, attitudes and practices of Aware respondents.

Figure 1 displays the level of perceived future impact of LLMs in various stages of publication amongst aware respondents. Majority believe that LLMs will have a major overall impact (52.0%). Areas that will be majorly impacted are grammatical errors and formatting (66.3%), revision and editing (57.2%), and writing (57.2%). Areas that will not be impacted to moderately impacted were methodology (74.3%), journal selection (73.3%), and study ideas (71.1%).

Fig. 1
figure 1

Future of LLMs in various stages of the publication process.

Table 3 represents aware respondent perceptions on future scope of LLMs. Majority perceived that it will bring a positive impact (50.8%), yet a significant proportion were unsure (32.6%). While most respondents believe that journal should allow usage of AI tools in publishing (58.1%), the majority (78.3%) also believe that some regulations (i.e. modified journal policies, AI review boards, tools to detect LLM usage) should be put in place to make AI tools in publishing ethical. When stratified with the number of active years in medical research, none of these variables were significantly associated.

Table 3 Insights into the future scope of LLMs from aware respondents.
Fig. 2
figure 2

Overall Opinion on Future Scopes & Challenges (Thematic Categories).

In our survey, 79% (n = 179) of the respondents were willing to share their overall opinion into future scope and challenges of LLMs. Their views fall into one or more categories in Fig. 2. 28% (n = 64) respondents expressed that LLMs are helpful tools in the publication process, particularly in organizing and writing large topics in a systematic way. Additionally, around a quarter of respondents (n = 55) stated that with the use of LLMs, researchers are able to spend less time on different sections of their research projects such as literature review, data analysis, and manuscript preparation. However, the survey respondents also revealed several concerns and challenges associated with using LLMs in academic research. 14% (n = 33) of the respondents expressed uncertainty or their lack of experience with LLMs. Ethical apprehensions about the use of LLMs in academic research and publication, including potential biases, privacy issues, and plagiarism, were noted in 8% of the participants (n = 18).

Discussion

AI has generated seismic waves around the world; the field of research is no exception. Our study assessed the awareness, trends of usage and future scope of LLMs to better analyze this impact in the field of academia. It captured the perception of researchers from all walks of medical and paramedical research representing 59 countries and spanning 65 specialties. Our respondents mainly belonged to Medical subspecialties (64.6%) rather than Surgical or Paramedical subspecialties, similar (68%) to respondent characteristics seen in a study by Abdelhafiz et al.9 Our respondents were mostly working in academic settings (57.1%) followed by public and private healthcare settings, similar to the study by Abdelhafiz et al. where 75% of participants were from universities or research centers9. The respondents with 10+, 6–10 and 0–5 years of research experience were 21.7%, 31.4% and 46.9% respectively, suggesting that our target population well represented academicians at varying stages of their careers.

A significant majority of our respondents (87.6%) were aware of LLMs, which was similar (85%) to a survey conducted among medical students in Jordan10, and higher in comparison to a study done in Pakistan where only 21.3% of the respondents had familiarity with AI11. A plausible explanation for the high level of awareness amongst GCSRT participants could be that they had already completed an advanced training in research, and might have come across the applications of LLMs in contemporary research and publication during this training period. Also, their keen interest in research might have rendered them to explore the latest advancements in this field, amongst which usage of LLMs and AI tools probably tops the list12,13. Interestingly, the participants who were aware of LLMs had a higher number of publications compared to those who were not (p-value < 0.001).This finding coincides with previous studies where it has been reported that greater familiarity and access to LLMs is associated with a greater pre-print and publication turn-out ratio amongst academic authors, probably due to the fast-paced nature of LLMs research and the use of LLMs for writing assistance14. None of the other variables like age, country of respondent, or field of practice were significantly associated with awareness of the use of LLMs. An overwhelmingly large proportion of the respondents who were aware of LLMs, reported that they were not aware of AI tools prior to 2022 (86.4%). This corresponds to a compelling trajectory of publications pertaining to LLM in medical research, from May 2021 to July 202315.

81.3% of our aware respondents never previously used LLMs in their research projects or publications. This is in contrast to Eppler et al.‘s earlier study16, where nearly half of the respondents reported having used LLMs in their academic practice. Amongst those who previously used LLMs in their publications, most rated their usage to be of moderate to frequent intensity for tasks such as grammatical error corrections, editing and manuscript writing. These results were in concordance with the study by Eppler et al.16, which showed that the most common use of LLMs in scientific publishing was for writing (36.6%) followed by checking grammar (30.6%). With the help of LLMs based on NLP, it is possible to conveniently rectify grammatical errors using categorization models and algorithm-based sentence construction17,18. Despite the frequent use of LLMs for various components of academic writing, a considerable proportion of these respondents (~ 40%) did not acknowledge its usage in their publications. There are multiple reasons as to why a researcher may not reveal the inclusion of AI tools in their research papers. First, is the lack of information or comprehension on part of the researchers regarding the technologies they are using, due to which they remain oblivious to the degree to which AI has been integrated into their research19,20. And second, is the skepticism or negative perceptions associated with the use of AI – like the notion that a machine was deployed to generate proposals or scientific discussion of their study21. Thus, the question of whether or not to acknowledge the use of AI in research studies remains an ethical imbroglio. Publishers may ask authors to submit or include a declaration about whether or not they have used AI systems in their writing22,23,24.

Figures 1 and 2; Table 3 provide insights into future scopes and challenges of LLMs among global researchers. Figure 1 reveals a substantial belief in the transformative potential of LLMs, with slightly more than half of respondents anticipating a major overall impact. Specific areas identified as being most significantly influenced by LLMs in the future included grammatical errors and formatting, revision and editing, writing, and literature review. These results align with current literature suggesting that LLMs can greatly enhance the efficiency and accuracy of these tasks, thus facilitating quicker and higher-quality academic outputs25,26,27. Conversely, the areas perceived to be less impacted, such as methodology, journal selection, and study ideas, reflect apprehension about AI to critically assess research design and journal suitability.

As shown in Table 3, slightly more than half of the participants view the impact of LLMs positively, yet around one-third of them remain uncertain. This uncertainty underscores a significant concern regarding the ethical implications and potential misuse of AI technologies. Ethical concerns are well-documented in existing studies that highlight issues such as data privacy, misinformation, and unintended biases that can arise from AI-generated content28,29. In addition, our study reveals that while the majority of respondents support the use of AI tools in publishing, there is a strong consensus on the necessity of implementing regulatory measures, such as modified journal policies, AI review boards, and tools to detect LLM usage. This finding is consistent with broader ethical guidelines proposed in the literature, which advocate for robust oversight and ethical frameworks to mitigate the risks associated with AI deployment in sensitive fields, such as medical research27,30. Interestingly, the perception of AI’s ethical use varies with experience levels. In our study, participants with more than 10 years of research experience were more likely to view AI tools as positive and support their use under regulated conditions compared to those with lesser years of research experience; however, this result was not statistically significant.

Conclusions

The discipline of academic writing has seen a noticeable transformation following the advent of LLMs, with an increasing number of researchers incorporating these tools at varying stages of their research publications. However, as the applications of LLMs rise, there is a corresponding rise in the concerns regarding their validity, accountability, potential exploitation and ethical implications.

While there is a broad recognition of the beneficial impact of LLMs on certain aspects of academic research and publishing, addressing associated ethical risks and apprehensions is of paramount importance. Our study emphasizes the need for developing comprehensive guidelines and ethical frameworks to govern the use of AI in medical and paramedical research. The growing utility of LLMs necessitates the implementation of such regulatory policies promptly to ensure their safe, responsible and effective usage.

Limitations

Our study has certain methodological limitations which need to be addressed. First, since this is a cross-sectional study, causal inferences cannot be drawn from the findings as well as the temporal relevance of our study findings are subjected to change over time. Second, despite our extensive attempts to maintain the anonymity of survey responses, the study findings are prone to social desirability bias. Third, since our study population was exclusively limited to the participants of the GCSRT program with extensive knowledge surrounding academic research, a selection bias could have been introduced which limits the generalizability of the study findings. Fourth, our study did not collect several respondent’s characteristics that may be associated with awareness of the use of LLMs, such as sex, level of education, and level of income. And finally, our study is susceptible to sampling bias due to the use of WhatsApp and LinkedIn for data collection. Participants using LinkedIn might be concurrently using other platforms for their research, while participants using WhatsApp might be younger and more well-versed in technology and AI. Hence, concerns about the overrepresentation of certain demographics could affect the external validity of the findings.