Introduction

Generative artificial intelligence (GenAI), refers to deep-learning models that can generate high-quality content based on the data on which it was trained1. Large language models are a category of foundation models trained on immense amounts of data. Although not inherently able to understand text and data, these models are able to generate natural, human-like language2 which is perceived as “conversational” by users. GenAI differs from extractive AI technologies that excel in accessing, collating, prioritizing, adapting, and using information under narrow circumstances3.

GenAI including large language models (LLMs), provides learners and educators in medicine and health sciences with previously-unimaginable opportunities for teaching and learning. The scope of potential applications, and the efficiency with which this could be completed, is rapidly increasing. The utility of its capacity to utilize multi-modal approaches has been demonstrated in cardiac electrophysiology education4 and digital pathology5,6.

AI in general is a complex social, cultural, and material artifact whose meaning and place continue to be constructed by different stakeholders. There remains a paucity of information regarding the development, deployment and commercialization7 of these models and their applications and services based upon them8. Unsurprisingly, there has been growing consternation among educators, professional bodies, and governments regarding the potential need for regulation, in its various forms, and control of the influence of this technology. It often feels like attempts to provide regulatory frameworks and legislation are reactionary and ineffectual, in the face of rapid global progress unbounded by any specific institutional or sovereign authority. Several guidelines, ethical considerations9, statement papers, and recommendations have been published in recent years10, including primers for AI3, recommendations for workforce implications11, and considerations, especially ethical12, regarding the integration of AI in medical curricula13.

Enthusiasm for, and trepidation of the future role of GenAI in medical education must to be considered in the context of our evolving understanding of pedagogical principles and best practices. The impact of GenAI cannot be ignored, as it risks multi-level harms ranging from a lack of structures to ensuring scholarly integrity, stagnation, and irrelevance of learning approaches14. There is likely a need for sustainable and adaptable responses to GenAI in learning, teaching and assessment15. This review seeks to situate our current understanding of the impact of GenAI in undergraduate medical education, within a pedagogical framework, to inform regulatory concerns. A narrow focus recognizes the differences between undergraduate, postgraduate, and continuing professional education requirements. We outline the concerns of GenAI and LLMs in medical education, pedagogical considerations, and emerging roles, and present a discussion regarding the regulation and preservation of academic integrity. A summary of the key considerations and concerns regarding GenAI and LLMs is provided in Fig. 1.

Fig. 1
figure 1

Summary of the key concerns and considerations for GenAI and LLMs from medical education, pedagogy, and regulatory perspectives. The overlap in concerns regarding GenAI and LLMs between each perspective is demonstrated in the Venn diagram. Unknown biases and trustworthiness are concerns shared across all three perspectives.

Pedagogical considerations for generative AI in medical education

GenAI and learning

The learning process is perhaps the biggest consideration when situating GenAI within a pedagogical approach. Educability, that is, the ability of learners to utilize any and all previously-learned information in meaningful ways, distinguishes their learning capacity from machines16. GenAI is likely to be a useful adjunctive tool in medical education but is unlikely to replace all the experiences and social interactions that are important for the development of empathetic and contextually aware learners in constructivist and experiential frameworks. It is important to include information on GenAI in education for all students and educators. Without the opportunity to learn about the ethical use of GenAI, learners are more susceptible to engaging in inappropriate use of GenAI17. A recent scoping review identified the need for further research in three key areas to improve our understanding of the role of GenAI in medical education. These include (i) developing learners’ skills to evaluate GenAI critically, (ii) rethinking assessment methodology, and (iii) studying human–AI interactions18.

What learning processes, in the medical student “journey,” are likely to be impacted (adversely or positively) by LLMs and GenAI? Students learn through a combination of different means, and the theoretical approaches to classify these have revolved around cognitive psychology, humanistic psychology, and social anthropology. Social constructivism, as an epistemological framework, outlines learning through the construction of knowledge and interactions with others by linking new information to that previously learned and incorporating new experiences into a knowledge base. It is not simply the transmission of knowledge from the external world to the learner19. A complementary learning theory, experiential learning, defines learning as a process whereby knowledge is created through the transformation of experience. Different learners pass through phases of reflective observation, abstract conceptualization, and active experimentation in their own preferred order20. Therefore, the context in which learning is experienced and knowledge is acquired is critical. Whilst offering “efficiency,” how GenAI is situated with respect to the “context” of learning and the transition from “novice” to “expert” is not yet fully understood. Cognitive psychology offers many theories and explanations of how expertise develops, and is fostered via learning21. As our understanding and experience of GenAI grows, it may well be that we find it fits within existing frameworks or demands a novel approach to understanding its role in this transition. The extant literature identifies instances where GenAI can accelerate learning in novices22 but may also accelerate skill decay and hinder skill acquisition23. A theoretical counterargument posits that GenAI itself could play the role of “the more knowledgeable other” in the social constructivist framework24. Operating in a metaphorical contextual vacuum, GenAI is unlikely to bypass the process of exposure and experience in learning. Its inability to teach the integration of contextual and external information, comprehend sensory and nonverbal cues, cultivate rapport and interpersonal interaction, and align with overarching medical education and patient care goals25 remains a key limitation for the current generation of LLMs.

Clinical reasoning and GenAI

The key difference between extractive and GenAI is that the latter leverages machine learning, such as neural networks, to generate new content. This method is based on the relationships and patterns found in existing datasets, which can be broad and varied. GenAI can autonomously and rapidly produce large volumes of content and has the capacity to be “imaginative” and “disruptive” in its innovation. Via a “black box” mechanism involving multiple layers of neural networks (which remains opaque even to developers and difficult for computer scientists to explain), GenAI can create substantially larger output than the input provided10,26. The concerns are slightly different for extractive AI compared to GenAI. The former, with more utility for diagnostic processes, has a stronger expectation of being correct, while the latter must be plausible and useful. The strengths and weaknesses of GenAI and extractive and algorithmic forms of AI are distinct.

This ““black box” component of GenAI may be problematic if trying to teach clinical reasoning and decision-making skills in medical education, which means necessarily involving opaque, partial and ambiguous situations27. When prompted, GenAI might offer a plausible explanation of its decision-making process, but this explanation is not necessarily an accurate, or comprehensible representation of how GenAI actually made its decisions. It could also be argued that the ways in which humans recognize and solve problems and engage in clinical reasoning are unclear. This opacity is problematic when considering that the formative process of learning clinical decision-making, including understanding when and why errors occur, has implications for legal liability and accountability in medical practice14. LLMs can impact critical evaluation and analytical thinking and, when used inappropriately, could negatively impact students” ability to discriminate valuable information from inaccurate and irrelevant inputs28. Just as the Internet has led to the externalization of factual knowledge, there are concerns that LLMs could externalize medical reasoning. The response to the readily-available nature of a vast amount of information was to place greater emphasis on debate and discussion and knowledge “management,” rather than memorization28. With an improvement in the capacity to assist with clinical reasoning28, LLMs are likely to promote further changes to educational methods and the need to reconceptualize assessment.

In being unable to account for patient context in its formulation, there is a risk that GenAI may reduce complex patient experiences to linear problem-solving interventions, promising “solutionism,” and risking objectifying patients, based on potentially-biased learning of patient populations that are “most studied” or “most prevalent in the literature. Recalculating patient illness experiences into solution-based computational terms risks ignoring the benefits of dialog and the complex and often unpredictable patient experiences29. Algorithmic and extractive AI technologies may excel at diagnostic components of consultations that are akin to data reduction and categorization tasks. By contrast, developing a treatment plan is context-dependent, imbued with uncertainty and more nuanced, a scenario more suited to GenAI capabilities. Uncertainty in its many guises cannot be avoided in medical education and clinical practice. In synthesizing large bodies of knowledge, GenAI may obfuscate or overstate uncertainty, and learners will need to develop skills to understand not only their own but also technological reactions30. Although GenAI and LLMs, as adaptive educational systems, may improve the efficiency and interactivity of the learning experience, their unknown impact on learners’ attention and other cognitive and metacognitive abilities need to be considered31.

Here we highlight the need to acknowledge both the strengths and limitations of GenAI in medical education. Educators and learners are well-advised to consider the implications of the output of GenAI will be un-scaffolded and not peremptorily verified. This presents a unique facet to the constructivist and experiential approaches that underpin much medical education pedagogy.

Assessment and GenAI

Despite the benefits of LLMs being able to synthesize and personalize information for learners3, there is much consternation with respect to the use of LLMs to subvert current assessment processes32. Using GenAI in this manner (when explicitly disallowed in the task description) is an academic dishonesty. A more vexing concern is whether learners will become overly or completely reliant on these technologies and what can be gained or sacrificed using GenAI as an educator. There is a potential risk of denying learners the formative experiences and important skills such as critical thinking necessary in the journey from novice to “expert.”

GenAI has progressed to the point where LLMs can pass licensure examinations in many undergraduate33 and postgraduate specialty training programs34,35. However, ongoing challenges to medical education from LLMs include ensuring the accuracy and contemporaneousness of information, reducing bias36, ensuring accountability28, minimizing learner over-reliance, preventing patient privacy exposure, safeguarding data security, enhancing the cultivation of empathy, and maintaining academic integrity37. With existing and potential changes for learners and educators, it remains important to consider what place GenAI has within broader educational aims and pedagogy in the context of its potential, limitations, and boundaries. We need to consider what makes educators and learners unique and whether GenAI can support or supplant this in working towards the goal of creating competent and empathetic doctors. Empirical studies evaluating the use of GenAI and LLMs in medical education and their efficacy in developing competencies in health professional training are scarce. These studies have focused on the use of GenAI for learning support38 or automated assessments of clinical skills, but there has been limited use of theory or conceptual frameworks39.

Curriculum and assessment redesign encompassing future-focused competencies recognizes that new skills will be required for novel models of care. Learners should be proficient in understanding the origins and development of technologies that they will be using in their clinical work, in research, and in continuing learning and professional development. New areas of technical competence will be essential for learners to work in AI-integrated healthcare environments to deliver patient care, communicate with other health professionals, and effectively manage large amounts of population-wide data that will become increasingly available40. It is impossible to address the potential impacts of GenAI use in medical education without acknowledging the intersection with clinician training and clinical care.

Emerging roles of GenAI in medical education

For learners

Artificial intelligence is likely to impact medical education methods by producing intelligent and personalized systems to identify and respond to gaps in students’ knowledge, adaptable virtual facilitators in constructivist learning approaches, mining data, and providing intelligent feedback to learners3,41. GenAI not only delivers content but also enables adaptive learning, provides information and feedback, creates individualized learning pathways, supports competency-based assessment, and potentially provides and manages programmatic assessment data42. For learners, this individualized learning can be customizable in depth, tone, and style of output, making it an ideal personalized teaching assistant28.

Students can benefit from improved practical skills43, robust selection processes and research assistance44. Recent research on the medical student perspective suggests that GenAI is good at facilitating differential diagnosis brainstorming, providing interactive practice cases, and aiding in multiple-choice question review25. LLMs can be used to create interactive and engaging simulations. For example, students may use LLMs to have conversations with simulated patients, allowing them to practice taking patient histories or assessing diagnoses and discussing treatment plans28.

For educators

From an educator’s perspective, LLMs may help shape medical curriculum development and engender changes in teaching methodologies41. It has been demonstrated that GenAI with expert human guidance can also produce assessment items for medical examinations45. Human-developed questions still retain a higher discriminatory power46. This is potentially due to human assessors being more adept at generating items with higher construct validity and also being more closely aligned with a priori knowledge such as lecture material. GenAI may help reduce the administrative burden on educators42, with help in assessment and attendance tracking3. Analogous to “precision medicine,” educators can foster “precision education” by leveraging data to individualize training and assessment. Data can inform the strategic deployment of educational resources and strengthen the link between practice and education, and educators can advocate the development of appropriate tools42.

Traditional assessment methodologies are increasingly at risk of obsolescence12, necessitating a paradigm shift towards assessment modalities that are more resistant to unapproved GenAI assistance, such as continuous in-person assessment of practical or clinical skills, and oral examinations47. A contrasting and perhaps more realistic view accepts that students are more likely to use GenAI and will be working in healthcare environments that have been transformed by the integration of GenAI. Assessment may need to be better at evaluating whether students can use GenAI with a complete understanding of its strengths and limitations, demonstrating its effective and safe use in their own learning and in patient care. The reliance on traditional written tasks, which are susceptible to completion by GenAI without genuine student engagement or learning, underscores the urgency for educators to redesign assessments41. Educators may need to rethink and re-define “authenticity” and “originality” in assessment that incorporates the use of GenAI48,49. Competency frameworks need to be reconsidered and updated to consider 21st century realities. The abilities that students and future clinicians require to adequately meet patients’ healthcare needs will be impacted by AI-enabled systems50. There remains a need to improve the digital literacy of future physicians while incorporating patients’ views with the increasing use of GenAI technologies50. This highlights the need for new assessment strategies which permit authenticity of the learner’s voice, discourage over-reliance on GenAI for completion and which prepare the future workforce for workplaces where they will need to navigate GenAI and technology competently.

Impacts on the educator–learner relationship

Learners have expressed a lack of confidence in being able to inform others about the features and risks of GenAI applications due to a lack of formal instruction about the use of such programs51. Unsurprisingly, there is a demand for structured GenAI training, particularly in terms of reducing medical errors and ethical issues51. One apparent deficiency of GenAI and LLMs identified by medical students was the reduction in the humanistic aspect of medicine. The nature of learning is likely to evolve with the introduction of GenAI in education, as well as the roles of educators, what is demanded of them, and the relationships they have with students. It may well be that there is greater emphasis on reinforcing human skills, communication, empathy, professionalism, and contextualizing and individualizing treatment strategies for patients. The opportunities, challenges, and considerations for GenAI are summarized in Table 1.

Table 1 Key opportunities, challenges, and pedagogical considerations of GenAI and large language models

Trustworthiness and the intersection of GenAI use in medical education and clinical practice

Underlying some concerns about GenAI is perhaps the belief that seeking autonomous input from GenAI will necessarily result in nefarious outcomes52. In healthcare settings, one factor delaying the translation of GenAI and its potential benefits to patient care and education is whether learners, educators, clinicians, and patients would trust it. The patient’s voice regarding their needs and expectations is not always fully considered in the application of GenAI in healthcare53. Studies of patient perceptions about the use of GenAI in healthcare have concluded that most are comfortable with its involvement but would prefer the final plans and management to be approved and delivered by humans. Trusting the decision-making capacity of the clinician is based on the pre-existing trust that patients have with their physicians54. The mistrust of GenAI is perhaps secondary to its inability to explain its rationale55 and decisions, that are not fully transparent42. This is complicated by the understanding that GenAI technology has inherent biases that join human biases in shaping the diagnostic process, in a potentially non-neural manner56. Where GenAI and LLMs perform with some degree of autonomy, from an ethical perspective, this gives them a moral agency that needs to be accounted for57. Blind acceptance of AI decisions is another potential source of mistrust, where GenAI output replaces, rather than augments, human decision making. A greater understanding of the “permissible” ways in which GenAI could augment human processes may help learners, users, and patients to ensure that the use of GenAI remains responsible.

When decisions are subjective or the variables change, human judgment is trusted more because of people’s capacity for empathy. Even when GenAI systems outperform human doctors, trust in GenAI does not increase58. This mistrust is even greater where factors affecting diagnoses may be behavioral, and case-specific, such as mental health59. Another driver of consumer resistance to medical AI is the phenomenon of “uniqueness neglect.” This indicates that GenAI systems are less able than human providers to account for consumers’ unique characteristics and circumstances, and drive consumer resistance to medical AI60. The human ability to combine contextual awareness with knowledge leads to the perception of superiority in planning, managing, and achieving favorable results59. To this end, GenAI and LLMs should retain an assistive role in clinical encounters, and medical education needs to adapt to ensure that future doctors are prepared for an GenAI-assisted work environment to preserve doctor-patient relationships61. GenAI and LLMs use quantifiable datasets, and there is a risk that patients themselves are reduced to data points, neglecting their experiences and individual context. Patients and healthcare professionals should consider this and encourage patient empowerment by expressing their individual circumstances62. Medical training should prepare doctors to operate dynamically in being able to adapt to a range of technologically-enabled, or not, environments. There will be patients who may lack digital literacy with whom the nature of interaction would differ to those who operate in more AI-enabled environments and who may have consulted independently with GenAI technologies to understand their medical conditions. The digitally-literate physician will be able to navigate and address the diverse needs of all patient groups.

A further consequence of using GenAI as an adjunct to a healthcare provider’s work, is that administrative tasks may be made less onerous for clinicians, reducing burnout, and allowing for greater time and connection with the humanistic side of medicine. The counterargument is that there exists a potentially increased burden with higher throughput of patient consultations and increased cognitive load with monitoring and correcting the output of GenAI processes. Democratization of patient care could also be achieved by providing patients with access to their information in a timely and comprehensible fashion63. Considering the doctor-patient-AI “triad” relationship represents a paradigm shift and calls for further research to better understand how GenAI influences the doctor-patient relationship and respective autonomies, to ensure that ethical practice remains present64. Technological mediation may inhibit the development of trust in a doctor-patient relationship, and this too, would benefit from further research and understanding. As a mediator placed between the doctor and patient, GenAI systems can inhibit tacit understanding of the patient’s health and well-being and encourage both clinicians and patients to discuss health solely in measurable quantities or machine interpretable terms65.

Often, models are developed without input from the people who will ultimately use them, namely students, practitioners, and patients. GenAI models also have no intrinsic ability to use context or meaning to inform output and decisions, which is problematic because context critically determines the quality of outcomes for patients52. This contextual awareness is likely to improve with newer generations of GenAI, but it will be critical for any underlying bias within the material that the GenAI has “learned” from to be either eliminated or mitigated, as this will inform the technology’s capacity to make inferences about the patient context. A codesigned approach that considers which tasks are more efficient with GenAI elements, which should be learner and student-led, is likely to be more productive66.

The U.S. Department of Health and Human Services has identified six principles of trustworthy AI67, including LLMs being robust and reliable, fair and impartial, transparent and explainable, responsible and accountable, safe and secure. and ensuring privacy and consent. However, it is unclear what would make GenAI trustworthy in clinical practice and without a clear understanding, the development of effective implementation strategies will be impaired in the healthcare setting68 GenAI and LLMs are likely to continue to develop in ways that benefit particular groups (especially commercial), but without a high level of trustworthiness, they are unlikely to be acceptable to all aspects of health professions. Evaluating and ensuring the presence of these underlying “foundations” of the trustworthiness of GenAI technologies by health professionals, possibly as part of the responsibility for self-regulation, may be required to shape GenAI development in equitable and acceptable ways. These are considerations that those involved in medical education. particularly learners and educators, need to heed and develop personal approaches to. Further research including all stakeholders, especially patients, learners, and educators, into the foundations of trustworthiness and how these features in future AI-enabled workspaces will be critical.

Is regulation necessary?

Given the speed and unpredictability of innovation, quantum of investment, and lack of technical information, it is almost impossible to forecast the opportunities and risks of GenAI accurately. LLMs raise questions about the opportunities and risks of widespread adoption; scope and adequacy of national strategic planning and policies; fitness of legal and regulatory approaches, and implications of increasing geopolitical competition and geo-specific regulations8. Regulation needs to be defined within the context of this review.

  1. (i)

    With regard to the medical device functionality of GenAI in clinical work, a legal definition of regulation is appropriate where it represents rules, or directives, designed to control and govern conduct. Oversight would be the domain of government departments responsible for the implementation and use of therapeutic goods and devices.

  2. (ii)

    Regarding medical education, regulation refers to accreditation and validation, that is, formal processes to ensure that standards for quality and competency are met. Oversight would be local and context-dependent.

There are several challenges associated with attempts to regulate technology. The perceived risks of harm are tempered by social norms, market pressure, and the coding architecture (design, structure, and organization of the codebase). Adapting formal regulation may be one element for ensuring safe and ethical GenAI use. A stepped approach to GenAI regulation recognizes that a new technology does not necessarily imply the need for new rules. Where there are risks from the use of GenAI that warrant some form of regulation, identifying which component or process requires regulation will be important and the codesign of any framework with all stakeholders will be critical69. Existing legal frameworks may address and mitigate some risks of patient-facing GenAI use69 however specific contexts for GenAI and LLM use will require may require specific regulatory attention.

Regulation and GenAI use in medical education

Preserving academic integrity

Detecting the misuse of LLMs for plagiarism where there is no augmentation of learner abilities44 remains challenging given the lack of transparency, from both GenAI programs and the algorithms used by detection tools. The ability of GenAI to pass high-stakes examinations70 highlights an issue with reliance on “single-shot” examinations and their inherent difficulties with generalizability71, being limited assessments of knowledge. Programmatic assessment72, which focuses on a wide variety of assessment tasks, including workplace-based assessments, is potentially more resistant to the unauthorized use of GenAI. GenAI can help organize the wealth of performance evidence that accompanies programmatic assessment, visualizing and interpreting it in a manner that informs future learning and identifying signals in performance evidence that would steer additional diagnostic assessments or learning experiences42. There will be an onus placed on educators to rethink how the utility of GenAI can be maximized73 while mitigating concerns about its potential misuse.

GenAI, students, and clinicians are likely to have an interdependent relationship. Bearman and Ajjawi provided a framework to work with, and not fear, “black boxes”27. Orienting students to quality standards and providing meaningful interactions with GenAI systems would (i) permit an understanding of the social regulating boundaries around GenAI (ii) promote learner interactions with GenAI while building evaluative judgment in weighing GenAI’s contribution and (iii) encouraging understanding of the evaluative, ethical, and practical necessities of working with “black boxes”27. Just as learners will use GenAI to an increasing degree, it will continue to rely on high-quality input from users74, including students and clinicians. Initial training with GenAI is unlikely to be sufficient as a standalone endeavor and additional training is likely to be required as the field evolves. By establishing frameworks for its adoption and education early, this process becomes more feasible in the future.

As with any source material, understanding the nature of veracity and applicability of information to their own learning and eventually patient care needs to be emphasized. Learners should continuously critique and question GenAI-generated outputs in biomedical knowledge and the pathophysiology of the disease13,75. This would prevent GenAI-generated information from acting as an automated crutch for clinical decision-making76, which could hamper the development of clinical reasoning abilities77.

Specific concerns regarding GenAI use in medical education include algorithmic bias, overreliance, plagiarism, misinformation, inequity, privacy, and copyright concerns9,41. Many practical guidelines regarding the regulation of GenAI agree on factors that require regulatory oversight including: transparency, bias, content validity, data protection, excessive (and non-consensual) data collection, data ownership, informed consent, ensuring that users remain empowered and establishing accountability9,78. Where information is drawn from by GenAI programs is also of importance with respect to intellectual property and copyright protection.

Regulation and GenAI use in clinical practice

If GenAI and LLMs are used in clinical settings, there is ambiguity regarding the responsibility for medical diagnoses, whether it is a GenAI or a healthcare professional. Calls have been made for users of GenAI to be guided by ethical principles, which practically and legally may involve reforming the categories of medical malpractice, vicarious liability, and product liability, as well as the ancillary duties healthcare providers62. Many of these recommendations fall under the rubric of “soft law,” presenting self-regulating obligations and codes of conduct that are not legally enforceable but are considered “good practice.” With the introduction of GenAI systems, there is potentially an argument for some aspects, such as the duty to warn of limitations and obtain informed consent, to be reallocated to “hard law,” becoming legal obligations related to disclosure of information.

Although regulations regarding therapeutic goods and devices focus mostly on patient safety, they do not necessarily guarantee it79. Not all AI tools with regulatory authorization are not necessarily clinically validated80 and if GenAI is implemented poorly, it may add to doctors’ burden of responsibility and potentially expose doctors to the risks of poor decision-making. Alternatively, GenAI implemented with a responsible design, informed by cognitive science, would allow doctors to offload many of their cognitive tasks to GenAI when appropriate and focus their attention on patients52. Responsible GenAI requires the development of legal frameworks to protect patients and consumers from potential harm arising from poorly developed GenAI and inappropriate deployment in socio-technical systems. Most importantly, patients and consumers have the right to be informed about the limitations of GenAI to allow them to decide which aspects of their lives could benefit from it52 and the choice to opt-out of systems employing GenAI.

As previously discussed, the use of GenAI during medical training may result in inadequate development of critical thinking and clinical reasoning skills, which may threaten patient safety several years later, as the learner starts to take on greater responsibility for patient care. On the other hand, training of the future generation without adequate recognition of the role of GenAI in their future practice and the new competencies that are required is likely to result in graduates who are underprepared for their clinical roles which may ultimately adopt such technologies. Clinicians are likely going to need to understand and keep pace with patient use of GenAI as well.

GenAI currently operates in a regulatory framework that is patchwork, at best. One call for legislation is based on human-rights, with concerns for emerging harms from GenAI centered on privacy, algorithmic discrimination, automation bias, misinformation, and disinformation81. Legislation does exist to regulate GenAI usage in specific settings or circumstances; however, many gaps still exist.

Regulation, applied with the intent of supporting safe innovation, may also to some degree, incur human and economic opportunity costs in also potentially restricting progress and innovation. This reinforces the overall message that regulation, in whichever form it is present, needs to be with purpose and should assure educators, learners, medical professionals, and patients that LLMs can be used without causing harm or compromising data or privacy82.

Levels of regulation

Consideration of the different types of, and levels at which regulation may apply will inform how individuals, institutions, accrediting bodies, national governments, and global organizations manage the establishment of the acceptable use of GenAI in medical education to ensure safe and ethical practice. The ecological framework allows the consideration of regulatory principles at the micro, meso, and macro levels and has been used to synthesize and unify existing learning theories to model the roles of artificial intelligence in promoting learning processes83. The ecological framework not only identifies increasingly broader levels of influence but also considers the relationships across different levels. Any regulatory effort is unlikely to succeed without all levels interacting to some degree. At this nascent stage however, the most readily-applicable action is likely to be at the micro level, that of the individual learner and the educator. This framework is summarized in Fig. 2.

Fig. 2
figure 2

A depiction of regulatory levels for GenAI and LLMs within the ecological framework. The micro-level consists of learners and educators, the meso-level of institutions, industry bodies and accrediting organizations and the macro-level of national and international organizations. There is intercalation of the regulatory concerns, strategies and frameworks across these different levels.

Micro: individual learners and educators

Regulation at the micro level, including individual learners and educators, would predominantly involve degrees of self-regulation. Regulatory responsibility for the use of GenAI in medical education will likely need to focus on developing robust strategies to counter or address opacity and inexplicability, data privacy and security, fairness and bias, reliability84, protection of intellectual property, assurance of quality control and standardization, informed consent, data ownership, over-reliance on GenAI models, and continuous monitoring and validation82. Educators and learners should be encouraged to develop personal and morally-informed strategies, akin to a personal code of conduct, to manage these issues and be ready to state how these have been addressed, or not, when using GenAI. Ideally, learners will be empowered to increase their knowledge and skills to use a range of emerging digital health systems, analyze the data emanating from them, and evaluate information for trustworthiness and relevance47. This would not only ensure that students are adept at leveraging GenAI in their future careers but also emphasize the importance of critical thinking and maintaining integrity and professional standards in their work47, despite the convenience of readily-generated information14.

Meso: institutions and accrediting bodies

At this level, there is the intersection of regulatory processes governing therapeutic goods and devices as well as educational accreditation. Professional health education curricula will need to evolve to include comprehensive teaching on the ethical and appropriate use of GenAI9 and critical appraisal of information created with it. Institution-level approaches to GenAI may be retrofitted to existing national guidelines85. Similarly, institutional policies may be developed based on guidelines that have been developed. It would be the responsibility of tertiary educational institutions and professional colleges overseeing pre-vocational and postgraduate vocational training to develop frameworks appropriate to their accreditation and validation processes. Individual professional organizations such as the Royal Australian College of General Practitioners, have also developed evolving position statements to guide clinicians86. The latter position statement outlines various concerns and issues and makes legally non-binding recommendations but calls on general practitioners to be cognizant of technological advances and their ethical and clinical implications, calling for individual responsibility with GenAI use. These reminders should be reinforced at the medical student and learner levels to encourage forward thinking about the ethical challenges that GenAI systems will pose. Enabling this self-reflection would emphasize on faculty development and educator’s upskilling to safely and productively engage with GenAI87.

The Australian Health Practitioner Regulation Agency, responsible for clinician accreditation in Australia, reminded practitioners to consider their professional obligations when using GenAI in practice, particularly with respect to accountability, understanding, transparency, and informed consent88. Some recommendations have called for self-regulation at an industry level, with a codesign process between stakeholders, developers and users encouraging transparency and potentially increasing public trust89,90.

Macro: national and international organizations

Most LLMs have been released globally. Ideally, a global approach from regulators is required; however, proactive regulation is impossible with the proverbial cat being already out of the bag. Broader regulation at the national and international macro level is challenging and likely lags significantly behind GenAI research and development. The first international convention was recently signed by the Council of Europe91.The Bletchley Declaration, signed by 28 countries and the European Union, at an AI Safety Summit, establishes a shared understanding of the opportunities and risks posed by frontier artificial intelligence. The aim of this declaration was to promote increased transparency by private actors developing frontier AI capabilities, appropriate evaluation metrics, tools for safety testing, and developing relevant public sector capability and scientific research while acknowledging that approaches would “differ” with respect to applicable legal frameworks92. A similar declaration from the United Nations93 has cited concerns about human rights infringements and inequity with GenAI technology, but apart from development of an independent international scientific panel, intergovernmental and multi-stakeholder policy dialog tacitly acknowledges the difficulties in enforcement. Some regulatory approaches include risk-based approaches, where compliance obligations are proportionate to the level of risk, medical or otherwise, posed by the use of GenAI technology. These include sector-agnostic and sector-specific rules and regulations, depending on a particular sector’s use of GenAI; and policy alignment, incorporating GenAI-related rule making within existing frameworks for cybersecurity, data privacy, and intellectual property protection.

National governments have recognized that there is low public trust in GenAI systems which can in turn slow adoption and public acceptance. The risk-based approach seeks, through greater testing, transparency, and oversight, to pre-emptively mitigate potential negative impacts from GenAI and LLMs that could be difficult or impossible to reverse94. GenAI systems are being developed and deployed at a speed and scale that will outpace the capacity of the legislative frameworks. A map of potential GenAI risks may need to be developed to be answered by future GenAI regulations to ensure that it can account for and handle new risks, potential. and actual alike95. Government-level organizations are calling on those developing and deploying GenAI in high-risk contexts to take their own proactive steps to ensure user and consumer safety94.

Other national approaches have included mooting the legal protection of human rights in the USA96, national AI strategies in the UK97, Hong Kong98, white papers in Japan99 and voluntary standards in Australia100.

Legal regulatory approaches for therapeutic devices are required to account for the unique differences in the development and distribution of LLMs compared to other existing medical technologies. To safeguard patient care, Mesko and Topol suggested that a regulatory body only has to design regulations for LLMs if either the developers of LLMs claim that their LLM can be used for medical purposes, or if LLMs are developed for, adapted, modified, or directed toward specific medical purposes82. Such adaptation or use of LLMs for medical purposes may not always be explicitly stated, or even intended, by developers. Even if the currently widespread LLMs do not fall into either category, further iterations of the medical alternatives of LLMs specifically trained on medical data and databases will probably occur. A participatory approach to AI governance, informed by the micro and end-user levels will be more effective than overarching top-down regulations.

There is little by way of international oversight governing the use of GenAI in medical education. An initial advancement of the meso-level approaches would see institutions and accrediting bodies collaborating and adopting shared strategies at a national level. It remains to be seen if global governance would be necessary, or even feasible, in medical education.

Underpinning regulatory concerns is an understandable focus on patient safety, privacy, transparency, and ongoing trust in the healthcare profession. However, this safety cannot be guaranteed if learners, the future workforce, are deficient in clinical reasoning and critical thinking skills because of, or when operating within, GenAI-integrated environments. Accountability lies with the end-user of any GenAI technology, as it would with any therapeutic good or device, and navigating the challenges that GenAI represent is an important learning skill. A broader view of “regulation” with participation from all stakeholders, will help ensure that accrediting bodies, education providers and students will understand and consider how GenAI and LLMs affect learning, development of knowledge and skills and attainment of competency in practice.

Conclusion

The intersection of the role of GenAI in medical education and clinical use hinges on issues of governance and regulation. Currently, GenAI is unlikely to become fully autonomous and unreservedly accepted by the wider medical community and by patients due to issues of trustworthiness and the as-yet unknown impacts on the doctor-patient relationship, despite promised gains in efficiency and personalization of outcomes. Its use and inputs need to be constantly moderated and updated by humas to ensure the veracity and utility of its output, retain its generative capacity and prevent model collapse. The implications of this in medical education should be considered in the context of the learning process, authentic assessment, and the preservation of academic integrity. Regulation, in its different guises, applied thoughtfully at different levels, will guide users towards safe, appropriate, and equitable use of these technologies. In place of blind acceptance, a balanced and considered collaboration between humans, GenAI, and governance will permit advancements in learning possibilities and efficiencies without over-regulation stifling innovation and progress.