Menu Close

GPT stands for Generative Pre-trained Transformer (GPT), a type of language model that uses deep learning to generate human-like, conversational text. GPT-4 is the newest version of OpenAI’s language model systems. Its previous version, GPT 3.5, powered the company’s wildly popular ChatGPT chatbot when it launched in November of 2022.

Last month we reported that GPT-4, the new multimodal deep learning model from OpenAI, has passed the Uniform Bar Exam, demonstrating an enormous leap for machine learning and proving that an artificial intelligence program can perform complex legal tasks on par with or better than humans, according to a new paper co-authored by Daniel Martin Katz, professor of law at Illinois Institute of Technology’s Chicago-Kent College of Law.

And now Google has announced the latest version of Med-PaLM a large language model (LLM) designed to provide high quality answers to medical questions.

Med-PaLM harnesses the power of Google’s large language models, which have been aligned to the medical domain with a set of carefully-curated medical expert demonstrations.

Developing AI that can answer medical questions accurately has been a long-standing challenge with several research advances over the past few decades. While the topic is broad, answering US Medical License Exam (USMLE)-style questions has recently emerged as a popular benchmark for evaluating medical question answering performance.

Google’s first version of Med-PaLM, preprinted in late 2022, was the first AI system to surpass the pass mark on US Medical License Exam (USMLE) questions. Med-PaLM also generates accurate, helpful long-form answers to consumer health questions, as judged by panels of physicians and users.

Google introduced its latest model, Med-PaLM 2, at its annual health event The Check Up. Med-PaLM 2 achieves an accuracy of 85.4% on USMLE questions. This performance is on par with “expert” test takers, and is an 18% leap over its own state of the art results from Med-PaLM.

Google assessed Med-PaLM and Med-PaLM 2 against a benchmark it called ‘MultiMedQA’, which combines seven question answering datasets spanning professional medical exams, medical research, and consumer queries.

Med-PaLM was the first AI system to obtain a passing score on USMLE questions from the MedQA dataset, with an accuracy of 67.4%. Med-PaLM 2 improves on this further with state of the art performance of 85.4%, matching expert test-takers.

Google will soon release a preprint for Med-PaLM 2. In the coming months, Med-PaLM 2 will also be made available to a select group of Google Cloud customers for limited testing, to explore use cases and share feedback, as we investigate safe, responsible, and meaningful ways to use this technology.