Large language models (LLMs) are making significant strides in assisting researchers with bias evaluation in clinical trials, potentially transforming the landscape of medical research. A recent study published in the Journal of Medical Internet Research explores the accuracy and efficiency of these AI tools in implementing the revised Risk-of-Bias tool (RoB2).
Enhancing Accuracy in Bias Evaluation
The study systematically analyzed 86 Cochrane reviews encompassing 1,399 randomized controlled trials (RCTs). By randomly selecting 46 RCTs, researchers compared the assessments made by LLMs against those conducted by experienced human reviewers. The findings revealed that LLMs achieved accuracy rates between 57.5% and 70% when compared to Cochrane Reviews, and up to 74.2% relative to human reviewers. Notably, when LLMs were guided by structured prompts and methodological reasoning, their accuracy in specific domains significantly improved.
Time Efficiency and Consistency
One of the standout benefits of using LLMs is the drastic reduction in time required for assessments. While human reviewers took an average of 31.5 minutes per trial, LLMs completed their evaluations in just 1.9 minutes. Additionally, LLMs demonstrated high consistency across multiple iterations, maintaining an average accuracy of 85.2%. These efficiencies suggest that AI can play a crucial role in streamlining the review process, allowing researchers to focus on more complex aspects of their studies.
• LLMs show promise in increasing the speed of RoB2 assessments by over 90%.
• Accuracy improves significantly when LLMs utilize structured prompts and detailed reasoning.
• Domains related to randomization and blinding present more challenges for both LLMs and human reviewers.
• Consistency in LLM performance suggests reliability in repeated assessments.
• The heterogeneous risk distribution in assignment-focused RCTs indicates areas for further refinement in AI models.
The incorporation of LLMs into the RoB2 assessment process not only enhances efficiency but also maintains a commendable level of accuracy. By handling the more routine aspects of bias evaluation, AI can supplement human expertise, ensuring thorough and timely reviews of clinical trials.
As the medical research community continues to adopt advanced technologies, the integration of AI tools like LLMs could become standard practice in systematic reviews. Future studies with larger datasets and improved prompting techniques are expected to further elevate the performance of these models, making them invaluable assets in the pursuit of unbiased and reliable clinical research outcomes.

This article has been prepared with the assistance of AI and reviewed by an editor. For more details, please refer to our Terms and Conditions. We do not accept any responsibility or liability for the accuracy, content, images, videos, licenses, completeness, legality, or reliability of the information contained in this article. If you have any complaints or copyright issues related to this article, kindly contact the author.