Loading…

Crowd-sourcing and automation facilitated the identification and classification of randomized controlled trials in a living review

To evaluate an approach using automation and crowdsourcing to identify and classify randomized controlled trials (RCTs) for rheumatoid arthritis (RA) in a living systematic review (LSR). Records from a database search for RCTs in RA were screened first by machine learning and Cochrane Crowd to exclu...

Full description

Saved in:
Bibliographic Details
Published in:Journal of clinical epidemiology 2023-12, Vol.164, p.1-8
Main Authors: Kamso, Mohammed Mujaab, Pardo, Jordi Pardo, Whittle, Samuel L., Buchbinder, Rachelle, Wells, George, Glennon, Vanessa, Tugwell, Peter, Deardon, Rob, Sajobi, Tolulope, Tomlinson, George, Jesse, Elliott, Kelly, Shannon, E., Hazlewood, Glen S.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:To evaluate an approach using automation and crowdsourcing to identify and classify randomized controlled trials (RCTs) for rheumatoid arthritis (RA) in a living systematic review (LSR). Records from a database search for RCTs in RA were screened first by machine learning and Cochrane Crowd to exclude non-RCTs, then by trainee reviewers using a Population, Intervention, Comparison and Outcome (PICO) annotator platform to assess eligibility and classify the trial to the appropriate review. Disagreements were resolved by experts using a custom online tool. We evaluated the efficiency gains, sensitivity, accuracy and interrater agreement (kappa scores) between reviewers. From 42,452 records, machine learning and Cochrane Crowd excluded 28,777 (68%), trainee reviewers excluded 4,529 (11%), and experts excluded 7200 (17%). The 1,946 records eligible for our LSR represented 220 RCTs, and included 148/149 (99.3%) of known eligible trials from prior reviews. Although excluded from our LSRs, 6,420 records were classified as other RCTs in RA to inform future reviews. False negative rates amongst trainees were highest for the RCT domain (12%), although only 1.1% of these were for the primary record. Kappa scores for two reviewers ranged from moderate to substantial agreement (0.40 to 0.69). A screening approach combining machine learning, crowdsourcing, and trainee participation substantially reduced screening burden for expert reviewers and was highly sensitive.
ISSN:0895-4356
1878-5921
DOI:10.1016/j.jclinepi.2023.10.007