Loading…

The receiver operating characteristic curve accurately assesses imbalanced datasets

Many problems in biology require looking for a “needle in a haystack,” corresponding to a binary classification where there are a few positives within a much larger set of negatives, which is referred to as a class imbalance. The receiver operating characteristic (ROC) curve and the associated area...

Full description

Saved in:
Bibliographic Details
Published in:Patterns (New York, N.Y.) N.Y.), 2024-06, Vol.5 (6), p.100994, Article 100994
Main Authors: Richardson, Eve, Trevizani, Raphael, Greenbaum, Jason A., Carter, Hannah, Nielsen, Morten, Peters, Bjoern
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Many problems in biology require looking for a “needle in a haystack,” corresponding to a binary classification where there are a few positives within a much larger set of negatives, which is referred to as a class imbalance. The receiver operating characteristic (ROC) curve and the associated area under the curve (AUC) have been reported as ill-suited to evaluate prediction performance on imbalanced problems where there is more interest in performance on the positive minority class, while the precision-recall (PR) curve is preferable. We show via simulation and a real case study that this is a misinterpretation of the difference between the ROC and PR spaces, showing that the ROC curve is robust to class imbalance, while the PR curve is highly sensitive to class imbalance. Furthermore, we show that class imbalance cannot be easily disentangled from classifier performance measured via PR-AUC. [Display omitted] •For imbalanced datasets, recent papers report that the ROC-AUC is inflated•Simulated and real-world data show that the ROC-AUC is invariant to class imbalance•The effect of class imbalance on the PR-AUC cannot be trivially removed•Partial ROC-AUCs allow performance evaluation over the upper score ranges There is conflicting information about whether a popular metric, the area under the curve (AUC) of the receiver operating characteristic (ROC) curve, should be used when a dataset has many more negatives than positives, and one is only interested in performance on the positive instances. Many practitioners prefer a metric called the precision-recall (PR)-AUC and are taught that the ROC-AUC will give an “overly optimistic” estimate of model performance. We show that the ROC-AUC is only inflated by an imbalance in simulations where changing the imbalance changes the score distribution. By contrast, the PR-AUC changes drastically with class imbalance; furthermore, one cannot subtract or normalize the PR-AUC by class imbalance to obtain a corrected performance estimate. Our work encourages the adoption of the ROC-AUC in such cases, allowing for fairer comparisons of models across datasets with different imbalances and furthering the understanding of the relationship between the ROC and PR spaces. In datasets where there are many more negative than positive instances, it has become common wisdom that the ROC-AUC is inflated and the PR-AUC should be used. The authors show that this is a misunderstanding: the ROC-AUC is invariant to class imbalance when the
ISSN:2666-3899
2666-3899
DOI:10.1016/j.patter.2024.100994