Loading…

Comparing accuracy assessments to infer superiority of image classification methods

The z-test based on the Kappa statistic is commonly used to infer superiority of one map production method over another. Typically the same reference data set is used to calculate and next compare the Kappa's of the two maps. This data structure easily leads to dependence between the two error-...

Full description

Saved in:

Bibliographic Details
Published in:	International journal of remote sensing 2006-01, Vol.27 (1), p.223-232
Main Authors:	de Leeuw, J., Jia, H., Yang, L., Liu, X., Schmidt, K., Skidmore, A. K.
Format:	Article
Language:	English
Subjects:	Animal, plant and microbial ecology Applied geophysics Biological and medical sciences Earth sciences Earth, ocean, space Exact sciences and technology Fundamental and applied biological sciences. Psychology General aspects. Techniques Internal geophysics system Teledetection and vegetation maps
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The z-test based on the Kappa statistic is commonly used to infer superiority of one map production method over another. Typically the same reference data set is used to calculate and next compare the Kappa's of the two maps. This data structure easily leads to dependence between the two error-matrices. This may result in overly large variance estimates and too conservative inference about the difference in accuracy between the two methods. Tests considering the dependency between the error matrices would be more sensitive in such case. In this article we compare the performance of two such tests, a randomization and McNemar's test, with the traditional z-test. We compared 16 alternative methods to classify salt marsh vegetation in The Netherlands. The error matrices were positively associated in all 120 possible comparisons of pairs of classification methods. This suggests that dependency between pairs of error matrices used in classifier comparison is a common phenomenon. Both the randomization and McNemar test gave lower p values and rejected the null hypothesis of equal performance more frequently than the z-test. We therefore recommend considering their use.
ISSN:	0143-1161 1366-5901
DOI:	10.1080/01431160500275762