Published 2022 | Version v2
Journal article

A Quantitative Analysis of Labeling Issues in the CelebA Dataset

Description

Description

Facial attribute prediction is a facial analysis task that describes images using natural language features. While many works have attempted to optimize prediction accuracy on CelebA, the largest and most widely used facial attribute dataset, few works have analyzed the accuracy of the dataset's attribute labels. In this paper, we seek to do just that. Despite the popularity of CelebA, we find through quantitative analysis that there are widespread inconsistencies and inaccuracies in its attribute labeling. We estimate that at least one third of all images have one or more incorrect labels, and reliable predictions are impossible for several attributes due to inconsistent labeling. Our results demonstrate that classifiers struggle with many CelebA attributes not because they are difficult to predict, but because they are poorly labeled. This indicates that the CelebA dataset is flawed as a facial analysis tool and may not be suitable as a generic evaluation benchmark for imbalanced classification.

Details

Title A Quantitative Analysis of Labeling Issues in the CelebA Dataset
Authors
  • Lingenfelter, B.
  • Davis, S. R.
  • Hand, E.M.
  • Publisher Springer, Cham
    Year of publication 2022