Heteroskedastic and Imbalanced Deep Learning with Adaptive Regularization

K. Cao, Y. Chen, J. Lu, N. Arechiga, A. Gaidon, T. Ma

Published in ICLR 2021 - January 2021

Links: arxiv, openreview, bibtex

Abtract

Real-world large-scale datasets are heteroskedastic and imbalanced – labels have varying levels of uncertainty and label distributions are long-tailed. Heteroskedasticity and imbalance challenge deep learning algorithms due to the difficulty of distinguishing among mislabeled, ambiguous, and rare examples. Addressing heteroskedasticity and imbalance simultaneously is under-explored. We propose a data-dependent regularization technique for heteroskedastic datasets that regularizes different regions of the input space differently. Inspired by the theoretical derivation of the optimal regularization strength in a one-dimensional nonparametric classification setting, our approach adaptively regularizes the data points in higher-uncertainty, lower-density regions more heavily. We test our method on several benchmark tasks, including a real-world heteroskedastic and imbalanced dataset, WebVision. Our experiments corroborate our theory and demonstrate a significant improvement over other methods in noise-robust deep learning.

Bibtex

@inproceedings{cao2021heteroskedastic,
    title={Heteroskedastic and Imbalanced Deep Learning with Adaptive Regularization},
    author={Kaidi Cao and Yining Chen and Junwei Lu and Nikos Arechiga
        and Adrien Gaidon and Tengyu Ma},
    booktitle={ICLR},
    year={2021},
}