Chrome Extension
WeChat Mini Program
Use on ChatGLM

Fast Nonparametric Estimation of Class Proportions in the Positive-Unlabeled Classification Setting.

AAAI Conference on Artificial Intelligence(2020)

Northeastern Univ

Cited 35|Views18
Abstract
Estimating class proportions has emerged as an important direction in positive-unlabeled learning. Well-estimated class priors are key to accurate approximation of posterior distributions and are necessary for the recovery of true classification performance. While significant progress has been made in the past decade, there remains a need for accurate strategies that scale to big data. Motivated by this need, we propose an intuitive and fast nonparametric algorithm to estimate class proportions. Unlike any of the previous methods, our algorithm uses a sampling strategy to repeatedly (1) draw an example from the set of positives, (2) record the minimum distance to any of the unlabeled examples, and (3) remove the nearest unlabeled example. We show that the point of sharp increase in the recorded distances corresponds to the desired proportion of positives in the unlabeled set and train a deep neural network to identify that point. Our distance-based algorithm is evaluated on forty datasets and compared to all currently available methods. We provide evidence that this new approach results in the most accurate performance and can be readily used on large datasets.
More
Translated text
Key words
Robust Estimation,Principal Component Analysis,Robust Statistics,Relative Importance
PDF
Bibtex
AI Read Science
AI Summary
AI Summary is the key point extracted automatically understanding the full text of the paper, including the background, methods, results, conclusions, icons and other key content, so that you can get the outline of the paper at a glance.
Example
Background
Key content
Introduction
Methods
Results
Related work
Fund
Key content
  • Pretraining has recently greatly promoted the development of natural language processing (NLP)
  • We show that M6 outperforms the baselines in multimodal downstream tasks, and the large M6 with 10 parameters can reach a better performance
  • We propose a method called M6 that is able to process information of multiple modalities and perform both single-modal and cross-modal understanding and generation
  • The model is scaled to large model with 10 billion parameters with sophisticated deployment, and the 10 -parameter M6-large is the largest pretrained model in Chinese
  • Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities We will continue the pretraining of extremely large models by increasing data to explore the limit of its performance
Try using models to generate summary,it takes about 60s
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Related Papers

Partial Optimal Transport with Applications on Positive-Unlabeled Learning

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020 2020

被引用36

AUL is a Better Optimization Metric in PU Learning

Shangchuan Huang,Songtao Wang,Dan Li, Liwei Jiang
ICLR 2021 2021

被引用0

Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper

要点】:本文提出了一种基于距离的快速非参数算法,用于估计正类在未标记数据中的比例,该算法在准确性和扩展性方面优于现有方法。

方法】:算法通过抽样策略,从正例集中抽取样本,记录到未标记样本的最小距离,移除最近的未标记样本,并使用深度神经网络识别距离的急剧增加点来确定正类比例。

实验】:本文在四十个数据集上评估了该距离算法,并与现有方法进行了比较,实验结果表明新方法在准确性上领先,且适用于大规模数据集。