Majority Voting and Pairing with Multiple Noisy Labeling

1croreprojects@gmail.com

ABSTRACT:

This paper proposes strategies of utilizing these multiple labels for supervised learning, based on two basic ideas: majority voting and pairing. We show several interesting results based on our experiments. (i) The strategies based on the majority voting idea work well under the situation where the certainty level is high. (ii) On the contrary, the pairing strategies are more preferable under the situation where the certainty level is low. (iii) Among the majority voting strategies, soft majority voting can reduce the bias and roughness, and perform better than majority voting. (iv) Pairing can completely avoid the bias by having both sides (potentially correct and incorrect/noisy information) considered. Beta estimation is applied to reduce the impact of the noise in pairing. Our experimental results show that pairing with Beta estimation always performs well under different certainty levels. (v) All strategies investigated are labeling quality agnostic strategies for real-world applications, and some of them perform better than or at least very close to the gnostic strategies.

 

Click Here:-  IEEE 2018 UG Projects Chennai

 

OBJECTIVE:

 

This paper addresses five strategies of utilizing multiple labels based on two basic ideas: majority voting and pairing. Three are based on the first idea, and two based on the second idea. Note that the five strategies proposed do not use the labeling qualities as parameters inside, since it is normally true the labeling qualities are unavailable in real-world applications.

 

SCOPE OF THE PROJECT:

 

The experimental results also show that some of them perform better than or at least very close to the labeling quality gnostic strategies.

 

EXISTING SYSTEM:

 

THERE are various costs associated with the preprocessing stage of the KDD process, including costs of acquiring features, formulating data, cleaning data, obtaining expert labeling of data, and so on. For example, in order to build a model to recognize whether two products described on two web pages are the same, one must extract the product information from the pages, formulate features for comparing the two along relevant dimensions, and label product pairs as identical or not.

 

DISADVANTAGES:

 

This process involves costly manual intervention at several points.

To build a model that recognizes whether an image contains an object of interest, one first needs to take pictures in appropriate contexts, sometimes at substantial cost.

 

PROPOSED SYSTEM:

 

Repeated-labeling is a tool that should be considered whenever labeling might be noisy, and can be repeated. With repeated labeling, we can have multiple noisy labels available for each object. Thus, it is interesting and necessary to study how we can utilize the multiple noisy labels.

This paper proposed five strategies of utilizing multiple noisy labels based on two basic ideas: majority voting and pairing. The first idea is simple and straightforward. It is also the approach people intend to apply. However, it could not avoid the bias of labeling, especially when only a few labels are available.

 

ADVANTAGES:

 

The soft majority voting strategies can reduce the bias and roughness to some extent.

The second idea can completely avoid the bias by having both sides (potentially correct and incorrect/noisy information) considered.

For this idea, it is very important to reduce the impact of the noisy information. Comparing to Paired-Freq, which keeps the noise completely, Paired-Beta can reduce the impact of the noisy information.

 

REFERENCES

 

[1] P. D. Turney, “Types of cost in inductive concept learning,” in Proceedings of the ICML-2000 Workshop on Cost-Sensitive Learning, 2000, pp. 15–21.

[2] G. M. Weiss and F. J. Provost, “Learning when training data are costly: The effect of class distribution on tree induction,” Journal of Artificial Intelligence Research, vol. 19, pp. 315–354, 2003.

[3] B. Frnay and M. Verleysen, “Classification in the presence of label noise: A survey,” IEEE Transactions Neural Networks and Learning Systems, vol. 25, no. 5, pp. 845–869, 2014.

[4] V. S. Sheng, F. Provost, and P. G. Ipeirotis, “Get another label? Improving data quality and data mining using multiple, noisy labelers,” in Proceedings of the Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2008), 2008, pp. 614–622.

[5] P. G. Ipeirotis, F. Provost, V. S. Sheng, and J. Wang, “Repeated labeling using multiple noisy labelers.” 2010, working Paper, Available at SSRN: http://ssrn.com/abstract=1688193.



Leave a comment

*

*

*