NetSpam a Network-based Spam Detection Framework for Reviews in Online Social Media

1croreprojects@gmail.com

ABSTRACT

 

Nowadays, a big part of people rely on available content in social media in their decision for example, reviews and feedback on a topic or product. The possibility that anybody can leave a review provides a golden opportunity for spammers to write spam reviews about products and services for different interests. Identifying these spammers and the spam content is a hot topic of research and although a considerable number of studies have been done recently toward this end, but so far the methodologies put forth still barely detect spam reviews, and none of them show the importance of each extracted feature type. In this study, we propose a novel framework, named Net Spam, which utilizes spam features for modeling review datasets as heterogeneous information networks to map spam detection procedure into a classification problem in such networks. Using the importance of spam features help us to obtain better results in terms of different metrics experimented on real-world review datasets from Yelp and Amazon websites. The results show that Net Spam outperforms the existing methods and among four categories of features; including review-behavioral, user-behavioral, review linguistic, user-linguistic, the first type of features performs better than the other categories.

 

Click here:-  B.Tech Project Centers in Chennai

 

EXISTING SYSTEM:

 

The results show that NetSpam outperforms the existing methods and among four categories of features; including review-behavioral, use behavioral, review linguistic, user linguistic, the first type of features performs better than the other categories.

Despite this great deal of efforts, many aspects have been missed or remained unsolved. One of them is a classifier that can calculate feature weights that show each feature’s level of importance in determining spam reviews. The general concept of our proposed framework is to model a given review dataset as a Heterogeneous Information Network (HIN) and to map the problem of spam detection into a HIN classification problem. In particular, we model review dataset as a HIN in which reviews are connected through different node types. The general concept of our proposed framework is to model a given review dataset as a Heterogeneous Information Network and to map the problem of spam detection into a HIN classification problem. In particular, we model review dataset as in which reviews are connected through different node types. A weighting algorithm is then employed to calculate each feature’s importance. These weights are utilized to calculate the final labels for reviews using both unsupervised and supervised approaches.

 

DISADVANTAGE:

 

  • This utilizes spam features for modeling review datasets as heterogeneous information networks to map spam detection procedure into a classification problem in such networks.
  • Time Complexity.

 

PROPOSED SYSTEM:

 

  • NetSpam is able to find features importance even without ground truth, and only by relying on metapath definition and based on values calculated for each review. NetSpam improves the accuracy compared to the stateof- the art in terms of time complexity, which highly depends to the number of features used to identify a spam review; hence, using features with more weights will resulted in detecting fake reviews easier with less time complexity.
  • A new Content Based Algorithm for spam features is proposed to determine the relative importance of each feature and shows how effective each of features are in identifying spams from normal reviews.

 

ADVANTAGE:

 

To identify spam and spammers  as well as different type of analysis on this topic. Written reviews also help service providers to enhance the quality of their products and services.

 

ALGORITHM:

 

1. Content Based Algorithm(To filter positive and negative reviews )

 

FUTURE WORK:

 

For future work, meta path concept can be applied to other problems in this field. For example, similar framework can be used to find spammer communities.

 

REFERENCES

 

[1] J. Donfro, A whopping 20 % of yelp reviews are fake. http://www.businessinsider.com/20-percent-of-yelp-reviews-fake-2013 9.Accessed: 2015-07-30.

[2] M. Ott, C. Cardie, and J. T. Hancock. Estimating the prevalence of deception in online review communities. In ACM WWW, 2012.

[3] M. Ott, Y. Choi, C. Cardie, and J. T. Hancock. Finding deceptive opinion spam by any stretch of the imagination.In ACL, 2011.

[4] Ch. Xu and J. Zhang. Combating product review spam campaigns via multiple heterogeneous pairwise features. In SIAM International Conference on Data Mining, 2014.

[5] N. Jindal and B. Liu. Opinion spam and analysis. In WSDM, 2008.

[6] F. Li, M. Huang, Y. Yang, and X. Zhu. Learning to identify review spam. Proceedings of the 22nd International Joint Conference on Artificial Intelligence; IJCAI, 2011.

[7] G. Fei, A. Mukherjee, B. Liu, M. Hsu, M. Castellanos, and R. Ghosh. Exploiting burstiness in reviews for review spammer detection. In ICWSM,2013.

[8] A. j. Minnich, N. Chavoshi, A. Mueen, S. Luan, and M. Faloutsos. Trueview: Harnessing the power of multiple review sites. In ACM WWW, 2015.

[9] B. Viswanath, M. Ahmad Bashir, M. Crovella, S. Guah, K. P. Gummadi, B. Krishnamurthy, and A. Mislove. Towards detecting anomalous user behavior in online social networks. In USENIX, 2014.



Leave a comment

*

*

*