Sentiment Embeddings with Applications to Sentiment Analysis1croreprojects@gmail.com
We propose learning sentiment-specific word embeddings dubbed sentiment embeddings in this paper. Existing word embedding learning algorithms typically only use the contexts of words but ignore the sentiment of texts.
It is problematic for sentiment analysis because the words with similar contexts but opposite sentiment polarity, such as good and bad, are mapped to neighbouring word vectors.
We address this issue by encoding sentiment information of texts (e.g., sentences and words) together with contexts of words in sentiment embeddings.
By combining context and sentiment level evidences, the nearest neighbours in sentiment embedding space are semantically similar and it favours words with the same sentiment polarity.
In order to learn sentiment embeddings effectively, we develop a number of neural networks with tailoring loss functions, and collect massive texts automatically with sentiment signals like emoticons as the training data.
Sentiment embeddings can be naturally used as word features for a variety of sentiment analysis tasks without feature engineering.
We apply sentiment embeddings to word-level sentiment analysis, sentence level sentiment classification, and building sentiment lexicons.
Existing embedding learning approaches are mostly on the basis of distributional hypothesis, which states that the representations of words are reflected by their contexts.
As a result, words with similar grammatical usages and semantic meanings, such as “hotel” and “motel”, are mapped into neighbouring vectors in the embedding space.
Since word embeddings capture semantic similarities between words, they have been leveraged as inputs or extra word features for a variety of natural language processing tasks.
Collobert and Weston train word embeddings with a ranking-type hinge loss function by replacing the middle word within a window with a randomly selected one.
Mikolov et al. introduce continuous bag-of-words (CBOW) and continuous skip-gram, and release the popular word2vec3 toolkit.
CBOW model predicts the current word based on the embeddings of its context words, and Skip-gram model predicts surrounding words given the embeddings of current word.
Mnih and Kavukcuoglu accelerate the embedding learning procedure with noise contrastive estimation
DISADVANTAGES OF EXISTING SYSTEM
The most serious problem of context-based embedding learning algorithms is that they only model the contexts of words but ignore the sentiment information of text.
As a result, words with opposite polarity, such as good and bad, are mapped into close vectors in the embedding space.
Existing word embedding learning algorithms typically only use the contexts of words but ignore the sentiment of texts.
In this paper, we propose learning sentiment-specific word embeddings dubbed sentiment embeddings for sentiment analysis.
We retain the effectiveness of word contexts and exploit sentiment of texts for learning more powerful continuous word representations.
By capturing both context and sentiment level evidences, the nearest neighbors in the embedding space are not only semantically similar but also favor to have the same sentiment polarity, so that it is able to separate good and bad to opposite ends of the spectrum.
We learn sentiment embeddings from tweets, leveraging positive and negative emoticons as pseudo sentiment labels of sentences without manual annotations.
We obtain lexical level sentiment supervision from Urban Dictionary based on a small list of sentiment seeds with minor manual annotation.
We propose learning sentiment embeddings that encode sentiment of texts in continuous word representation.
ADVANTAGES OF PROPOSED SYSTEM
We evaluate the effectiveness of sentiment embeddings empirically by applying them to three sentiment analysis tasks.
Word level sentiment analysis on benchmark sentiment lexicons can help us see whether sentiment embeddings are useful to discover similarities between sentiment words.
Sentence level sentiment classification on tweets and reviews help us understand whether sentiment embeddings are helpful in capturing discriminative features for predict the sentiment of text.
Building sentiment lexicon is useful for measuring the extent to which sentiment embeddings improve lexical level tasks that need to find similarities between words.
Experimental results show that sentiment embeddings consistently outperform context-based word embeddings, and yields state-of- the-art performances on several benchmark datasets of these tasks.
Duyu Tang, Furu Wei, Bing Qin, Nan Yang, Ting Liu, and Ming Zhou, “Sentiment Embeddings with Applications to Sentiment Analysis”, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 28, NO. 2, FEBRUARY 2016.