Sentiment analysis
Encyclopedia
Sentiment analysis or opinion mining refers to the application of natural language processing
Natural language processing
Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....

, computational linguistics
Computational linguistics
Computational linguistics is an interdisciplinary field dealing with the statistical or rule-based modeling of natural language from a computational perspective....

, and text analytics
Text analytics
The term text analytics describes a set of linguistic, statistical, and machine learning techniques that model and structure the information content of textual sources for business intelligence, exploratory data analysis, research, or investigation. The term is roughly synonymous with text mining;...

 to identify and extract subjective information in source materials.

Generally speaking, sentiment analysis aims to determine the attitude of a speaker or a writer with respect to some topic or the overall contextual polarity of a document. The attitude may be his or her judgement or evaluation (see appraisal theory
Appraisal theory
Appraisal theory is the idea that emotions are extracted from our evaluations of events that cause specific reactions in different people. Essentially, our appraisal of a situation causes an emotional, or affective, response that is going to be based on that appraisal. An example of this is going...

), affective state (that is to say, the emotional state of the author when writing), or the intended emotional communication (that is to say, the emotional effect the author wishes to have on the reader).

Subtasks

A basic task in sentiment analysis is classifying the polarity of a given text at the document, sentence, or feature/aspect level — whether the expressed opinion in a document, a sentence or an entity feature/aspect is positive, negative, or neutral. Advanced, "beyond polarity" sentiment classification looks, for instance, at emotional states such as "angry," "sad," and "happy."

Early work in that area includes Turney and Pang
who applied different methods for detecting the polarity of product reviews and movie reviews respectively. This work is at the document level. One can also classify a document's polarity on a multi-way scale, which was attempted by
Pang
and Snyder
(among others): expanded the basic task of classifying a movie review as either positive or negative to predicting star ratings on either a 3 or a 4 star scale, while Snyder performed an in-depth analysis of restaurant reviews, predicting ratings for various aspects of the given restaurant, such as the food and atmosphere (on a five-star scale).

A different method for determining sentiment is the use of a scaling system whereby words commonly associated with having a negative, neutral or positive sentiment with them are given an associated number on a -5 to +5 scale (most negative up to most positive) and when a piece of unstructured text is analyzed using natural language processing
Natural language processing
Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....

, the subsequent concepts are analyzed for an understanding of these words and how they relate to the concept . Each concept is then given a score based on the way sentiment words relate to the concept, and their associated score. This allows movement to a more sophisticated understanding of sentiment based on an 11 point scale. Alternatively, texts can be given a positive and negative sentiment strength score if the goal is to determine the sentiment in a text rather than the overall polarity and strength of the text
.

Another research direction is subjectivity/objectivity identification. This task is commonly

defined as classifying a given text (usually a sentence) into one of two classes: objective or subjective. This problem can sometimes be more difficult than polarity classification
the subjectivity of words and phrases may depend on their context and an objective document may contain subjective sentences (e.g., a news article quoting people's opinions). Moreover, as mentioned by Su

,
results are largely dependent on the definition of subjectivity used when annotating texts. However, Pang
showed that removing objective sentences from a document before classifying its polarity helped improve performance.

The more fine-grained analysis model is called the feature/aspect-based sentiment analysis
.
It refers to determining the opinions or sentiments expressed on different features or aspects of entities, e.g., of a cell phone, a digital camera, or a bank. A feature or aspect is an attribute or component of an entity, e.g., the screen of a cell phone, or the picture quality of a camera. This problem involves several sub-problems, e.g., identifying relevant entities, extracting their features/aspects, and determining whether an opinion expressed on each feature/aspect is positive, negative or neutral
.
More detailed discussions about this level of sentiment analysis can be found in Liu's NLP Handbook chapter, "Sentiment Analysis and Subjectivity"
.

Methods

Computers can perform automated sentiment analysis of digital texts, using elements from machine learning
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...

 such as latent semantic analysis
Latent semantic analysis
Latent semantic analysis is a technique in natural language processing, in particular in vectorial semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. LSA assumes that words that are close...

, support vector machines, "bag of words" and Semantic Orientation — Pointwise Mutual Information (See Peter Turney's work in this area). More sophisticated methods try to detect the holder of a sentiment (i.e. the person who maintains that affective state) and the target (i.e. the named entity or target whose affective state one is interested in). To mine the opinion in context and get the feature which has been opinionated, the grammatical relationships of words are used. Grammatical dependency relations are obtained by deep parsing of the text.

In sentic computing
Sentic computing
Sentic computing is a multi-disciplinary approach to opinion mining and sentiment analysis at the crossroads between affective computing and common sense computing, which exploits both computer and social sciences to better recognize, interpret and process opinions and sentiments over the Web.In...

, a multi-disciplinary approach to opinion mining and sentiment analysis, text processing is not based on statistical learning models but rather on common sense reasoning tools and affective ontologies. Differently from statistical classification, which generally requires large inputs and thus cannot appraise texts with satisfactory granularity, sentic computing enables the analysis of documents not only on the page- or paragraph-level but also on the sentence- and clause-level.

Open source software tools deploy machine learning
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...

, statistics, and natural language processing techniques to automate sentiment analysis on large collections of texts, including web pages, online news, internet discussion groups, online reviews, web blogs, and social media.

Evaluation

The accuracy of a sentiment analysis system is, in principle, how well it agrees with human judgments. This is usually measured by precision and recall
Precision and recall
In pattern recognition and information retrieval, precision is the fraction of retrieved instances that are relevant, while recall is the fraction of relevant instances that are retrieved. Both precision and recall are therefore based on an understanding and measure of relevance...

. However, human raters typically agree about 70% of the time (see Inter-rater reliability
Inter-rater reliability
In statistics, inter-rater reliability, inter-rater agreement, or concordance is the degree of agreement among raters. It gives a score of how much homogeneity, or consensus, there is in the ratings given by judges. It is useful in refining the tools given to human judges, for example by...

). Thus, a 70% accurate program is doing as well as humans, even though such accuracy may not sound impressive. If a program were "right" 100% of the time, humans would still disagree with it about 30% of the time, since they disagree that much about any answer. More sophisticated measures can be applied, but evaluation of sentiment analysis systems remains a complex matter. For sentiment analysis tasks returning a scale rather than a binary judgement, correlation is a better measure than precision because it takes into account how close the predicted value is to the target value.

Sentiment analysis was used to test the relationship between Internet financial message boards and the behavior of the stock market to find a strong correlation between posts and volume of stock .

Sentiment analysis and Web 2.0

The rise of social media
Social media
The term Social Media refers to the use of web-based and mobile technologies to turn communication into an interactive dialogue. Andreas Kaplan and Michael Haenlein define social media as "a group of Internet-based applications that build on the ideological and technological foundations of Web 2.0,...

 such as blogs and social networks has fueled interest in sentiment analysis. With the proliferation of reviews, ratings, recommendations and other forms of online expression, online opinion has turned into a kind of virtual currency for businesses looking to market their products, identify new opportunities and manage their reputations. As businesses look to automate the process of filtering out the noise, understanding the conversations, identifying the relevant content and actioning it appropriately, many are now looking to the field of sentiment analysis. If web 2.0
Web 2.0
The term Web 2.0 is associated with web applications that facilitate participatory information sharing, interoperability, user-centered design, and collaboration on the World Wide Web...

 was all about democratizing publishing, then the next stage of the web may well be based on democratizing data mining
Data mining
Data mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...

 of all the content that is getting published.

One step towards this aim is accomplished in research. Several research teams in universities around the world currently focus on understanding the dynamics of sentiment in e-communities
Virtual community
A virtual community is a social network of individuals who interact through specific media, potentially crossing geographical and political boundaries in order to pursue mutual interests or goals...

 through sentiment analysis. The CyberEmotions project
CyberEmotions
CyberEmotions is a large-scale integrating project funded by the European Commission under the Seventh Framework Programme in FET ICT domain theme 3: ‘Science of complex systems for socially intelligent ICT’...

, for instance, recently identified the role of negative emotion
Emotion
Emotion is a complex psychophysiological experience of an individual's state of mind as interacting with biochemical and environmental influences. In humans, emotion fundamentally involves "physiological arousal, expressive behaviors, and conscious experience." Emotion is associated with mood,...

s in driving social networks discussions. Sentiment analysis could therefore help understand why certain e-communities die or fade away (e.g., MySpace
MySpace
Myspace is a social networking service owned by Specific Media LLC and pop star Justin Timberlake. Myspace launched in August 2003 and is headquartered in Beverly Hills, California. In August 2011, Myspace had 33.1 million unique U.S. visitors....

) while others seem to grow without limits (e.g., Facebook
Facebook
Facebook is a social networking service and website launched in February 2004, operated and privately owned by Facebook, Inc. , Facebook has more than 800 million active users. Users must register before using the site, after which they may create a personal profile, add other users as...

).

Sentiment analysis (together with opinion mining) is becoming a promising topic
in the field of CRM 2.0 as well. As a direct consequence of the concept of Web 2.0,
CRM 2.0 refers to all CRM solutions where the customer engages with
the products/services provided by the enterprise. This way, customer profiling
becomes more effective and enterprises can move towards one-to-one marketing.
In this perspective, social media are an important source of information for
enterprises: the word-of-mouth effect can be highly positive or highly negative,
as far as prospect customers' sentiment towards brands and products.
Thus, it is clear that sentiment analysis and opinion mining will shortly become
a key component of modern and more innovative CRM solutions
http://www.crm-trends.com/.

The problem is that most sentiment analysis algorithms use simple terms to express sentiment about a product or service. However, cultural factors, linguistic nuances and differing contexts make it extremely difficult to turn a string of written text into a simple pro or con sentiment. The fact that humans often disagree on the sentiment of text illustrates how big a task it is for computers to get this right. The shorter the string of text, the harder it becomes.

Sentiment analysis on the web has also been the subject of art. Artist Jonathan Harris' We Feel Fine project is an example of the depiction of emotions across the blogosphere, which uses many of the same techniques involved in the commercial application of sentiment analysis.

Further reading


The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK