Friday, 5 May 2017 Yige Zhao
What Is Sentiment Analysis
Sentiment Analysis is the means of applying natural language processing methods and determining subjective information in source text.
In text analysis, sentiment is the attitude or opinion expressed towards something. Sentiment can be positive, negative or neutral.
Why Would We Want to Do This
Emotion and psychology influence trading and investment decisions, causing people to behave in an unpredictable or irrational way.
Meanwhile, the glut of data makes reading everything an impossible task.
So we need sentiment analysis to:
- Extract more information
- Automate the analysis of unstructured content
- Speed up the understanding
- Limit the noise
Process of Real-time Sentiment Analysis
The process of real-time sentiment analysis can be roughly divided into the following four steps:
3.Sentiment Score Process
1.Topic Classification on Apache Spark
We consider five different news topics: Economics, Legal, Politics, Security, and Non.
Non topic consists of all other topics, such as Health, Technology, and Sports. Naive Bayes Algorithm from Apache Spark’s MLlib is used to train and predict news topic.
The step outlines:
- Extract articles titles and contents
- Tokenize the texts and remove non-alphabet characters and stop words
- Split the articles into training and test set
- Calculate tf-idf matrix on training set
- Train Naive Bayes Algorithm with training set
- Classify the test article and measure the result performance
2. Named Entity Recognition (NER)
Normally, a reader need to know the following two questions from a piece of news:
What’s objective that the news is talking about? For example, Apple or Facebook?
In general, is it bad or good?
The technology of Named Entity Recognition (NER) is for answering the first question: What’s objective that the news is talking about? For example, Apple or Facebook?
More specifically, quickly determining which item in the text maps to proper names, such as people or places.
For InfoTrie, we need to go further to determine which company is involved in the news.
We decouple the task into two parts:
Use the popular community package like nltk and Stanford NER to narrow down the searching space.
Search for the company name using our own company synonym database.
After NER process, the news will be documented under the identified company name for delivery or further analysis.
Sometimes, one news mentioned several companies. In this scenario, relevance measure is conducted. The relevance measure considers the location of a term in the text. For example, intuitively, one news may be more relevant to a company when the name of the company occurs in the title.
3. Sentiment Score Process
To know whether a news is bad or good to a company, a common way is to search for the emotional states such as “angry,” “sad,” and “happy.” and count on the occurrence of these states.
Our method: In our case, we first collect a library of these emotional states specialized in
In our case, we first collect a library of these emotional states specialized in the financial community.
Next, we count on all the words that both in the library and text.
Then normalize the counting result for both positive and negative words to [0, 10], where score 0 means that all words are negative and score 10 means that all are positive.
Advantages of our method:
These scores can be treated as a quantitative measure of sentiment that can be used to compare between companies and time.
Finally, both NER and sentiment scoring process are completed on the distributed computational clusters so that the analyzing result can be delivered and documented in real-time.
Practical Application: Real-time Analytics in Trading Business
Let’s see a practical application: Real-time analytics in trading business.
Data Feed Engine
Traders usually need to make mass of trading decisions based on multiple dimensions of information like news, financial analysis reports, real-time quotes and so on.
Real-time Analytics Engine
With the help of the real-time analytics, the latency of the pre-decision process can be largely improved to the range from milliseconds to a few seconds once the business event has occurred.
Last but not least, an alert will send to the trader and wait for his or her final trigger. Traders become the strategy creators and decision makers instead of data collectors and processors.
Professional Product for Sentiment Analysis
Since Sentiment Analysis is so important, is there any professional product which has following features to do it?
A Large Number of Users
Ultra High Processing Speed
FinSentS is a cutting edge Sentiment Analysis and News Analytics engine.
FinSentS web Dashboard indexes in real-time, in a way similar to what Google or Bing does for business news, blogs and social media feed. It scans thousands of websites, blogs, and business news publications in real-time.
KEY FEATURES of FinSentS
Take two minutes to register, save two hours every day! Get started!