Tuesday, 28 July 2015
Fernando Pratama InfoTrie, Singapore
This report explores methods to create time series of country risk. In particular, United States is chosen. Four risk topics are discussed; economic risk, legal risk, security risk, and political risk.
Firstly, the keywords of each risk topics need to be extracted. Labeled US articles from Reuters and guardian are extracted in order to generate keywords. Term weighting tf-idf is then applied to each article set for all topics. The top 200 words with the highest weighting on each topic are chosen to be the keywords. Here the top 10 words for each risk topics.
These keywords are then used to query articles from InfoTrie website. TThese10000 articles for each topic downloaded for this report. The data acquired is then pre-processed into time series of articles with sentiments. Here is the summary of the economic time series articles.
One of the way to measure the topic risk is to perform sentiment analysis on each topic where low sentiment implies high risk and vice versa. Past papers have conducted experiment on relationship between article sentiment and stock price (1), (2). Since there is already infotrie’s sentiment indicator, we can use it to create time series of sentiment to estimate country risk. The sentiment itself is ranging from 1 to 10. Here are the time series plots on average daily sentiment for each topic:
Another approach is to consider certain keywords that relate to certain risk event. Firstly, The articles are grouped based on date and converted into document term matrix with tf-idf weighting. The sum of weight in all documents for each term becomes the term value in each day.
Here are the examples for “greek” and “bailout” terms in economic time series and “nuclear” in politic time series.
Note that terms which do not appear in certain days result in disconnected lines in the time series graphs.