This blog post will cover analysis on 10 large capitalization companies that are in the Technology sector. The companies are Ebay, Hewlett-Packard, Cisco Systems, Qualcomm, Oracle, Microsoft, IBM, Google, Facebook and Apple.
Two data sets were involved in the analysis. The first was tabulated using only free, readily available web-based information and will hereby be referred to as the Ordinary data set. The second comprises free web-based information plus FinSentS’s supplement of Bloomberg’s premium equity content, which will be known as the Premium data set from here onwards. Both were analyzed using FinSentS’s algorithm, to determine volume, sentiment and buzz scores.
Each company was analyzed to determine if there was a difference in the predictive ability of the two data sets. The Independent variable, Price was, in turn, regressed with dependent variables (from both Ordinary and Premium) such as Buzz, Volume and periodic Sentiment data, lagged for 3, 7, 15 or 30 days. The significant variables were then regressed altogether as dependent variables with Price. We concluded that the higher the adjusted R2 value, the greater the ability of the data set to predict price variation.
Hypothesis: Premium data better enables an investor to predict the way the prices of stocks will change as compared to using Ordinary data set at the 95% confidence level.
The following assumptions were taken for this analysis:
- When regressing one independent and one dependent variable, linear regression is appropriate. In every scenario, plotting residuals of the error term against the actual term proved a linear relationship.
- When there was more than one dependent variable, to prevent the data from being over-fitted, multivariable regression was used.
In the next section, this blog post will show how the analysis was conducted on one of the ten companies, eBay. For analysis on the other nine, please download the white papers that are available on Ordinary.com.
Detailed Analysis – Ebay
The price was regressed with both Ordinary and Premium data on Volume, Buzz and periodic Sentiment, lagged for 3, 7, 15 and 30 days. Dependent variables that had an R2 values greater than 0.5% were deemed significant. In the end, all significant dependent variables were regressed again with the price to determine the adjusted R2 value.
For each pair, the linearity assumption was checked using a normal Q-Q plot of residuals, as seen from Figure 1 below.
Figure 1: Eg, checking linearity assumption for Price vs Sentiment
Which data set is a better predictor for Ebay?
In total, we tested 7 dependent variables against Price.
|Independent Variable||Dependent Variable||Premium (%)||Ordinary (%)|
|Price||Sentiment SMA 3 days||1.82||0.09|
|Price||Sentiment SMA 7 days||4.52||0.08|
|Price||Sentiment SMA 15 days||1.56||0.11|
|Price||Sentiment SMA 30 days||0.16||0.26|
|Price (best fit)|
Once we carried out linear regression on all variables against Price, we concluded that variables that explained at least 0.5% of variation would be deemed significant.
All significant variables, highlighted in red, then underwent a multivariable regression against Price. This left us with the final adjusted R2 percentage of 8.27% for Premium and 5.40% for Ordinary. We can, therefore, conclude, that for eBay, the Premium data set was a better predictor for Price.
Abstract results for all companies
The table below shows the results for all the 10 companies analyzed. From the table, it is clear that the Premium data set currently does a better job of explaining the variation of the change in the price of large cap technology companies, beating the Ordinary one 6 to 4. For more information and detailed analysis, please visit FinSentS.com.
|Total Adjusted R-squared values (%)|