Bond Investors Must Harness Big Data’s Brave New World


Mike Rierson

Page 1 of 2

Welcome to the world of big data. As of the middle of last year, every minute Internet users sent more than 204 million e-mails, ran 4 million searches on Google and submitted more than 275,000 new tweets, according to Domo, a software company based in American Fork, Utah (see chart). In aggregate, IBM estimates that each day the world creates more than 2.5 quintillion bytes of new data.

This rapid growth of often previously unavailable sets of data on a massive scale has led to the development of a new field of study at the intersection of statistics and computer science. D ata science offers new tools for extracting predictive insights from enormous sets of data that had been unsuited to classic statistical models and techniques. Over the past several years, data science algorithms have become so ingrained into our daily lives, we often don’t even realize it. When we use a search engine, translate a web page to a new language at the click of a button or when our e-mail software chucks out yet another piece of spam, we are witnessing such machine-learning techniques.

Still, the cardinal sin of any statistical modeler is overfitting the data: testing and retesting model specifications to get better and better predictive power that ultimately fails when used on real-world and out-of-sample data. Researchers use a variety of techniques to avoid this danger, such as disciplined use of testing and holdout samples; a focus on predictive variables with clear sensibility, that is, being able to explain why it works; and, especially, an emphasis on simpler models with fewer free parameters to be tuned to the data in the first place. Essentially, those who manipulate data restrict their attention to models with a few variables and simple linear relations between inputs and forecasts to reduce the temptations of unfettered data mining.

The cost of these restrictions is that we lose the ability to identify some of the more subtle predictive features of these new techniques that may provide trading insights. Machine learning provides a way around that limitation, by effectively letting the data “speak” to uncover the nonlinear, or dynamically evolving, relationships across a broader set of potentially predictive variables. These techniques allow for guardrails that limit the complexity of models to their ability to forecast out of sample. Moreover, traditional statistical techniques are best suited to data sets that are organized so that there are a fixed number of fields for each observation. Machine-learning techniques, by contrast, can be applied to more unstructured data sets, like large bodies of text. Examples of unstructured data include news articles, press releases, blogs and tweets.

Source: Bond Investors Must Harness Big Data’s Brave New World

Via: Google Alert for ML