Yahoo releases 13.5TB of data to help researchers

Dive Brief:Machine learning is growing in popularity, but the average company can find it difficult to get the large data sets needed to test machine learning programs. In response, Yahoo released the “largest ever” data set to machine learning scientists. The data comes from interactions with the company’s news feeds, including Yahoo News, Sports, Finance, Movies and Real Estate. Dive Insight: Computer scientists require large data sets to guide and test machine learning systems. To help aid in this effort, Yahoo released 110 billion records comprising 13.5TB of data for public use yesterday. The data is now available for download through Yahoo Labs’ Webscope data sharing program, and is more than ten times the size of the largest previously released dataset, Yahoo said. “Data is the life-blood of research in machine learning,”…


Link to Full Article: Yahoo releases 13.5TB of data to help researchers