Yahoo’s Gigantic ‘Anonymized’ User Dataset Isn’t All That Anonymous

Yahoo Labs, the research wing of Yahoo, just released what the company is calling the “largest ever” machine learning dataset for artificial intelligence researchers to use in their work, for free. For example, to create a Facebook-like recommendation algorithm. In doing so, Yahoo also released information that could potentially be used by researchers who download the database—and anyone they share it with—to identify Yahoo customers. The behemoth dataset consists of 13.5 terabytes of user interactions with news items from some 20 million users, which the company says have been “anonymized.” While there are no names attached to the data, seven million users in the database also had information about their age, gender, the city they were in when they accessed the page, whether they used a mobile device or a…


Link to Full Article: Yahoo’s Gigantic ‘Anonymized’ User Dataset Isn’t All That Anonymous