Lustre to DAOS: Machine Learning on Intel’s Platform

May 23, 2016 Nicole Hemsoth Training a machine learning algorithm to accurately solve complex problems requires large amounts of data. The previous article discussed how scalable distributed parallel computing using a high-performance communications fabric like Intel Omni-Path Architecture (Intel OPA) is an essential part of what makes the training of deep learning on large complex datasets tractable in both the data center and within the cloud. Preparing large unstructured data sets for machine learning can be as intensive a task as the training process – especially for the file-system and storage subsystem(s). Starting (and restarting) big data training jobs using tens of thousands of clients also make severe demands on the file-system. The Lustre* file-system, which is part of the Intel Scalable System Framework (Intel SSF), is the current de…


Link to Full Article: Lustre to DAOS: Machine Learning on Intel’s Platform