Apache Spark: The Future of Big Data Science?

Apache Spark is the go-to tool for Data Science at scale. It is an open source, distributed compute platform which is the first tool in the Data Science toolbox which is built specifically with Data Science in mind. In this blog, I want to talk about why I think Spark is the future of Data Science at scale and why Capgemini are supporting the Spark London Meetup Group. We all know that data volumes are growing at an alarming rate and in order to get the best value out of these datasets business need to be able to analyse the full breadth and depth of this data. Traditionally this has been achieved with the various NoSQL datastores like Hadoop, MongoDb, ElasticSearch and countless others. What has been lacking is the…


Link to Full Article: Apache Spark: The Future of Big Data Science?