Resource List: Machine Learning, ODPi, Deduping With Scala, OCR and More…

ODPi for Hadoop Standards: The ODPi + ASF to consolidate Hadoop and all the versions.   Too many custom distributions with various versions of the 20 or so tools that make up Apache Big Data.   To be able to move between HDP, CDH, IBM, Pivotal and MapR seemless would be awesome.  For now HDP, Pivotal and IBM are part of the ODPi. Structured Data: Connecting Modern Relational Database and Hadoop is always an architectural challenge that requires decisions, EnterpriseDB (Postgresql) has an interesting article on that.   It let’s you read HDFS/Hive tables from EDB with SQL.  (Github) Semistructured Data: Using Apache NIFI with Tesseract for OCR:   HP and Google have been fine-tuning Tesseract for awhile to handle OCR.   Using dataflow technology from the NSA, you can automate OCR tasks on…


Link to Full Article: Resource List: Machine Learning, ODPi, Deduping With Scala, OCR and More…