Image Processing + Machine Learning in R: Denoising Dirty Documents Tutorial Series

Colin Priest finished 2nd in the Denoising Dirty Documents playground competition on Kaggle. He blogged about his experience in an excellent tutorial series that walks through a number of image processing and machine learning approaches to cleaning up noisy images of text. The series starts with linear regression, but quickly moves on the GBMs, CNNs, and deep neural networks. You’ll learn techniques like adaptive thresholding, canny edge detection, and applying median filter functions along the way. You’ll also use stacking, engineer a key feature, and create a strong final ensemble with the different models you’ve created throughout the series. Sample image from the Denoising Dirty Documents training set You can review the key learning from the series below and follow the header links to the full tutorial installment on Colin’s blog. Tutorials 1-6…


Link to Full Article: Image Processing + Machine Learning in R: Denoising Dirty Documents Tutorial Series