Don’t sabotage your data science efforts with garbage

 Image: iStock/Devonyu The Kryptonite for any data scientist is low quality data. You could invent the cleverest algorithm the world has ever seen, but it would render useless when fed bad data. As they say, “Garbage in, garbage out.” I’m currently working with a large oil and gas company to improve the safety of their refineries, by helping them adopt a more risk-based inspection strategy. The optimal application of risk would be purely quantitative — use historical inspection data to identify high-risk areas that require more attention. This approach is being challenged due to the confidence some people have with the existing, historical inspection data. It’s a valid challenge that’s commonly faced by data professionals. To defend your data science, you must have good data quality techniques. 1: Clean sources…


Link to Full Article: Don’t sabotage your data science efforts with garbage