Don’t Be a Big Data Snooper

(Ken Cook/Shutterstock) One of the biggest challenges that data scientists face is separating true predictors from false ones. When an airtight causal model can’t be created, data scientists often look to a secondary class of models based on correlations to accurately predict outcome. However, when using these models, great care must be taken to avoid falling victim to the data snooping bias. Data snooping is essentially the practice of finding patterns in data that don’t actually reflect the real world. Data scientists may know it by other names, like overfitting the curve or confusing the noise for the signal. The simple definition makes it sound like data snooping would be fairly easy to avoid. However, because of the way the human brain works and how it’s wired to spot connections in…


Link to Full Article: Don’t Be a Big Data Snooper