Understanding the Data in Data Science

The most time-consuming aspect of any data science project is the transformation of data to a format that an analyst can use to build models. This is more critical for parametric models, which assume known distributions in the data. However, even before you begin to transform the data, you need to understand it. What does it mean to “understand” data? The objectives of data understanding are: Understand the attributes of the data. Summarize the data by identifying key characteristics, such as data volume and total number of variables in the data. Understand the problems with the data, such as missing values, inaccuracies, and outliers. Visualize the data to validate the key characteristics of the data or unearth problems with the summary statistics. In this post, I will explain each of…


Link to Full Article: Understanding the Data in Data Science