To determine the limitations of your data, be sure to:
- Verify all the variables you’ll use in your model.
- Assess the scope of the data, especially over time, so your model can avoid the seasonality trap.
- Check for missing values, identify them, and assess their impact on the overall analysis.
- Watch out for extreme values (outliers) and decide on whether to include them in the analysis.
- Confirm that the pool of training and test data is large enough.
- Make sure data type (integers, decimal values, or characters, and so forth) is correct and set the upper and lower bounds of possible values.
- Pay extra attention to data integration when your data comes from multiple sources.
- Choose a relevant dataset that is representative of the whole population.
- Choose the right parameters for your analysis.
- Any values missing from the data.
- Any inconsistencies and/or errors existing in the data.
- Any duplicates or outliers in the data.
- Any normalization or other transformation of the data.
- Any derived data needed for the analysis.dummies. (2019). The Limitations of the Data in Predictive Analytics - dummies. [online] Available at: https://www.dummies.com/programming/big-data/data-science/the-limitations-of-the-data-in-predictive-analytics/ [Accessed 18 Jan. 2019].
No comments:
Post a Comment