Code snippets
Data pre-processing: missing data, outliers, scaling, transformation
- Data read & pre-processing (Using pandas; a code)
- Data scaling (a sklearn tutorial; UCI Wine Dataset available through sklearn.dataset)
- Compare the effect of different scalers on data with outliers (sklearn tutorial; California housing dataset with very different scales & some attributes have large outliers). Different scaling approaches: standard scaling, min-max scaling, max-abs scaling, robust scaling using quantile, power transform Yeo-Johnson, power transformation (Box-Cox), quantile transformation (gaussian pdf), quantile transformation (uniform pdf), sample-wise L2 normalizing.
- Data transformation: FFT (a code) (read an online article)
Association rule analysis
Clustering
Classification
- Comparison of KNN, Logistic Regression, and SVM on the credit card fraud dataset