Code snippets

Data pre-processing: missing data, outliers, scaling, transformation

  • Data read & pre-processing (Using pandas; a code)
  • Data scaling (a sklearn tutorial; UCI Wine Dataset available through sklearn.dataset)
  • Compare the effect of different scalers on data with outliers (sklearn tutorial; California housing dataset with very different scales & some attributes have large outliers). Different scaling approaches: standard scaling, min-max scaling, max-abs scaling, robust scaling using quantile, power transform Yeo-Johnson, power transformation (Box-Cox), quantile transformation (gaussian pdf), quantile transformation (uniform pdf), sample-wise L2 normalizing.
  • Data transformation: FFT (a code) (read an online article)

Association rule analysis

Clustering

Classification

  • Comparison of KNN, Logistic Regression, and SVM on the credit card fraud dataset

Evaluations