Data Science Toolbox

ELI5

tags: #explainibility #interpretability #python

ELI5 is a Python library which allows to visualize and debug various Machine Learning models using unified API. It has built-in support for several ML frameworks and provides a way to explain black-box models.

SHAP

tags: #explainibility #interpretability #python

SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. Includes great (interactive) dashboards. Only used it on random forrests so far.

Lime

tags: #explainibility #interpretability #python

lime (Local Interpretable Model-agnostic Explanations) explain what classifiers are doing.

Interpretable Machine Learning - A Guide for Making Black Box Models Explainable.

tags: #explainibility #interpretability #book online version

UMAP

tags: #dimensionreduction Code, Paper Uniform Manifold Approximation and Projection (UMAP) is a dimension reduction technique that can be used for visualisation similarly to t-SNE, but also for general non-linear dimension reduction.

t-SNE

Article: Exploring t-SNE

Explaination of Random Forrest Feature Importance

tags: #randomforrests #featureimportance Article

dtreeviz

tags: #interpretability #randomforrests Code Article A python library for decision tree visualization and model interpretation.

HDBSCAN

tags: #visualisation #unsupervised Documentation The hdbscan library is a suite of tools to use unsupervised learning to find clusters, or dense regions, of a dataset.

Kalman Filters

Huber Loss

If MSE is too sensitive to outliers in your data and MAE not enough try Huber loss

Active Learning

tags: #data #labels If you have lots of unlabelled data but labelling is expensive, active learning can help find the best data to label.

Pandas Profiling

tags: #exploration #python #visualization pandas’ DataFrame.describe() on steroids.

Forecasting: Principles and Practice

tags: #book #timeseries #rlang #forecasting Book Mostly methods like ARIMA, including a chapter on #hierarchical time series

Time Series Classification Repository

tags: #timeseries #classification http://timeseriesclassification.com/

tsfresh

tags: #python #timeseries #features tsfresh alculates a large number of time series characteristics (features).

Featuretools

tags: python #features Featuretools automatically creates features from temporal and relational datasets (timeseries and relational data)

Reptile

Article tags: #fewshot #metalearning

Yellowbrick

tags: #vizualization Website Yellowbrick extends the Scikit-Learn API to make model selection and hyperparameter tuning easier. Under the hood, it’s using Matplotlib.

SMOGN

tags: #imbalanced #machinelearning SMOGN: Synthetic Minority Over-sampling for regression with Gaussian Noise