Data Science Toolbox

ELI5

tags: #explainibility #interpretability #python ELI5 is a Python library which allows to visualize and debug various Machine Learning models using unified API. It has built-in support for several ML frameworks and provides a way to explain black-box models.

SHAP

tags: #explainibility #interpretability #python SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. Includes great (interactive) dashboards. Only used it on random forrests so far.

Lime

tags: #explainibility #interpretability #python lime (Local Interpretable Model-agnostic Explanations) explain what classifiers are doing.

Interpretable Machine Learning - A Guide for Making Black Box Models Explainable.

tags: #explainibility #interpretability #book online version

UMAP

tags: #dimensionreduction Code, Paper Uniform Manifold Approximation and Projection (UMAP) is a dimension reduction technique that can be used for visualisation similarly to t-SNE, but also for general non-linear dimension reduction.

t-SNE

Explaination of Random Forrest Feature Importance

tags: #randomforrests #featureimportance Article

dtreeviz

tags: #interpretability #randomforrests Code Article A python library for decision tree visualization and model interpretation.

HDBSCAN

tags: #visualisation #unsupervised Documentation The hdbscan library is a suite of tools to use unsupervised learning to find clusters, or dense regions, of a dataset.

Kalman Filters

Huber Loss

If MSE is too sensitive to outliers in your data and MAE not enough try Huber loss

Active Learning

tags: #data #labels If you have lots of unlabelled data but labelling is expensive, active learning can help find the best data to label.

modAL python framework for active learning
Active Learning

Pandas Profiling

tags: #exploration #python #visualization pandas’ DataFrame.describe() on steroids.

SweetViz

tags #exploration #python #visualization In-depth EDA (target analysis, comparison, feature analysis, correlation) in two lines of code, alternative to Pandas Profiling

Great Expectations

tags: #data #etl #pipeline Helps eliminate data pipeline debt, through data testing, documentation, and profiling. Assertions for data

PopMon

tags: #data #pipeline #drift Monitor the stability of a pandas or spark dataframe

Kats “One stop shop for time series analysis in Python”

tags: #timeseries #forecasting #detection Includes 10+ forecasting models, backtesting hyperparameter tuning, pattern detection and time series feature extraction.

Darts

Forecasting: Principles and Practice

tags: #book #timeseries #rlang #forecasting Book Mostly methods like ARIMA, including a chapter on #hierarchical time series

Matrix Profile

tags: #timeseries #anomalydetection #motif Website Presentation Part 1 and Part 2 Python package #python “The matrix profile is a data structure and associated algorithms that helps solve the dual problem of anomaly detection and motif discovery. It is robust, scalable and largely parameter-free.”

Time Series Classification Repository

tags: #timeseries #classification http://timeseriesclassification.com/

tsfresh

tags: #python #timeseries #features tsfresh calculates a large number of time series characteristics (features).

tsfel

tags: #python #timeseries #features tsfel (another) Time Series Feature Extraction Library

Featuretools

tags: #python #features Featuretools automatically creates features from temporal and relational datasets (timeseries and relational data)

Reptile

Article tags: #fewshot #metalearning

Yellowbrick

tags: #vizualization Website Yellowbrick extends the Scikit-Learn API to make model selection and hyperparameter tuning easier. Under the hood, it’s using Matplotlib.

SMOGN

tags: #imbalanced #machinelearning SMOGN: Synthetic Minority Over-sampling for regression with Gaussian Noise

HanTa Hanover Tagger

tags: #nlp #tagger A simple approach to lemmatization and POS-tagging based on heuristics and hidden markov models of German morphology. Github

Mathworks Predictive Maintenance Toolbox

tags: #predictivemaintenance Predictive Maintenance Toolbox™ lets you manage sensor data, design condition indicators, and estimate the remaining useful life (RUL) of a machine. Website, also see RUL estimation

Text Vizualization Browser

tags: #vizualization #nlp Different plots for text Website