Data Science Toolbox
ELI5
tags: #explainibility #interpretability #python ELI5 is a Python library which allows to visualize and debug various Machine Learning models using unified API. It has built-in support for several ML frameworks and provides a way to explain black-box models.
SHAP
tags: #explainibility #interpretability #python SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. Includes great (interactive) dashboards. Only used it on random forrests so far.
Lime
tags: #explainibility #interpretability #python lime (Local Interpretable Model-agnostic Explanations) explain what classifiers are doing.
Interpretable Machine Learning - A Guide for Making Black Box Models Explainable.
tags: #explainibility #interpretability #book online version
UMAP
tags: #dimensionreduction Code, Paper Uniform Manifold Approximation and Projection (UMAP) is a dimension reduction technique that can be used for visualisation similarly to t-SNE, but also for general non-linear dimension reduction.
t-SNE
Explaination of Random Forrest Feature Importance
tags: #randomforrests #featureimportance Article
dtreeviz
tags: #interpretability #randomforrests Code Article A python library for decision tree visualization and model interpretation.
HDBSCAN
tags: #visualisation #unsupervised Documentation The hdbscan library is a suite of tools to use unsupervised learning to find clusters, or dense regions, of a dataset.
Kalman Filters
Huber Loss
If MSE is too sensitive to outliers in your data and MAE not enough try Huber loss
Active Learning
tags: #data #labels If you have lots of unlabelled data but labelling is expensive, active learning can help find the best data to label.
- modAL python framework for active learning
- Active Learning
Pandas Profiling
tags: #exploration #python #visualization pandas’ DataFrame.describe()
on steroids.
SweetViz
tags #exploration #python #visualization In-depth EDA (target analysis, comparison, feature analysis, correlation) in two lines of code, alternative to Pandas Profiling
Great Expectations
tags: #data #etl #pipeline Helps eliminate data pipeline debt, through data testing, documentation, and profiling. Assertions for data
PopMon
tags: #data #pipeline #drift Monitor the stability of a pandas or spark dataframe
Kats “One stop shop for time series analysis in Python”
tags: #timeseries #forecasting #detection Includes 10+ forecasting models, backtesting hyperparameter tuning, pattern detection and time series feature extraction.
Darts
Forecasting: Principles and Practice
tags: #book #timeseries #rlang #forecasting Book Mostly methods like ARIMA, including a chapter on #hierarchical time series
Matrix Profile
tags: #timeseries #anomalydetection #motif Website Presentation Part 1 and Part 2 Python package #python “The matrix profile is a data structure and associated algorithms that helps solve the dual problem of anomaly detection and motif discovery. It is robust, scalable and largely parameter-free.”
Time Series Classification Repository
tags: #timeseries #classification http://timeseriesclassification.com/
tsfresh
tags: #python #timeseries #features tsfresh calculates a large number of time series characteristics (features).
tsfel
tags: #python #timeseries #features tsfel (another) Time Series Feature Extraction Library
Featuretools
tags: #python #features Featuretools automatically creates features from temporal and relational datasets (timeseries and relational data)
Reptile
Article tags: #fewshot #metalearning
Yellowbrick
tags: #vizualization Website Yellowbrick extends the Scikit-Learn API to make model selection and hyperparameter tuning easier. Under the hood, it’s using Matplotlib.
SMOGN
tags: #imbalanced #machinelearning SMOGN: Synthetic Minority Over-sampling for regression with Gaussian Noise
HanTa Hanover Tagger
tags: #nlp #tagger A simple approach to lemmatization and POS-tagging based on heuristics and hidden markov models of German morphology. Github
Mathworks Predictive Maintenance Toolbox
tags: #predictivemaintenance Predictive Maintenance Toolbox™ lets you manage sensor data, design condition indicators, and estimate the remaining useful life (RUL) of a machine. Website, also see RUL estimation
Text Vizualization Browser
tags: #vizualization #nlp Different plots for text Website