ML In Production
Abstract
“Give me an afternoon and I will train a neural network, Give me six months and I will bring it to production.”
Bio
Christian Geier has more than 10 years of experience of solving real world problems in industry as well as academia with machine learning and computer science. He currently is a senior data scientist consultant with Ginkgo Analytics.
Title: Machine Learning in Production
Machine Learning (ML) is currently a very trendy topic and expected to be the driver of the coming Artifical Intelligence revolution. Therefore, companies finance and push ML projects in all areas ranging from accelerating research & development via predictive maintenance on the shop floor to tailoring marketing campaigns to customers. But a lot of those projects don’t ever come out of the prototype phase or are never used once live. The reasons for failure range from trying to solve the wrong problem to begin with to issues with data quality, missing infrastructure or an ever-changing environment. This talk will focus on how IT and business can together tackle the challenges of not only creating machine learning models but, more importantly, bringing them into production and keeping them there.
Agenda
- Prerequisites for successful Machine Learning projects
- Bringing ML models into production
- Keeping ML models in production
- #MLOps
Inspiration
- https://files.gotocon.com/uploads/slides/conference_5/150/original/Henrik_Brink_ml-talk-goto2017.pdf
- https://files.gotocon.com/uploads/slides/conference_15/955/original/GOTO%20Copenhagen%202019.pdf
- https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems
- https://mlinproduction.com/deploying-machine-learning-models/
- https://codingossip.github.io/2020/test-first-machine-learning/
- https://www.information-age.com/machine-learning-models-production-123485722/
- https://blog.cloudera.com/putting-machine-learning-models-into-production/
Tools
Ideas
ML Modeling is only a very small part of bring an AI solution to production?
- Slide 1: blocks that show different process steps until model is working
- initial data collection, data understanding, business understanding, modelling, validation
- Slide 2: Zoom out, much more blocks, everything that is need for production
- configuration, data monitoring, monitoring, process management, result validation, infrastructure management,
- Slide 1: blocks that show different process steps until model is working
What to optimize for?
Model interpretability
- interpretability vs performance?
- Help to fix problems in production
Model complexity
Infrastructure & Data Engineering
- What roles do you need?
Deployment & Data Engineering
- DevOps
- Containers
- CI
- Ready Made ML Platforms
- Batch or live prediction
- API
- Return time of prediction ->
Model Versioning
Data Versioning
Problem once deployed and running
- Accuracy dropping?
- Slow: Drift in data, behavior change?
- Data changes
- Trend?
- World changes
- Behaviour change
- Data changes
- Fast/Sudden change in data
- Data collection
- Hardware problems / failure
- Systems Problem
- software update somewhere
- credentials update
- Data collection
- Slow: Drift in data, behavior change?
- Retraining your models?
- On what timescale?
- Labelling necessary?
- Accuracy dropping?
A/B testing of models in live production
Monitoring
- Performance Metrics
- Data input (e.g. distributions, automatic outlier detection, etc)
- ETL Monitoring
- Cloud Metrics (how long does your model take to train, to predict, consumption tracking)
Retraining if model performance is of?