How-to Guide

Try Domo for yourself.

Completely free.

Guide: How To Monitor Machine Learning Models

Your company has invested the time and money to build, train, test, and deploy your machine learning (ML) models, so you may think the hard work is over. But if you’re not routinely monitoring your models once deployed in the real environment, they could be prone to bias, drift, corrupt data, or decay, which can cause serious problems and poor predictions. Learn why ML model monitoring is vital to your operations, what metrics you need to monitor, and tools that can help monitor your models effectively.

An overview of monitoring machine learning models

Machine learning is a type of artificial intelligence that detects patterns and makes predictions by processing large, complex volumes of data, including historical data and human inputs, using models. ML models are more vital than ever before—we’re now generating data that is too vast for humans to process and use efficiently. Companies across many industries are investing heavily in ML models to do things like analyze medical imaging, forecast stocks, personalize marketing campaigns, and make informed business decisions.

However, using ML models isn’t without risk. Models aren’t infallible and can generate incorrect predictions. Data sets used in the process may overrepresent or under-weigh certain populations, producing bias in your results. And since the models learn and adapt through new experiences and data over time, you run the risk of decay and drift. Monitoring ML models throughout their lifecycle is essential to mitigate these risks.

What is ML model monitoring?

Machine learning model monitoring is the process of evaluating model outcomes and performance after you deploy ML models in the real world. Exposure to real environments vs testing and changes with time can impact the accuracy and effectiveness of your model’s performance. Continuous ML monitoring is necessary to ensure your models perform as expected and to detect and resolve common post-deployment issues, like concept drift, that affect model outcomes.

Importance of ML model monitoring

Since ML models can degrade or break, continually monitoring and evaluating them as they operate is necessary. If you rely on one-time monitoring at model deployment, you could miss critical issues that negatively impact operations. Assessing the model in a live environment allows your team to understand its performance in real time, rather than rely on outdated findings. You’ll be able to identify and resolve problems quickly, reducing the risk of unexpected predictions. 

Monitoring models also offers businesses advantages beyond analyzing model performance. ML monitoring can detect changes and alert you when specific complications occur, like a drop in accuracy, data drift, or a corrupt feature. It can even pinpoint the primary source of an issue, letting you focus your resources on fixing it. 

The ongoing monitoring of machine learning models in production also helps you gain insight into how users interact with your models. You’ll receive important user and model feedback, which is essential for continuously refining your ML model. 

Machine learning monitoring also offers greater visibility so that your team—from data scientists to model users, product managers, executives, and other stakeholders—better understands the model’s risks and performance. They can easily compare models, identify underperforming segments, and see if the models positively or negatively impact their business. Monitoring models is key for optimizing their business value.

Challenges of ML model monitoring

While monitoring your ML models is vital for their effectiveness, it can be difficult. First, you can’t rely on the same monitoring methods used for tracking the health of your existing software systems. ML models share some of the same challenges as traditional software, such as technical debt, but also have their own potential issues that require a separate field of monitoring. 

Second, ML model monitoring centers around specific model and data quality metrics, and computing and monitoring these metrics differs. Some of the challenges you may face include:

  • Silent errors: ML models will keep working if they can process the incoming data inputs, even if the data is biased, incorrect, or unreliable. If this happens, your model will produce low-quality predictions without alerting you to any problems. This doesn’t happen with software systems, so your data team must be aware of this issue and proactively ensure data quality.
  • Entanglements: Even small changes to your input data can affect your models’ target function and predictions, so carefully test your coding.
  • Delays in feedback: You can’t always measure a model’s performance in real time because of delays in data inputs and outputs. You may need to run two monitoring loops: one using estimates for real-time feedback and a delayed loop that runs once your hard data is in. 
  • Ownership in production: There may be confusion about which department is responsible for the model once it’s in production. Have a plan in place so your data scientists, engineers, or developers understand their responsibilities in the monitoring process. 
  • Stakeholder involvement: Their expertise or insights can improve your model’s effectiveness and reduce risk. However, their involvement complicates the monitoring process, so you’ll want to define clear guidelines to follow. 

Quality is relative: Your model’s performance is subjective, and there isn’t a specific threshold to hit for your prediction to be accurate. An accuracy rate of 85 percent might be great for one model, while it indicates data quality problems in another. Your organization won’t be able to use one set of performance metrics; you’ll have to adjust your approach based on each model.

What needs to be monitored in ML models?

There are two key areas to center your ML model monitoring: functional-level monitoring and operational-level monitoring. Let’s dive deeper into each topic to learn which elements are most important to track.

Functional-level monitoring

You’ll want to monitor three factors that impact the functionality of your ML models: your input data, your model, and the outputs or predictions your model makes. Monitoring this information ensures your model functions optimally and produces relevant, accurate results.

Input data

Model performance hinges on the input data it receives to run and make predictions. Monitoring input data is the foundational step for identifying and resolving functional-based issues before they impact your model’s performance. You’ll want to monitor: 

  • Data quality: Data pipelines must be free of problems like missing values, mismatched formats, lost data, alterations to the source database, or range violations to ensure data integrity. You need to verify that your data types are equivalent and valid. If your model receives problematic data or data that it’s not expecting, it can cause your model to break. 
  • Data drift: The values or statistical properties of data can change or drift over time. This naturally happens as your business problems change and context or people’s behavior shifts. Your team must monitor for drift between your training and production data to look for these changes and update models accordingly.

Model

Your ML model’s business value is based on its overall performance. If you’re not meeting a certain performance threshold, the model wastes your efforts and resources. Model monitoring is central to your machine learning system, helping you detect and fix issues so your model can meet your performance goals. 

Evaluate your ML model for:

  • Model drift: Changes to the real-world environment can cause model predictions to decay or drift with time, making them less effective. Your team can identify this issue by monitoring your model’s predictive performance over time or conducting statistical testing. The  Kolmogorov-Smirnov (KS) test compares distributions from two data sets—in this case, the training data and the model data. Divergence between the data sets greater than the maximum amount allowed indicates drift in your model. 
  • Versions: Track your model’s version history and predictions to distinguish between older and updated versions of your model. Make sure you’re using the correct version of your model in production to gain the best performance. 

Output/predictions

Your ML model is built and put into production to solve a specific problem. Understanding your model’s outputs, including predictions, in your production environment is key to rating its success and ensuring you meet business KPIs. 

Consider monitoring: 

  • Ground truth: This is the reality you want to model or the target you’re using to validate your model. Measuring outputs against your ground truth labels lets you see if you’ve achieved your goal. For example, if your sales team uses an ML model to send personalized emails to leads, the prediction is whether they will click through to your website or not. The personalization and prediction are considered valid if the lead takes this action. In this scenario, comparing the outcome with your ground truth is easy. 
  • Prediction drift: For scenarios where it’s impossible to generate ground truth labels, predictions must be closely monitored for drift. Significant changes in the expected distribution of your predictions can indicate something has gone wrong and needs to be explored further. 

Operational-level monitoring

This stage monitors the health of your machine learning system’s resources across three operational elements: system performance metrics, pipelines, and cost. Operations engineers or your IT department typically monitor and correct any issues at this level. 

ML system performance

Staying informed on how your machine learning system’s infrastructure performs and if it’s in line with your entire software system is vital. Monitoring your model’s system performance lets you learn if it’s meeting requests quickly enough, if there are serving limitations, if it’s using resources efficiently, and if it can meet requirements to scale. For this, you’ll want to track:

  • Metrics: CPU/GPU use, total API calls, memory use, number of failed requests, and response time of prediction service or model server are top metrics to help you measure your model’s speed, latency, and performance within your application stack. 
  • System reliability: The infrastructure needed to run your model, including which machines are running and the number of clusters running.

Pipelines

Unhealthy pipelines can cause quality issues or leakage within your model, causing it to break. Monitoring the health of your model’s most critical pipelines and detecting unexpected changes must be top priorities.

  • Data pipelines: Data must be tracked and evaluated at every step, starting from your input data sources. Are the data properly structured, complete, and valid? You’ll also want to check input data against quality-based statistical metrics such as standard deviation, mean, correlation, or K-S test. Then, monitor your output data by checking if the schema or output file size is as expected. Checking that every workflow step in the process is as expected and in the right format and monitoring the time it takes to run tasks helps you detect problems and ensure data operations run smoothly.
  • Model pipelines: Consider monitoring any factor that may cause your model to break during production after you’ve completed any new training or redeployments. This includes validating the dependency version and logging model pipeline metadata so it’s easier to find and debug if a failure occurs.

Cost

While machine learning has the potential to generate significant value for your company, that doesn’t mean you should downplay its costs. The entire machine learning system, from data storage and model training to deployment, retraining, and monitoring, is costly and requires your organization to take on additional tasks and responsibilities. 

Monitoring the cost of your ML models helps you develop and stick to an appropriate budget. You can easily track vendor services, monitor system usage costs, or set alerts when you reach a budgetary threshold. It lets you analyze expenses to optimize your budget or find compromises to help reduce costs.

How to get started monitoring your ML models

Establishing a machine learning model monitoring practice is easier than you think. After deploying your models, use the following steps to monitor your model’s performance and health.

1. Understand your needs and goals 

Though you can track numerous ML model elements, simplify the monitoring process by asking: What do you need to monitor, and who needs to see the results? Instead of creating a complicated monitoring practice that tries to solve every problem, monitor your models with a specific objective. Data scientists and engineers may need to monitor for missing data or feature changes, while a product manager will look for useful insights.

2. Evaluate your existing tools

You may be able to use existing applications or tools for ML model monitoring. Engineering and development tools that track and evaluate system performance and metrics also apply to your model. Using existing tools can help you cut down on monitoring costs.

3. Decide how to deliver results

How are you going to share your monitoring results? Do you already have preferred channels to send monitoring alerts when changes or failures happen? Do you need to visualize results so users and stakeholders can access and understand information? Depending on what tools you already use and how you want to communicate model monitoring with others, you may need a dashboard or other visualization platform to make results accessible to everyone involved. 

No matter what your chosen platform is, it should be intuitive and easy for all users to operate. Your monitoring tool needs to integrate with your existing platforms and data sources, provide the required metrics for your ML models, allow for customization, and offer collaboration so you can share results with others. 

4. Define your monitoring metrics

You’ll need to select monitoring metrics, statistics, and tests relevant to your model and goals. Direct model performance metrics like daily predictions, prediction drift, null value percentage, and share of drift are a few options. 

Since ML models will degrade with time, it’s essential to track metrics and set thresholds to automatically detect and alert you when large changes or dips in performance occur. Based on your criteria, you can even alert specific roles or departments or send weekly reports.

5. Set up a reference data set (if needed)

Depending on your model and metrics, you may need to choose one or more baseline data sets as a reference to detect data drift. Your reference data set must reflect your model’s expected patterns, like data from previous model operations. 

6. Decide your monitoring schedule

Do you need to monitor your ML model in real time? Or would periodic hourly, daily, or weekly monitoring meet your requirements? When choosing your monitoring schedule, consider factors such as the model’s format, deployment, level of risk, and your system’s existing infrastructure. For many businesses, batch monitoring over time is sufficient unless you’re dealing with time-sensitive topics. Others may choose a longer schedule to utilize ground truth labels.

7. Automate monitoring actions and troubleshoot solutions

Create a plan of action ahead of time so that if a model breaks or shows signs of bias, drift, or health issues, your company can address it immediately. If your monitoring tools allow it, set up automatic alerts so your team can investigate problems. Also, have a troubleshooting framework in place so individuals or teams know who is responsible and what steps to take to mitigate issues.

Manage your ML models with Domo

Monitoring machine learning models can be challenging but is essential to your model’s health and lifecycle. Fortunately, Domo simplifies the ML model monitoring process so you can quickly identify and solve problems. 

Our AI and ML model management seamlessly integrates your existing ML models and lets you build and train new ones in our controlled and transparent environment. From there, you can easily deploy, refine, and retrain your models to achieve peak performance. In addition, our intuitive visualizations and automation features deliver speedy insights that drive smarter decisions to stay ahead in your business. 

Are you interested in seeing how big data and AI can improve your business? Discover how Domo.AI combines AI innovations with our existing BI platform for powerful analysis and meaningful business insights.

RELATED RESOURCES

Article

AI Models: Types, Examples, and Everything You Need to Know

Article

10 things you need to know before implementing composable analytics

glossary

AI Governance

Ready to get started?
Try Domo now or watch a demo.