The Key to Successful AI Adoption – Building a Robust AI Pipeline

June 2, 2020 | Debiprasad Banerjee

Artificial Intelligence-based systems and applications, especially the Deep learning models with convolutional and recurrent networks, are now everywhere and analyze massive amounts of audio, image, video, text, and graph data, with applications in machine translation, understanding objects and scenes, mapping and GIS apps, ranking user preferences, ad placement, etc. Competing frameworks for building these networks, such as TensorFlow, PyTorch, Caffe, MXNet, etc., explore different tradeoffs between usability and expressiveness, research or production orientation, and supported hardware. With the rapid advances in all these areas, especially the new-age hardware now available for inferencing on the edge, these gaps may get vast and the differences more pronounced as specializations become differentiators.

The flip side of these tradeoffs is that data scientists and machine learning engineers generally have a steep learning curve before attaining the expertise required to take a heterogeneous ensemble of models built on different frameworks to production and optimize them for specific hardware.

Even after an ML engineer successfully takes a trained model to production, there is a likelihood of ‘data drift’ – predictions on live data not having the same accuracy seen on the test data set used to generate that model. The deep learning frameworks or libraries don’t provide the needed supporting infrastructure to analyze why the model behaves differently, and engineers typically must reinvent the wheel for monitoring, debugging, and visualizing this pathological behavior. A systemic failure like the prediction server causing CPU thrashing or running out of memory is hard to debug since one does not have readily available debugging tools, which can help an ML reliability engineer pinpoint the exact place in the serving code that causes this behavior.

Thus, taking a trained model, often developed only within a Jupyter Notebook environment, to production with a service level agreement that clients find satisfactory is several months of effort. The maintenance effort required to ensure things stay stable involves some more months of infrastructure work. It requires a well-planned deployment approach using a pipeline designed to abstract and automate a few deployment and monitoring processes. Any framework or platform designed to realize this end-to-end pipeline must essentially consist of the following workflow steps:

  1. Gather and Manage data
  2. Train and Evaluate models
  3. Deploy and Scale models to make predictions
  4. Monitor predictions and initiate remedial action

Data Engineering Pipeline

Often, identifying the appropriate set of features is the hardest part of any machine learning application, and we find that building and managing data pipelines is typically one of the most complex and costliest pieces of a complete machine learning solution.

Identifying and aggregating the data available within and outside the enterprise is the first step in building a robust data pipeline. Once the data is cleansed, transformed, and stored in the data lake, the platform should provide standard tools to generate features and label data sets for training (and re-training) and feature-only data sets for prediction.

The pipelines need to be scalable and performant. They should incorporate integrated monitoring for data flow and quality and support online and offline training and predicting.

Training Pipeline

A distributed model training pipeline should expand to handle millions of training samples and down to small datasets for quick iterations.

With the data pipeline set up and the features identified and isolated into a store, the training process can begin with the selected algorithms. It requires creating a model configuration file that specifies the model type, hyper-parameters, and data source reference, as well as compute resource requirements like the number of machines, how much memory, whether to use GPUs, etc. It is used to configure the training job, which runs on a cluster like Hadoop YARN or Mesos.

After training the model, performance metrics (e.g., ROC curve, PR curve, or IOU graph) are computed, stored, and available on a dashboard. At the end of the training, the original configuration, the learned parameters, and the evaluation report are saved back into the model repository for analysis and deployment.

Serving Pipeline

The model serving pipeline consists of setting up the infrastructure for model deployment and setting up data flows for inferencing.

A model can be deployed for inferencing in a batch processing mode where the predictions are done offline and made available for consumption by the front end by storing them in a data store. Hence also the name ‘Offline Deployment’.

Online or real-time deployment, on the other hand, consists of deploying the model for invocation by the front-end applications in real time while serving a user request. Hence, online deployed models typically receive their inferencing data in real-time via a data stream or application session data passed on while making the API call to the model.

However, these models may still need to use features computed offline in batch mode and made available via the feature store. The serving pipeline also needs to facilitate the deployment of multiple models to ensure A/B Testing. Managing the deployments via containers abstracts the model from the calling applications and facilitates a ‘hot swap’ architecture construct.


During the training process a model is trained and evaluated based on the historical data available. ‘Data Drift’ between historical training data and live production data is a well-established fact, and it is necessary to set up adequate monitoring mechanisms to ensure that a model works well post deployment.

It is critical to constantly monitor the prediction stream to ensure that the incoming data is accurate and the environment has not changed in a way that can introduce anomalies. The monitoring pipeline should be able to log the data and the prediction streams and correlate the prediction with the actual observed outcomes later. The anomalies, thus detected, can be labeled and sent into the re-training pipeline for continuous improvements of the model accuracies.

And what’s more…

A web-based UI and a rich API layer are built to tie these together and provide easy access to this platform for end users. It would consist of a management application that serves the web UI and API layers and integrates with the existing enterprise monitoring and alerting infrastructure.

This layer would also house a robust workflow engine used to orchestrate the batch data pipelines, create and schedule training jobs and batch prediction jobs, and coordinate the deployment of models into both the batch and online containers. Included as part of the UI should be a dashboard for consuming the model training and evaluation reports and the performance monitoring statistics of the serving pipeline.

In Conclusion

While it may look like a massive investment of time and resources to build this platform, we can create it step-by-step and deploy our first few models into production. The key to successful deployment and scaling of enterprise-wide AI systems is the understanding and holistic view of the end-to-end pipeline.

Related Articles

Want to explore all the ways you can start, run & grow your business?