BeamSummit2022-Implementing Cloud Agnostic Machine Learning Workflows with Apache Beam on Kubernetes
Session presented by Charles Adetiloye and Alexander Lerma at Beam Summit 2022
The need for a highly efficient data processing workflow is fast becoming a necessity in every organization implementing and deploying Machine Learning models at scale. In most cases, ML teams leverage the managed service solutions already in place by the cloud infrastructure provider they choose. While this approach is good enough for most teams to get going, the long-term cost of keeping the platform running may be prohibitively higher over time.
As an alternative, Charles and Alexander run their ML pipeline tasks as Apache Beam jobs orchestrated with Argo on Kubernetes. Using Kubernetes gives them a clean abstraction of the underlying compute resources and enables them to declaratively configure Apache Beam job runners for either streaming or batch workloads on any Cloud or OnPrem compute infrastructure.
In this talk, Charles, MLOps Engineer at MavenCode, and Alexander, MLOps Engineer at MavenCode, discuss how they have implemented a continuous integration and deployment environment stack to containerize and deploy Argo workflows for running their beam job on Kubernetes. They go through the challenges they encountered and lessons learned with recommended best practices to consider for any MLOps team considering this approach.
source