THANK YOU FOR SUBSCRIBING
New Technologies Enable Ubiquitous Machine Learning
Ahmed Khamassi, Vice President Data Science, Equinor UK Ltd
The development of machine learning (ML) processes that are in production and utilized by end- users in their day-to- day activities is not an easy task.
They need to tackle problem statements that bear significant value to the organization.
Machine learning is seldom only about algorithms. Data pipeline development, feature engineering, model validation and lifecycle management are fundamental components.
ML processes need to be implemented at scale: a large number of applications in value generating activities. For example, applications that recognize people in digital photography are popular. The ML process doing the recognition was designed and, probably, trained once, then deployed as a product feature for thousands of users, adding value to our digital experience at no extra cost to the product owner.
Organizations serious about creating value from ML, should emulate this approach: solve once and deploy with ϵ -cost scaling: every marginal deployment in production should cost a diminishing marginal cost.
Design Constraints to ϵ-cost Scaling
Let’s consider an example. Industrial organizations are investing in the early detection of machine failures. Often, data scientists try to build models for specific machines, e.g. the compressor on production line A using its sensor data. Once a model is built and validated statistically and by the engineers, it is deployed so that engineers can act on early warnings. This is an evolved process that the data science team may need to recreate for the compressor on production line B. The value of individual processes is low, and the cost to achieve a several implementations, so that the aggregate value is substantial, is high.
ϵ -cost scaling restates the problem as: can we build a machine learning product that allows engineers to detect degradation in the performance of any list of machines equipped with sensors?
Such a product has three design constraints:
1. The machine learning algorithm must automate feature engineering: designing variables that best predict the degradation of machine performance
Machine learning is seldom only about algorithms. Data pipeline development, feature engineering, model validation and lifecycle management are fundamental components
2. The technology platform needs to allow for the automatic training and deployment of models at scale
3. A simple user interface needs to allow the engineers to design and build their own models.
1st Constraint: Deep Learning
The first constraint can be tackled with deep learning, which uses deep neural networks and the back propagation algorithm to train them.
Neural networks are universal function estimators. They use a series of inter-connected layers that transform the input of previous layers into higher feature representations. Each layer is a row of nodes. Nodes take the output of the previous layer’s nodes, compute their weighted sums and apply an activation function to the sums (e.g. the sigmoid function). The back propagation algorithm finds the weights that minimize the difference between how the network represents the targets and the actual training targets.
The power of deep learning is that it combines the input space into a large number of significant features: it creates the combination of variables that best represent the physical process.
Data scientists need to find the best neural network architecture(s), and parameters.
Although deep learning satisfies the first design constraint, the job of the data scientist becomes that of an algorithm engineer rather than a modeller.
2nd Constraint: Kubernetes and GitOps
Cloud native platforms allow the automation of scaling depending on the computational workload. Our engineers may want to build monitoring processes for dozens of machines concurrently.
Kubernetes automates the management of containers and the resources they require when ran. So, our product can be a network of containers that complete general tasks such as authentication, fetching sensor data, training neural networks, generating API endpoints and visualizations.
Data scientists enrich the product features by adopting a GitOps process. GitOps implements a secure Kubernetes controller that listens for and synchronizes deployments to the Kubernetes cluster. They move code from development to deployment by updating a declarative configuration file and pushing it to Git.
Kubernetes and GitOps satisfy the second constraint, but require the data scientist to combine analysis and software engineering.
3rd Constraints: GitOps again
Our product achieves high aggregate value when adequately applied on many machines. Making ML ubiquitous.
Engineers are best placed to select the data suited for the risks they monitor.
The product can be designed so that the engineers interact with it through a simple configuration that maps sensors to machines, and defines the pipelines and algorithms to build. They can do so declaratively or through a UI. Changes to the configuration automatically kicks off the build and deploy process through GitOps.
To guarantee quality, the product should include automated validation and QC mechanisms transparent to the user, unleashing their innovative skills.
GitOps satisfies the third constraint, but the engineer’s job needs to involve more statistics and ML.
Check This Out: Top Machine Learning Solution Companies
Check This Out: Top Machine Learning Consulting/Services Companies