In the previous blog post about Kubernetes autoscaling, we looked at different concepts and terminologies related to autoscaling such as HPA, cluster auto-scaler, etc. In this post, we’ll do a walkthrough of how Kubernetes autoscaling can be implemented for custom metrics generated by the application.
Why Custom Metrics?
The CPU or RAM consumption of an application may not indicate the right metric for scaling always. For example, if you have a message queue consumer that can handle 500 messages per second without crashing. Once a single instance of this consumer is handling close to 500 messages per second, you may want to scale the application to two instances so that load is distributed across two instances. Measuring CPU or RAM is a fundamentally flawed approach for scaling such an application and you would have to look at a metric that relates more closely to the application’s nature. The number of messages that an instance is processing at a given point in time is a better indicator of the actual load on that application. Similarly, there might be applications where other metrics make more sense and these can be defined using custom metrics in Kubernetes.