MLOps Platform
Auto Scaling

Auto Scaling

EverlyAI will automatically help you adjust number of machines based on the load. This will help you

  1. Increase number of machines when there is a traffic spike.
  2. Reduce number of machines when traffic volume drops to save cost.

This feature is only available for model serving, not for model training.

Horizontal scaling

From the most basic perspective, EverlyAI operates on the ratio between desired metric value and current metric value:

desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]

For example, if the current metric value is 20, and the desired value is 10, the number of replicas will be doubled, since 20 / 10 == 2.0. If the current value is instead 5, you'll halve the number of replicas, since 5 / 10 == 0.5.

EverlyAI supportsa few metrics, such as

  • CPU utilization
  • Memory utilization
  • GPU utilization

The currentMetricValue is computed by taking the average of the given metric across all instances in the project.

Scale to zero

You can set the Minimum number of machine to zero so that EverlyAI will scale it to zero instance when there is no user traffic after scale to zero grace period seconds.

You should set scale to zero grace period to a relatively large value to reduce interruptions.