MLOps Platform
Network Configuration

Network Configuration

When you create a project for serving, we will bring up two components,

  1. L4 Load Balancer (opens in a new tab) for extremely fast and efficient load balancing.
  2. A set of machines/servers to run your code.

Your client directly talks to the servers that runs your code without hopping through any of our server stack, as shown in the figure below.

Following are the network configurations you can customize for your applications.

Sticky Session

If this is enabled, the request from the same client always goes to the same server. We use a two-tuple (source IP and destination IP) hash to route to backend instances. The successive requests from the same client IP address are handled by the same backend instance.

Project API validation

When this is enabled, users need to add project api key inside the request header. For example, with cURL, users need to add -H "Authorization: Bearer <PROJECT_API_KEY>. Otherwise the request is rejected. You can enable this when you do not have access control implemented on your servers.

Health probe path

The load balancer periodically sends a request to the probe path. If a non-200 status is received, the load balancer considers the server as unhealthy. It will stop sending traffic to the server. You can leave this empty to use the default check.

To customize the behavior, you need to implement a health check endpoint.

@app.get('/my_health')
def health():
    return 'OK'

and you can use /my_health as the value for health probe path when creating the project.