Stable Diffusion Serving

In this tutorial, we will show how to serve a stable diffusion model to generate images from text. We will use EverlyAI's Local Code feature.

Step 1. develop the server

As the first step, we will develop the stable diffusion server. For the purpose of the tutorial, we will use FastAPI (opens in a new tab) as the web framework. The code is shown below.

On line 9 to 12, we load Stable Diffusion model, stabilityai/stable-diffusion-2 (opens in a new tab), using the Huggingface Diffuser library (opens in a new tab). You can switch to any library or framework you prefer to load and optimize models.

On line 23 to 29, we define a fastAPI POST endpoint for users to generate the image. The endpoint is available at /generate and it expects an input with prompt as key. We will use FastAPI FileResponse (opens in a new tab) to directly return the generated image.

EverlyAI allows users to define any number of endpoints. For the purpose of this tutorial, we will define another endpoint /random/secret/message on line 31 to 33.

server.py

import torch
from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler
import fastapi
from fastapi.responses import FileResponse
import os
import uuid
from pydantic import BaseModel
 
class GenerateRequest(BaseModel):
    prompt: str
 
# Model initialization
model_id = "stabilityai/stable-diffusion-2"
scheduler = EulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")
pipe = StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
 
filepath_prefix = "/tmp/sd_server_demo"
os.makedirs(filepath_prefix, exist_ok=True)
 
app = fastapi.FastAPI()
 
@app.post('/generate')
def generate(request: GenerateRequest):
    # Optionally, we can load other SD model or Lora from EverlyAI file system here.
    image = pipe(request.prompt).images[0]
    path = os.path.join(filepath_prefix, uuid.uuid4().hex + '.jpg')
    image.save(path)
    return FileResponse(path=path, media_type='image/jpeg')
 
@app.get("/random/secret/message")
def message():
    return "Wow!"

requirements.txt

accelerate==0.27.2
annotated-types==0.6.0
anyio==4.3.0
certifi==2024.2.2
charset-normalizer==3.3.2
click==8.1.7
diffusers==0.26.3
exceptiongroup==1.2.0
fastapi==0.110.0
filelock==3.13.1
fsspec==2024.2.0
h11==0.14.0
huggingface-hub==0.20.3
idna==3.6
importlib-metadata==7.0.1
Jinja2==3.1.3
MarkupSafe==2.1.5
mpmath==1.3.0
networkx==3.2.1
numpy==1.26.4
packaging==23.2
pillow==10.2.0
psutil==5.9.8
pydantic==2.6.2
pydantic_core==2.16.3
PyYAML==6.0.1
regex==2023.12.25
requests==2.31.0
safetensors==0.4.2
scipy==1.12.0
sniffio==1.3.0
starlette==0.36.3
sympy==1.12
tokenizers==0.15.2
torch==2.2.1
tqdm==4.66.2
transformers==4.38.1
typing_extensions==4.9.0
urllib3==2.2.1
uvicorn==0.27.1
zipp==3.17.0

Verify locally

It is highly recommended that we first verify the code works locally. If there is no GPUs available locally, we can comment out line 12 to load the model on CPU. Run the following command,

uvicorn server:app

and navigate to http://localhost:8000/docs (opens in a new tab). You can see all the endpoints and use the Web UI to send requests.

Step 2. Package code and start server

Now that we have verified the server works locally, we will serve it on EverlyAI for efficiency, scalability and reliability. As a first step, we will create a file, everlyai_entrypoint.sh, to tell EverlyAI how to run our code.

everlyai_entrypoint.sh

pip install -r requirements.txt
uvicorn server:app --host=0.0.0.0 --port=8000

After that, run the code below to package all the code.

zip code.zip everlyai_entrypoint.sh requirements.txt server.py

Navigate to Project page (opens in a new tab) and use the following configuration.

Job type: Model Serving
Code: Local code
Upload code zip file: select the zip generated in the previous step
Enable project API validation: OFF

💡

We disable project API validation so that we can directly use Swagger UI in the next step. If you do not have authorization or access control logic, you should enable the project API validation.

Step 3. Query the server

Once the GPU instance has transited to RUNNING state, we can send request to the server. The Swagger UI is available at https://<public-domain>/docs. Following is an example.

Huggingface TGI Model Training