免费创建Flux生图API

众所周知，Flux 是一个 120 亿参数的大模型，普通人本地根本跑不起来，那么今天我来教大家免费白嫖一个无服务器容器类的网站，帮我们一键化构建一个生图API, 每个人每个月有免费的30 刀的赠费，可以使用H100, A100 等超强显卡资源。

注册Models网站

首先，打开这个网站models，我们注册一下：

然后进入Dashboard 页面，在这里你可以管理你所有创建的APP，你也可以在右上角查看你的可用余额，也就是 30 美元。

创建Token

你需要先安装脚手架，并且申请一个token才可以从本地创建App并上传，我们运行：

首先创建一个本地的虚拟环境；

mkdir Models && cd Models
python -m venv .venv
source .venv/bin/activate

然后安装包并申请token

pip install modal
python3 -m modal setup

一键化脚本

我把一键化脚本放上来，另外，这个脚本使用了Flux.1-dev以及一个增加细节的lora, 具体可以查看代码：

新建一个文件，你可以叫他Flux_cli.py;将下面的代码复制粘贴进去，并且按照下面的指示，填写两个地方。如果对GPU没有要求，只需要添加你的hf-token.

import io
import random
import modal


MINUTES = 60

app = modal.App("Flux.1-dev")

# config
image = (
    modal.Image.debian_slim(python_version="3.12")
    .pip_install(
        "accelerate==0.33.0",
        "diffusers==0.31.0",
        "fastapi[standard]==0.115.4",
        "huggingface-hub[hf_transfer]==0.25.2",
        "sentencepiece==0.2.0",
        "torch==2.5.1",
        "torchvision==0.20.1",
        "transformers~=4.44.0",
        "peft"
    )
    .env({"HF_HUB_ENABLE_HF_TRANSFER": "1"})  # faster downloads
)

with image.imports():
    import diffusers
    import torch
    from fastapi import Response


model_id = "black-forest-labs/FLUX.1-dev"


# inference
#### 这里可以选择显卡，你可以选择H100
@app.cls(
    image=image,
    gpu="A100-40GB",
    timeout=10 * MINUTES,
)
class Inference:
    @modal.build()
    @modal.enter()
    def initialize(self):
        self.pipe = diffusers.FluxPipeline.from_pretrained(
            model_id,
            ### 注意这里你要去hugging face网站申请token,才能访问Flux大模型
            token="",
            torch_dtype=torch.bfloat16,
        )
        self.pipe.load_lora_weights("Shakker-Labs/FLUX.1-dev-LoRA-add-details", weight_name="FLUX-dev-lora-add_details.safetensors")
        self.pipe.fuse_lora(lora_scale=0.45)

        
    @modal.enter()
    def move_to_gpu(self):
        self.pipe.to("cuda")

    @modal.method()
    def run(
        self, prompt: str, width: int = 768, height: int = 1024, steps: int = 28, cfg: float = 4.5, batch_size: int = 1, seed: int = None
    ) -> list[bytes]:
        seed = seed if seed is not None else random.randint(0, 2**32 - 1)
        print("seeding RNG with", seed)
        torch.manual_seed(seed)
        images = self.pipe(
            prompt,
            # outputting multiple images per prompt is much cheaper than separate calls
            num_images_per_prompt=batch_size,
            num_inference_steps=steps,
            height=height,
            width=width,  # turbo is tuned to run in four steps
            guidance_scale=cfg,  # turbo doesn't use CFG
            # T5-XXL text encoder supports longer sequences, more complex prompts
            max_sequence_length=512,
        ).images

        image_output = []
        for image in images:
            with io.BytesIO() as buf:
                image.save(buf, format="PNG")
                image_output.append(buf.getvalue())
        torch.cuda.empty_cache()  # reduce fragmentation
        return image_output

    @modal.web_endpoint(docs=True)
    def web(self, prompt: str, width: int = 768, height: int = 1024, steps: int = 28, cfg: float = 4.5, seed: int = None):
        return Response(
            content=self.run.local(  # run in the same container
                prompt, width=width, height=height, steps=steps, cfg=cfg, batch_size=1, seed=seed
            )[0],
            media_type="image/png",
        )