Deploy Text Generation Inference (TGI) One-Click App

Serverless GPUs are available on Koyeb! Get ready to deploy serverless AI apps on high-performance infrastructure. Deploy today!

Deploy TGI for free

Get $200 in credit on your first invoice!

Claim credit

Overview

Text Generation Inference (TGI) is a powerful toolkit designed for deploying and serving Large Language Models (LLMs). It supports high-performance text generation across popular open-source LLMs, including Llama, Falcon, StarCoder, BLOOM, GPT-NeoX, and T5.

This Starter deploys Text Generation Inference (TGI) to Koyeb in one click. By default, it deploys on an Nvidia RTX 4000 SFF Ada GPU Instance using Qwen/Qwen2.5-1.5B. You can change the model during deployment by modifying the MODEL_ID environment variable.

Configuration

You must run Text Generation Inference (TGI) on a GPU Instance type. During initialization, Text Generation Inference (TGI) will download the specified model from Hugging Face.

To change the deployed model, in the Environment variables and files section, modify the MODEL_ID value to the desired model ID you want to deploy.

When deploying Text Generation Inference (TGI) on Koyeb, the following environment variables can be configured to customize the deployment.

Text Generation Inference (TGI)

Overview

Configuration

Deploy AI apps to production in minutes