Text Generation Inference (TGI)
Deploy TGI for high-performance text generation using the most popular open-source LLMs
Serverless GPUs are available on Koyeb! Get ready to deploy serverless AI apps on high-performance infrastructure. Deploy today!
Get $200 in credit on your first invoice!
Overview
Text Generation Inference (TGI) is a powerful toolkit designed for deploying and serving Large Language Models (LLMs). It supports high-performance text generation across popular open-source LLMs, including Llama, Falcon, StarCoder, BLOOM, GPT-NeoX, and T5.
This Starter deploys Text Generation Inference (TGI) to Koyeb in one click. By default, it deploys on an Nvidia RTX 4000 SFF Ada GPU Instance using Qwen/Qwen2.5-1.5B. You can change the model during deployment by modifying the MODEL_ID environment variable.
Configuration
You must run Text Generation Inference (TGI) on a GPU Instance type. During initialization, Text Generation Inference (TGI) will download the specified model from Hugging Face.
To change the deployed model, in the Environment variables and files section, modify the MODEL_ID value to the desired model ID you want to deploy.
When deploying Text Generation Inference (TGI) on Koyeb, the following environment variables can be configured to customize the deployment.