Text Generation Inference (TGI)
Deploy TGI for high-performance text generation using the most popular open-source LLMs
Serverless GPUs are available on Koyeb! Get ready to deploy serverless AI apps on high-performance infrastructure. Deploy today!
Get your $200 of credit to try Koyeb over 30 days!
Overview
Text Generation Inference (TGI) is a powerful toolkit designed for deploying and serving Large Language Models (LLMs). It supports high-performance text generation across popular open-source LLMs, including Llama, Falcon, StarCoder, BLOOM, GPT-NeoX, and T5.
This Starter deploys Text Generation Inference (TGI) to Koyeb in one click. By default, it deploys on an Nvidia RTX 4000 SFF Ada GPU Instance using Qwen/Qwen2.5-1.5B. You can change the model during deployment by modifying the MODEL_ID environment variable.
Configuration
You must run Text Generation Inference (TGI) on a GPU Instance type. During initialization, Text Generation Inference (TGI) will download the specified model from Hugging Face.
To change the deployed model, in the Environment variables section, modify the MODEL_ID value to the desired model ID you want to deploy.
When deploying Text Generation Inference (TGI) on Koyeb, the following environment variables can be configured to customize the deployment.