All appsText Generation Inference (TGI)

Text Generation Inference (TGI)

Deploy TGI for high-performance text generation using the most popular open-source LLMs

Serverless GPUs are available on Koyeb! Get ready to deploy serverless AI apps on high-performance infrastructure. Deploy today!

Deploy TGI for free

Get your $200 of credit to try Koyeb over 30 days!

Claim credit

Overview

Text Generation Inference (TGI) is a powerful toolkit designed for deploying and serving Large Language Models (LLMs). It supports high-performance text generation across popular open-source LLMs, including Llama, Falcon, StarCoder, BLOOM, GPT-NeoX, and T5.

This Starter deploys Text Generation Inference (TGI) to Koyeb in one click. By default, it deploys on an Nvidia RTX 4000 SFF Ada GPU Instance using Qwen/Qwen2.5-1.5B. You can change the model during deployment by modifying the MODEL_ID environment variable.

Configuration

You must run Text Generation Inference (TGI) on a GPU Instance type. During initialization, Text Generation Inference (TGI) will download the specified model from Hugging Face.

To change the deployed model, in the Environment variables section, modify the MODEL_ID value to the desired model ID you want to deploy.

When deploying Text Generation Inference (TGI) on Koyeb, the following environment variables can be configured to customize the deployment.

Deploy AI apps to production in minutes

Koyeb is a developer-friendly serverless platform to deploy apps globally. No-ops, servers, or infrastructure management.
All systems operational
© Koyeb