Sep 04, 2024
13 min read

Deploy Portkey Gateway to Koyeb to Streamline Requests to 200+ LLMs

Introduction

Since their debut in the ML/AI landscape, large language models (LLMs) have seen widespread adoption, delivering significant value across diverse fields. Today, a variety of LLMs are available, each with unique capabilities and specialized strengths. Because of their varied focuses, integrating multiple LLMs into a software product offers the opportunity to build AI-powered products that adapt to diverse requirements and workloads with increased reliability and robustness, resulting in an improved overall user experience.

Portkey, a control panel for AI apps, offers a suite of development tools to help with this. Among them is AI Gateway, which connects, load balances, and manages multiple LLMs through a single, consistent API. Portkey's AI Gateway supports over 100 AI models offering seamless access to vision, audio, and image generation capabilities and ensuring uninterrupted performance by allowing model switching during failures.

In this tutorial, you will create a simple LLM querying application with the option to submit questions to two different LLMs — Llama 3 and Groq — using Portkey's AI gateway.

Prerequisites

To successfully follow this tutorial, you'll need:

  • Node.js and npm installed. The demo app in this tutorial uses version 20 of Node.js.
  • A Together AI account.
  • A Groq account.
  • A Koyeb account.

Get LLM API Keys

The two LLMs used in this tutorial require valid API keys for access. In this section, you'll obtain the API keys for both.

First, log into your Together AI account. Click the profile icon in the top right corner and go to the settings page. Then, navigate to the API KEYS tab, copy your API key, and store it securely for future use.

Next, log into your Groq account. In the left sidebar, click the API Keys link and click the Create API Key button to create an API key. Copy your API key and store it securely for future use.

In the next section, you will setup Portkey's AI Gateway using Docker.

Deploy the AI Gateway

Portkey provides, amongst other options, a Docker image for deploying the AI Gateway. This ready-to-use service provides an authenticated API on port 8787, with endpoints for chat and image features from supported LLMs.

To access the AI Gateway API, you must first deploy the Docker image and start the service. Begin by logging into your Koyeb control panel and following these steps:

  1. Click the Create Service button in the sidebar.
  2. Choose the Docker web service option.
  3. Type portkeyai/gateway:latest into the Docker image field.
  4. Select your preferred instance and region.
  5. In the Exposed ports section, change the Port value to 8787.
  6. Choose a name for your service in the Service name section.
  7. Click Deploy.

Koyeb handles the pulling, building, and running of the AI Gateway Docker image. Once the deployment is finished, make sure to copy the service's public URL and save it for future reference.

In the next section, you'll create an npm project for the demo application.

Create a demo project

In this section, you'll set up an npm project and install the essential packages for the demo application. To get started, run the following command in your terminal:

mkdir example-portkey

The command creates an example-portkey directory on your development machine, which will be the application's root directory. Next, run the commands below to initialize a Git repository within the example-portkey directory:

cd example-portkey
git init

The first command switches your terminal to the example-portkey directory, and the second command initializes a Git repository within the directory.

Next, initialize an npm project in the root directory by running this command in your terminal:

npm init -y

The command above creates an npm project with the default configurations in the example-portkey directory, creating a package.json file in the process. Next, install the required packages by executing the commands below:

npm install axios body-parser ejs express
npm install -D dotenv nodemon

These commands install the specified JavaScript packages from the npm registry, with the -D flag indicating that these packages are meant for development only. The installed packages include:

  • axios: A promise based HTTP client for the browser and Node.js.
  • body-parser: A body parsing middleware for Node.js
  • ejs: A JavaScript templating engine.
  • express: A web framework for Node.js.

The development-only packages include:

  • dotenv: A package for handling environment variables during development.
  • nodemon: A package that automatically restarts development servers whenever code changes are detected.

With the packages installed, you've set up an npm project for the demo application. Next, you'll configure an Express service for the application.

Set up the Express server

In this section, you'll configure an Express web server for the demo application.

First, create a file named index.js in the root directory. Then, add the following code to that file:

require('dotenv').config()

const express = require('express')
const path = require('path')
const bodyParser = require('body-parser')

const app = express()
const port = process.env.PORT || 3000

app.use(express.json())
app.use(bodyParser.urlencoded({ extended: true }))

app.set('view engine', 'ejs')
app.set('views', path.join(__dirname, 'views'))

app.get('/', (_req, res) => {
  res.render('index')
})

app.listen(port, () => {
  console.log(`Server is running on http://localhost:${port}`)
})

The code begins by importing the following packages:

  • dotenv: to manage environment variables.
  • express: to create and manage a web server.
  • path: to handle file and directory paths.
  • body-parser: to parse the body of incoming requests.

It then creates an instance of an Express application and sets the server to listen on the port defined by the PORT environment variable, using port 3000 if the variable is not set. The server is configured to parse JSON and URL-encoded data, uses ejs as the view engine, and looks for EJS templates in the views directory.

A route handler is defined for the root path (/), which renders the index view when accessed. Finally, the server starts listening for requests on the specified port and logs a confirmation message that it is running.

Now that the Express server is set up, the next section will walk you through creating a page to query the LLMs.

Set up query page

The LLM query page will include a form with an input field for questions, a dropdown menu to select the LLM, and a submit button. Upon submission, the LLM's response will be displayed on the page.

To begin, create a views directory in the root of your project:

mkdir views

Inside this new views directory, create an index.ejs file and add the following code to it:

<!doctype html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>Portkey Gateway Questionnaire</title>
    <link
      href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.3/dist/css/bootstrap.min.css"
      rel="stylesheet"
      integrity="sha384-QWTKZyjpPEjISv5WaRU9OFeRpok6YctnYmDr5pNlyT2bRjXh0JMhjY6hW+ALEwIH"
      crossorigin="anonymous"
    />
  </head>

  <body>
    <div class="container py-4">
      <div class="bg-light rounded-3 mb-4 p-5">
        <div class="container-fluid">
          <h1 class="display-5 fw-bold">Ask a Question</h1>
          <div class="col-md-8 fs-6">
            <form id="questionForm" method="POST" action="/ask">
              <div class="mb-3">
                <label for="question">Question</label>
                <input
                  type="text"
                  class="form-control col-6"
                  id="question"
                  name="question"
                  placeholder="Type your question here"
                  required
                />
              </div>
              <div class="mb-3">
                <label for="model">Model</label>
                <select class="form-control col-6" id="model" name="model" required>
                  <option value="together">Together AI</option>
                  <option value="groq">Groq</option>
                </select>
              </div>
              <button type="submit" class="btn btn-primary">Ask</button>
            </form>

            <% if(typeof response !=='undefined' ) {%>
            <h2 class="display-7 fw-bold mt-5">Answer:</h2>
            <p id="answer" class="h-100 text-bg-dark rounded-3 px-3 py-3"><%= response %></p>
            <%}%>
          </div>
        </div>
      </div>
    </div>
  </body>
</html>

The code added in the file above provides the HTML structure for the index view, which is rendered by the root route handler. It contains:

  • Bootstrap for styling.
  • An HTML form with an input field and a select dropdown.
  • A submit button.
  • A section to display the LLM response.

To view the page, modify the script section of the package.json file with the following code:

. . .
"scripts": {
  "dev": "nodemon index.js",  // [!code ++]
  "test": "echo \"Error: no test specified\" && exit 1"
}
. . .

The code adds a dev script for starting the development server. It executes the index.js file using nodemon.

To run the demo application on your local machine, enter the following command in your terminal:

npm run dev

Running the command starts the Express server and shows a message confirming that it's running on the specified port. To view the page, open your web browser and go to http://localhost:<YOUR_PORT>. You should see the query form displayed.

In the next section, you'll set up the logic to query the LLMs through the AI Gateway.

Add LLM querying functionality

The AI Gateway provides a chat endpoint at /v1/chat/completions where you can send POST requests to generate LLM responses for chat conversations. In this section, you'll add a route handler to process form data, call the chat endpoint to get a response, and return the response to the page.

Firstly, create a .env file in your root directory and add the code below to the file, substituting your own API keys and gateway URL:

TOGETHER_API_KEY="<YOUR TOGETHER API KEY>"
GROQ_API_KEY="<YOUR GROQ API KEY>"
GATEWAY_URL="<YOUR DEPLOYED AI GATEWAY URL>" # URL without the trailing slash (/)

Since the environment variables entered above are sensitive, make sure they aren't committed to your Git history. To prevent this, run the following command in your terminal:

printf "%s\n" ".env" "node_modules" > .gitignore

The command creates a .gitignore file and adds the .env file and node_modules directory to it, excluding them from the Git history.

Next, make the following changes to the code in your index.js file:

require('dotenv').config()

const express = require('express')
const path = require('path')
const bodyParser = require('body-parser')
const axios = require('axios') // [!code ++]

const app = express()
const port = process.env.PORT || 3000

// Middleware
app.use(express.json())
app.use(bodyParser.urlencoded({ extended: true }))

// set up EJS as view engine
app.set('view engine', 'ejs')
app.set('views', path.join(__dirname, 'views'))

const MODEL_MAP = { // [!code ++]
  groq: { // [!code ++]
    providerSlug: 'groq', // [!code ++]
    model: 'mixtral-8x7b-32768', // [!code ++]
    apiKey: process.env.GROQ_API_KEY, // [!code ++]
  }, // [!code ++]
  together: { // [!code ++]
    providerSlug: 'together-ai', // [!code ++]
    model: 'meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo', // [!code ++]
    apiKey: process.env.TOGETHER_API_KEY, // [!code ++]
  }, // [!code ++]
} // [!code ++]

app.get('/', (_req, res) => {
  res.render('index')
})

app.post('/ask', async (req, res) => { // [!code ++]
  const { question, model } = req.body // [!code ++]
  const modelInfo = MODEL_MAP[model] // [!code ++]

  if (!modelInfo) { // [!code ++]
    return res.status(400).json({ error: 'Model not found' }) // [!code ++]
  } // [!code ++]

  const { providerSlug: provider, apiKey, model: modelName } = modelInfo // [!code ++]
  const data = { // [!code ++]
    model: modelName, // [!code ++]
    messages: [{ role: 'user', content: question }], // [!code ++]
  } // [!code ++]

  try { // [!code ++]
    const url = `${process.env.GATEWAY_URL}/v1/chat/completions` // [!code ++]
    const response = await axios.post(url, data, { // [!code ++]
      headers: { // [!code ++]
        Authorization: `Bearer ${apiKey}`, // [!code ++]
        'Content-Type': 'application/json', // [!code ++]
        'x-portkey-provider': provider, // [!code ++]
      }, // [!code ++]
    }) // [!code ++]

    res.render('index', { response: `${response.data.choices[0].message.content}` }) // [!code ++]
  } catch (error) { // [!code ++]
    res.status(500).json({ error: error.message }) // [!code ++]
  } // [!code ++]
}) // [!code ++]

app.listen(port, () => {
  console.log(`Server is running on http://localhost:${port}`)
})

The modified code imports the axios library and defines a MODEL_MAP object, which stores the configurations for two LLMs. Each configuration includes the provider's name, the model name, and the API key needed for access.

Next, the code sets up a POST route handler for the /ask endpoint. When a request is received, it extracts the question and model from the request body. It then looks up the model's configuration in the MODEL_MAP object and returns an error if the model is not found.

Afterwards, it creates the request payload for the AI Gateway and sends the request, including the API key in the Authorization header and the provider name in the x-portkey-provider header.

Finally, the response from the AI Gateway is returned to the client.

To test the functionality, start the server, open the UI page in your browser, enter a question, choose your preferred LLM, and submit. The response should appear on the page.

In the next section, you will deploy the demo application online on Koyeb.

Deploy to Koyeb

The demo application is now complete and interacts with the deployed AI Gateway service to answer questions using two different LLMs. The final step is to deploy the demo application to the cloud on Koyeb.

To get started, update the script section in your package.json file with the code below:

...
"scripts": {
  "dev": "nodemon index.js",
  "start": "node index.js",  // [!code ++]
  "test": "echo \"Error: no test specified\" && exit 1"
}
...

The code above modifies the scripts section of the package.json file, adding a start script which runs the index.js file using node.

Next, create a GitHub repository for your code, then use the following command to push your local code to the repository:

git add --all
git commit -m "Complete AI Gateway powered LLM query app."
git remote add origin git@github.com/<YOUR_GITHUB_USERNAME>/<YOUR_REPOSITORY_NAME>.git
git branch -M main
git push -u origin main

To deploy the code from the GitHub repository, go to the Koyeb control panel. Then, on the Overview page:

  1. Click Create Service in the left sidebar.
  2. Choose the GitHub deploy option.
  3. Search for and select your repository. Alternatively, you can use the public example repo for this article by pasting the following in the Public GitHub repository field: https://github.com/koyeb/example-portkey.
  4. Choose your preferred instance and deployment region.
  5. Under Environment variables, for each variable in your .env file:
  • Enter the variable name.
  • Select Secret as the type.
  • For the value, click Create secret, then specify the secret name and value, and click Create.
  1. In the Service name section, enter a name for the service or use the default.
  2. Click Deploy to start the deployment.

The Koyeb platform builds and deploys your code, then starts the application using the start script from the package.json file. You can track the deployment progress through the provided logs. Once the deployment is complete and health checks pass, your application will be up and running.

Click the provided public URL to access your live application.

Conclusion

In this tutorial, you built a simple application that queries two different LLMs using Portkey's AI Gateway. The AI Gateway offers more than just chat completion, with features like caching, fallbacks, and load balancing. For more details on these features, refer to the Portkey Gateway documentation.

When your application is deployed from your own repository using the Git deployment option, any code push to the deployed branch will automatically trigger a new build. The changes will go live once the deployment succeeds. If the deployment fails, Koyeb will keep the last successful production deployment active, ensuring your application continues to run without interruption.


Deploy AI apps to production in minutes

Koyeb is a developer-friendly serverless platform to deploy apps globally. No-ops, servers, or infrastructure management.
All systems operational
© Koyeb