Build a Retrieval-Augmented Generation Chatbot using pgvector

Introduction

The adoption of AI driven chatbots is accelerating in order to respond quickly and save the effort of manually answering each question. Whether performing searches over documents or helping users with queries on a product, AI is used everywhere.

But what goes into configuring these chatbots so that they understand unique content? How are they able to generate relevant answers to user queries? Retrieval-Augmented Generation is an AI framework that allows LLMs to extend their existing knowledge based on the unique content passed from external sources.

In this tutorial, you will be use OpenAI's embedding API alongside pgvector, an open-source vector similarity search extension for PostgreSQL, to create and deploy a RAG chatbot on Koyeb. It will be able to generate relevant responses with AI while continuously updating its knowledge base in real-time.

You can deploy the Retrieval-Augmented Generation chatbot as configured in this guide using the Deploy to Koyeb button below:

Note: You will need to replace the values of the environment variables in the configuration with your own REPLICATE_TOKEN, OPENAI_API_KEY and POSTGRES_URL. Remember to add the ?sslmode=require parameter to the POSTGRES_URL value.

Requirements

To successfully follow this tutorial, you will need the following:

Node.js and npm installed. The demo app in this tutorial uses version 18 of Node.js.
Git installed.
An OpenAI account.
A Replicate account.
A Koyeb account to deploy the application.

Steps

To complete this guide and deploy the Retrieval-Augmented Generation chatbot, you'll need to follow these steps:

Generate the Replicate API token
Generate the OpenAI API token
Create a PostgreSQL database on Koyeb
Create a new Remix application
Add Tailwind CSS to the application
Create vector embeddings of a text using OpenAI and LiteLLM
Add seed data to the database
Build the components of our application
Define the Remix application routes
Build the homepage as the chatbot interface
Build the chat API endpoint
Deploy the Remix app to Koyeb

Generate the Replicate API token

HTTP requests to the Replicate API require an authorization token. To generate this token, log in to your Replicate account and navigate to the API Tokens page. Enter a name for your token and click the Create token button to generate a new token. Copy and securely store this token for later use as REPLICATE_API_TOKEN environment variable.

Locally, set and export the REPLICATE_API_TOKEN environment variable by executing the following command:

export REPLICATE_API_TOKEN="<YOUR_REPLICATE_TOKEN>"

Generate the OpenAI API token

HTTP requests to the OpenAI API require an authorization token. To generate this token, log in to your OpenAI account and navigate to the API Keys page. Enter a name for your token and click the Create new secret key button to generate a new key. Copy and securely store this token for later use as OPENAI_API_KEY environment variable.

Locally, set and export the OPENAI_API_KEY environment variable by executing the following command:

export OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"

Create a PostgreSQL database on Koyeb

To create a PostgreSQL database, log in to the Koyeb control panel and navigate to the Databases tab. Next, click on the Create Database Service button. Here, either accept or replace the default generated name, choose your preferred region, and confirm or customize the default role. When you are ready, click the Create Database Service button to provision your PostgreSQL database service.

Once you've created the database service, a list of your existing database services will be displayed. From there, select the newly created database service, copy the database connection string, and securely store it for later use as the POSTGRES_URL environment variable.

Create a new Remix application

To start building the application, create a new Remix project. Open your terminal and run the following command:

npx create-remix@latest rag-chatbot

npx allows us to execute npm packages binaries (create-remix in our case) without having to first install them globally.

When prompted, choose:

Yes when prompted to initialize a new git repository
Yes when prompted to install the npm dependencies

Once the installation is done, you can move into the project directory and start the app. In another terminal window, start the development server by typing:

cd rag-chatbot
npm run dev

The app should be running on localhost:3000. Currently, it just displays a welcome page with some links to documentation. We are going to leave this running as we continue to build the app.

Note: According to one of the Remix Decisions, using .server in the filename is the only guaranteed way to exclude code from the client. You'll see later how we create the database connection and seed the data using server only modules.

Next, in your first terminal window, run the command below to install the necessary libraries and packages for building the application:

npm install pg pgvector
npm install -D dotenv tsx @types/pg

The above command installs the packages passed to the install command, with the -D flag specifying the libraries intended for development purposes only.

The libraries installed include:

pg: A PostgreSQL client for Node.js.
pgvector: A vector similarity search library for Node.js.

The development-specific libraries include:

@types/pg: Type definitions for pg.
tsx: To execute and rebuild TypeScript efficiently.
dotenv: A library for handling environment variables.

Add Tailwind CSS to the application

For styling the app, we will be using Tailwind CSS. Install and set up Tailwind at the root of our project's directory by running:

npm install -D tailwindcss

Next, run the init command to create tailwind.config.ts:

npx tailwindcss init --ts

Next, we need to make use of Tailwind directives in our CSS file. Directives are custom Tailwind-specific at-rules that offer special functionalities for Tailwind CSS projects.

Create a tailwind.css file in the app directory, and add the snippet below in it:

/* File: app/tailwind.css */

@tailwind base;
@tailwind components;
@tailwind utilities;

Tailwind scans our HTML, JavaScript/Typescript components, and any other template files for class names, and then generates all of the corresponding CSS for those styles. We need to configure our template paths so that Tailwind can generate all of the CSS we need by updating the content array of tailwind.config.ts as below:

// File: tailwind.config.ts
import type { Config } from 'tailwindcss'

export default {
  content: [], // [!code --]
  content: ['./app/**/*.{ts,tsx,js,jsx}'], // [!code ++]
  theme: {
    extend: {},
  },
  plugins: [],
} satisfies Config

Lastly, you'll import and use the compiled app/tailwind.css inside app/root.tsx. Make the following changes to the default root.tsx file to finish setting up Tailwind with your Remix app:

// File: app/root.tsx
 import { cssBundleHref } from "@remix-run/css-bundle"; // [!code --]
 import stylesheet from '~/tailwind.css' // [!code ++]

. . .

export const links: LinksFunction = () => [
  ...(cssBundleHref ? [{ rel: "stylesheet", href: cssBundleHref }] : []), // [!code --]
  { rel: 'stylesheet', href: stylesheet } // [!code ++]
];

. . .

Create vector embeddings of a text using OpenAI and LiteLLM

OpenAI provides an embeddings API to generate vector embeddings of a text string. Among multiple models offered is the text-embedding-3-small, which is the newest and performant embedding model. By default, the length of the embedding vector will be 1536. You'll use this model to generate embeddings for the seed data to be added to the database. You'll use the litellm package to call OpenAI embedding models.

In your terminal window, execute the following to install LiteLLM:

npm install litellm

To create vector embeddings of a text, you'll simply use the asynchronous embedding method from litellm with text-embedding-3-small as the model name. For example, you could obtain the vector embedding with the flow described above like this:

import { embedding } from 'litellm'

// Generate embeddings of a message using OpenAI via LiteLLM
const embeddingData = await embedding({
  model: 'text-embedding-3-small',
  input: 'Rishi is enjoying using LiteLLM',
})

// Using the OpenAI output format, obtain the embedding vector stored in
// the first object of the data array
const getEmbeddingVector = embeddingData.data[0].embedding

Note: You need to make sure that the OPENAI_API_KEY exists as an environment variable. Refer to an earlier section Generate OpenAI Token on how to generate the OpenAI API Token.

In this section, we'll implement a similar flow in our application.

Set up the database connection

The node-postgres (pg) library provides a low-level interface to interact directly with PostgreSQL databases using raw SQL queries.

To initiate the setup of a database connection, generate a .env file in the root directory of your project and include the following code, replacing the placeholder values with your own:

# Koyeb Managed Postgres Instance URL
POSTGRES_URL="<YOUR_DATABASE_CONNECTION_URL>?sslmode=require"

The addition of the sslmode=require parameter to the POSTGRES_URL value above indicates that the database connection should be established with SSL enabled.

The values added to the .env file should be kept secret and not included in Git history. By default, Remix CLI ensures that .env is added to the .gitignore file in your project.

Create the database client

Following that, establish a database client to connect to the database. To achieve this, create a postgres directory in the app directory by running the following command:

mkdir app/postgres

Inside this app/postgres directory, create a db.server.ts file with the following code:

// File: app/postgres/db.server.ts
// Load the environment variables
import 'dotenv/config'
// Load the postgres module
import pg from 'pg'

// Create a connection string to the Koyeb managed postgres instance
const connectionString: string = `${process.env.POSTGRES_URL}`

// Create a in-memory pool so that it's cached for multiple calls
export default new pg.Pool({ connectionString })

The code imports the dotenv configuration, making sure that all the environment variables in the .env file are present at runtime. Afterwards, the code imports the pg library, retrieves the database URL from the environment variables, and uses it to create a new pool instance, which is subsequently exported.

Create the database schema

Next, create a schema.server.ts file within the app/postgres directory and add the following code to it:

// File: app/postgres/schema.server.ts
import pool from './db.server'

async function createSchema() {
  // Create the vector extension if it does not exist
  await pool.query('CREATE EXTENSION IF NOT EXISTS vector;')
  // Create the data table if it does not exist
  await pool.query(
    'CREATE TABLE IF NOT EXISTS data (id SERIAL PRIMARY KEY, metadata text, embedding vector(1536));'
  )
  console.log('Finished setting up the database.')
}

createSchema()

The code above defines how data will be stored, organized, and managed in the database. Using the pool database instance, it executes an SQL query to create the vector extension within the database if it does not already exist.

The vector extension enables PostgreSQL databases to store vector embeddings. After creating the vector extension, a subsequent SQL query creates a data table within the database. This table comprises three columns:

An id column for storing auto-incrementing unique identifiers for each row in the table.
A metadata column with a text data type for storing text data.
An embedding column with a vector(1536) data type. This column will store vector data with a length of 1536 elements.

After executing the SQL queries, a message is printed to the console if there’s an error during the execution.

To execute the code added to the schema file, update the script section of your package.json file with the following code:

{
. . .
  "scripts": {
   "db:setup": "tsx app/postgres/schema.server", // [!code ++]
    . . .
  }
. . .
}

The db:setup script runs the code within the schema.server.ts file when executed.

Test the database setup locally

To execute it, run the following command in your terminal window:

npm run db:setup

If the command is executed successfully, you will see no logs in your terminal window, marking the completion of the database connection setup.

Add seed data to the database

The RAG chatbot will operate by retrieving metadata from the database with vector embeddings that are closest match to the vector embedding of the user's query. In this section, we will insert 5 facts about Rishi into the database.

To get started, create a embedding.server.ts file in the app/postgres directory and include the following code within the file:

// File: app/postgres/embedding.server.ts
import { embedding } from 'litellm'
import { toSql } from 'pgvector/pg'

import pool from './db.server'

interface Row {
  metadata: string
  distance: number
}

const getErrorMessage = (error: unknown) => {
  if (error instanceof Error) return error.message
  return String(error)
}

// Utility to save embeddings in the Koyeb managed postgres instance
export const saveEmbedding = async (metadata: string, embedding: string): Promise<void> => {
  try {
    await pool.query({
      text: 'INSERT INTO data (metadata, embedding) VALUES ($1, $2)',
      values: [metadata, embedding],
    })
  } catch (e) {
    console.log(getErrorMessage(e))
  }
}

// Utility to find relevant embeddings from the Koyeb managed postgres instance
export const findRelevantEmbeddings = async (embedding: string): Promise<Row[] | undefined> => {
  try {
    const res = await pool.query(
      'SELECT metadata, embedding <-> $1 AS distance FROM data ORDER BY distance LIMIT 3',
      [embedding]
    )
    return res.rows
  } catch (e) {
    console.log(getErrorMessage(e))
  }
}

// Utility to create embedding vector using OpenAI via LiteLLM
export const generateEmbeddingQuery = async (input: string): Promise<string | undefined> => {
  try {
    // Generate embeddings of a message using OpenAI via LiteLLM
    const embeddingData = await embedding({
      input,
      model: 'text-embedding-3-small',
    })
    return toSql(embeddingData.data[0].embedding)
  } catch (e) {
    console.log(getErrorMessage(e))
  }
}

The code starts by importing various libraries:

pool database instance for connecting to the database
litellm for creating embeddings via OpenAI
dotenv for managing environment variables
pgvector for handling vector embeddings

The code further defines and exports four functions:

getErrorMessage: takes an error parameter, checks if it's an instance of Error, and returns its message if true, else it returns the string representation of the error. This is the recommended way of handling error messages (by Kent C. Dodds).
saveEmbedding: executes an SQL query to insert a text and its corresponding vector embedding into the data table.
findSimilarEmbeddings: given a vector embedding, executes an SQL query to retrieve the top 3 most similar vector embeddings from the database along with their corresponding metadata.
generateEmbeddingQuery: receives a text input and obtains a vector embedding from Open AI API's response. It then transforms it into an SQL vector using the toSql method from pgvector and returns it.

To generate seed data for the database, create a seed.server.ts file in the app/postgres directory. Add the code below to the file:

// File: app/postgres/seed.server.ts
import { generateEmbeddingQuery, saveEmbedding } from './embedding.server'

const About = [
  'Rishi is a quick learner.',
  "Rishi is blown away by Koyeb's service.",
  'Rishi has been happy using Postgres so far.',
  'Rishi is having fun marketing www.launchfa.st.',
  'Rishi is super excited to collaborate on technical writing.',
]

async function seed() {
  await Promise.all(
    About.map(async (information: string) => {
      const embedding = await generateEmbeddingQuery(information)
      if (embedding) saveEmbedding(information, embedding)
    })
  )
  console.log('Finished seeding the database.')
}

seed()

The code above imports the generateEmbeddingQuery and saveEmbedding methods, and defines an About array, which contains 5 facts related to Rishi.

For each fact in the About array, a vector embedding query is generated using the generateEmbeddingQuery function and then saved to the database using the saveEmbedding function. Errors that occur while creating and saving a vector embedding are logged on the console.

To execute the code in the seed file, update the scripts section of your package.json file with the code below:

{
. . .
  "scripts": {
  "db:seed": "tsx app/postgres/seed.server", // [!code ++]
    . . .
  }
. . .
}

Test the database locally

The db:seed script added above executes the seed.server.ts file. To run the script, run the code below in your terminal window:

npm run db:seed

Successfully running the command above should display no error message in your terminal window.

In this section, you have added 5 things about Rishi and their corresponding vector embeddings to the database.

Build the components of our application

It is now time to create the components that'll help you quickly prototype the UI and handle the complexities of creating a chatbot application with Remix.

Using shadcn/ui components

To quickly prototype the chat interface, you'll set up shadcn/ui with Remix. Specifically from shadcn/ui you'll be able to show toasts, use a baked-in accessible input element, and button element. In your terminal window, run the command below to start setting up the shadcn/ui:

npx shadcn-ui@latest init

You will be asked a few questions to configure a components.json, answer with the following:

✔ Would you like to use TypeScript (recommended)? no / **yes**
✔ Which style would you like to use? › **Default**
✔ Which color would you like to use as base color? › **Slate**
✔ Where is your global CSS file? **app/tailwind.css**
✔ Would you like to use CSS variables for colors? no / **yes**
✔ Are you using a custom tailwind prefix eg. tw-? (Leave blank if not)
✔ Where is your tailwind.config.js located? **tailwind.config.ts**
✔ Configure the import alias for components: **~/components**
✔ Configure the import alias for utils: **~/lib/utils**
✔ Are you using React Server Components? **no** / yes
✔ Write configuration to components.json. Proceed? **yes**

With above, you've set up a CLI that allows us to easily add React components to your Remix application.

In your terminal window, run the command below to get the button, input and toast elements:

npx shadcn-ui@latest add button
npx shadcn-ui@latest add input
npx shadcn-ui@latest add toast

With above, you should now see a ui directory inside the app/components directory containing button.tsx, input.tsx, toaster.tsx, toast.tsx, and use-toast.ts.

Open the app/root.tsx, and make the following changes:

import stylesheet from '~/tailwind.css'
import { Toaster } from '~/components/ui/toaster' // [!code ++]
import type { LinksFunction } from '@remix-run/node'
import { Links, LiveReload, Meta, Outlet, Scripts, ScrollRestoration } from '@remix-run/react'

export const links: LinksFunction = () => [{ rel: 'stylesheet', href: stylesheet }]

export default function App() {
  return (
    <html lang="en">
      <head>
        <meta charSet="utf-8" />
        <meta name="viewport" content="width=device-width, initial-scale=1" />
        <Meta />
        <Links />
      </head>
      <body>
        <Outlet />
        <ScrollRestoration />
        <Scripts />
        <LiveReload />
        <Toaster /> // [!code ++]
      </body>
    </html>
  )
}

In the code above, you import the Toaster component (created by shadcn/ui), and made sure that it's present on each page in your Remix application. This allows you to use the useToast hook (exported by use-toast.ts) in your React components to show toasts in your application.

Highlight code blocks with react-syntax-highlighter

To render code blocks in the responses from AI in a visually appealing manner, we'll use the react-syntax-highlighter library. In your terminal window, install react-syntax-highlighter via the following command:

npm install react-syntax-highlighter

Next, create a code-block.tsx file inside app/components directory:

// File: app/components/code-block.tsx

// Inspired by Chatbot-UI and modified to fit the needs of this project
// https://github.com/mckaywrigley/chatbot-ui/blob/main/components/messages/message-codeblock.tsx

import { Prism as SyntaxHighlighter } from 'react-syntax-highlighter'

interface CodeBlockProps {
  language: string
  value: string
}

const CodeBlock = ({ language, value }: CodeBlockProps) => {
  return (
    <div className="relative w-full font-sans codeblock bg-zinc-950">
      <div className="flex items-center justify-between w-full px-6 py-2 pr-4 bg-zinc-800 text-zinc-100">
        <span className="text-xs lowercase">{language}</span>
      </div>
      <SyntaxHighlighter
        PreTag="div"
        showLineNumbers
        language={language}
        customStyle={{
          margin: 0,
          width: '100%',
          background: 'transparent',
          padding: '1.5rem 1rem',
        }}
        codeTagProps={{
          style: {
            fontSize: '0.9rem',
            fontFamily: 'var(--font-mono)',
          },
        }}
      >
        {value}
      </SyntaxHighlighter>
    </div>
  )
}

CodeBlock.displayName = 'CodeBlock'

export { CodeBlock }

The code above begins with importing the React-compatible Prism syntax highlighter component from the react-syntax-highlighter library. Afterwards, it exports a CodeBlock React component that shows the language of the code block (for example, JavaScript) above the rendered code block.

Creating a memoized React markdown component

You'll want to render the responses from the AI as quickly as possible. For this, you'll set up an endpoint with streaming enabled that'll return the response in the form of tokens. To save re-renders of your React component responsible for showing the response from AI to the user, you'll use memo from React. With memo, you are able to make your UI render faster by skipping re-renders if the props of your React component have not changed.

To render responses from AI into HTML-friendly manner, we'll use the react-markdown library. In your terminal window, install react-markdown via the following command:

npm install react-markdown

Next, create a mark.tsx file inside app/components directory with the following contents:

// File: app/components/mark.tsx
import { memo } from 'react'
import ReactMarkdown from 'react-markdown'

export const MemoizedReactMarkdown = memo(
  ReactMarkdown,
  (prevProps, nextProps) =>
    prevProps.children === nextProps.children && prevProps.className === nextProps.className
)

The code above begins by importing memo from React and the Markdown renderer component from react-markdown. You're now done with optimizing re-renders in your Remix application.

Next, you'll use this component to create another component that'll highlight code blocks, mathematical expressions, and frontmatter responses beautifully. In your terminal window, execute the following command:

npm install remark-gfm remark-math

The command above installs the packages that can detect and create relevant formats for mathematical expressions and frontmatter responses from a given text.

Create a memoized-react-markdown.tsx file inside the app/components directory with the following content:

// File: app/components/memoized-react-markdown.tsx
import clsx from 'clsx'
import remarkGfm from 'remark-gfm'
import remarkMath from 'remark-math'
import { CodeBlock } from '~/components/code-block'
import { MemoizedReactMarkdown } from '~/components/mark'

const MemoizedMD = ({ message, index }) => {
  return (
    <MemoizedReactMarkdown
      remarkPlugins={[remarkGfm, remarkMath]}
      components={{
        p({ children }) {
          return <p className="mb-2 last:mb-0">{children}</p>
        },
        code({ node, inline, className, children, ...props }) {
          const match = /language-(\w+)/.exec(className || '')
          if (inline) {
            return (
              <code className={className} {...props}>
                {children}
              </code>
            )
          }
          return (
            <CodeBlock
              key={Math.random()}
              language={(match && match[1]) || ''}
              value={String(children).replace(/\n$/, '')}
              {...props}
            />
          )
        },
      }}
      className={clsx(
        'prose dark:prose-invert prose-p:leading-relaxed prose-pre:p-0 mt-4 w-full break-words pt-4',
        index !== 0 && 'border-t'
      )}
    >
      {message}
    </MemoizedReactMarkdown>
  )
}

export default MemoizedMD

The code above begins by importing the packages you've just installed and the code block component we created in the previous subsection. It then uses the components prop from react-markdown which allows you to style each HTML element in your own desired way.

Create a knowledge base component

Let's say you want to update the database (aka knowledge base) with new information so that the chatbot can learn and give out responses based on the latest data. You'll create a component that'll take in the new information as sentences separated by commas (,) and call an API that will take care of inserting them into the database.

Create a knowledge-base.tsx file inside app/components directory with the following content:

// File: app/components/knowledge-base.tsx

import { useState } from 'react'
import { Maximize2 } from 'lucide-react'
import { Form, useNavigation } from '@remix-run/react'

export default function KnowledgeBase() {
  const { state } = useNavigation()
  const [expanded, setExpanded] = useState(true)
  return (
    <Form id="rag" method="post" className="absolute top-0 border p-3 m-2 rounded-sm right-0 flex flex-col items-start">
      <div className="cursor-pointer absolute top-1.5 right-1.5">
        <Maximize2
          size={12}
          className="fill-black"
          onClick={() => {
            setExpanded((expanded) => !expanded)
          }}
        />
      </div>
      {expanded && <span className="text-xs font-medium">Update Knowledge Base</span>}
      {expanded && (
        <textarea
          id="content"
          name="content"
          autoComplete="off"
          placeholder="Add to the existing knowledge base. Seperate sentences with comma (,)"
          className="mt-2 p-1 border border-black/25 outline-hidden text-xs h-[45px] w-[280px] rounded-sm"
        />
      )}
      {expanded && (
        <button disabled={state === 'submitting'} className="mt-3 text-sm px-2 py-1 border rounded-sm" type="submit">
          {state === 'submitting' ? <>Submitting...</> : <>Submit &rarr;</>}
        </button>
      )}
    </Form>
  )
}

The code above imports the Form component and useNavigation hook from Remix. These are used to handle form responses via route actions in Remix. Moreover, the code informs the user supplying the chatbot with new information when their content is being processed.

Use Vercel's ai package to prototype the chat UI

To handle the complexity of managing messages between a User and AI and calling the API to give out responses based on the conversation, you'll use the open source ai package from Vercel. In your terminal window, execute the following command to install it:

npm install ai

Define the Remix application routes

With Remix, creating a JavaScript or TypeScript file in the app/routes directory maps it to a route in your application. The name of the file created maps to the route's URL pathname (with the exception of _index.tsx, which is the index route).

Creating nested paths that do not rely on the parent layout is done by inserting a trailing underscore at the end of the first segment in the file name. For example, you want to serve requests to /api/something without relying on any parent layout, you would start the file with the name as api_ (first segment of the route) and then append .something to it.

The structure below is what our routes folder will look like at the end of this section:

├── _index.tsx
└── api_.chat.tsx

_index.tsx will serve as the homepage, i.e. localhost:3000.
api_.chat.tsx will serve responses to localhost:3000/api/chat.

URL	Matched Routes
`/`	`app/routes/_index.tsx`
`/api/chat`	`app/routes/api_.chat.tsx`

Build the homepage as the chatbot interface

To get started, open the app/routes/_index.tsx file and replace the existing code with the following:

import { useChat } from 'ai/react'
import { ChevronRight } from 'lucide-react'
import { Input } from '~/components/ui/input'
import KnowledgeBase from '~/components/knowledge-base'
import MemoizedMD from '~/components/memoized-react-markdown'

export default function Index() {
  const { messages, input, handleInputChange, handleSubmit } = useChat()
  return <>
    <KnowledgeBase />
    <div className="flex flex-col items-center">
      <div className="relative flex flex-col items-start w-full max-w-lg px-5 overflow-hidden">
        <form onSubmit={handleSubmit} className="flex flex-row w-[75vw] max-w-[500px] items-center space-x-2 fixed bottom-4">
          <Input
            id="message"
            value={input}
            type="message"
            autoComplete="off"
            onChange={handleInputChange}
            placeholder="What's your next question?"
            className="border-black/25 hover:border-black placeholder:text-black/75 rounded-sm"
          />
          <button className="size-6 flex flex-col border border-black/50 items-center justify-center absolute right-3 rounded-full hover:bg-black hover:text-white" type="submit">
            <ChevronRight size={18} />
          </button>
        </form>
        <div className="w-full flex flex-col max-h-[90vh] overflow-y-scroll">
          {messages.map((i, _) => (
            <MemoizedMD key={_} index={_} message={i.content} />
          ))}
        </div>
      </div>
    </div>
  </>
}

The code above begins by importing the useChat hook from ai package, the markdown component that you created earlier to render each message with it, the Input element from shadcn/ui, and the knowledge base component. In the React component on homepage, you'll deconstruct the following from the useChat hook:

The reactive messages array which contains the conversation between the user and AI
The reactive input value inserted by user into the input field
The handleInputChange method to make sure the input value is in sync with the changes
The handleSubmit method to call the API (/api/chat) to get a response for the user's latest message

Now, remember that the KnowledgeBase component is a form element. To handle form submissions in Remix on the server, you'll use Remix route actions. Update the homepage code in the app/routes/_index.tsx file with the following:

import { useChat } from 'ai/react'
import { useEffect } from 'react' // [!code ++]
import { ChevronRight } from 'lucide-react'
import { Input } from '~/components/ui/input'
import { useActionData } from '@remix-run/react' // [!code ++]
import { useToast } from '~/components/ui/use-toast' // [!code ++]
import KnowledgeBase from '~/components/knowledge-base'
import { ActionFunctionArgs, json } from '@remix-run/node' // [!code ++]
import MemoizedMD from '~/components/memoized-react-markdown'
import { generateEmbeddingQuery, saveEmbedding } from '~/postgres/embedding.server' // [!code ++]

export const action = async ({ request }: ActionFunctionArgs) => { // [!code ++]
  const formData = await request.formData() // [!code ++]
  const content = formData.get('content') as string // [!code ++]
  if (content) { // [!code ++]
    const messages = content.split(',').map((i: string) => i.trim()) // [!code ++]
    if (messages.length > 0) { // [!code ++]
      await Promise.all( // [!code ++]
        messages.map(async (information: string) => { // [!code ++]
          const embedding = await generateEmbeddingQuery(information) // [!code ++]
          if (embedding) saveEmbedding(information, embedding) // [!code ++]
        }), // [!code ++]
      ) // [!code ++]
      return json({ code: 1 }) // [!code ++]
    } // [!code ++]
  } // [!code ++]
  return json({ code: 0 }) // [!code ++]
} // [!code ++]

export default function Index() {
  const { toast } = useToast() // [!code ++]
  const actionData = useActionData<typeof action>() // [!code ++]
  const { messages, input, handleInputChange, handleSubmit } = useChat()
  useEffect(() => { // [!code ++]
    if (actionData) { // [!code ++]
      if (actionData['code'] === 1) { // [!code ++]
        toast({ // [!code ++]
          description: 'Knowledge base updated succesfully.', // [!code ++]
        }) // [!code ++]
        const formSelector = document.getElementById('rag') as HTMLFormElement // [!code ++]
        if (formSelector) formSelector.reset() // [!code ++]
      } else { // [!code ++]
        toast({ // [!code ++]
          description: 'There was an error in updating the knowledge base.', // [!code ++]
        }) // [!code ++]
      } // [!code ++]
    } // [!code ++]
  }, [actionData]) // [!code ++]
  return (
    <>
      <KnowledgeBase />
      {/* Rest of the component as is */}
    </>
  )
}

The changes above begin by importing the following:

Functions that will create embedding queries and save it to the database
The useActionData hook from Remix that is responsible for managing the state of form response
The useToast hook from shadcn/ui that allow you to show toasts with a function call
The json method from Remix that allows to create Response objects according as defined by web standards

The changes then show a creation of an action function that's responsible for:

Listening only to non-GET requests (for example, POST, PUT, DELETE) on the homepage
Parsing the form data from the request
Splitting the content on commas (,) to get an array of text when content is found inside the form data
Creating and saving the embedding vector along with the respective text into the database

The additions also include use of the useToast and useActionData hooks. Once the form is submitted, the data returned by the action function is accessible via the useActionData hook. From the response returned, you'll be able to show toasts with suitable messages to inform if the update of the knowledge base was successful or not.

Build the chat API endpoint

Create a file named api_.chat.tsx in the app/routes directory to handle the POST request created by the useChat hook in our React component.

Use vector search to create relevant context from the query

Before we continue, let's discuss briefly why relevant context creation is important when making a RAG chatbot. By default, an AI API will be able respond using only the knowledge that it has been trained on. We want to make sure that the chatbot knowledge base is updated with the specific content that a user will ask questions about.

To create such context in realtime, you will search for relevant vector embeddings that closely represent the vector embedding of the user's query. Afterwards, you can obtain the metadata associated with the relevant vectors and set the relevant context to a string containing all of the metadata together.

To do all of that, put the following code in the app/routes/api_chat.tsx file:

import { json } from '@remix-run/node'
import type { ActionFunctionArgs } from '@remix-run/node'
import { findRelevantEmbeddings, generateEmbeddingQuery } from '~/postgres/embedding.server'

export const action = async ({ request }: ActionFunctionArgs) => {
  // Set of messages between user and chatbot
  const { messages = [] } = await request.json()
  if (messages.length < 1) return json({ message: 'No conversation found.' })

  // Get the latest question stored in the last message of the chat array
  const userMessages = messages.filter((i: { role: string }) => i.role === 'user')
  const input = userMessages[userMessages.length - 1].content

  // Generate embeddings of the latest question using OpenAI
  const embedding = await generateEmbeddingQuery(input)
  if (!embedding) return json({ message: 'Error while generating embedding vector.' })

  // Fetch the relevant set of records based on the embedding
  let similarQuestions = await findRelevantEmbeddings(embedding)
  if (!similarQuestions) {
    similarQuestions = []
    console.log({ message: 'Error while finding relevant vectors.' })
  }

  // Combine all the metadata of the relevant vectors
  const contextFromMetadata = similarQuestions.map((i) => i.metadata).join('\n')
}

Use Replicate to obtain LLAMA 2 70B chat model responses

To easily fetch model responses from the Replicate platform, we'll use the replicate SDK. In your terminal window, execute the following command:

npm install replicate

Previously, you were able to successfully create the relevant context for the user's query. It's now time to prompt LLAMA 2 70B, a chat model from Meta, in order to enhance the AI response by inserting the context as part of the system knowledge. Because we want to get the response to the user as quickly as possible, we'll enable streaming using the ReplicateStream functionality exported by the ai package.

To do all of that, update the app/routes/api_.chat.tsx file with the following code:


import { json } from '@remix-run/node'
import type { ActionFunctionArgs } from '@remix-run/node'
import { ReplicateStream, StreamingTextResponse } from 'ai' // [!code ++]
import { experimental_buildLlama2Prompt } from 'ai/prompts'
import Replicate from 'replicate' // [!code ++]
import { findRelevantEmbeddings, generateEmbeddingQuery } from '~/postgres/embedding.server'

// Instantiate the Replicate API // [!code ++]
const replicate = new Replicate({ // [!code ++]
  auth: process.env.REPLICATE_API_TOKEN, // [!code ++]
}) // [!code ++]

export const action = async ({ request }: ActionFunctionArgs) => {
  // Set of messages between user and chatbot
  const { messages = [] } = await request.json()
  if (messages.length < 1) return json({ message: 'No conversation found.' })
  // Get the latest question stored in the last message of the chat array
  const userMessages = messages.filter((i: { role: string }) => i.role === 'user')
  const input = userMessages[userMessages.length - 1].content
  // Generate embeddings of the latest question using OpenAI
  const embedding = await generateEmbeddingQuery(input)
  if (!embedding) return json({ message: 'Error while generating embedding vector.' })
  // Fetch the relevant set of records based on the embedding
  let similarQuestions = await findRelevantEmbeddings(embedding)
  if (!similarQuestions) {
    similarQuestions = []
    console.log({ message: 'Error while finding relevant vectors.' })
  }
  // Combine all the metadata of the relevant vectors
  const contextFromMetadata = similarQuestions.map((i) => i.metadata).join('\n')
  // Now use Replicate LLAMA 70B streaming to perform the autocompletion with context // [!code ++]
  const response = await replicate.predictions.create({ // [!code ++]
    // You must enable streaming. // [!code ++]
    stream: true, // [!code ++]
    // The model must support streaming. See https://replicate.com/docs/streaming // [!code ++]
    model: 'meta/llama-2-70b-chat', // [!code ++]
    // Format the message list into the format expected by Llama 2 // [!code ++]
    // @see https://github.com/vercel/ai/blob/99cf16edf0a09405d15d3867f997c96a8da869c6/packages/core/prompts/huggingface.ts#L53C1-L78C2 // [!code ++]
    input: { // [!code ++]
      prompt: experimental_buildLlama2Prompt([ // [!code ++]
        { // [!code ++]
          // create a system content message to be added as // [!code ++]
          // the llama2prompt generator will supply it as the context with the API // [!code ++]
          role: 'system', // [!code ++]
          content: contextFromMetadata.substring(0, Math.min(contextFromMetadata.length, 2000)), // [!code ++]
        }, // [!code ++]
        // also, pass the whole conversation! // [!code ++]
        ...messages, // [!code ++]
      ]), // [!code ++]
    }, // [!code ++]
  }) // [!code ++]
  // Convert the response into a friendly text-stream // [!code ++]
  const stream = await ReplicateStream(response) // [!code ++]
  // Respond with the stream // [!code ++]
  return new StreamingTextResponse(stream) // [!code ++]
}

The changes above create an instance of Replicate using their SDK, and then prompts the LLAMA 2 70B chat model using the syntax defined for the experimental_buildLlama2Prompt function of the ai package. Each item in the array passed to the prompt function contains a role key which, in our case, may be:

system: representing the system knowledge
user: representing the user message
assistant: representing the responses from the model

You've successfully created a chat endpoint that uses Retrieval Augmented Generation to provide results closely tied to user input. In the upcoming section, you will proceed to deploy the application online on the Koyeb platform.

Deploy the Remix app to Koyeb

Koyeb is a developer-friendly serverless platform to deploy apps globally. No ops, servers, or infrastructure management is required and it has supports for different tech stacks including Rust, Golang, Python, PHP, Node.js, Ruby, and Docker.

With the app now complete, the final step is to deploy it online on Koyeb. Since the app uses a managed PostgreSQL service, the deployment process doesn't include a database setup.

We will use git-driven deployment to deploy on Koyeb. To do this, we need to create a new GitHub repository from the GitHub web interface or by using the GitHub CLI with the following command:

gh repo create <YOUR_GITHUB_REPOSITORY> --private

Initialize a new git repository on your machine and add a new remote pointing to your GitHub repository:

git init
git remote add origin git@github.com:<YOUR_GITHUB_USERNAME>/<YOUR_GITHUB_REPOSITORY>.git
git branch -M main

Add all the files in your project directory to the git repository and push them to GitHub:

git add .
git commit -m "Initial commit"
git push -u origin main

To deploy the code on the GitHub repository, visit the Koyeb control panel, and while on the Overview tab, click Create Web Service to start the deployment process:

Select the GitHub deployment method.
Choose the repository for your code from the repository drop-down menu.
In the Environment variables and files section, click Add variable to include additional environment variables. Add the POSTGRES_URL, OPENAI_API_KEY, and REPLICATE_API_TOKEN environment variables. For each variable, input the variable name, select the Secret type, and in the value field, choose the Create secret option. In the form that appears, specify the secret name along with its corresponding value, and finally, click the Create button. Remember to add the ?sslmode=require parameter to the POSTGRES_URL value.
Choose a name for your App and Service and click Deploy.

During the deployment on Koyeb, the process identifies the build and start scripts outlined in the package.json file, using them to build and launch the application. You can track the deployment progress through the displayed log output. When the deployment completes and the health checks return successfully, your application will be operational. You can visit it using Koyeb's application URL, which should look something like this:

https://<YOUR_APP_NAME>-<KOYEB_ORG_NAME>.koyeb.app/

If you would like to look at the code for the demo application, you can find it in the repository associated with this tutorial.

Conclusion

In this tutorial, you created a Retrieval Augmented Generation chatbot using vector embeddings and the LLAMA 2 70B Chat model with the Remix framework. With Koyeb's managed PostgreSQL service supporting the pgvector extension, you are able to perform vector search in the database and create context relevant to user messages in the realtime.

Since the application was deployed using the Git deployment option, subsequent code push to the deployed branch will automatically initiate a new build for your application. Changes to your application will become live once the deployment is successful. In the event of a failed deployment, Koyeb retains the last operational production deployment, ensuring the uninterrupted operation of your application.