Build a Retrieval-Augmented Generation Chatbot using pgvector
Introduction
The adoption of AI driven chatbots is accelerating in order to respond quickly and save the effort of manually answering each question. Whether performing searches over documents or helping users with queries on a product, AI is used everywhere.
But what goes into configuring these chatbots so that they understand unique content? How are they able to generate relevant answers to user queries? Retrieval-Augmented Generation is an AI framework that allows LLMs to extend their existing knowledge based on the unique content passed from external sources.
In this tutorial, you will be use OpenAI's embedding API alongside pgvector, an open-source vector similarity search extension for PostgreSQL, to create and deploy a RAG chatbot on Koyeb. It will be able to generate relevant responses with AI while continuously updating its knowledge base in real-time.
You can deploy the Retrieval-Augmented Generation chatbot as configured in this guide using the Deploy to Koyeb button below:
Note: You will need to replace the values of the environment variables in the configuration with your own REPLICATE_TOKEN
, OPENAI_API_KEY
and POSTGRES_URL
. Remember to add the ?sslmode=require
parameter to the POSTGRES_URL
value.
Requirements
To successfully follow this tutorial, you will need the following:
- Node.js and
npm
installed. The demo app in this tutorial uses version 18 of Node.js. - Git installed.
- An OpenAI account.
- A Replicate account.
- A Koyeb account to deploy the application.
Steps
To complete this guide and deploy the Retrieval-Augmented Generation chatbot, you'll need to follow these steps:
- Generate the Replicate API token
- Generate the OpenAI API token
- Create a PostgreSQL database on Koyeb
- Create a new Remix application
- Add Tailwind CSS to the application
- Create vector embeddings of a text using OpenAI and LiteLLM
- Add seed data to the database
- Build the components of our application
- Define the Remix application routes
- Build the homepage as the chatbot interface
- Build the chat API endpoint
- Deploy the Remix app to Koyeb
Generate the Replicate API token
HTTP requests to the Replicate API require an authorization token. To generate this token, log in to your Replicate account and navigate to the API Tokens page. Enter a name for your token and click the Create token button to generate a new token. Copy and securely store this token for later use as REPLICATE_API_TOKEN
environment variable.
Locally, set and export the REPLICATE_API_TOKEN
environment variable by executing the following command:
export REPLICATE_API_TOKEN="<YOUR_REPLICATE_TOKEN>"
Generate the OpenAI API token
HTTP requests to the OpenAI API require an authorization token. To generate this token, log in to your OpenAI account and navigate to the API Keys page. Enter a name for your token and click the Create new secret key button to generate a new key. Copy and securely store this token for later use as OPENAI_API_KEY
environment variable.
Locally, set and export the OPENAI_API_KEY
environment variable by executing the following command:
export OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"
Create a PostgreSQL database on Koyeb
To create a PostgreSQL database, log in to the Koyeb control panel and navigate to the Databases tab. Next, click on the Create Database Service button. Here, either accept or replace the default generated name, choose your preferred region, and confirm or customize the default role. When you are ready, click the Create Database Service button to provision your PostgreSQL database service.
Once you've created the database service, a list of your existing database services will be displayed. From there, select the newly created database service, copy the database connection string, and securely store it for later use as the POSTGRES_URL
environment variable.
Create a new Remix application
To start building the application, create a new Remix project. Open your terminal and run the following command:
npx create-remix@latest rag-chatbot
npx
allows us to execute npm
packages binaries (create-remix
in our case) without having to first install them globally.
When prompted, choose:
Yes
when prompted to initialize a new git repositoryYes
when prompted to install the npm dependencies
Once the installation is done, you can move into the project directory and start the app. In another terminal window, start the development server by typing:
cd rag-chatbot
npm run dev
The app should be running on localhost:3000. Currently, it just displays a welcome page with some links to documentation. We are going to leave this running as we continue to build the app.
Note: According to one of the Remix Decisions, using .server
in the filename is the only guaranteed way to exclude code from the client. You'll see later how we create the database connection and seed the data using server only modules.
Next, in your first terminal window, run the command below to install the necessary libraries and packages for building the application:
npm install pg pgvector
npm install -D dotenv tsx @types/pg
The above command installs the packages passed to the install
command, with the -D
flag specifying the libraries intended for development purposes only.
The libraries installed include:
pg
: A PostgreSQL client for Node.js.pgvector
: A vector similarity search library for Node.js.
The development-specific libraries include:
@types/pg
: Type definitions for pg.tsx
: To execute and rebuild TypeScript efficiently.dotenv
: A library for handling environment variables.
Add Tailwind CSS to the application
For styling the app, we will be using Tailwind CSS. Install and set up Tailwind at the root of our project's directory by running:
npm install -D tailwindcss
Next, run the init
command to create tailwind.config.ts
:
npx tailwindcss init --ts
Next, we need to make use of Tailwind directives in our CSS file. Directives are custom Tailwind-specific at-rules that offer special functionalities for Tailwind CSS projects.
Create a tailwind.css
file in the app
directory, and add the snippet below in it:
/* File: app/tailwind.css */
@tailwind base;
@tailwind components;
@tailwind utilities;
Tailwind scans our HTML, JavaScript/Typescript components, and any other template files for class names, and then generates all of the corresponding CSS for those styles. We need to configure our template paths so that Tailwind can generate all of the CSS we need by updating the content
array of tailwind.config.ts
as below:
// File: tailwind.config.ts
import type { Config } from 'tailwindcss'
export default {
content: [], // [!code --]
content: ['./app/**/*.{ts,tsx,js,jsx}'], // [!code ++]
theme: {
extend: {},
},
plugins: [],
} satisfies Config
Lastly, you'll import and use the compiled app/tailwind.css
inside app/root.tsx
. Make the following changes to the default root.tsx
file to finish setting up Tailwind with your Remix app:
// File: app/root.tsx
import { cssBundleHref } from "@remix-run/css-bundle"; // [!code --]
import stylesheet from '~/tailwind.css' // [!code ++]
. . .
export const links: LinksFunction = () => [
...(cssBundleHref ? [{ rel: "stylesheet", href: cssBundleHref }] : []), // [!code --]
{ rel: 'stylesheet', href: stylesheet } // [!code ++]
];
. . .
Create vector embeddings of a text using OpenAI and LiteLLM
OpenAI provides an embeddings API to generate vector embeddings of a text string. Among multiple models offered is the text-embedding-3-small
, which is the newest and performant embedding model. By default, the length of the embedding vector will be 1536. You'll use this model to generate embeddings for the seed data to be added to the database. You'll use the litellm
package to call OpenAI embedding models.
In your terminal window, execute the following to install LiteLLM:
npm install litellm
To create vector embeddings of a text, you'll simply use the asynchronous embedding
method from litellm
with text-embedding-3-small
as the model name. For example, you could obtain the vector embedding with the flow described above like this:
import { embedding } from 'litellm'
// Generate embeddings of a message using OpenAI via LiteLLM
const embeddingData = await embedding({
model: 'text-embedding-3-small',
input: 'Rishi is enjoying using LiteLLM',
})
// Using the OpenAI output format, obtain the embedding vector stored in
// the first object of the data array
const getEmbeddingVector = embeddingData.data[0].embedding
Note: You need to make sure that the
OPENAI_API_KEY
exists as an environment variable. Refer to an earlier section Generate OpenAI Token on how to generate the OpenAI API Token.
In this section, we'll implement a similar flow in our application.
Set up the database connection
The node-postgres
(pg) library provides a low-level interface to interact directly with PostgreSQL databases using raw SQL queries.
To initiate the setup of a database connection, generate a .env
file in the root directory of your project and include the following code, replacing the placeholder values with your own:
# Koyeb Managed Postgres Instance URL
POSTGRES_URL="<YOUR_DATABASE_CONNECTION_URL>?sslmode=require"
The addition of the sslmode=require
parameter to the POSTGRES_URL
value above indicates that the database connection should be established with SSL enabled.
The values added to the .env
file should be kept secret and not included in Git history. By default, Remix CLI ensures that .env
is added to the .gitignore
file in your project.
Create the database client
Following that, establish a database client to connect to the database. To achieve this, create a postgres
directory in the app
directory by running the following command:
mkdir app/postgres
Inside this app/postgres
directory, create a db.server.ts
file with the following code:
// File: app/postgres/db.server.ts
// Load the environment variables
import 'dotenv/config'
// Load the postgres module
import pg from 'pg'
// Create a connection string to the Koyeb managed postgres instance
const connectionString: string = `${process.env.POSTGRES_URL}`
// Create a in-memory pool so that it's cached for multiple calls
export default new pg.Pool({ connectionString })
The code imports the dotenv
configuration, making sure that all the environment variables in the .env
file are present at runtime. Afterwards, the code imports the pg
library, retrieves the database URL from the environment variables, and uses it to create a new pool instance, which is subsequently exported.
Create the database schema
Next, create a schema.server.ts
file within the app/postgres
directory and add the following code to it:
// File: app/postgres/schema.server.ts
import pool from './db.server'
async function createSchema() {
// Create the vector extension if it does not exist
await pool.query('CREATE EXTENSION IF NOT EXISTS vector;')
// Create the data table if it does not exist
await pool.query(
'CREATE TABLE IF NOT EXISTS data (id SERIAL PRIMARY KEY, metadata text, embedding vector(1536));'
)
console.log('Finished setting up the database.')
}
createSchema()
The code above defines how data will be stored, organized, and managed in the database. Using the pool
database instance, it executes an SQL query to create the vector
extension within the database if it does not already exist.
The vector
extension enables PostgreSQL databases to store vector embeddings. After creating the vector extension, a subsequent SQL query creates a data
table within the database. This table comprises three columns:
- An
id
column for storing auto-incrementing unique identifiers for each row in the table. - A
metadata
column with atext
data type for storing text data. - An
embedding
column with avector(1536)
data type. This column will store vector data with a length of 1536 elements.
After executing the SQL queries, a message is printed to the console if there’s an error during the execution.
To execute the code added to the schema file, update the script
section of your package.json
file with the following code:
{
. . .
"scripts": {
"db:setup": "tsx app/postgres/schema.server", // [!code ++]
. . .
}
. . .
}
The db:setup
script runs the code within the schema.server.ts
file when executed.
Test the database setup locally
To execute it, run the following command in your terminal window:
npm run db:setup
If the command is executed successfully, you will see no logs in your terminal window, marking the completion of the database connection setup.
Add seed data to the database
The RAG chatbot will operate by retrieving metadata from the database with vector embeddings that are closest match to the vector embedding of the user's query. In this section, we will insert 5 facts about Rishi into the database.
To get started, create a embedding.server.ts
file in the app/postgres
directory and include the following code within the file:
// File: app/postgres/embedding.server.ts
import { embedding } from 'litellm'
import { toSql } from 'pgvector/pg'
import pool from './db.server'
interface Row {
metadata: string
distance: number
}
const getErrorMessage = (error: unknown) => {
if (error instanceof Error) return error.message
return String(error)
}
// Utility to save embeddings in the Koyeb managed postgres instance
export const saveEmbedding = async (metadata: string, embedding: string): Promise<void> => {
try {
await pool.query({
text: 'INSERT INTO data (metadata, embedding) VALUES ($1, $2)',
values: [metadata, embedding],
})
} catch (e) {
console.log(getErrorMessage(e))
}
}
// Utility to find relevant embeddings from the Koyeb managed postgres instance
export const findRelevantEmbeddings = async (embedding: string): Promise<Row[] | undefined> => {
try {
const res = await pool.query(
'SELECT metadata, embedding <-> $1 AS distance FROM data ORDER BY distance LIMIT 3',
[embedding]
)
return res.rows
} catch (e) {
console.log(getErrorMessage(e))
}
}
// Utility to create embedding vector using OpenAI via LiteLLM
export const generateEmbeddingQuery = async (input: string): Promise<string | undefined> => {
try {
// Generate embeddings of a message using OpenAI via LiteLLM
const embeddingData = await embedding({
input,
model: 'text-embedding-3-small',
})
return toSql(embeddingData.data[0].embedding)
} catch (e) {
console.log(getErrorMessage(e))
}
}
The code starts by importing various libraries:
pool
database instance for connecting to the databaselitellm
for creating embeddings via OpenAIdotenv
for managing environment variablespgvector
for handling vector embeddings
The code further defines and exports four functions:
getErrorMessage
: takes an error parameter, checks if it's an instance ofError
, and returns its message if true, else it returns the string representation of the error. This is the recommended way of handling error messages (by Kent C. Dodds).saveEmbedding
: executes an SQL query to insert a text and its corresponding vector embedding into thedata
table.findSimilarEmbeddings
: given a vector embedding, executes an SQL query to retrieve the top 3 most similar vector embeddings from the database along with their corresponding metadata.generateEmbeddingQuery
: receives a text input and obtains a vector embedding from Open AI API's response. It then transforms it into an SQL vector using thetoSql
method frompgvector
and returns it.
To generate seed data for the database, create a seed.server.ts
file in the app/postgres
directory. Add the code below to the file:
// File: app/postgres/seed.server.ts
import { generateEmbeddingQuery, saveEmbedding } from './embedding.server'
const About = [
'Rishi is a quick learner.',
"Rishi is blown away by Koyeb's service.",
'Rishi has been happy using Postgres so far.',
'Rishi is having fun marketing www.launchfa.st.',
'Rishi is super excited to collaborate on technical writing.',
]
async function seed() {
await Promise.all(
About.map(async (information: string) => {
const embedding = await generateEmbeddingQuery(information)
if (embedding) saveEmbedding(information, embedding)
})
)
console.log('Finished seeding the database.')
}
seed()
The code above imports the generateEmbeddingQuery
and saveEmbedding
methods, and defines an About
array, which contains 5 facts related to Rishi.
For each fact in the About
array, a vector embedding query is generated using the generateEmbeddingQuery
function and then saved to the database using the saveEmbedding
function. Errors that occur while creating and saving a vector embedding are logged on the console.
To execute the code in the seed file, update the scripts
section of your package.json
file with the code below:
{
. . .
"scripts": {
"db:seed": "tsx app/postgres/seed.server", // [!code ++]
. . .
}
. . .
}
Test the database locally
The db:seed
script added above executes the seed.server.ts
file. To run the script, run the code below in your terminal window:
npm run db:seed
Successfully running the command above should display no error message in your terminal window.
In this section, you have added 5 things about Rishi and their corresponding vector embeddings to the database.
Build the components of our application
It is now time to create the components that'll help you quickly prototype the UI and handle the complexities of creating a chatbot application with Remix.
Using shadcn/ui components
To quickly prototype the chat interface, you'll set up shadcn/ui
with Remix. Specifically from shadcn/ui
you'll be able to show toasts, use a baked-in accessible input element, and button element. In your terminal window, run the command below to start setting up the shadcn/ui
:
npx shadcn-ui@latest init
You will be asked a few questions to configure a components.json
, answer with the following:
✔ Would you like to use TypeScript (recommended)? no / **yes**
✔ Which style would you like to use? › **Default**
✔ Which color would you like to use as base color? › **Slate**
✔ Where is your global CSS file? **app/tailwind.css**
✔ Would you like to use CSS variables for colors? no / **yes**
✔ Are you using a custom tailwind prefix eg. tw-? (Leave blank if not)
✔ Where is your tailwind.config.js located? **tailwind.config.ts**
✔ Configure the import alias for components: **~/components**
✔ Configure the import alias for utils: **~/lib/utils**
✔ Are you using React Server Components? **no** / yes
✔ Write configuration to components.json. Proceed? **yes**
With above, you've set up a CLI that allows us to easily add React components to your Remix application.
In your terminal window, run the command below to get the button, input and toast elements:
npx shadcn-ui@latest add button
npx shadcn-ui@latest add input
npx shadcn-ui@latest add toast
With above, you should now see a ui
directory inside the app/components
directory containing button.tsx
, input.tsx
, toaster.tsx
, toast.tsx
, and use-toast.ts
.
Open the app/root.tsx
, and make the following changes:
import stylesheet from '~/tailwind.css'
import { Toaster } from '~/components/ui/toaster' // [!code ++]
import type { LinksFunction } from '@remix-run/node'
import { Links, LiveReload, Meta, Outlet, Scripts, ScrollRestoration } from '@remix-run/react'
export const links: LinksFunction = () => [{ rel: 'stylesheet', href: stylesheet }]
export default function App() {
return (
<html lang="en">
<head>
<meta charSet="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<Meta />
<Links />
</head>
<body>
<Outlet />
<ScrollRestoration />
<Scripts />
<LiveReload />
<Toaster /> // [!code ++]
</body>
</html>
)
}
In the code above, you import the Toaster
component (created by shadcn/ui
), and made sure that it's present on each page in your Remix application. This allows you to use the useToast
hook (exported by use-toast.ts
) in your React components to show toasts in your application.
Highlight code blocks with react-syntax-highlighter
To render code blocks in the responses from AI in a visually appealing manner, we'll use the react-syntax-highlighter
library. In your terminal window, install react-syntax-highlighter
via the following command:
npm install react-syntax-highlighter
Next, create a code-block.tsx
file inside app/components
directory:
// File: app/components/code-block.tsx
// Inspired by Chatbot-UI and modified to fit the needs of this project
// https://github.com/mckaywrigley/chatbot-ui/blob/main/components/messages/message-codeblock.tsx
import { Prism as SyntaxHighlighter } from 'react-syntax-highlighter'
interface CodeBlockProps {
language: string
value: string
}
const CodeBlock = ({ language, value }: CodeBlockProps) => {
return (
<div className="relative w-full font-sans codeblock bg-zinc-950">
<div className="flex items-center justify-between w-full px-6 py-2 pr-4 bg-zinc-800 text-zinc-100">
<span className="text-xs lowercase">{language}</span>
</div>
<SyntaxHighlighter
PreTag="div"
showLineNumbers
language={language}
customStyle={{
margin: 0,
width: '100%',
background: 'transparent',
padding: '1.5rem 1rem',
}}
codeTagProps={{
style: {
fontSize: '0.9rem',
fontFamily: 'var(--font-mono)',
},
}}
>
{value}
</SyntaxHighlighter>
</div>
)
}
CodeBlock.displayName = 'CodeBlock'
export { CodeBlock }
The code above begins with importing the React-compatible Prism syntax highlighter component from the react-syntax-highlighter
library. Afterwards, it exports a CodeBlock
React component that shows the language of the code block (for example, JavaScript) above the rendered code block.
Creating a memoized React markdown component
You'll want to render the responses from the AI as quickly as possible. For this, you'll set up an endpoint with streaming enabled that'll return the response in the form of tokens. To save re-renders of your React component responsible for showing the response from AI to the user, you'll use memo
from React. With memo, you are able to make your UI render faster by skipping re-renders if the props of your React component have not changed.
To render responses from AI into HTML-friendly manner, we'll use the react-markdown
library. In your terminal window, install react-markdown
via the following command:
npm install react-markdown
Next, create a mark.tsx
file inside app/components
directory with the following contents:
// File: app/components/mark.tsx
import { memo } from 'react'
import ReactMarkdown from 'react-markdown'
export const MemoizedReactMarkdown = memo(
ReactMarkdown,
(prevProps, nextProps) =>
prevProps.children === nextProps.children && prevProps.className === nextProps.className
)
The code above begins by importing memo from React and the Markdown renderer component from react-markdown. You're now done with optimizing re-renders in your Remix application.
Next, you'll use this component to create another component that'll highlight code blocks, mathematical expressions, and frontmatter responses beautifully. In your terminal window, execute the following command:
npm install remark-gfm remark-math
The command above installs the packages that can detect and create relevant formats for mathematical expressions and frontmatter responses from a given text.
Create a memoized-react-markdown.tsx
file inside the app/components
directory with the following content:
// File: app/components/memoized-react-markdown.tsx
import clsx from 'clsx'
import remarkGfm from 'remark-gfm'
import remarkMath from 'remark-math'
import { CodeBlock } from '~/components/code-block'
import { MemoizedReactMarkdown } from '~/components/mark'
const MemoizedMD = ({ message, index }) => {
return (
<MemoizedReactMarkdown
remarkPlugins={[remarkGfm, remarkMath]}
components={{
p({ children }) {
return <p className="mb-2 last:mb-0">{children}</p>
},
code({ node, inline, className, children, ...props }) {
const match = /language-(\w+)/.exec(className || '')
if (inline) {
return (
<code className={className} {...props}>
{children}
</code>
)
}
return (
<CodeBlock
key={Math.random()}
language={(match && match[1]) || ''}
value={String(children).replace(/\n$/, '')}
{...props}
/>
)
},
}}
className={clsx(
'prose dark:prose-invert prose-p:leading-relaxed prose-pre:p-0 mt-4 w-full break-words pt-4',
index !== 0 && 'border-t'
)}
>
{message}
</MemoizedReactMarkdown>
)
}
export default MemoizedMD
The code above begins by importing the packages you've just installed and the code block component we created in the previous subsection. It then uses the components prop from react-markdown
which allows you to style each HTML element in your own desired way.
Create a knowledge base component
Let's say you want to update the database (aka knowledge base) with new information so that the chatbot can learn and give out responses based on the latest data. You'll create a component that'll take in the new information as sentences separated by commas (,
) and call an API that will take care of inserting them into the database.
Create a knowledge-base.tsx
file inside app/components
directory with the following content:
// File: app/components/knowledge-base.tsx
import { useState } from 'react'
import { Maximize2 } from 'lucide-react'
import { Form, useNavigation } from '@remix-run/react'
export default function KnowledgeBase() {
const { state } = useNavigation()
const [expanded, setExpanded] = useState(true)
return (
<Form id="rag" method="post" className="absolute top-0 border p-3 m-2 rounded right-0 flex flex-col items-start">
<div className="cursor-pointer absolute top-1.5 right-1.5">
<Maximize2
size={12}
className="fill-black"
onClick={() => {
setExpanded((expanded) => !expanded)
}}
/>
</div>
{expanded && <span className="text-xs font-medium">Update Knowledge Base</span>}
{expanded && (
<textarea
id="content"
name="content"
autoComplete="off"
placeholder="Add to the existing knowledge base. Seperate sentences with comma (,)"
className="mt-2 p-1 border border-black/25 outline-none text-xs h-[45px] w-[280px] rounded"
/>
)}
{expanded && (
<button disabled={state === 'submitting'} className="mt-3 text-sm px-2 py-1 border rounded" type="submit">
{state === 'submitting' ? <>Submitting...</> : <>Submit →</>}
</button>
)}
</Form>
)
}
The code above imports the Form
component and useNavigation
hook from Remix. These are used to handle form responses via route actions
in Remix. Moreover, the code informs the user supplying the chatbot with new information when their content is being processed.
Use Vercel's ai package to prototype the chat UI
To handle the complexity of managing messages between a User and AI and calling the API to give out responses based on the conversation, you'll use the open source ai
package from Vercel. In your terminal window, execute the following command to install it:
npm install ai
Define the Remix application routes
With Remix, creating a JavaScript or TypeScript file in the app/routes
directory maps it to a route in your application. The name of the file created maps to the route's URL pathname (with the exception of _index.tsx
, which is the index route).
Creating nested paths that do not rely on the parent layout is done by inserting a trailing underscore at the end of the first segment in the file name. For example, you want to serve requests to /api/something
without relying on any parent layout, you would start the file with the name as api_
(first segment of the route) and then append .something
to it.
The structure below is what our routes
folder will look like at the end of this section:
├── _index.tsx
└── api_.chat.tsx
_index.tsx
will serve as the homepage, i.e. localhost:3000.api_.chat.tsx
will serve responses to localhost:3000/api/chat.
URL | Matched Routes |
---|---|
/ | app/routes/_index.tsx |
/api/chat | app/routes/api_.chat.tsx |
Build the homepage as the chatbot interface
To get started, open the app/routes/_index.tsx
file and replace the existing code with the following:
import { useChat } from 'ai/react'
import { ChevronRight } from 'lucide-react'
import { Input } from '~/components/ui/input'
import KnowledgeBase from '~/components/knowledge-base'
import MemoizedMD from '~/components/memoized-react-markdown'
export default function Index() {
const { messages, input, handleInputChange, handleSubmit } = useChat()
return <>
<KnowledgeBase />
<div className="flex flex-col items-center">
<div className="relative flex flex-col items-start w-full max-w-lg px-5 overflow-hidden">
<form onSubmit={handleSubmit} className="flex flex-row w-[75vw] max-w-[500px] items-center space-x-2 fixed bottom-4">
<Input
id="message"
value={input}
type="message"
autoComplete="off"
onChange={handleInputChange}
placeholder="What's your next question?"
className="border-black/25 hover:border-black placeholder:text-black/75 rounded"
/>
<button className="size-6 flex flex-col border border-black/50 items-center justify-center absolute right-3 rounded-full hover:bg-black hover:text-white" type="submit">
<ChevronRight size={18} />
</button>
</form>
<div className="w-full flex flex-col max-h-[90vh] overflow-y-scroll">
{messages.map((i, _) => (
<MemoizedMD key={_} index={_} message={i.content} />
))}
</div>
</div>
</div>
</>
}
The code above begins by importing the useChat
hook from ai
package, the markdown component that you created earlier to render each message with it, the Input
element from shadcn/ui
, and the knowledge base component. In the React component on homepage, you'll deconstruct the following from the useChat
hook:
- The reactive
messages
array which contains the conversation between the user and AI - The reactive
input
value inserted by user into the input field - The
handleInputChange
method to make sure theinput
value is in sync with the changes - The
handleSubmit
method to call the API (/api/chat
) to get a response for the user's latest message
Now, remember that the KnowledgeBase
component is a form element. To handle form submissions in Remix on the server, you'll use Remix route actions. Update the homepage code in the app/routes/_index.tsx
file with the following:
import { useChat } from 'ai/react'
import { useEffect } from 'react' // [!code ++]
import { ChevronRight } from 'lucide-react'
import { Input } from '~/components/ui/input'
import { useActionData } from '@remix-run/react' // [!code ++]
import { useToast } from '~/components/ui/use-toast' // [!code ++]
import KnowledgeBase from '~/components/knowledge-base'
import { ActionFunctionArgs, json } from '@remix-run/node' // [!code ++]
import MemoizedMD from '~/components/memoized-react-markdown'
import { generateEmbeddingQuery, saveEmbedding } from '~/postgres/embedding.server' // [!code ++]
export const action = async ({ request }: ActionFunctionArgs) => { // [!code ++]
const formData = await request.formData() // [!code ++]
const content = formData.get('content') as string // [!code ++]
if (content) { // [!code ++]
const messages = content.split(',').map((i: string) => i.trim()) // [!code ++]
if (messages.length > 0) { // [!code ++]
await Promise.all( // [!code ++]
messages.map(async (information: string) => { // [!code ++]
const embedding = await generateEmbeddingQuery(information) // [!code ++]
if (embedding) saveEmbedding(information, embedding) // [!code ++]
}), // [!code ++]
) // [!code ++]
return json({ code: 1 }) // [!code ++]
} // [!code ++]
} // [!code ++]
return json({ code: 0 }) // [!code ++]
} // [!code ++]
export default function Index() {
const { toast } = useToast() // [!code ++]
const actionData = useActionData<typeof action>() // [!code ++]
const { messages, input, handleInputChange, handleSubmit } = useChat()
useEffect(() => { // [!code ++]
if (actionData) { // [!code ++]
if (actionData['code'] === 1) { // [!code ++]
toast({ // [!code ++]
description: 'Knowledge base updated succesfully.', // [!code ++]
}) // [!code ++]
const formSelector = document.getElementById('rag') as HTMLFormElement // [!code ++]
if (formSelector) formSelector.reset() // [!code ++]
} else { // [!code ++]
toast({ // [!code ++]
description: 'There was an error in updating the knowledge base.', // [!code ++]
}) // [!code ++]
} // [!code ++]
} // [!code ++]
}, [actionData]) // [!code ++]
return (
<>
<KnowledgeBase />
{/* Rest of the component as is */}
</>
)
}
The changes above begin by importing the following:
- Functions that will create embedding queries and save it to the database
- The
useActionData
hook from Remix that is responsible for managing the state of form response - The
useToast
hook fromshadcn/ui
that allow you to show toasts with a function call - The
json
method from Remix that allows to create Response objects according as defined by web standards
The changes then show a creation of an action
function that's responsible for:
- Listening only to non-
GET
requests (for example,POST
,PUT
,DELETE
) on the homepage - Parsing the form data from the request
- Splitting the content on commas (,) to get an array of text when content is found inside the form data
- Creating and saving the embedding vector along with the respective text into the database
The additions also include use of the useToast
and useActionData
hooks. Once the form is submitted, the data returned by the action function is accessible via the useActionData
hook. From the response returned, you'll be able to show toasts with suitable messages to inform if the update of the knowledge base was successful or not.
Build the chat API endpoint
Create a file named api_.chat.tsx
in the app/routes
directory to handle the POST
request created by the useChat
hook in our React component.
Use vector search to create relevant context from the query
Before we continue, let's discuss briefly why relevant context creation is important when making a RAG chatbot. By default, an AI API will be able respond using only the knowledge that it has been trained on. We want to make sure that the chatbot knowledge base is updated with the specific content that a user will ask questions about.
To create such context in realtime, you will search for relevant vector embeddings that closely represent the vector embedding of the user's query. Afterwards, you can obtain the metadata associated with the relevant vectors and set the relevant context to a string containing all of the metadata together.
To do all of that, put the following code in the app/routes/api_chat.tsx
file:
import { json } from '@remix-run/node'
import type { ActionFunctionArgs } from '@remix-run/node'
import { findRelevantEmbeddings, generateEmbeddingQuery } from '~/postgres/embedding.server'
export const action = async ({ request }: ActionFunctionArgs) => {
// Set of messages between user and chatbot
const { messages = [] } = await request.json()
if (messages.length < 1) return json({ message: 'No conversation found.' })
// Get the latest question stored in the last message of the chat array
const userMessages = messages.filter((i: { role: string }) => i.role === 'user')
const input = userMessages[userMessages.length - 1].content
// Generate embeddings of the latest question using OpenAI
const embedding = await generateEmbeddingQuery(input)
if (!embedding) return json({ message: 'Error while generating embedding vector.' })
// Fetch the relevant set of records based on the embedding
let similarQuestions = await findRelevantEmbeddings(embedding)
if (!similarQuestions) {
similarQuestions = []
console.log({ message: 'Error while finding relevant vectors.' })
}
// Combine all the metadata of the relevant vectors
const contextFromMetadata = similarQuestions.map((i) => i.metadata).join('\n')
}
Use Replicate to obtain LLAMA 2 70B chat model responses
To easily fetch model responses from the Replicate platform, we'll use the replicate SDK. In your terminal window, execute the following command:
npm install replicate
Previously, you were able to successfully create the relevant context for the user's query. It's now time to prompt LLAMA 2 70B, a chat model from Meta, in order to enhance the AI response by inserting the context as part of the system knowledge. Because we want to get the response to the user as quickly as possible, we'll enable streaming using the ReplicateStream
functionality exported by the ai
package.
To do all of that, update the app/routes/api_.chat.tsx
file with the following code:
import { json } from '@remix-run/node'
import type { ActionFunctionArgs } from '@remix-run/node'
import { ReplicateStream, StreamingTextResponse } from 'ai' // [!code ++]
import { experimental_buildLlama2Prompt } from 'ai/prompts'
import Replicate from 'replicate' // [!code ++]
import { findRelevantEmbeddings, generateEmbeddingQuery } from '~/postgres/embedding.server'
// Instantiate the Replicate API // [!code ++]
const replicate = new Replicate({ // [!code ++]
auth: process.env.REPLICATE_API_TOKEN, // [!code ++]
}) // [!code ++]
export const action = async ({ request }: ActionFunctionArgs) => {
// Set of messages between user and chatbot
const { messages = [] } = await request.json()
if (messages.length < 1) return json({ message: 'No conversation found.' })
// Get the latest question stored in the last message of the chat array
const userMessages = messages.filter((i: { role: string }) => i.role === 'user')
const input = userMessages[userMessages.length - 1].content
// Generate embeddings of the latest question using OpenAI
const embedding = await generateEmbeddingQuery(input)
if (!embedding) return json({ message: 'Error while generating embedding vector.' })
// Fetch the relevant set of records based on the embedding
let similarQuestions = await findRelevantEmbeddings(embedding)
if (!similarQuestions) {
similarQuestions = []
console.log({ message: 'Error while finding relevant vectors.' })
}
// Combine all the metadata of the relevant vectors
const contextFromMetadata = similarQuestions.map((i) => i.metadata).join('\n')
// Now use Replicate LLAMA 70B streaming to perform the autocompletion with context // [!code ++]
const response = await replicate.predictions.create({ // [!code ++]
// You must enable streaming. // [!code ++]
stream: true, // [!code ++]
// The model must support streaming. See https://replicate.com/docs/streaming // [!code ++]
model: 'meta/llama-2-70b-chat', // [!code ++]
// Format the message list into the format expected by Llama 2 // [!code ++]
// @see https://github.com/vercel/ai/blob/99cf16edf0a09405d15d3867f997c96a8da869c6/packages/core/prompts/huggingface.ts#L53C1-L78C2 // [!code ++]
input: { // [!code ++]
prompt: experimental_buildLlama2Prompt([ // [!code ++]
{ // [!code ++]
// create a system content message to be added as // [!code ++]
// the llama2prompt generator will supply it as the context with the API // [!code ++]
role: 'system', // [!code ++]
content: contextFromMetadata.substring(0, Math.min(contextFromMetadata.length, 2000)), // [!code ++]
}, // [!code ++]
// also, pass the whole conversation! // [!code ++]
...messages, // [!code ++]
]), // [!code ++]
}, // [!code ++]
}) // [!code ++]
// Convert the response into a friendly text-stream // [!code ++]
const stream = await ReplicateStream(response) // [!code ++]
// Respond with the stream // [!code ++]
return new StreamingTextResponse(stream) // [!code ++]
}
The changes above create an instance of Replicate using their SDK, and then prompts the LLAMA 2 70B chat model using the syntax defined for the experimental_buildLlama2Prompt
function of the ai
package. Each item in the array passed to the prompt function contains a role key which, in our case, may be:
system
: representing the system knowledgeuser
: representing the user messageassistant
: representing the responses from the model
You've successfully created a chat endpoint that uses Retrieval Augmented Generation to provide results closely tied to user input. In the upcoming section, you will proceed to deploy the application online on the Koyeb platform.
Deploy the Remix app to Koyeb
Koyeb is a developer-friendly serverless platform to deploy apps globally. No ops, servers, or infrastructure management is required and it has supports for different tech stacks including Rust, Golang, Python, PHP, Node.js, Ruby, and Docker.
With the app now complete, the final step is to deploy it online on Koyeb. Since the app uses a managed PostgreSQL service, the deployment process doesn't include a database setup.
We will use git-driven deployment to deploy on Koyeb. To do this, we need to create a new GitHub repository from the GitHub web interface or by using the GitHub CLI with the following command:
gh repo create <YOUR_GITHUB_REPOSITORY> --private
Initialize a new git repository on your machine and add a new remote pointing to your GitHub repository:
git init
git remote add origin git@github.com:<YOUR_GITHUB_USERNAME>/<YOUR_GITHUB_REPOSITORY>.git
git branch -M main
Add all the files in your project directory to the git repository and push them to GitHub:
git add .
git commit -m "Initial commit"
git push -u origin main
To deploy the code on the GitHub repository, visit the Koyeb control panel, and while on the Overview tab, click Create Web Service to start the deployment process:
- Select the GitHub deployment method.
- Choose the repository for your code from the repository drop-down menu.
- In the Environment variables section, click Add variable to include additional environment variables. Add the
POSTGRES_URL
,OPENAI_API_KEY
, andREPLICATE_API_TOKEN
environment variables. For each variable, input the variable name, select the Secret type, and in the value field, choose the Create secret option. In the form that appears, specify the secret name along with its corresponding value, and finally, click the Create button. Remember to add the?sslmode=require
parameter to thePOSTGRES_URL
value. - Choose a name for your App and Service and click Deploy.
During the deployment on Koyeb, the process identifies the build
and start
scripts outlined in the package.json
file, using them to build and launch the application. You can track the deployment progress through the displayed log output. When the deployment completes and the health checks return successfully, your application will be operational. You can visit it using Koyeb's application URL, which should look something like this:
https://<YOUR_APP_NAME>-<KOYEB_ORG_NAME>.koyeb.app/
If you would like to look at the code for the demo application, you can find it in the repository associated with this tutorial.
Conclusion
In this tutorial, you created a Retrieval Augmented Generation chatbot using vector embeddings and the LLAMA 2 70B Chat model with the Remix framework. With Koyeb's managed PostgreSQL service supporting the pgvector
extension, you are able to perform vector search in the database and create context relevant to user messages in the realtime.
Since the application was deployed using the Git deployment option, subsequent code push to the deployed branch will automatically initiate a new build for your application. Changes to your application will become live once the deployment is successful. In the event of a failed deployment, Koyeb retains the last operational production deployment, ensuring the uninterrupted operation of your application.