Use LlamaIndex to Build a Retrieval-Augmented Generation (RAG) Application

Introduction

The emergence of generative AI has sparked a surge in software products providing AI capabilities driven by large language models (LLMs). These LLMs excel in natural language processing, closely resembling human capabilities due to their extensive pre-training on a substantial volume of public data. However, a challenge arises when these models are applied to personalised or organisational tasks, as their training data lacks the inclusion of private and organisation-specific information.

Efforts to address this challenge have resulted in the creation of techniques like Retrieval Augmented Generation (RAG). RAG combines both retrieval-based and generative approaches, forming a system that integrates LLMs with private data. LlamaIndex is a data framework that enables LLMs to process, organise, and retrieve domain-specific or private data. It utilises RAG to load your data, organize it into an index, and offer natural language access to query and interact with the data conversationally.

In this tutorial, you will build a document knowledge base application using LlamaIndex and Together AI. By the conclusion of this tutorial, you'll be capable of uploading a document to the application and retrieving information from the document through conversational queries.

You can deploy the LlamaIndex RAG application as configured in this guide using the Deploy to Koyeb button below:

Note: Be sure to change the value of the API_KEY environment variable to your Together AI API key when you deploy. You can take a look at the application we will be building in this tutorial in the project GitHub repository.

Requirements

To successfully follow along with this tutorial, you will need the following:

Node.js and npm installed on your development machine. The demo application for this tutorial is developed with Node v20.10.0.
Git installed on your development machine.
A Koyeb account.
A Together AI account.

Get a Together AI API key

Together AI provides state-of-the-art tools to empower AI applications. These tools seamlessly integrate with top LLM frameworks for tasks such as fine-tuning models, RAG integrations, and more. To access these tools, you'll require an API key.

While logged into your Together AI account, go to the settings page by clicking on the profile icon in the top right corner. Once there, navigate to the API KEYS tab, copy your API key, and securely store it for future use.

In the upcoming section, you will prepare the codebase for the demo application.

Set up the project

In this section, you'll create an npm project with TypeScript and install the necessary dependencies for developing the demo application. To get started, run the command below in your terminal window:

mkdir -p example-llamaindex-rag/src

The command above generates an example-llamaindex-rag directory, serving as the root directory for the application, along with a src directory nested within it. Next, initiate a Git repository in the project's root directory by executing the following command:

cd example-llamaindex-rag
git init

The first of the two commands above changes your terminal's current directory to the example-llamaindex-rag directory while the second command initialises a Git repository in the directory.

The next step in setting up the project is to initialise an npm project in the root directory. Run the command below to do that:

npm init --yes

The command above creates an npm project with the default configuration in the example-llamaindex-rag directory.

Next, execute the commands below to install the packages required for developing the demo app:

npm install llamaindex ejs express multer
npm install --save-dev dotenv typescript nodemon ts-node @types/express @types/node @types/multer

The command above installs the listed packages from the npm registry, with the --save-dev flag specifying development-only packages. The installed packages include:

llamaindex: LlamaIndex's package optimised for TypeScript use.
ejs: A JavaScript templating engine.
express: A web framework for Node.js.
multer: A Node.js middleware for handling file uploads.

The development-only libraries include:

dotenv: A package for handling environment variables.
typescript: A package for enabling TypeScript code execution.
nodemon: A package for restarting the application when code changes are detected during development.
ts-node: A package for executing and rebuilding TypeScript code efficiently.
@types/express: Type definitions for express.
@types/node: Type definitions for Node.js.
@types/multer: Type definitions for multer.

With the required packages now installed, create a tsconfig.json file in the root directory and add the following code to it:

{
  "compilerOptions": {
    "target": "es2016",
    "module": "commonjs",
    "esModuleInterop": true,
    "forceConsistentCasingInFileNames": true,
    "strict": true,
    "skipLibCheck": true
  },
  "include": ["src/**/*.ts"],
  "exclude": ["node_modules"]
}

The code in the tsconfig.json file specifies the configuration for transpiling TypeScript code in the project.

This final code change completes the project set-up. In the upcoming section, you will set up a web server for the demo application with Express.js.

Set up an Express server

This section will focus on creating a web server with Express for the demo application. To get started, create an index.ts in the src directory and add the following code to the file:

// file: src/index.ts
import 'dotenv/config'
import express, { Express, Request, Response } from 'express'
import path from 'path'

const app: Express = express()
const port = process.env.PORT || 3000

app.use(express.json())

// set up EJS as view engine
app.set('view engine', 'ejs')
app.set('views', path.join(__dirname, 'views'))

app.get('/', (req: Request, res: Response) => {
  res.render('index')
})

app.listen(port, () => {
  console.log(`🔥🔥🔥: Server is running at http://localhost:${port}`)
})

The code above imports the following packages:

The dotenv/config for accessing environment variables.
The express library alongside the types for Express, Request, and Response objects for setting up a web server.
The path module for managing file paths.

The code goes on to create an instance of an Express app and assigns the server's port based on the PORT environment variable, defaulting to 3000 if unspecified. It also configures the Express server to handle JSON payloads from incoming requests using json. Furthermore, it sets the server's view engine to EJS, with the views directory defined as a folder named views.

Additionally, the code defines a route handler to handle HTTP requests to the root route (/). This route handler renders an index view. Lastly, the web server is started, listens for web requests on the designated port, and logs a confirmation message indicating the server is operational.

The Express server set-up is now complete, and the upcoming section will focus on adding the document upload functionality.

Set up document upload

In this section, you will add a page and route handler to handle file uploads. To get started, add the following code below to your index.ts file:

// file: src/index.ts

import "dotenv/config";
import express, { Express, Request, Response } from "express";
import path from "path";
import multer, { Multer } from "multer"; // [!code ++]

const app: Express = express();
const port = process.env.PORT || 3000;

app.use(express.json());

// set up EJS as view engine
app.set("view engine", "ejs");
app.set("views", path.join(__dirname, "views"));

// set up multer for file uploads
const storage = multer.memoryStorage(); // [!code ++]
const upload: Multer = multer({ storage }); // [!code ++]

...

// Handle file upload
app.post( // [!code ++]
  "/upload", // [!code ++]
  upload.single("document"), // [!code ++]
  async (req: Request, res: Response) => { // [!code ++]
    if (!req.file) return; // [!code ++]

    try { // [!code ++]
      // Access the file content from req.file.buffer
      const content: string = req.file.buffer.toString(); // [!code ++]
      console.log(content); // [!code ++]

      // Send a response to the client
      res.status(200).json({ message: "Document uploaded successfully." }); // [!code ++]
    } catch (error) { // [!code ++]
      res.status(500).json({ message: "Error uploading document." }); // [!code ++]
    } // [!code ++]
  } // [!code ++]
); // [!code ++]

...

The code added above starts with importing the multer middleware and its Multer type definition. Afterward, the code sets up multer to store uploaded files in memory as buffer objects.

Next, a route handler is defined for handling HTTP POST requests to the /upload endpoint. In this handler, the multer middleware is applied, enabling access to the content of the uploaded file, which is then stored in a content variable and logged to the console. The route handler returns a JSON success or error message based on the outcome of the code execution.

Next, create a views directory in the src directory:

mkdir -p src/views

In the views directory, create an index.ejs file. Add the code below to the index.ejs file:

<!doctype html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>Llama Index RAG Application</title>

    <!-- Add Bootstrap CSS link -->
    <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/bootstrap@5.0.2/dist/css/bootstrap.min.css" />
  </head>

  <body>
    <div class="container mt-5">
      <div class="row">
        <!-- Left Half: Document Index Form -->
        <div class="col-md-6 mb-3">
          <div class="rounded border p-3">
            <h2 class="mb-3">Index Document</h2>
            <form id="uploadForm" action="/upload" method="post" enctype="multipart/form-data">
              <p class="text-success" id="serverResponse"></p>
              <div class="mb-3">
                <label for="document" class="form-label">Choose a document file (txt)</label>
                <input type="file" class="form-control" id="document" name="document" accept=".txt" />
              </div>
              <button type="submit" class="btn btn-primary">Index Document</button>
            </form>
          </div>
        </div>
      </div>
    </div>

    <!-- Add Bootstrap JS and Popper.js -->
    <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.0.2/dist/js/bootstrap.bundle.min.js"></script>
    <script>
      // submit document upload form
      const responseField = document.querySelector('#serverResponse')
      const uploadForm = document.querySelector('#uploadForm')
      uploadForm.onsubmit = (event) => {
        event.preventDefault()

        const formData = new FormData(event.target)
        fetch('/upload', {
          method: 'POST',
          body: formData,
        })
          .then((response) => response.json())
          .then(({ message }) => {
            responseField.innerHTML = message
          })
          .catch((error) => {
            console.error('Error uploading file:', error)
          })
      }
    </script>
  </body>
</html>

The markup introduced above defines a simple Bootstrap-styled HTML page with a form for uploading a text document file. The code incorporates functionality that initiates a POST request to the /upload endpoint when the form is submitted, sending the uploaded document as the body of the HTTP request. Upon success, the response message is displayed on the page; otherwise, the error message is logged to the browser console.

To test the functionality, you can use any .txt document of your choice or opt for the Sherlock Holmes story available for download here. Next, add the command below to the scripts section of your package.json file to run your application:

...
"scripts": {
  "dev": "nodemon --watch './**/*.ts' --exec ts-node ./src/index.ts", // [!code ++]
  "test": "echo \"Error: no test specified\" && exit 1"
},
...

The dev script added to the package.json file above utilises nodemon and ts-node to serve the index.ts file. It automatically restarts the application whenever a file change is detected.

To run the application, execute the command below in your terminal window:

npm run dev

Executing the provided command will start the server and show a message confirming the server is up and running. To access the page, open your web browser and navigate to http://localhost:<YOUR_PORT>, and you should see the document upload form displayed on the page. On the page, select your preferred text file and submit the form, you should see the text file content displayed in your terminal window and a Document uploaded successfully. message displayed on the page.

In the upcoming section, you will add the functionality to create an index from the uploaded document using LlamaIndex and Together AI.

Create the document index

The RAG process comprises several key stages following the ingestion of data. These stages, amongst others, include the indexing stage, where the data is organized or indexed into a format compatible with LLMs, and the querying stage, where the relevant context is retrieved from the index based on a provided query. In this section, you'll add the logic to create and store an index for the uploaded document.

To get started, create an .env file in the root directory and add the following code to the file:

API_KEY="YOUR_TOGETHER_AI_API_KEY"

Your API key should be stored secretly and not checked into Git history. To guard against this, create a .gitignore file by running the command below in your terminal window:

printf "%s\n" ".env" "node_modules" "src/**/*.js" > .gitignore

The provided command generates a .gitignore file in the root directory of your project. It adds entries for the .env file, the node_modules directory, and all TypeScript-generated JavaScript files to the .gitignore file. This ensures these files are excluded from Git history.

Next, create a llama.ts file in the src directory and add the following code to the file:

// file: src/llama.ts
import 'dotenv/config'
import {
  Document,
  TogetherEmbedding,
  TogetherLLM,
  VectorStoreIndex,
  serviceContextFromDefaults,
  storageContextFromDefaults,
} from 'llamaindex'

const togetherLLM = new TogetherLLM({ apiKey: process.env.API_KEY })
const togetherEmbedding = new TogetherEmbedding({
  apiKey: process.env.API_KEY,
})

const serviceContext = serviceContextFromDefaults({
  llm: togetherLLM,
  embedModel: togetherEmbedding,
})

const createIndex = async (text: string) => {
  // load document
  const document = new Document({ text })

  // create storage context
  const storageContext = await storageContextFromDefaults({
    persistDir: './storage',
  })

  // create index
  const index = await VectorStoreIndex.fromDocuments([document], {
    serviceContext,
    storageContext,
  })

  return index
}

const loadIndex = async () => {
  const storageContext = await storageContextFromDefaults({
    persistDir: './storage',
  })

  const index = await VectorStoreIndex.init({ storageContext, serviceContext })

  return index
}

export { createIndex, loadIndex }

The code above imports some modules from the llamaindex library. They include:

TogetherLLM: Together AI's LLM.
TogetherEmbedding: Together AI's vector embedding model.
serviceContextFromDefaults: Creates a collection of components used in different parts of the application.
Document: A versatile container that holds data from any data source.
VectorStoreIndex: An index that stores the data only according to their vector embeddings.
storageContextFromDefaults: For persisting indexes.

In the subsequent steps, the code instantiates the Together AI LLM and vector embedding model using the Together AI API key. Then, it constructs a service context object with its llm property set to Together AI's LLM and its embedModel property set to the embedding model provided by Together AI.

Finally, the code defines and exports two functions; createIndex and loadIndex. The createIndex function takes a text argument and uses it to generate a document object containing the provided text. Subsequently, a storage context is created, designating the storage directory as the location for storing indexes. The fromDocuments method of the VectorStoreIndex class is then invoked, accepting the document, service, and storage contexts as parameters. This method processes the document, retrieves its embeddings, and builds and stores the index. The createIndex method then returns the index.

Conversely, the loadIndex function initializes a storage context, specifying the storage directory as the location for persistent storage through its persistDir attribute. Then, an index is loaded using the VectorStoreIndex class, instantiated with both the storage and service contexts. Finally, the loaded index is returned by the function.

To create and store an index for an uploaded document, add the following code to your index.ts file:

// file: src/index.ts

. . .

import { createIndex } from "./llama"; // [!code ++]

. . .

// Handle file upload
app.post(
  "/upload",
  upload.single("document"),
  async (req: Request, res: Response) => {
    if (!req.file) return;

    try {
      // Access the file content from `req.file.buffer`
      const content: string = req.file.buffer.toString();
      // create index
      console.log(content); // [!code --]
      await createIndex(content); // [!code ++]

      // Send a response to the client
      res.status(200).json({ message: "Document uploaded successfully." }); // [!code --]
      res.status(200).json({ message: "Document indexed successfully." }); // [!code ++]
    } catch (error) {
      res.status(500).json({ message: "Error uploading document." }); // [!code --]
      res.status(500).json({ message: "Error indexing document." }); // [!code ++]
    }
  }
);

. . .

The code above imports the createIndex function and modifies the route handler for POST requests to the /upload route. In the route handler, after reading the content of the uploaded document, it is passed to the createIndex function, which goes ahead to create and store an index for the document. Additionally, the code modifies the response message returned by the router handler.

To test the new functionality, start the development server again if it is not already running by typing:

npm run dev

Open your web browser and go to http://localhost:<YOUR_PORT>. Then, upload your text document and click the Index Document button to submit the form. If successful, you'll see a message confirming that the document was indexed successfully displayed on the page. This might take a few moments to complete. During this process, a storage directory will be generated in your project's root directory, containing three files: doc_store.json, index_store.json, and vector_store.json. These files contain all the necessary data for LlamaIndex to retrieve the most relevant context for any given query.

With an index now created for your document, in the upcoming section, you will implement the capability to query the document conversationally.

Query the document index

The previously created loadIndex function is set up to load the index built for the document in the previous section. In this section, you'll add UI elements and a route handler to load and query the index. To get started, add the following code to the index.ts file:

// file: src/index.ts

import "dotenv/config";
import express, { Express, Request, Response } from "express";
import path from "path";
import multer, { Multer } from "multer";

import { createIndex } from "./llama"; // [!code --]
import { createIndex, loadIndex } from "./llama"; // [!code ++]

. . .

// query the index
app.post("/query", async (req: Request, res: Response) => { // [!code ++]
  const query = req.body.query; // [!code ++]
  const index = await loadIndex(); // [!code ++]

  // query the index
  const queryEngine = index.asQueryEngine(); // [!code ++]
  const response = await queryEngine.query({ query }); // [!code ++]

  res.status(200).json({ response: response.toString() }); // [!code ++]
}); // [!code ++]

. . .

The provided code begins by importing the loadIndex function. Subsequently, it defines a route handler for handling HTTP POST requests to the /query endpoint. Within this handler, the code accesses a query parameter in the request body and loads the index created for the uploaded document.

This loaded index has an asQueryEngine method, which returns a query engine capable of retrieving relevant nodes from the index based on a query string and then sending them to an LLM to generate a response. The query engine is instantiated, and the request's query parameter is passed to its query function to generate a response to the query.

Finally, the response from the LLM is converted to a string using its toString function and returned as the response.

Next, replace the code in your src/views/index.ejs file with the one below to add the UI functionality to submit queries to the /query endpoint:

<!doctype html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>Llama Index RAG Application</title>

    <!-- Add Bootstrap CSS link -->
    <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/bootstrap@5.0.2/dist/css/bootstrap.min.css" />
  </head>

  <body>
    <div class="container mt-5">
      <div class="row">
        <!-- Left Half: Document Index Form -->
        <div class="col-md-6 mb-3">
          <div class="rounded border p-3">
            <h2 class="mb-3">Index Document</h2>
            <form id="uploadForm" action="/upload" method="post" enctype="multipart/form-data">
              <p class="text-success" id="serverResponse"></p>
              <div class="mb-3">
                <label for="document" class="form-label">Choose a document file (txt)</label>
                <input type="file" class="form-control" id="document" name="document" accept=".txt" />
              </div>
              <button type="submit" class="btn btn-primary">Index Document</button>
            </form>
          </div>
        </div>

        <!-- Right Half: Query Section -->
        <div class="col-md-6 mb-3">
          <div class="rounded border p-3">
            <h2 class="mb-3">Query Document Index</h2>
            <form id="questionForm">
              <div class="mb-3">
                <label for="question" class="form-label">Enter your question</label>
                <input type="text" class="form-control" id="question" name="question" required />
              </div>
              <button type="submit" class="btn btn-success">Submit Question</button>
            </form>

            <hr />

            <!-- Question Response -->
            <div class="mt-3">
              <h5>Query Response:</h5>
              <textarea class="form-control" id="questionResponse" rows="5" readonly></textarea>
            </div>
          </div>
        </div>
      </div>
    </div>

    <!-- Add Bootstrap JS and Popper.js -->
    <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.0.2/dist/js/bootstrap.bundle.min.js"></script>
    <script>
      // submit document upload form
      const responseField = document.querySelector('#serverResponse')
      const uploadForm = document.querySelector('#uploadForm')
      uploadForm.onsubmit = (event) => {
        event.preventDefault()

        const formData = new FormData(event.target)
        fetch('/upload', {
          method: 'POST',
          body: formData,
        })
          .then((response) => response.json())
          .then(({ message }) => {
            responseField.innerHTML = message
          })
          .catch((error) => {
            console.error('Error uploading file:', error)
          })
      }

      // submit question form
      const questionForm = document.querySelector('#questionForm')
      questionForm.onsubmit = (event) => {
        event.preventDefault()

        const questionInput = document.querySelector('#question')
        const query = questionInput.value
        const questionResponse = document.querySelector('#questionResponse')

        // clear question response field
        questionResponse.value = ''

        fetch('/query', {
          method: 'POST',
          headers: {
            'Content-Type': 'application/json',
          },
          body: JSON.stringify({ query }),
        })
          .then((response) => response.json())
          .then((data) => {
            const answer = data.response.trim()
            questionResponse.rows = answer.split(/\r?\n|\r|\./).length
            questionResponse.value = answer
          })
      }
    </script>
  </body>
</html>

The code above adds a second form to the page. This form includes an input field for entering questions, a text area for presenting the responses, and a script that runs when the form is submitted. Upon submitting the form, the value entered in the input field is sent via a POST request to the /query endpoint, and the resulting response is displayed in the text area on the page.

To check the functionality, , run the development server once more:

npm run dev

Visit http://localhost:<YOUR_PORT> in your web browser, find the question entry form, input a question related to your uploaded text document, and submit it. For instance, if you used the provided sample document, you might ask, "What is the story about?". The page should then display the answer to your question.

With the RAG technique applied by LlamaIndex, the Together AI LLM used in the demo application can now answer questions about the uploaded document, even though its content wasn't included in the original training data. In the next section, you will deploy your application online with Koyeb.

Deploy to Koyeb

With the app development complete, proceed with the online deployment of the app on Koyeb by updating the scripts section of your package.json file with the provided code:

"scripts": {
  "dev": "nodemon --watch './**/*.ts' --exec ts-node ./src/index.ts",
  "build": "npx tsc", // [!code ++]
  "start": "node src/index.js", // [!code ++]
  "test": "echo \"Error: no test specified\" && exit 1"
}

The code above adds the build and start command scripts to the package.json file. The build script compiles the TypeScript code into JavaScript, and the start script executes the compiled JavaScript code with node.

Following that, create a GitHub repository for your code and use the command below to push your local code to the repository:

git add --all
git commit -m "Complete Llama Index RAG Application."
git remote add origin git@github.com/<YOUR_GITHUB_USERNAME>/<YOUR_REPOSITORY_NAME>.git
git branch -M main
git push -u origin main

To deploy the code now on the GitHub repository, go to the Koyeb control panel. While on the Overview tab, click the Create Web Service button to initiate the deployment process. On the App deployment page:

Choose GitHub as the deployment method.
Select your repository from the drop-down menu. Alternatively, deploy from the example repository associated with this tutorial by entering https://github.com/koyeb/example-llamaindex-rag in the public repository field.
In the Builder section, choose Buildpack.
Pick your preferred options for Instances and Regions.
Expand the Environment variables and files section and click Add variable to include an environment variable for your API key. Enter API_KEY as the variable name, select the Secret type, and in the value field, choose the Create secret option. In the modal that appears, specify the secret name and its corresponding value, then click the Create button.
Provide a name for your application or use the default name.
Finally, kick off the deployment process by clicking Deploy.

During the deployment, the process identifies the build and start scripts specified in the package.json file and uses them to build and launch the application. You can monitor the deployment progress through the presented logs. Once the deployment is complete and essential health checks are successful, your application will be up and running.

Access your live application by clicking on the provided public URL.

Conclusion

In this tutorial, you leveraged LlamaIndex to enhance the capabilities of an LLM, allowing it to answer questions and generate text content in a domain beyond the scope of its initial training data.

LlamaIndex provides additional tools for developing RAG systems. You can explore various other available use cases by referring to their TypeScript documentation.

Because the application was deployed using the Git deployment option, any subsequent code push to the deployed branch will automatically trigger a new build for your application. Once the deployment is successful, changes to your application will go live. In case of a failed deployment, Koyeb will preserve the last operational production deployment, ensuring the continuous operation of your application.