Use LlamaIndex to Build a Retrieval-Augmented Generation (RAG) Application
21 minIntroduction
The emergence of generative AI has sparked a surge in software products providing AI capabilities driven by large language models (LLMs). These LLMs excel in natural language processing, closely resembling human capabilities due to their extensive pre-training on a substantial volume of public data. However, a challenge arises when these models are applied to personalised or organisational tasks, as their training data lacks the inclusion of private and organisation-specific information.
Efforts to address this challenge have resulted in the creation of techniques like Retrieval Augmented Generation (RAG). RAG combines both retrieval-based and generative approaches, forming a system that integrates LLMs with private data. LlamaIndex is a data framework that enables LLMs to process, organise, and retrieve domain-specific or private data. It utilises RAG to load your data, organize it into an index, and offer natural language access to query and interact with the data conversationally.
In this tutorial, you will build a document knowledge base application using LlamaIndex and Together AI. By the conclusion of this tutorial, you'll be capable of uploading a document to the application and retrieving information from the document through conversational queries.
You can deploy the LlamaIndex RAG application as configured in this guide using the Deploy to Koyeb button below:
Note: Be sure to change the value of the API_KEY
environment variable to your Together AI API key when you deploy. You can take a look at the application we will be building in this tutorial in the project GitHub repository.
Requirements
To successfully follow along with this tutorial, you will need the following:
- Node.js and
npm
installed on your development machine. The demo application for this tutorial is developed with Node v20.10.0. - Git installed on your development machine.
- A Koyeb account.
- A Together AI account.
Get a Together AI API key
Together AI provides state-of-the-art tools to empower AI applications. These tools seamlessly integrate with top LLM frameworks for tasks such as fine-tuning models, RAG integrations, and more. To access these tools, you'll require an API key.
While logged into your Together AI account, go to the settings page by clicking on the profile icon in the top right corner. Once there, navigate to the API KEYS tab, copy your API key, and securely store it for future use.
In the upcoming section, you will prepare the codebase for the demo application.
Set up the project
In this section, you'll create an npm
project with TypeScript and install the necessary dependencies for developing the demo application. To get started, run the command below in your terminal window:
The command above generates an example-llamaindex-rag
directory, serving as the root directory for the application, along with a src
directory nested within it. Next, initiate a Git repository in the project's root directory by executing the following command:
The first of the two commands above changes your terminal's current directory to the example-llamaindex-rag
directory while the second command initialises a Git repository in the directory.
The next step in setting up the project is to initialise an npm
project in the root directory. Run the command below to do that:
The command above creates an npm
project with the default configuration in the example-llamaindex-rag
directory.
Next, execute the commands below to install the packages required for developing the demo app:
The command above installs the listed packages from the npm
registry, with the --save-dev
flag specifying development-only packages. The installed packages include:
llamaindex
: LlamaIndex's package optimised for TypeScript use.ejs
: A JavaScript templating engine.express
: A web framework for Node.js.multer
: A Node.js middleware for handling file uploads.
The development-only libraries include:
dotenv
: A package for handling environment variables.typescript
: A package for enabling TypeScript code execution.nodemon
: A package for restarting the application when code changes are detected during development.ts-node
: A package for executing and rebuilding TypeScript code efficiently.@types/express
: Type definitions for express.@types/node
: Type definitions for Node.js.@types/multer
: Type definitions formulter
.
With the required packages now installed, create a tsconfig.json
file in the root directory and add the following code to it:
The code in the tsconfig.json
file specifies the configuration for transpiling TypeScript code in the project.
This final code change completes the project set-up. In the upcoming section, you will set up a web server for the demo application with Express.js.
Set up an Express server
This section will focus on creating a web server with Express for the demo application. To get started, create an index.ts
in the src
directory and add the following code to the file:
The code above imports the following packages:
- The
dotenv/config
for accessing environment variables. - The
express
library alongside the types forExpress
,Request
, andResponse
objects for setting up a web server. - The
path
module for managing file paths.
The code goes on to create an instance of an Express app and assigns the server's port based on the PORT
environment variable, defaulting to 3000
if unspecified. It also configures the Express server to handle JSON payloads from incoming requests using json
. Furthermore, it sets the server's view engine to EJS
, with the views directory defined as a folder named views
.
Additionally, the code defines a route handler to handle HTTP requests to the root route (/
). This route handler renders an index
view. Lastly, the web server is started, listens for web requests on the designated port, and logs a confirmation message indicating the server is operational.
The Express server set-up is now complete, and the upcoming section will focus on adding the document upload functionality.
Set up document upload
In this section, you will add a page and route handler to handle file uploads. To get started, add the following code below to your index.ts
file:
The code added above starts with importing the multer
middleware and its Multer
type definition. Afterward, the code sets up multer
to store uploaded files in memory as buffer objects.
Next, a route handler is defined for handling HTTP POST requests to the /upload
endpoint. In this handler, the multer
middleware is applied, enabling access to the content of the uploaded file, which is then stored in a content
variable and logged to the console. The route handler returns a JSON success or error message based on the outcome of the code execution.
Next, create a views
directory in the src
directory:
In the views
directory, create an index.ejs
file. Add the code below to the index.ejs
file:
The markup introduced above defines a simple Bootstrap-styled HTML page with a form for uploading a text document file. The code incorporates functionality that initiates a POST request to the /upload
endpoint when the form is submitted, sending the uploaded document as the body of the HTTP request. Upon success, the response message is displayed on the page; otherwise, the error message is logged to the browser console.
To test the functionality, you can use any .txt
document of your choice or opt for the Sherlock Holmes story available for download here. Next, add the command below to the scripts
section of your package.json
file to run your application:
The dev
script added to the package.json
file above utilises nodemon
and ts-node
to serve the index.ts
file. It automatically restarts the application whenever a file change is detected.
To run the application, execute the command below in your terminal window:
Executing the provided command will start the server and show a message confirming the server is up and running. To access the page, open your web browser and navigate to http://localhost:<YOUR_PORT>
, and you should see the document upload form displayed on the page. On the page, select your preferred text file and submit the form, you should see the text file content displayed in your terminal window and a Document uploaded successfully.
message displayed on the page.
In the upcoming section, you will add the functionality to create an index from the uploaded document using LlamaIndex and Together AI.
Create the document index
The RAG process comprises several key stages following the ingestion of data. These stages, amongst others, include the indexing
stage, where the data is organized or indexed into a format compatible with LLMs, and the querying
stage, where the relevant context is retrieved from the index based on a provided query. In this section, you'll add the logic to create and store an index for the uploaded document.
To get started, create an .env
file in the root directory and add the following code to the file:
Your API key should be stored secretly and not checked into Git history. To guard against this, create a .gitignore
file by running the command below in your terminal window:
The provided command generates a .gitignore
file in the root directory of your project. It adds entries for the .env
file, the node_modules
directory, and all TypeScript-generated JavaScript files to the .gitignore
file. This ensures these files are excluded from Git history.
Next, create a llama.ts
file in the src
directory and add the following code to the file:
The code above imports some modules from the llamaindex
library. They include:
TogetherLLM
: Together AI's LLM.TogetherEmbedding
: Together AI's vector embedding model.serviceContextFromDefaults
: Creates a collection of components used in different parts of the application.Document
: A versatile container that holds data from any data source.VectorStoreIndex
: An index that stores the data only according to their vector embeddings.storageContextFromDefaults
: For persisting indexes.
In the subsequent steps, the code instantiates the Together AI LLM and vector embedding model using the Together AI API key. Then, it constructs a service context object with its llm
property set to Together AI's LLM and its embedModel
property set to the embedding model provided by Together AI.
Finally, the code defines and exports two functions; createIndex
and loadIndex
. The createIndex
function takes a text
argument and uses it to generate a document object containing the provided text. Subsequently, a storage context is created, designating the storage
directory as the location for storing indexes. The fromDocuments
method of the VectorStoreIndex
class is then invoked, accepting the document, service, and storage contexts as parameters. This method processes the document, retrieves its embeddings, and builds and stores the index. The createIndex
method then returns the index.
Conversely, the loadIndex
function initializes a storage context, specifying the storage
directory as the location for persistent storage through its persistDir
attribute. Then, an index is loaded using the VectorStoreIndex
class, instantiated with both the storage and service contexts. Finally, the loaded index is returned by the function.
To create and store an index for an uploaded document, add the following code to your index.ts
file:
The code above imports the createIndex
function and modifies the route handler for POST requests to the /upload
route. In the route handler, after reading the content of the uploaded document, it is passed to the createIndex
function, which goes ahead to create and store an index for the document. Additionally, the code modifies the response message returned by the router handler.
To test the new functionality, start the development server again if it is not already running by typing:
Open your web browser and go to http://localhost:<YOUR_PORT>
. Then, upload your text document and click the Index Document
button to submit the form. If successful, you'll see a message confirming that the document was indexed successfully displayed on the page. This might take a few moments to complete. During this process, a storage directory will be generated in your project's root directory, containing three files: doc_store.json
, index_store.json
, and vector_store.json
. These files contain all the necessary data for LlamaIndex to retrieve the most relevant context for any given query.
With an index now created for your document, in the upcoming section, you will implement the capability to query the document conversationally.
Query the document index
The previously created loadIndex
function is set up to load the index built for the document in the previous section. In this section, you'll add UI elements and a route handler to load and query the index. To get started, add the following code to the index.ts
file:
The provided code begins by importing the loadIndex
function. Subsequently, it defines a route handler for handling HTTP POST requests to the /query
endpoint. Within this handler, the code accesses a query
parameter in the request body and loads the index created for the uploaded document.
This loaded index has an asQueryEngine
method, which returns a query engine capable of retrieving relevant nodes from the index based on a query string and then sending them to an LLM to generate a response. The query engine is instantiated, and the request's query
parameter is passed to its query
function to generate a response to the query.
Finally, the response from the LLM is converted to a string using its toString
function and returned as the response.
Next, replace the code in your src/views/index.ejs
file with the one below to add the UI functionality to submit queries to the /query
endpoint:
The code above adds a second form to the page. This form includes an input field for entering questions, a text area for presenting the responses, and a script that runs when the form is submitted. Upon submitting the form, the value entered in the input field is sent via a POST request to the /query
endpoint, and the resulting response is displayed in the text area on the page.
To check the functionality, , run the development server once more:
Visit http://localhost:<YOUR_PORT>
in your web browser, find the question entry form, input a question related to your uploaded text document, and submit it. For instance, if you used the provided sample document, you might ask, "What is the story about?". The page should then display the answer to your question.
With the RAG technique applied by LlamaIndex, the Together AI LLM used in the demo application can now answer questions about the uploaded document, even though its content wasn't included in the original training data. In the next section, you will deploy your application online with Koyeb.
Deploy to Koyeb
With the app development complete, proceed with the online deployment of the app on Koyeb by updating the scripts
section of your package.json
file with the provided code:
The code above adds the build
and start
command scripts to the package.json
file. The build
script compiles the TypeScript code into JavaScript, and the start
script executes the compiled JavaScript code with node
.
Following that, create a GitHub repository for your code and use the command below to push your local code to the repository:
To deploy the code now on the GitHub repository, go to the Koyeb control panel. While on the Overview tab, click the Create Web Service button to initiate the deployment process. On the App deployment page:
- Choose GitHub as the deployment method.
- Select your repository from the drop-down menu. Alternatively, deploy from the example repository associated with this tutorial by entering
https://github.com/koyeb/example-llamaindex-rag
in the public repository field. - In the Builder section, choose Buildpack.
- Pick your preferred options for Instances and Regions.
- Expand the Environment variables section and click Add variable to include an environment variable for your API key. Enter
API_KEY
as the variable name, select the Secret type, and in the value field, choose the Create secret option. In the modal that appears, specify the secret name and its corresponding value, then click the Create button. - Provide a name for your application or use the default name.
- Finally, kick off the deployment process by clicking Deploy.
During the deployment, the process identifies the build
and start
scripts specified in the package.json
file and uses them to build and launch the application. You can monitor the deployment progress through the presented logs. Once the deployment is complete and essential health checks are successful, your application will be up and running.
Access your live application by clicking on the provided public URL.
Conclusion
In this tutorial, you leveraged LlamaIndex to enhance the capabilities of an LLM, allowing it to answer questions and generate text content in a domain beyond the scope of its initial training data.
LlamaIndex provides additional tools for developing RAG systems. You can explore various other available use cases by referring to their TypeScript documentation.
Because the application was deployed using the Git deployment option, any subsequent code push to the deployed branch will automatically trigger a new build for your application. Once the deployment is successful, changes to your application will go live. In case of a failed deployment, Koyeb will preserve the last operational production deployment, ensuring the continuous operation of your application.