6 Retrieval Augmented Generation (RAG)

6.1 Create a Large Language Model Cluster

6.1.1 Create the RAG Container

Create a volume called rag-data.

docker volume create rag-data

Create the container with the rag engine in the folliwng way. The software is documented here.

docker run -d -p 9099:9099 --add-host=host.docker.internal:host-gateway --platform linux/amd64 -v rag-data:/app/pipelines --network workshop_network --name rag --hostname rag --restart always jcppc/tecweb2025-rag:latest

6.1.2 Configure RAG

Access the WebUI and configure the LLM to use RAG http://localhost:3000.
Follow these steps:
- You should have the following screen or very similar.

Figure 6.1: LLM Console

Navigate to the Admin Panel > Settings > Connections section in the top right side of Open WebUI.

Figure 6.2: Admin Settings

Figure 6.3: Connections Settings

Figure 6.4: Connections Settings

When you’re on this page, you can press the + button to add another connection.

Figure 6.5: Connections Settings

Figure 6.6: Connections Settings

Set the API URL to http://host.docker.internal:9099 and the API key to 0p3n-w3bu! .
Verify your connection.

Figure 6.7: Connections Settings

Once you’ve added your pipelines connection and verified it, you will see a screen similar to this one.

Figure 6.8: Connections Settings

Navigate to the Admin Panel > Settings > Pipelines section in Open WebUI.
You should have a screen similar to this one.

Figure 6.9: Pipelines Settings

Download the RAG pipeline file from here. The pipeline file it is already prepared with the code for this workshop.
Uncompress the file and select it to the Open WebUI interface.
Press the upload button on the right side of the panel.

Figure 6.10: Pipelines Settings

Figure 6.11: Pipelines Settings

Figure 6.12: Pipelines Settings

You should see a screen like the following.

Important

If you don’t see a screen like the following, run the following commands:

docker exec -it rag /bin/bash

pip install -r requirements.txt

Exit and restart the rag container and go back to the same steps.
- Uncompress the file and select it to the Open WebUI interface.
- Press the upload button on the right side of the panel.

Figure 6.13: Pipelines Settings

If you want to use Ollama models, change ChatGPT Key field from Custom to None and enter an existing Ollama model name (should be download first in the LLMs container) in the field Text to SQL Model.
If you want to use ChatGPT integration, keep the field as Custom and enter a ChatGPT Key and a OpenAI model in the field Text to SQL Model.
Add a ChatGPT Key, similar to this one, in the last field of the form.

Important

This ChatGpt Key is no longer valid. It’s used here only as an example.

sk-proj-XlXRiWlfetudLBrRrP6B4C_RRjDpJu03LJ9smXeDz3p-gE7PLg-a1Td6qFCWoydnIL2pxhJL6dT3BlbkFJ5B2u

Press the button Save at the bottom of the screen.
Restart the rag (pipelines) container in Docker.
Connect to your rag container with this command:

docker exec -it rag /bin/bash

Once inside the rag container run the following command to install missing libraries.

pip install -r requirements.txt

Exit and restart the rag container.
At the end of installation, your Docker should look like this.

Figure 6.14: Docker Environment

Access again your main WebUI console and select the Model > Tecweb2025 .

Figure 6.15: RAG Model Selection

You can set this as your default model if you wish.

Figure 6.16: RAG Model Selection

Start to ask questions.

Figure 6.17: Ask Questions

Congratulations

You are now using LLMs & RAG to run queries on databases.