6  Retrieval Augmented Generation (RAG)

6.1 Create a Large Language Model Cluster

6.1.1 Create the RAG Container

  • Create a volume called rag-data.
docker volume create rag-data
  • Create the container with the rag engine in the folliwng way. The software is documented here.
docker run -d -p 9099:9099 --add-host=host.docker.internal:host-gateway --platform linux/amd64 -v rag-data:/app/pipelines --network workshop_network --name rag --hostname rag --restart always jcppc/tecweb2025-rag:latest

6.1.2 Configure RAG

  • Access the WebUI and configure the LLM to use RAG http://localhost:3000.

  • Follow these steps:

    • You should have the following screen or very similar.
Figure 6.1: LLM Console
  • Navigate to the Admin Panel > Settings > Connections section in the top right side of Open WebUI.
Figure 6.2: Admin Settings
Figure 6.3: Connections Settings
Figure 6.4: Connections Settings
  • When you’re on this page, you can press the + button to add another connection.
Figure 6.5: Connections Settings
Figure 6.6: Connections Settings
  • Set the API URL to http://host.docker.internal:9099 and the API key to 0p3n-w3bu! .

  • Verify your connection.

Figure 6.7: Connections Settings

Once you’ve added your pipelines connection and verified it, you will see a screen similar to this one.

Figure 6.8: Connections Settings
  • Navigate to the Admin Panel > Settings > Pipelines section in Open WebUI.

  • You should have a screen similar to this one.

Figure 6.9: Pipelines Settings
  • Download the RAG pipeline file from here. The pipeline file it is already prepared with the code for this workshop.

  • Uncompress the file and select it to the Open WebUI interface.

  • Press the upload button on the right side of the panel.

Figure 6.10: Pipelines Settings
Figure 6.11: Pipelines Settings
Figure 6.12: Pipelines Settings
  • You should see a screen like the following.
Important

If you don’t see a screen like the following, run the following commands:

docker exec -it rag /bin/bash

pip install -r requirements.txt

  • Exit and restart the rag container and go back to the same steps.

    • Uncompress the file and select it to the Open WebUI interface.

    • Press the upload button on the right side of the panel.

Figure 6.13: Pipelines Settings
  • If you want to use Ollama models, change ChatGPT Key field from Custom to None and enter an existing Ollama model name (should be download first in the LLMs container) in the field Text to SQL Model.

  • If you want to use ChatGPT integration, keep the field as Custom and enter a ChatGPT Key and a OpenAI model in the field Text to SQL Model.

  • Add a ChatGPT Key, similar to this one, in the last field of the form.

Important

This ChatGpt Key is no longer valid. It’s used here only as an example.

sk-proj-XlXRiWlfetudLBrRrP6B4C_RRjDpJu03LJ9smXeDz3p-gE7PLg-a1Td6qFCWoydnIL2pxhJL6dT3BlbkFJ5B2u
  • Press the button Save at the bottom of the screen.

  • Restart the rag (pipelines) container in Docker.

  • Connect to your rag container with this command:

docker exec -it rag /bin/bash
  • Once inside the rag container run the following command to install missing libraries.
pip install -r requirements.txt
  • Exit and restart the rag container.

  • At the end of installation, your Docker should look like this.

Figure 6.14: Docker Environment
  • Access again your main WebUI console and select the Model > Tecweb2025 .
Figure 6.15: RAG Model Selection
  • You can set this as your default model if you wish.
Figure 6.16: RAG Model Selection
  • Start to ask questions.
Figure 6.17: Ask Questions
Congratulations

You are now using LLMs & RAG to run queries on databases.