Friday, May 23, 2025

Local LLM using Ollama and open-webUI

 I have a local server with Nvidia GPUs which I bought off ebay for $800. The GPU are RTX but there are 4 of them in the server. I run ollama on it and downloaded a few models that I use mainly to ask them how to configure other applications. I have multiple laptops where I run docker containers with open-webui and ollama-webui as shown below. Open-webui allows saving chats while ollam-webui does not. 



The performance is about as good as any online LLM server. I routinely get around 20+ Tokens/sec i.e. it is not a long wait to get your queries answered and stop and add information. In other words, it is usable. 




All the models are free and open source - so I pay nothing to use them. The only out-of-pocket expense was this server that I purchase. I could have made one with $99 2 socket intel motherboard (from Aliexpress), but I did not have time, so I bought it off some student off ebay. 

The answers are accurate and not all that different from using online version. Give it a try! It is fun and really really cheap. 


Appendix

I got the setup and asked it to write in a blog fashion all the steps necessary. I haven't validated them, but they look roughly what I did. 


 Title: Setting Up OLLAMA and Open-WebUI Across Server and Windows Laptop


In this blog post, we'll walk through the process of setting up OLLAMA, a powerful language model, on a server equipped with four NVIDIA GPUs, and connecting it to Open-WebUI, an intuitive user interface for large language models, on a Windows laptop using WSL (Windows Subsystem for Linux).


**Part 1: Server Setup**

First, let's set up OLLAMA on the server. Begin by updating the package lists and installing required dependencies:

```bash

sudo apt-get update && sudo apt-get upgrade -y

pip install ollama[all]

```

Now, we'll download and pull your favorite models from Hugging Face Model Hub. Here, we'll use DeepSea, Mistral, and QWEN:


```bash

ollama pull deepseek

ollama pull mistral

ollama pull qwen

```

To configure OLLAMA as a service, create a systemd unit file in `/etc/systemd/system`:

```bash

sudo nano ollama.service

```

Add the following content and save the file:

```ini

[Unit]

Description=OLLAMA Service

Requires=nvidia-smi.service

After=nvidia-smi.service


[Service]

User=<username>

ExecStart=/usr/local/bin/ollama start

Restart=always

EnvironmentFile=-/home/<username>/.ollama/config

WorkingDirectory=/home/<username>/.cache/ollama


[Install]

WantedBy=multi-user.target

```


Next, enable and start the service:


```bash

sudo systemctl daemon-reload

sudo systemctl enable ollama

sudo systemctl start ollama

```

Configure OLLAMA to use the GPU devices and expose it over a public IP using port forwarding. Update the `/home/<username>/.ollama/config` file accordingly:

```bash

# In [general] section

port = 8000


# In [gpus] section (add the necessary GPU IDs)

gpus = 0,1,2,3

```


**Part 2: Windows Laptop Setup**


Install WSL and Ubuntu if not already done. Open a Ubuntu terminal and update the package lists:


```bash

sudo apt-get update && sudo apt-get upgrade -y

pip install openwebui

```

Connect to the OLLAMA service on the server using the public IP and port:


```bash

openwebui connect <server_public_ip>:<port>

```


Once connected, Open-WebUI will launch, providing a user-friendly interface for interacting with your language models.


By following this guide, you've successfully set up OLLAMA on a server with multiple GPUs and connected it to Open-WebUI on a Windows laptop using WSL, enabling seamless access to powerful AI models from anywhere.


Local LLM using Ollama and open-webUI

 I have a local server with Nvidia GPUs which I bought off ebay for $800. The GPU are RTX but there are 4 of them in the server. I run ollam...