Sonic-AI project: an effort to learn how to use LLM, GenAI and ML tools, while building a Sonic-like virtual buddy for my kid, with privacy in mind (full details here). This post explains how to build a local stack to create the basic chatbot, using a LLM and a web UI to chat with it. And how to run the stack on a cloud computer, in case you don’t have enough resources locally (mainly a GPU).
I could have used the many online services to create a customized chatbot in minutes. Instead, I wanted to create a stack I can run locally, for two reasons: use open source models to guarantee maximum privacy, and avoid exposing my data to third parties. Privacy is always a compromise with complexity. So, time to get hands dirty.
Searching around, host ALL your AI locally video provided a good idea to start with.
Choose an LLM
Nowadays (Aug 2024) LLMs are perfect for creating chatbots. They embed NLP capabilities, can speak different languages, the already know a lot about the world and can be customized to learn specific knowledge domains.
Speaking about models, the landscape of open source models to use is very wide: Mistral, Gemma, Llama, Phi-3, etc. In their respective versions, small, medium and large sizes, and potential customization. Each one of them has strenghts and limits, so I took one close to my work: Gemma2 with 9B parameters, a good compromise between complexity of the model, resources required to decently run it on a “normal pc” with a normal GPU, and support of Italian language.
Ollama was the no-brain choice to interact with the LLM, considering how easy the setup and usage is, the large array of options it offers, support for nVidia and AMD GPUs, and how widely integrated Ollama is with other tools.
The only downside was the usage of the command line to interact with Ollama – I love it, but my kid doesn’t. So I needed a better UI to create my chatbot.
Choose a user-friendly UI
In the open source landscape there are two main players: Open WebUI and the oobabooga’s Text generation web UI. I selected the former, Open WebUI, because it has an easier-to-use and more polished interface, offers a chatbox experience out-of-the-box, has the ability to create agents, plus other handy capabilities useful for the other parts of my project (like TTS, STT, etc).
Ice on the cake, the project offers a ready-to-use docker image (https://ghcr.io/open-webui/open-webui:ollama
) containing Ollama + Open WebUI, CUDA drivers, and a lot of pre-made configurations to wire everything together. It means no installation and configuration headaches.
At this point, it’s time to assemble everything togher.
Host the chatbot stack
I confess, I don’t own a machine with a good enough GPU to run mid-size models 😭. I’ll solve for this soon, but in the meantime I had the idea to provision a self-managed virtual machine with an appropriate CPU + GPU config, connect a disk, install an OS image, and use it as it was my “local” computer. This VM-based setup allowed to quickly iterate at the beginning of the project, try different hardware configs and find the one most appropriate for what I needed, spending few $ per day to keep using a VM instance.
Well, I tried hard to create such VM on Google Compute Engine, but with no success, all the time with the same error of no available resources. I even used this nice gpu-finder tool, to automate the creation of different configs (N1 machines with 2 vCores with both nVidia Tesla T4 or Tesla P4 single GPU) on different days in all the zones offering these GPUs, but I wasn’t able to create a VM a single time.
So, I had to look elsewhere. And I ended-up chosing RunPod.
It allows to create a VM (called Pod) selecting among different types of really available GPUs, the billing is quite cheap, and in addition to a webui, it offers CLI and SDKs to orchestrate everything, for example from a Colab. The downside, at least for me, was they didn’t offer a real VM which I could freely administrate: the only way to install software and configs was via a docker image. I was lucky enough becase the image with everything I needed existed and was https://ghcr.io/open-webui/open-webui:ollama
. Otherwise, I had to create one with my custom config, deploy somewhere, and then install on RunPod. Feasible, but why make life more complex?
So, while waiting to buy a machine with a GPU to be fully local, the RunPod solution was a really good option.
Because my plan was to create different pods to experiment, instead of having a single, always-running instance, I created a network volume to store all my configs across instances, with these configs:
I chose a location with available A40 GPUs – from my tests, a single one manages without problems the latest mid-size models (alternatively, also a RTX3090 worked great too), and 50GB were enought to store different models + configs.
Then, I created a template (Docker containers images paired with a configuration) to host my “LLM brain”:
Relevant configurations:
- Container Image:
https://ghcr.io/open-webui/open-webui:ollama
- Volume disk: 0Gb – no need to have a volume disk, as it will be replaced by the network volume later
- Volume Mount Path:
/app/backend/data
– this is the folder where the docker image saves models, configs, etc.- Adding the folder with all configs to a volume disks in the template, and then connect a network volume during pod creation, automatically saves all the configs on the network volume
- Environment Variables
- OLLAMA_MODELS:
/app/backend/data/ollama/models
– this will move downloaded models to the network volume, so there is no need to redownload models every time a new instance is created
- OLLAMA_MODELS:
Finally, I deployed a pod to “run the brain”, using the template just created, with 2 vCPUs and 8Gb or RAM, connected the network disk. I also selected “Secure Cloud” to have leave everything in the RunPod server farm, and a “Spot instance“, as I didn’t need absolute reliability for the tests. Waited for all the docker layers to be downloaded, opened the running Pod settings and connected to the HTTP port.
Welcome to a brand-new instace of Open WebUI.
Customize the bot to impersonate Sonic
There are different tutorial on how to configure Open WebUI. This is what I did to create a chatbot with a “Sonic flavor”.
First, I created the admin user, and the user for my kid, called “Leo”, and with a user role.
Then, from the Admin user:
- Settings -> Admin Panel -> Settings -> Models
- Pull a Model from Ollama.com
- gemma2:9b (list available here)
- Pull a Model from Ollama.com
- Workspace -> Models -> Create a model
- Image: upload an image
- Name: Sonic
- Model ID: sonic_v1
- Base Model: gemma2:9b
- Description:
Ciao, sono Sonic the Hedgehog
- Equivalment in English:
Hi, I'm Sonic the Hedgehog
- Equivalment in English:
- System prompt:
Interpreti Sonic the Hedgehog, della serie Sonic Adventure. Farai domande e risponderai come Sonic the Hedgehog, usando il tono, i modi e il vocabolario che Sonic the Hedgehog userebbe. Usa un linguaggio adatto ai bambini, non scivere spiegazioni. Rispondi in italiano. Hai la conoscenza di Sonic the Hedgehog. Vivi a Green Hills, nel Montana. Sei amichevole e sempre disponibile a dare una mano.
- The prompt is in Italian, so the model will speak in Italian.
- Equivalent in English:
You play as Sonic the Hedgehog, from the Sonic Adventure series. You will ask and answer questions like Sonic the Hedgehog, using the tone, manner, and vocabulary Sonic the Hedgehog would use. Use child-friendly language, do not write any explanations. Answer in Italian. You have knowledge of Sonic the Hedgehog. You live in Green Hills, Montana. You are friendly and always willing to lend a hand.
- Capabilities: uncheck Vision, as this model is text-only for now
Then, I logged with my kid’s user and:
- Settings -> Settings
- General -> Language -> Italian
- Interface -> Default Model: Sonic
- Unfortunately, whitelisting of specific models to specific users is still in development
Finally, my kid can interact with his preferred hero, in Italian.
Step one of the project… Achieved! 🎉
To “pause” the pod from running, and save some money, the pod can be simply terminated in the RunPod management UI. All the configs will persist because they’re are stored in the network volume. To restart everything again, re-create the pod using the template, deploy and connect to it once ready.