In the following talk (in Italian), I explained the basics concepts behind Home Assistant, the number one choice to manage home automation with 3 core principles in mind: privacy, choice and sustainability.
(MOCA 2024, Pescara)
Alfredo Morresi’s digital harbor
In the following talk (in Italian), I explained the basics concepts behind Home Assistant, the number one choice to manage home automation with 3 core principles in mind: privacy, choice and sustainability.
(MOCA 2024, Pescara)
Sonic-AI project: an effort to learn how to use LLM, GenAI and ML tools, while building a Sonic-like virtual buddy for my kid, with privacy in mind (full details here). This post explains how to build a local stack to create the basic chatbot, using a LLM and a web UI to chat with it. And how to run the stack on a cloud computer, in case you don’t have enough resources locally (mainly a GPU).
I could have used the many online services to create a customized chatbot in minutes. Instead, I wanted to create a stack I can run locally, for two reasons: use open source models to guarantee maximum privacy, and avoid exposing my data to third parties. Privacy is always a compromise with complexity. So, time to get hands dirty.
Searching around, host ALL your AI locally video provided a good idea to start with.
Nowadays (Aug 2024) LLMs are perfect for creating chatbots. They embed NLP capabilities, can speak different languages, the already know a lot about the world and can be customized to learn specific knowledge domains.
Speaking about models, the landscape of open source models to use is very wide: Mistral, Gemma, Llama, Phi-3, etc. In their respective versions, small, medium and large sizes, and potential customization. Each one of them has strenghts and limits, so I took one close to my work: Gemma2 with 9B parameters, a good compromise between complexity of the model, resources required to decently run it on a “normal pc” with a normal GPU, and support of Italian language.
Ollama was the no-brain choice to interact with the LLM, considering how easy the setup and usage is, the large array of options it offers, support for nVidia and AMD GPUs, and how widely integrated Ollama is with other tools.
The only downside was the usage of the command line to interact with Ollama – I love it, but my kid doesn’t. So I needed a better UI to create my chatbot.
In the open source landscape there are two main players: Open WebUI and the oobabooga’s Text generation web UI. I selected the former, Open WebUI, because it has an easier-to-use and more polished interface, offers a chatbox experience out-of-the-box, has the ability to create agents, plus other handy capabilities useful for the other parts of my project (like TTS, STT, etc).
Ice on the cake, the project offers a ready-to-use docker image (https://ghcr.io/open-webui/open-webui:ollama
) containing Ollama + Open WebUI, CUDA drivers, and a lot of pre-made configurations to wire everything together. It means no installation and configuration headaches.
At this point, it’s time to assemble everything togher.
I confess, I don’t own a machine with a good enough GPU to run mid-size models 😭. I’ll solve for this soon, but in the meantime I had the idea to provision a self-managed virtual machine with an appropriate CPU + GPU config, connect a disk, install an OS image, and use it as it was my “local” computer. This VM-based setup allowed to quickly iterate at the beginning of the project, try different hardware configs and find the one most appropriate for what I needed, spending few $ per day to keep using a VM instance.
Well, I tried hard to create such VM on Google Compute Engine, but with no success, all the time with the same error of no available resources. I even used this nice gpu-finder tool, to automate the creation of different configs (N1 machines with 2 vCores with both nVidia Tesla T4 or Tesla P4 single GPU) on different days in all the zones offering these GPUs, but I wasn’t able to create a VM a single time.
So, I had to look elsewhere. And I ended-up chosing RunPod.
It allows to create a VM (called Pod) selecting among different types of really available GPUs, the billing is quite cheap, and in addition to a webui, it offers CLI and SDKs to orchestrate everything, for example from a Colab. The downside, at least for me, was they didn’t offer a real VM which I could freely administrate: the only way to install software and configs was via a docker image. I was lucky enough becase the image with everything I needed existed and was https://ghcr.io/open-webui/open-webui:ollama
. Otherwise, I had to create one with my custom config, deploy somewhere, and then install on RunPod. Feasible, but why make life more complex?
So, while waiting to buy a machine with a GPU to be fully local, the RunPod solution was a really good option.
Because my plan was to create different pods to experiment, instead of having a single, always-running instance, I created a network volume to store all my configs across instances, with these configs:
I chose a location with available A40 GPUs – from my tests, a single one manages without problems the latest mid-size models (alternatively, also a RTX3090 worked great too), and 50GB were enought to store different models + configs.
Then, I created a template (Docker containers images paired with a configuration) to host my “LLM brain”:
Relevant configurations:
https://ghcr.io/open-webui/open-webui:ollama
/app/backend/data
– this is the folder where the docker image saves models, configs, etc.
/app/backend/data/ollama/models
– this will move downloaded models to the network volume, so there is no need to redownload models every time a new instance is created Finally, I deployed a pod to “run the brain”, using the template just created, with 2 vCPUs and 8Gb or RAM, connected the network disk. I also selected “Secure Cloud” to have leave everything in the RunPod server farm, and a “Spot instance“, as I didn’t need absolute reliability for the tests. Waited for all the docker layers to be downloaded, opened the running Pod settings and connected to the HTTP port.
Welcome to a brand-new instace of Open WebUI.
There are different tutorial on how to configure Open WebUI. This is what I did to create a chatbot with a “Sonic flavor”.
First, I created the admin user, and the user for my kid, called “Leo”, and with a user role.
Then, from the Admin user:
Ciao, sono Sonic the Hedgehog
Hi, I'm Sonic the Hedgehog
Interpreti Sonic the Hedgehog, della serie Sonic Adventure. Farai domande e risponderai come Sonic the Hedgehog, usando il tono, i modi e il vocabolario che Sonic the Hedgehog userebbe. Usa un linguaggio adatto ai bambini, non scivere spiegazioni. Rispondi in italiano. Hai la conoscenza di Sonic the Hedgehog. Vivi a Green Hills, nel Montana. Sei amichevole e sempre disponibile a dare una mano.
You play as Sonic the Hedgehog, from the Sonic Adventure series. You will ask and answer questions like Sonic the Hedgehog, using the tone, manner, and vocabulary Sonic the Hedgehog would use. Use child-friendly language, do not write any explanations. Answer in Italian. You have knowledge of Sonic the Hedgehog. You live in Green Hills, Montana. You are friendly and always willing to lend a hand.
Then, I logged with my kid’s user and:
Finally, my kid can interact with his preferred hero, in Italian.
Step one of the project… Achieved! 🎉
To “pause” the pod from running, and save some money, the pod can be simply terminated in the RunPod management UI. All the configs will persist because they’re are stored in the network volume. To restart everything again, re-create the pod using the template, deploy and connect to it once ready.
I always considered a “real world” project the best way to learn a new tech: get the hands dirty, be guided by (sort-of) realistic user requirements, and the excitement of building something step after step, one solved failure at time.
This is why I decided to “be inspired” by the passion one of my kids has for Sonic the Hedgehog, and use the latest tools available in the ML and GenAI space to create for him a “Sonic-AI buddy”. A virtual chatbot, looking and acting like Sonic, my kid can interact and conversate with, safely and having fun.
To break-down complexity of such project, so I don’t need to learn everything-about-LLM before creating something, I want to start with a very basic working prototype providing simple chatbot features (the so-called MVP), and then develop different “skills”, with each one of them requiring learning and using different ML or GenAI techs to be be achieved. Incremental learning and improvements.
There is another preprequisite I want to fullfil: everything can run locally and based on OSS software. I’m a little bit paranoic mindful about privacy, and for no reasons the interaction of my kid should end-up in training dataset, or for internal model analysis, or somewhere else. So, privacy first.
Let’s start with “the brain“, the main element to which all the rest can then be attached.
The free tier of Google Colab runtime, powered only by a CPU, is enough to successfully run Google’s Gemma 2B parameters model and prompt it using the Colab UI.
In addition, it’s possible to set up the Colab to serve the model, so it can be consumed from anywhere via a normal REST call.
Colab with all the instructions is here.
!curl -fsSL https://ollama.com/install.sh | sh
This command installs Ollama on the notebook.
In order to “expose” the Ollama instance installed in the Colab notebook to the external world, a Cloudflare Tunnel is created, using the official client. The following lines install the required packages:
!wget https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64.deb
!dpkg -i cloudflared-linux-amd64.deb
Instead of adding a subdomain to a registered Cloudflare’s account, a random subdomain is generated by TryCloudflare. No registration required.
The following code exists for two purposes: start the Cloudflare tunnel as soon as Ollama is ready to serve, and return the random subdomain created by TryCloudflare.
import os
# Set OLLAMA_HOST to specify bind address
# https://github.com/ollama/ollama/blob/main/docs/faq.md#setting-environment-variables-on-linux
os.environ.update({'OLLAMA_HOST': '0.0.0.0'})
import subprocess
import threading
import time
import socket
def iframe_thread(port):
while True:
time.sleep(0.5)
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
result = sock.connect_ex(('127.0.0.1', port))
if result == 0:
break
sock.close()
p = subprocess.Popen(["cloudflared", "tunnel", "--url", f"http://127.0.0.1:{port}"], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
for line in p.stderr:
l = line.decode()
if "trycloudflare.com " in l:
print("\n\n\n\n\n")
print("running ollama server\n\n", l[l.find("http"):], end='')
print("\n\n\n\n\n")
threading.Thread(target=iframe_thread, daemon=True, args=(11434,)).start()
After setting some enviromental variables, a iframe_thread
function is defined. In the function, a while True
loop waits till the Ollama server is up and running. Once this happen, the subprocess.Popen
creates the Cloudflare tunnel pointing to the local Ollama installation, using the command cloudflared
, and prints the xxxx.trycloudflare.com randomly generated subdomain.
The last line of code lauches the iframe_thread
as a background Thread
. The wait for being connected with the Ollama server starts.
At this point, everything is ready to launch the Ollama server
!ollama serve
Colab will start the Ollama server, and the previously created thread, which was waiting for this to happen, quits from the while
loop, creates the tunnel and print the subdomain. Looking at the output of this Colab block something similar will appear:
Couldn't find '/root/.ollama/id_ed25519'. Generating new private key.
Your new public key is:
ssh-ed25519 blablablabla
2024/08/15 21:29:34 routes.go:1125: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
[...]
running ollama server
https://sand-commerce-fields-danger.trycloudflare.com
Bingo! The address https://sand-commerce-fields-danger.trycloudflare.com is the url to use to reach our Ollama-on-Colab instance, the <your_cloudflare_subdomain> in the following snippets.
Last, but not the least, Ollama need to download the Gemma 2B model. From this point ongoing, while the Colab notebook is busy keeps running the Ollama server, the rest of the interaction happen via Ollama API, available via the newly created Cloudflare tunnel.
From any command line shell available (a local computer, a mobile device, etc) these two commands need to be launched (only the first one is really mandatory, the second one is useful to improve performances):
curl <your_cloudflare_subdomain>/api/pull -d '{ "name": "gemma:2b" }'
This call instructs Ollama to download the Gemma 2B model via the pull
API endpoint.
curl <your_cloudflare_subdomain>/api/generate -d '{"model": "gemma:2b", "keep_alive": -1}'
This call instructs Ollama to keep the gemma:2b model loaded in memory, instead of discarding it after 5 minutes of non usage (the default behaviour).
It’s time to ask the first question to Gemma:
curl <your_cloudflare_subdomain>/api/generate -d '{"model": "gemma:2b", "stream":false, "prompt": "Create a 10 line poem about love, with rhyming couplets"}'
generate
API endpoint generates a response for a given prompt with a provided model."stream":false
waits for the model to elaborate the answer and returns it all at once, instead of a stream of tokens.
To generate the reply, Gemma took approx 60 to 90 seconds. Not the quickest in the world, but all CPU powered! ;)
The model’s reply is in the json message payload. To focus on it, and filter all the rest out, pipe the previous command to jq
:
curl <your_cloudflare_subdomain>/api/generate -d '{"model": "gemma:2b", "stream":false, "prompt": "Create a 10 line poem about love, with rhyming couplets"}' | jq ".response"
Well, no. Google Colab FAQ says the free notebook can run for at most 12 hours.
But no fear, once the notebook has been shut down, it’s just a matter to launch another “Run all“, wait to the new Cloudflare random subdomain, and restart the fun.
Tired of juggling event details and missing out on creative inspiration? This session dives deep into the world of AI-powered tools for community managers, sharing practical strategies for optimizing the entire event planning process. We’ll delve into specific tools that can assist with argument selection, creative asset and marketing material creation, survey analysis, and attendee data insights, and much more.
(Google North America Community Summit 2024 – Repo with all the prompts)
With a lot of battery-powered ZigBee devices on my home network (thermostats, contact sensors, remote controls, etc) it’s useful to have an Home Assistant dashboard with the status of all the batteries, to change the ones before the device goes off.
Simple task, with an Entities Card:
type: entities
title: Battery level
state_color: true
entities:
- type: section
label: Remote controls
- entity: sensor.switch_01_ikea_e1743_battery
- type: section
label: Thermostats
- entity: sensor.thermo_001_battery
name: Living room
- entity: sensor.thermo_002_battery
name: Parents room
With the following result:
But this approach has three main drawbacks:
Here a Markdown card with a template that identify all the entities in the system tracking battery level, find the ones with a value below a certain number, and list them:
type: markdown
content: |
{#- Find all the battery sensors -#}
{%- set sensors = expand(states.sensor)
| rejectattr('state', 'in', ['unavailable', 'undefined', 'unknown'])
| selectattr('attributes.device_class', 'defined')
| selectattr('attributes.device_class', '==', 'battery')
| rejectattr('entity_id', "search", "keepout_8p")
| selectattr('attributes.unit_of_measurement', 'defined')
| selectattr('attributes.unit_of_measurement', '==', '%')
| list %}
{#- Show only the entities with a battery level below a certain threshold -#}
{%- for s in sensors -%}
{%- if s.state | int(0) < 30 -%}
{{ s.attributes.friendly_name + ": " + s.state }}
{#- s.entity_id can be used too #}
{% endif -%}
{% endfor -%}
title: Devices with low battery level
With this final result:
Let’s look at the code block by block.
{#- Find all the battery sensors -#}
{%- set sensors = expand(states.sensor)
| rejectattr('state', 'in', ['unavailable', 'undefined', 'unknown'])
| selectattr('attributes.device_class', 'defined')
| selectattr('attributes.device_class', '==', 'battery')
| rejectattr('entity_id', "search", "keepout_8p")
| selectattr('attributes.unit_of_measurement', 'defined')
| selectattr('attributes.unit_of_measurement', '==', '%')
| list %}
First, the expand() command returns all the sensors in the system, then a series of Jinja 2 filters are applied to remove unavailable, undefined and unknown entities, find all the entities with a battery device class, remove entity_id of my phone (keepout_8p) and find all the remainig entities with the % as unit of measurement of the battery.
{#- Show only the entities with a battery level below a certain threshold -#}
{%- for s in sensors -%}
{%- if s.state | int(0) < 30 -%}
{{ s.attributes.friendly_name + ": " + s.state }}
{#- s.entity_id can be used too #}
{% endif -%}
{% endfor -%}
A simple check for all the sensors found previously, to identify if there are battery levels below a certain threshold. To be sure a comparison between numbers is done, the state value is parsed as integer, and then compared with the threshold level. Once the entity is identified, the message to show is assembled, chaining different property of the sensor.
To get the information, but inside an sensor, it could be useful to create a template sensor. The logic is the same:
template:
- sensor:
- name: ZigBee devices battery to change
unique_id: zigbee_devices_battery_to_change
state: >
{#- Find all the battery sensors -#}
{%- set sensors = expand(states.sensor)
| rejectattr('state', 'in', ['unavailable', 'undefined', 'unknown'])
| selectattr('attributes.device_class', 'defined')
| selectattr('attributes.device_class', '==', 'battery')
| rejectattr('entity_id', "search", "keepout_8p")
| selectattr('attributes.unit_of_measurement', 'defined')
| selectattr('attributes.unit_of_measurement', '==', '%')
| list %}
{#- Show only the devices with a battery level below a certain threshold -#}
{%- for s in sensors -%}
{% if s.state | int(0) < 30 %}
{{ s.attributes.friendly_name + ": " + s.state }}
{#- s.entity_id can be used too -#}
{% endif -%}
{%- endfor -%}