Serving Gemma 2B for free using Google Colab

The free tier of Google Colab runtime, powered only by a CPU, is enough to successfully run Google’s Gemma 2B parameters model and prompt it using the Colab UI.
In addition, it’s possible to set up the Colab to serve the model, so it can be consumed from anywhere via a normal REST call.

Colab with all the instructions is here.

Install Ollama in Colab notebook

!curl -fsSL https://ollama.com/install.sh | sh

This command installs Ollama on the notebook.

Expose Ollama via a Cloudflare tunnel

In order to “expose” the Ollama instance installed in the Colab notebook to the external world, a Cloudflare Tunnel is created, using the official client. The following lines install the required packages:

!wget https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64.deb
!dpkg -i cloudflared-linux-amd64.deb

Create the tunnel and capture the TryCloudflare subdomain

Instead of adding a subdomain to a registered Cloudflare’s account, a random subdomain is generated by TryCloudflare. No registration required.

The following code exists for two purposes: start the Cloudflare tunnel as soon as Ollama is ready to serve, and return the random subdomain created by TryCloudflare.

import os
# Set OLLAMA_HOST to specify bind address
# https://github.com/ollama/ollama/blob/main/docs/faq.md#setting-environment-variables-on-linux
os.environ.update({'OLLAMA_HOST': '0.0.0.0'})

import subprocess
import threading
import time
import socket

def iframe_thread(port):
    while True:
        time.sleep(0.5)
        sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        result = sock.connect_ex(('127.0.0.1', port))
        if result == 0:
            break
        sock.close()

    p = subprocess.Popen(["cloudflared", "tunnel", "--url", f"http://127.0.0.1:{port}"], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    for line in p.stderr:
        l = line.decode()
        if "trycloudflare.com " in l:
            print("\n\n\n\n\n")
            print("running ollama server\n\n", l[l.find("http"):], end='')
            print("\n\n\n\n\n")

threading.Thread(target=iframe_thread, daemon=True, args=(11434,)).start()

After setting some enviromental variables, a iframe_thread function is defined. In the function, a while True loop waits till the Ollama server is up and running. Once this happen, the subprocess.Popen creates the Cloudflare tunnel pointing to the local Ollama installation, using the command cloudflared, and prints the xxxx.trycloudflare.com randomly generated subdomain.

The last line of code lauches the iframe_thread as a background Thread. The wait for being connected with the Ollama server starts.

Launch the Ollama server

At this point, everything is ready to launch the Ollama server

!ollama serve

Colab will start the Ollama server, and the previously created thread, which was waiting for this to happen, quits from the while loop, creates the tunnel and print the subdomain. Looking at the output of this Colab block something similar will appear:

Couldn't find '/root/.ollama/id_ed25519'. Generating new private key.
Your new public key is: 

ssh-ed25519 blablablabla

2024/08/15 21:29:34 routes.go:1125: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
[...]


running ollama server

 https://sand-commerce-fields-danger.trycloudflare.com

Bingo! The address https://sand-commerce-fields-danger.trycloudflare.com is the url to use to reach our Ollama-on-Colab instance, the <your_cloudflare_subdomain> in the following snippets.

Set up Ollama using API calls

Last, but not the least, Ollama need to download the Gemma 2B model. From this point ongoing, while the Colab notebook is busy keeps running the Ollama server, the rest of the interaction happen via Ollama API, available via the newly created Cloudflare tunnel.
From any command line shell available (a local computer, a mobile device, etc) these two commands need to be launched (only the first one is really mandatory, the second one is useful to improve performances):

curl <your_cloudflare_subdomain>/api/pull -d '{ "name": "gemma:2b" }'

This call instructs Ollama to download the Gemma 2B model via the pull API endpoint.

curl <your_cloudflare_subdomain>/api/generate -d '{"model": "gemma:2b", "keep_alive": -1}'

This call instructs Ollama to keep the gemma:2b model loaded in memory, instead of discarding it after 5 minutes of non usage (the default behaviour).

Ask question to Gemma 2B

It’s time to ask the first question to Gemma:

curl <your_cloudflare_subdomain>/api/generate -d '{"model": "gemma:2b", "stream":false, "prompt": "Create a 10 line poem about love, with rhyming couplets"}'

generate API endpoint generates a response for a given prompt with a provided model.
"stream":false waits for the model to elaborate the answer and returns it all at once, instead of a stream of tokens.

To generate the reply, Gemma took approx 60 to 90 seconds. Not the quickest in the world, but all CPU powered! ;)

The model’s reply is in the json message payload. To focus on it, and filter all the rest out, pipe the previous command to jq:

curl <your_cloudflare_subdomain>/api/generate -d '{"model": "gemma:2b", "stream":false, "prompt": "Create a 10 line poem about love, with rhyming couplets"}' | jq ".response"

Does it works forever?

Well, no. Google Colab FAQ says the free notebook can run for at most 12 hours.
But no fear, once the notebook has been shut down, it’s just a matter to launch another “Run all“, wait to the new Cloudflare random subdomain, and restart the fun.

Using AI Tools for Effortless Event Planning

Tired of juggling event details and missing out on creative inspiration? This session dives deep into the world of AI-powered tools for community managers, sharing practical strategies for optimizing the entire event planning process. We’ll delve into specific tools that can assist with argument selection, creative asset and marketing material creation, survey analysis, and attendee data insights, and much more.

(Google North America Community Summit 2024 – Repo with all the prompts)

A template to show Zigbee devices with flat battery

With a lot of battery-powered ZigBee devices on my home network (thermostats, contact sensors, remote controls, etc) it’s useful to have an Home Assistant dashboard with the status of all the batteries, to change the ones before the device goes off.

Simple task, with an Entities Card:

type: entities
title: Battery level
state_color: true
entities:
  - type: section
    label: Remote controls
  - entity: sensor.switch_01_ikea_e1743_battery
  - type: section
    label: Thermostats
  - entity: sensor.thermo_001_battery
    name: Living room
  - entity: sensor.thermo_002_battery
    name: Parents room

With the following result:

But this approach has three main drawbacks:

Every time a new devices is added to the network, the card needs to be updated
Even if the colors help to identify at a glance the batteries that need to be replaced, the search is still a manual process
It cannot be automated, for example sending a message every time a battery goes under a certain level

Templates to the rescue

Here a Markdown card with a template that identify all the entities in the system tracking battery level, find the ones with a value below a certain number, and list them:

type: markdown
content: |
  {#- Find all the battery sensors -#}
  {%- set sensors = expand(states.sensor)
    | rejectattr('state', 'in', ['unavailable', 'undefined', 'unknown'])
    | selectattr('attributes.device_class', 'defined') 
    | selectattr('attributes.device_class', '==', 'battery') 
    | rejectattr('entity_id', "search", "keepout_8p")
    | selectattr('attributes.unit_of_measurement', 'defined') 
    | selectattr('attributes.unit_of_measurement', '==', '%') 
    | list %}
  {#- Show only the entities with a battery level below a certain threshold -#}
  {%- for s in sensors -%}
  {%- if s.state | int(0) < 30 -%}
    {{ s.attributes.friendly_name + ": " + s.state }}
    {#- s.entity_id can be used too #}
  {% endif -%}
  {% endfor -%}
title: Devices with low battery level

With this final result:

Let’s look at the code block by block.

Find all the entities measuring a battery level

  {#- Find all the battery sensors -#}
  {%- set sensors = expand(states.sensor)
    | rejectattr('state', 'in', ['unavailable', 'undefined', 'unknown'])
    | selectattr('attributes.device_class', 'defined') 
    | selectattr('attributes.device_class', '==', 'battery') 
    | rejectattr('entity_id', "search", "keepout_8p")
    | selectattr('attributes.unit_of_measurement', 'defined') 
    | selectattr('attributes.unit_of_measurement', '==', '%') 
    | list %}

First, the expand() command returns all the sensors in the system, then a series of Jinja 2 filters are applied to remove unavailable, undefined and unknown entities, find all the entities with a battery device class, remove entity_id of my phone (keepout_8p) and find all the remainig entities with the % as unit of measurement of the battery.

Find all the battery level below a certain threshold

  {#- Show only the entities with a battery level below a certain threshold -#}
  {%- for s in sensors -%}
  {%- if s.state | int(0) < 30 -%}
    {{ s.attributes.friendly_name + ": " + s.state }}
    {#- s.entity_id can be used too #}
  {% endif -%}
  {% endfor -%}

A simple check for all the sensors found previously, to identify if there are battery levels below a certain threshold. To be sure a comparison between numbers is done, the state value is parsed as integer, and then compared with the threshold level. Once the entity is identified, the message to show is assembled, chaining different property of the sensor.

A template sensor with the devices

To get the information, but inside an sensor, it could be useful to create a template sensor. The logic is the same:

 template:
  - sensor:
      - name: ZigBee devices battery to change
        unique_id: zigbee_devices_battery_to_change
        state: >
          {#- Find all the battery sensors -#}
          {%- set sensors = expand(states.sensor)
            | rejectattr('state', 'in', ['unavailable', 'undefined', 'unknown'])
            | selectattr('attributes.device_class', 'defined') 
            | selectattr('attributes.device_class', '==', 'battery') 
            | rejectattr('entity_id', "search", "keepout_8p")
            | selectattr('attributes.unit_of_measurement', 'defined') 
            | selectattr('attributes.unit_of_measurement', '==', '%') 
            | list %}
          {#- Show only the devices with a battery level below a certain threshold -#}
          {%- for s in sensors -%}
          {% if s.state | int(0) < 30 %}
          {{ s.attributes.friendly_name + ": " + s.state }}
          {#- s.entity_id can be used too -#}
          {% endif -%}
          {%- endfor -%}

The Community Commitment Curve to architect community engagement

“How to use the Community Commitment Curve to architect community engagement” is a comprehensive session aimed at helping individuals and organizations create and sustain vibrant developer marketing communities.

The session will provide insights into the concept of the Community Commitment Curve, a strategic framework designed to map and optimize community engagement.

(Developer Marketing Alliance webinar, 11 Oct 2023)

A playbook for a successful developer community operations team

Shared lessons learned from years of hands-on experience in community building, including:

Defining “Community Operations”: Demystifying the role and establishing its significance within an organization.The core building blocks of a Community Operations team.
The power of data-informed decision-making for community strategy.
The Metrics That Matter: Identifying KPIs that go beyond vanity metrics to justify the existence, and impact, of a community program.
Actionable steps to build a high-performing Community Operations team

(DevRelCon London 2023 – Slides)

Leadership Lessons From A Team of Community Builders

Good managers are made, not born. It makes no difference for a team of community builders, with added complexities such as remote working, high burnout risk, unclear career path, etc. I’ll share my stories covering topics like hiring, setting a vision, building and tracking metrics, managing a remote team, keeping work-life in harmony, scaling, etc – while working in the context of a community team.

(Community Rebellion Conference, June 2023)

Slides, with a lot of speaker’s notes

Rainbowbreeze

Serving Gemma 2B for free using Google Colab

Install Ollama in Colab notebook

Expose Ollama via a Cloudflare tunnel

Create the tunnel and capture the TryCloudflare subdomain

Launch the Ollama server

Set up Ollama using API calls

Ask question to Gemma 2B

Does it works forever?

Like this:

Using AI Tools for Effortless Event Planning

Like this:

A template to show Zigbee devices with flat battery

Templates to the rescue

Find all the entities measuring a battery level

Find all the battery level below a certain threshold

A template sensor with the devices

Like this:

The Community Commitment Curve to architect community engagement

Like this:

A playbook for a successful developer community operations team

Like this:

Leadership Lessons From A Team of Community Builders

Like this:

Install Ollama in Colab notebook

Expose Ollama via a Cloudflare tunnel

Create the tunnel and capture the TryCloudflare subdomain

Launch the Ollama server

Set up Ollama using API calls

Ask question to Gemma 2B

Does it works forever?

Share this:

Like this:

Share this:

Like this:

Templates to the rescue

Find all the entities measuring a battery level

Find all the battery level below a certain threshold

A template sensor with the devices

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this: