Open WebUI and mcpo to use all kinds of MCP servers

The integration of the Model Context Protocol (MCP) servers within Open WebUI significantly extends its functionality, by allowing access to external capabilities. Open WebUI both leverage native support for MCP server offering the HTTP streamable format, and usage of the mcpo (MCP-to-OpenAPI proxy server) for broader compatibility.

This article explains how to configure MPCO to leverage all kinds of MCP servers: local via sdio, or remote via SSE (Server-Sent Events), or Streamable HTTP. While this other article explains native support of HTTP streamable MCP servers in Open WebUI (added at the end of Sept 2025).

What is MCP?

MCP is an open standard that functions as a universal communication bridge, connecting LLMs to external tools and data sources. This protocol enables AI assistants to access real-time information and perform tasks on a variety of different areas.

MCP servers communicate with clients (the LLMs) using three primary channels: stdio (standard input/output), SSE (Server-Sent Events), or Streamable HTTP.

Because of the core architecture of Open WebUI, which is a web-based, multi-tenant environment, not a local desktop process, and because long-lived stdio or SSE connections are difficult to maintain securely across users and sessions, Open WebUI team decided to create mcpo (MCP-to-OpenAPI proxy server) – an open-source proxy that translates stdio, SSE-based or streamable HTTP MCP servers into OpenAPI-compatible endpoints.

In addition, mcpo automatically discovers MCP tools dynamically, generates REST endpoints, and creates interactive, human-readable OpenAPI documentation accessible at http://localhost:8000/docs.

Configure a mcpo server

mcpo can run MCP servers written as npm packages via npx, python packages via uvx, and can also wrap calls to SSE or Streamable HTTP MCP servers.

When launched from the command line, it’s possible to specify both an MCP server mcpo will run, or a configuration file which defines the MCP servers it will manage, their exposed names, etc. The project page has a lot examples on how to configure the different servers.

This article will use docker compose to launch mcpo, and a configuration file to define the MCP servers to expose. A simple docker-compose.yaml file follows:

services:
  mcpo:
    container_name: mcpo
    image: ghcr.io/open-webui/mcpo:main
    restart: unless-stopped
    volumes:
      # Map your local config directory to /app/config inside MCPO container
      - /Volumes/Data/development/open-webui/volumes/mcpo:/app/config
    ports:
      - 8000:8000
    # Command to launch MCPO using the mounted config file, with hot-reload enabled
    command: --config /app/config/config.json --hot-reload

In this example, the configuration file is created under /Volumes/Data/development/open-webui/volumes/mcpo/config.json on the host running Docker, which maps to /app/config/config.json in the command line passed to mcpo.

Four MCP servers are exposed via mcpo using the following config file:

{
  "mcpServers": {
    "time": {
      "command": "uvx",
      "args": ["mcp-server-time", "--local-timezone=America/New_York"]
    },
    "youtube-transcript": {
      "command": "uvx",
      "args": [
        "--from",
        "git+https://github.com/jkawamoto/mcp-youtube-transcript",
        "mcp-youtube-transcript"
      ]
    },
    "open-meteo": {
      "command": "npx",
      "args": ["open-meteo-mcp-server"]
    }
    "coingecko": {
      "type": "streamable-http",
      "url": "https://mcp.api.coingecko.com/mcp"
  }
}

Easy to spot, the configurations to add to the config.json file uses the same format of Gemini CLI, Claude, Visual Studio, or other MCP server clients.

Once changed the configuration, it’s possible to check if they’re correct looking at the mcpo container logs, using docker logs -f mcpo command. For example:

2025-10-26 15:12:07,498 - INFO - Config file modified: /app/config/config.json
2025-10-26 15:12:08,011 - INFO - Adding servers: ['coingecko_mcp_streamable_http']
2025-10-26 15:12:10,158 - INFO - HTTP Request: POST https://mcp.api.coingecko.com/mcp "HTTP/1.1 200 OK"
2025-10-26 15:12:10,159 - INFO - Received session ID: 7a3c7019bf4ec1aaa7b213d017cd968ca1c609ee6bb5612622a0d4ad41b8579d
2025-10-26 15:12:10,162 - INFO - Negotiated protocol version: 2025-06-18
2025-10-26 15:12:10,736 - INFO - HTTP Request: POST https://mcp.api.coingecko.com/mcp "HTTP/1.1 202 Accepted"
2025-10-26 15:12:10,806 - INFO - HTTP Request: GET https://mcp.api.coingecko.com/mcp "HTTP/1.1 404 Not Found"
2025-10-26 15:12:11,204 - INFO - HTTP Request: POST https://mcp.api.coingecko.com/mcp "HTTP/1.1 200 OK"
2025-10-26 15:12:12,038 - INFO - Successfully connected to new server: 'coingecko'
2025-10-26 15:12:12,038 - INFO - Config reload completed successfully

Otherwise, an ERROR log instance will be present.

The list of the MCP servers exposed, and their docs (what the LLM sees) can be browsed at http://localhost:8000/docs.

Connect to mcpo in Open WebUI

There are two ways to connect to mcpo in Open WebUI: via a User Tool Server in the User Settings, and via a Global Tool Server in the Admin Settings.

  • User Tool Servers utilize the client-side (your browser) to make the connection.
    • Ideal for accessing highly specific, local, or private development endpoints on your machine, as the resource exposure is isolated only to your session.
    • It would be possible to launch an mcpo server instance on the local machine and connect to it, while having Open WebUI running on another, remote, server, which was not configured to use the MCP, and for which the user has no admin privileges.
    • For example, a filesystem MCP can access files accessible only from the local machine where mcpo runs, while the Open WebUI server doesn’t have access to them.
  • Global Tool Servers utilize the server-side (Open WebUI’s backend) to make the connection.
    • This means the tool must be reachable from the server environment, typically using internal Docker network names or the host alias (host.docker.internal). It’s also possible to access to a mcpo running on another server / remotely, and reachable by the Open WebUI host machines.
    • For example, a fileserver MCP can access files present on the server, and not accessible to the user.
    • It would be possible to configure authentication credential shared among all the Open WebUI users, like the same Bearer token or sessions.
    • Once configured on the Open WebUI server, the MCP could be made available to all the users.

Of course, distinctions from these two options fade away if both Open WebUI and mcpo are launched on the same local machine, used to connect to Open WebUI. But it’s important to keep this distinction in mind.

Each server exposed by mcpo has to be configured separately.

For configuring Global Tool Servers, using the time server as example:

  1. Navigate to Admin Panel -> Settings -> External Tools.
  2. Click “+ (Add Connection).
  3. Set the Type to OpenAPI.
  4. Set the URL to http://host.docker.internal:8000/time
    • If Open WebUI and mcpo are in the same docker network, http://mcpo:8000/time can be used, assuming the mcpo image has the name mcpo, like in the docker compose file used above.
  5. Set OpenAPI Spec to URL, and openapi.json.
  6. Set Auth to None.
  7. Set ID to time_mcp_mcpo.
    • This is the string used in the logs to identity the MCP tool call.
  8. Set Name to Time MCP via mcpo.
    • This is the string used in the UI to configure the available tools and MCP servers for the model (see below).
  9. Set Description to Get the current time and date

For configuring User Tools Servers, instead:

  1. Navigate to Settings -> External Tools.
  2. Click “+ (Add Connection).
  3. Set the URL to http://localhost:8000/time
    • localhost because the mcpo server is accessible from the browser user session using localhost, or 127.0.0.1 address.
    • http://host.docker.internal:8000/time or http://mcpo:8000/time won’t work, as they refer to docker-network specific addresses, which are not available in the browser user session of the local machine
  4. Set OpenAPI Spec to URL, and openapi.json.
  5. Set Auth to None.

The time MCP server is now available inside Open WebUI, with two different names: Time MCP via mcpo if configured as Global Tool Server, or mcp-time if configured as User Tool Server.

Create an agent which uses MCP

To be sure the MCP call is considered, and then executed, by the LLM, ensure the model has tools support, and that Function Calling parameter set to Native in the Advanced Params section of the model configuration.

Here an example to create a specialized agent to return the current time, using the MCP server.

  1. Navigate to Workspace -> Models -> New Model.
  2. Set Model Name to Qwen3-Assistant.
  3. Set Base Model to qwen3:8b.
    • Or any other model supporting tools calling
  4. Set Description to Return the current time.
  5. No need to set any System Prompt.
  6. Advanced Params -> Show.
    • Set Function Calling to Native.
  7. In the Tools, check Time MCP via mcpo or mcp-time.

Save and start chatting with the agent, for example asking What's the current time?. Here what the result could be, where the result of the time MCP is expanded for additional clarity:

Open WebUI chat window
Example of an Open WebUI agent using a local MCP time server exposed via mcpo

Open WebUI and native MCP integration

The integration of the Model Context Protocol (MCP) servers within Open WebUI significantly extends its functionality, by allowing access to external capabilities. Open WebUI both leverage native support for MCP server offering the HTTP streamable format, and usage of the mcpo (MCP-to-OpenAPI proxy server) for broader compatibility.

This article explains how to configure Open WebUI’s native support for remove MCP servers offering HTTP streamable capabilities. For a guide on how to configure MCPO to leverage all kinds of MCP servers: local via sdio, or remote via SSE (Server-Sent Events), or Streamable HTTP, please refer to this article.

What is MCP?

MCP is an open standard that functions as a universal communication bridge, connecting LLMs to external tools and data sources. This protocol enables AI assistants to access real-time information and perform tasks on a variety of different areas.

MCP servers communicate with clients (the LLMs) using three primary channels: stdio (standard input/output), SSE (Server-Sent Events), or Streamable HTTP.

Connect to MCP via HTTP Streaming

In v0.6.31 Open WebUI added MCP (streamable HTTP) server support, alongside existing OpenAPI server integration. This allows to connect directly to an MCP server that exposes its functionality over a streaming HTTP endpoint. It supports Bearer token, session and OAuth for authentication, if necessary (doc page, but very basic so far).

To find MCP servers, the “Remote MCP Servers” page of Awesome MCP Servers is a good starting point. Looking at all the servers with http support, let’s user the one from CoinGecko.

Once logged in Open WebUI:

  1. Navigate to Admin Panel -> Settings -> External Tools.
  2. Click “+ (Add Connection).
  3. Set the Type to MCP Streamable HTTP.
  4. Set the URL to https://mcp.api.coingecko.com/mcp
  5. Set Auth to None.
  6. Set ID to coingecko_mcp_http.
    • This is the string used in the logs to identity the MCP tool call.
  7. Set Name to CoinGecko MCP via http.
    • This is the string used in the UI to configure the available tools and MCP servers for the model (see below).

Create an agent which uses MCP

To be sure the MCP call is considered, and then executed, by the LLM, ensure the model has tools support, and that Function Calling parameter set to Native in the Advanced Params section of the model configuration.

Here an example to create a specialized agent to return values of crypto assets:

  1. Navigate to Workspace -> Models -> New Model.
  2. Set Model Name to Crypto expert.
  3. Set Base Model to qwen3:8b.
    • Or any other model supporting tools calling
  4. Set Description to Return value of crypto assets.
  5. Set System Prompt to You are a cryptocurrency price lookup agent. When the user specifies one or more cryptocurrency names (e.g., "bitcoin", "ethereum", "BTC", "CRO"), output ONLY the current market price in USD for each, formatted as: [Name]: $[price]. Do not add explanations, context, errors, or any text beyond this. If a crypto is unrecognized, output: [Name]: Not found.
  6. Advanced Params -> Show.
    • Set Function Calling to Native.
  7. In the Tools, check CoinGecko MCP via http.
  8. In the Capabilities, uncheck everything except Status Updates.

Save and start chatting with the agent, for example asking BNB price. Here what the result could be, where the result of the CoinGecko MCP is expanded for additional clarity:

Open WebUI chat window
Example of an Open WebUI agent using MCP server calls

If the MCP server doesn’t support streamable HTTP, it’s possible to use mcpo to access them.

“Servant leader” – Interview with Alfredo Morresi

[…] It was during that period that I met Alfredo Morresi, who even then was the community manager for developers. Ensoul was developing a prototype of a webVR viewer, and Alfredo immediately stood out for his kindness and attentiveness. Among other things, we were fortunate to receive an early physical prototype of the Google Pixel and an invitation to the Google VR Workshop in London. […]

Link to the original post, and thanks Fulvio for the interview!

Home Assistant for privacy, choice and sustainability

In the following talk (in Italian), I explained the basics concepts behind Home Assistant, the number one choice to manage home automation with 3 core principles in mind: privacy, choice and sustainability.

(MOCA 2024, Pescara)

The Sonic-AI project: the brain

Sonic-AI project: an effort to learn how to use LLM, GenAI and ML tools, while building a Sonic-like virtual buddy for my kid, with privacy in mind (full details here). This post explains how to build a local stack to create the basic chatbot, using a LLM and a web UI to chat with it. And how to run the stack on a cloud computer, in case you don’t have enough resources locally (mainly a GPU).

I could have used the many online services to create a customized chatbot in minutes. Instead, I wanted to create a stack I can run locally, for two reasons: use open source models to guarantee maximum privacy, and avoid exposing my data to third parties. Privacy is always a compromise with complexity. So, time to get hands dirty.

Searching around, host ALL your AI locally video provided a good idea to start with.

Choose an LLM

Nowadays (Aug 2024) LLMs are perfect for creating chatbots. They embed NLP capabilities, can speak different languages, the already know a lot about the world and can be customized to learn specific knowledge domains.

Speaking about models, the landscape of open source models to use is very wide: Mistral, Gemma, Llama, Phi-3, etc. In their respective versions, small, medium and large sizes, and potential customization. Each one of them has strenghts and limits, so I took one close to my work: Gemma2 with 9B parameters, a good compromise between complexity of the model, resources required to decently run it on a “normal pc” with a normal GPU, and support of Italian language.

Ollama icon

Ollama was the no-brain choice to interact with the LLM, considering how easy the setup and usage is, the large array of options it offers, support for nVidia and AMD GPUs, and how widely integrated Ollama is with other tools.

The only downside was the usage of the command line to interact with Ollama – I love it, but my kid doesn’t. So I needed a better UI to create my chatbot.

Choose a user-friendly UI

In the open source landscape there are two main players: Open WebUI and the oobabooga’s Text generation web UI. I selected the former, Open WebUI, because it has an easier-to-use and more polished interface, offers a chatbox experience out-of-the-box, has the ability to create agents, plus other handy capabilities useful for the other parts of my project (like TTS, STT, etc).

Ice on the cake, the project offers a ready-to-use docker image (https://ghcr.io/open-webui/open-webui:ollama) containing Ollama + Open WebUI, CUDA drivers, and a lot of pre-made configurations to wire everything together. It means no installation and configuration headaches.

At this point, it’s time to assemble everything togher.

Host the chatbot stack

I confess, I don’t own a machine with a good enough GPU to run mid-size models 😭. I’ll solve for this soon, but in the meantime I had the idea to provision a self-managed virtual machine with an appropriate CPU + GPU config, connect a disk, install an OS image, and use it as it was my “local” computer. This VM-based setup allowed to quickly iterate at the beginning of the project, try different hardware configs and find the one most appropriate for what I needed, spending few $ per day to keep using a VM instance.

Well, I tried hard to create such VM on Google Compute Engine, but with no success, all the time with the same error of no available resources. I even used this nice gpu-finder tool, to automate the creation of different configs (N1 machines with 2 vCores with both nVidia Tesla T4 or Tesla P4 single GPU) on different days in all the zones offering these GPUs, but I wasn’t able to create a VM a single time.

So, I had to look elsewhere. And I ended-up chosing RunPod.

It allows to create a VM (called Pod) selecting among different types of really available GPUs, the billing is quite cheap, and in addition to a webui, it offers CLI and SDKs to orchestrate everything, for example from a Colab. The downside, at least for me, was they didn’t offer a real VM which I could freely administrate: the only way to install software and configs was via a docker image. I was lucky enough becase the image with everything I needed existed and was https://ghcr.io/open-webui/open-webui:ollama. Otherwise, I had to create one with my custom config, deploy somewhere, and then install on RunPod. Feasible, but why make life more complex?

So, while waiting to buy a machine with a GPU to be fully local, the RunPod solution was a really good option.

Because my plan was to create different pods to experiment, instead of having a single, always-running instance, I created a network volume to store all my configs across instances, with these configs:

I chose a location with available A40 GPUs – from my tests, a single one manages without problems the latest mid-size models (alternatively, also a RTX3090 worked great too), and 50GB were enought to store different models + configs.

Then, I created a template (Docker containers images paired with a configuration) to host my “LLM brain”:

Relevant configurations:

  • Container Image: https://ghcr.io/open-webui/open-webui:ollama
  • Volume disk: 0Gb – no need to have a volume disk, as it will be replaced by the network volume later
  • Volume Mount Path: /app/backend/data – this is the folder where the docker image saves models, configs, etc.
    • Adding the folder with all configs to a volume disks in the template, and then connect a network volume during pod creation, automatically saves all the configs on the network volume
  • Environment Variables
    • OLLAMA_MODELS: /app/backend/data/ollama/models – this will move downloaded models to the network volume, so there is no need to redownload models every time a new instance is created

Finally, I deployed a pod to “run the brain”, using the template just created, with 2 vCPUs and 8Gb or RAM, connected the network disk. I also selected “Secure Cloud” to have leave everything in the RunPod server farm, and a “Spot instance“, as I didn’t need absolute reliability for the tests. Waited for all the docker layers to be downloaded, opened the running Pod settings and connected to the HTTP port.

Welcome to a brand-new instace of Open WebUI.

Customize the bot to impersonate Sonic

There are different tutorial on how to configure Open WebUI. This is what I did to create a chatbot with a “Sonic flavor”.

First, I created the admin user, and the user for my kid, called “Leo”, and with a user role.

Then, from the Admin user:

  • Settings -> Admin Panel -> Settings -> Models
    • Pull a Model from Ollama.com
      • gemma2:9b (list available here)
  • Workspace -> Models -> Create a model
    • Image: upload an image
    • Name: Sonic
    • Model ID: sonic_v1
    • Base Model: gemma2:9b
    • Description: Ciao, sono Sonic the Hedgehog
      • Equivalment in English: Hi, I'm Sonic the Hedgehog
    • System prompt:
      • Interpreti Sonic the Hedgehog, della serie Sonic Adventure. Farai domande e risponderai come Sonic the Hedgehog, usando il tono, i modi e il vocabolario che Sonic the Hedgehog userebbe. Usa un linguaggio adatto ai bambini, non scivere spiegazioni. Rispondi in italiano. Hai la conoscenza di Sonic the Hedgehog. Vivi a Green Hills, nel Montana. Sei amichevole e sempre disponibile a dare una mano.
        • The prompt is in Italian, so the model will speak in Italian.
      • Equivalent in English: You play as Sonic the Hedgehog, from the Sonic Adventure series. You will ask and answer questions like Sonic the Hedgehog, using the tone, manner, and vocabulary Sonic the Hedgehog would use. Use child-friendly language, do not write any explanations. Answer in Italian. You have knowledge of Sonic the Hedgehog. You live in Green Hills, Montana. You are friendly and always willing to lend a hand.
    • Capabilities: uncheck Vision, as this model is text-only for now

Then, I logged with my kid’s user and:

  • Settings -> Settings
    • General -> Language -> Italian
    • Interface -> Default Model: Sonic

Finally, my kid can interact with his preferred hero, in Italian.

Step one of the project… Achieved! 🎉

To “pause” the pod from running, and save some money, the pod can be simply terminated in the RunPod management UI. All the configs will persist because they’re are stored in the network volume. To restart everything again, re-create the pod using the template, deploy and connect to it once ready.

The Sonic-AI project – intro

I always considered a “real world” project the best way to learn a new tech: get the hands dirty, be guided by (sort-of) realistic user requirements, and the excitement of building something step after step, one solved failure at time.

This is why I decided to “be inspired” by the passion one of my kids has for Sonic the Hedgehog, and use the latest tools available in the ML and GenAI space to create for him a “Sonic-AI buddy”. A virtual chatbot, looking and acting like Sonic, my kid can interact and conversate with, safely and having fun.

To break-down complexity of such project, so I don’t need to learn everything-about-LLM before creating something, I want to start with a very basic working prototype providing simple chatbot features (the so-called MVP), and then develop different “skills”, with each one of them requiring learning and using different ML or GenAI techs to be be achieved. Incremental learning and improvements.

  • The “Brain” (done): the core part of the project, a text chatbot agent able to impersonate Sonic, to provide my kid the feeling he can ask basic questions to him, and gets replies coherent with the style of his preferred heroes.
    • Technologies: an LLM used as a chatbot, a UI to interact with it, a system prompt to give the basic characterization.
  • The “Memories” (in progress): enrich the chatbot with domain-specific knowledge of the world of Sonic and his friends, so conversations won’t only be “in the tone” of Sonic, but also relavant to Sonic-verse.
    • Technologies: a mix of better prompting, fine tuning, RAG or something else to give the LLM the right knowledge about the character to impersonate
  • The “Voice” (in progress): what if the bot can speak with the voice my kid associates with Sonic?
    • Technologies: a customized Text-to-Speech model trained on the voice to reproduce, and a speaker
  • The “Hearing” (in progress): to completely get rid of text interaction, questions should be asked via voice
    • Technologies: connect the chatbot with a Speech-To-Text engine, and a mic
  • The “Eyes” (in progress): Sonic should be able to see the world around him
    • Technologies: something to capture a video stream, and a multimodal LLM to process images and text.
  • The “Body” (in progress): This is something that will connect the different input/output sensors. I’m still unsure how to create the body.about this In addition to a voice, the bot should have some sort of tangible body
    • Technologies: it could be a 3D printed figure of Sonic, an animated characted or something else

There is another preprequisite I want to fullfil: everything can run locally and based on OSS software. I’m a little bit paranoic mindful about privacy, and for no reasons the interaction of my kid should end-up in training dataset, or for internal model analysis, or somewhere else. So, privacy first.

Let’s start with “the brain“, the main element to which all the rest can then be attached.