By Bernat Sampera 13 min read Follow:

Using Ollama in a simple App with the MCP Protocol

Learn to build AI agents with the MCP protocol. This tutorial provides a complete Python example of setting up a tool server and using a local Ollama model to perform structured function calls. A practical introduction to MCP for real-world AI applications.

Ever felt that AI chatbots were cool but wished they could do things instead of just talking? What if you could tell an AI, "Add a new post to my blog about space exploration," and it would actually do it?

This concept is called function calling (or tool use), and it's the bridge between language models and real-world action. Today, we're going to build a simple but powerful application that does exactly this. We'll use a local LLM running on Ollama to control a simple blog database, and we'll connect them using the MCP (Model Context Protocol), a modern protocol designed for this exact purpose.

By the end of this tutorial, you will have:

  1. A server that exposes Python functions as "tools."

  2. A client that uses a local LLM (Llama 3.1) to understand a command and call the right tool.

  3. A clear understanding of how MCP makes this communication simple and robust.

Let's get started!

Step 0: Prerequisites and Setup

Before we write any code, let's get our environment ready.

1. Install Ollama and Llama 3.1:
If you don't have it already, install Ollama. Once installed, open your terminal and pull the model we'll be using:

ollama run llama3.1:latest

This might take a few minutes. Once it's done, you'll have a powerful LLM running locally on your machine.

2. Set up the Python Project:
Create a new folder for our project. Let's call it mcp_ollama_blog. Inside that folder, create a file named pyproject.toml. This file will manage our project's dependencies.

mkdir mcp_ollama_blog
cd mcp_ollama_blog
uv init
uv venv
source .venv/bin/activate

Paste the following content into your pyproject.toml:

[project]
name = "mcp_ollama_blog"
version = "0.1.0"
dependencies = [
    "aiosqlite>=0.21.0",
    "mcp[cli]>=1.13.1",
    "uvicorn",
    "langchain-ollama",
]
  

Now, create a virtual environment and install everything with one command.

uv sync

Our environment is now ready!

Part 1: The Server - Creating MCP tool

The server's job is to define the actions our AI can perform. We'll define a set of Python functions and expose them as "tools" using the mcp library.

Create a file named mcp_tools.py

# mcp_tools.py

from typing import Optional
import aiosqlite
from mcp.server.fastmcp import FastMCP

# Create an MCP server instance
mcp = FastMCP("Samperalabs")

DB_FILE = "content.db"

async def init_db():
    async with aiosqlite.connect(DB_FILE) as db:
        await db.execute("""
            CREATE TABLE IF NOT EXISTS posts (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                title TEXT NOT NULL,
                content TEXT NOT NULL
            )
        """)
        await db.commit()

@mcp.tool()
async def get_blog_posts(limit: Optional[int] = None) -> str:
    """Get blog posts with basic metadata."""
    query = "SELECT id, title, content FROM posts "
    params: list = []
    if limit is not None and isinstance(limit, int) and limit > 0:
        query += " LIMIT ?"
        params.append(limit)

    async with aiosqlite.connect(DB_FILE) as db:
        cursor = await db.execute(query, params)
        rows = await cursor.fetchall()

    list_of_rows = [_format_post_row(r) for r in rows]
    return "\n".join(list_of_rows)

@mcp.tool()
async def add_blog_post(title: str, content: str) -> str:
    """Add a blog post."""
    async with aiosqlite.connect(DB_FILE) as db:
        await db.execute(
            "INSERT INTO posts (title, content) VALUES (?, ?)",
            (title, content),
        )
        await db.commit()
        return "Blog post added successfully"

@mcp.tool()
async def remove_blog_post(id: int) -> str:
    """Remove a blog post."""
    async with aiosqlite.connect(DB_FILE) as db:
        await db.execute("DELETE FROM posts WHERE id = ?", (id,))
        await db.commit()
        return "Blog post removed successfully"

def _format_post_row(row: tuple) -> str:
    id_, title, content = row
    return f"id: {id_} | title: {title} | content={content}"
  

What's happening here?

  • @mcp.tool(): This is the magic decorator from the mcp-server library. It takes a regular Python function and registers it as a tool that can be called remotely over the MCP protocol. It automatically understands the function's name, arguments (title, content), and their types.

  • async / await: You'll see these keywords a lot. In simple terms, they allow our program to handle network requests and database operations efficiently without freezing. Think of it as telling Python, "This might take a moment, feel free to work on something else while you wait." You don't need to be an expert for this tutorial, just know it's what makes modern network apps fast.

  • aiosqlite: We're using a simple SQLite database to store our blog posts. aiosqlite is the async-compatible version.

Next, we need a way to run this server. Create a file called server_http.py

# server_http.py

import asyncio
from uvicorn import run as uvicorn_run
from mcp_tools import mcp, init_db

def main() -> None:
    app = mcp.streamable_http_app()  # This creates a web app with an /mcp endpoint
    uvicorn_run(app, host="127.0.0.1", port=8000, log_level="info")

if __name__ == "__main__":
    asyncio.run(init_db()) # Initialize the database before starting
    main()
  

This small file uses uvicorn to start a web server that listens for MCP requests and directs them to our tools.

Part 2: Calling the server with ollama

Now for the exciting part! The client will take a natural language command (like "remove post number 1"), ask Ollama to translate it into a structured tool call, and then send that call to our server.

Create a file named mcp_ollama_client.py

# mcp_ollama_client.py

import os
import asyncio
from typing import Dict
import json
from langchain_ollama import ChatOllama
from mcp.client.streamable_http import streamablehttp_client
from mcp import ClientSession

# Connect to our local Ollama model
llm = ChatOllama(model="llama3.1:latest", temperature=0, format="json")

def choose_with_langchain(expression: str) -> Dict:
    # This is our "system prompt" that teaches the LLM about our tools
    system = (
        "You are a tool selector. Return ONLY JSON with keys 'name' and 'arguments'. "
        "Allowed tools: "
        "1. 'get_blog_posts'. 'arguments' MUST be {\"limit\": number}. "
        "2. 'add_blog_post'. 'arguments' MUST be {'title': string, 'content': string}. "
        "3. 'remove_blog_post'. 'arguments' MUST be {'id': number}. "
        "Use numeric literals; do NOT return a JSON schema or descriptions."
    )
    prompt = system + "\n\nInput: " + expression
    return llm.invoke(prompt)

async def call_math_with_llm(expr: str, model_name: str) -> None:
    # 1. Ask the LLM to choose a tool based on our input
    choice = json.loads(choose_with_langchain(expr).content)
    print("choice", choice)

    # 2. Connect to the MCP server
    async with streamablehttp_client("http://127.0.0.1:8000/mcp") as (read, write, _):
        async with ClientSession(read, write) as session:
            await session.initialize()
            # 3. Call the chosen tool with the chosen arguments
            result = await session.call_tool(
                name=choice["name"], arguments=choice["arguments"]
            )
            text_blocks = [
                b for b in result.content if getattr(b, "type", None) == "text"
            ]
            print(f"Input: {expr}")
            print(f"Tool: {choice['name']}  Args: {choice['arguments']}")
            if text_blocks:
                print(f"Result: {text_blocks}")
            else:
                print(result.model_dump_json(indent=2))

async def main():
    model_name = os.getenv("OLLAMA_MODEL", "llama3.1:latest")

    # --- CHOOSE ONE COMMAND TO RUN ---
    # To run a command, uncomment the line. To run a different one,
    # comment the old one and uncomment the new one.

    await call_math_with_llm(
        "add a blog post about the ww2, the title is 'World War 2' and the content is 'World War 2 was a global conflict that lasted from 1939 to 1945'",
        model_name,
    )
    # await call_math_with_llm("remove the blog post with id 1", model_name)
    # await call_math_with_llm("list the blog posts", model_name)


if __name__ == "__main__":
    asyncio.run(main())
  

Dissecting the Client:

  • choose_with_langchain: This function is our "prompt engineer." The system prompt is a set of instructions for the LLM. We tell it exactly what tools are available and what format we expect back (JSON). This strict formatting is crucial for our program to work reliably.

  • call_math_with_llm: This function orchestrates the process.

    1. It sends our plain English command to the LLM.

    2. It connects to our server using streamablehttp_client. This opens the communication line.

    3. The key line is session_call_tool(...). This is where the mcp-client library does its job. It takes the tool name and arguments (provided by the LLM) and sends them over the MCP protocol to the server. The server receives the request, runs the corresponding Python function, and sends the result back.

Part 3: Putting It All Together

Now, let's see our creation in action. This requires two separate terminal windows.

Terminal 1: Start the Server

In your first terminal, start the MCP server. It will also create the content.db file for the first time.

# Make sure your virtual environment is active
# source .venv/bin/activate
uvicorn server_http:main --reload
  

You should see output indicating the server is running, like this:

INFO:     Started server process [12345]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
  

Leave this terminal running!

Terminal 2: Run the Client

Open a second terminal window and navigate to the same project directory.

Our mcp_ollama_client.py file is set up to let you easily switch between commands by commenting and uncommenting lines in the main function.

Action 1: Add a Blog Post

First, make sure the add_blog_post call is uncommented in mcp_ollama_client.py

    //...
async def main():
    //...
    await call_math_with_llm(
        "add a blog post about the ww2, the title is 'World War 2' and the content is 'World War 2 was a global conflict that lasted from 1939 to 1945'",
        model_name,
    )
    # await call_math_with_llm("remove the blog post with id 1", model_name)
    # await call_math_with_llm("list the blog posts", model_name)
//...
  

Now, run the client:

    # Make sure your virtual environment is active
python mcp_ollama_client.py
  

Client Output (Terminal 2):

choice {'name': 'add_blog_post', 'arguments': {'title': 'World War 2', 'content': 'World War 2 was a global conflict that lasted from 1939 to 1945'}}
Input: add a blog post about the ww2, the title is 'World War 2' and the content is 'World War 2 was a global conflict that lasted from 1939 to 1945'
Tool: add_blog_post  Args: {'title': 'World War 2', 'content': 'World War 2 was a global conflict that lasted from 1939 to 1945'}
Result: [TextBlock(type='text', text='Blog post added successfully')]
  

Action 2: List the Blog Posts

Now, go back to mcp_ollama_client.py. Comment out the add_blog_post line and uncomment the get_blog_posts line:

//...
async def main():
    //...
    # await call_math_with_llm( ... ) # The add post line is now commented
    # await call_math_with_llm("remove the blog post with id 1", model_name)
    await call_math_with_llm("list the blog posts", model_name)
//... 

Save the file and run the client again in Terminal 2:

    python mcp_ollama_client.py
  

Client Output (Terminal 2):

    choice {'name': 'get_blog_posts', 'arguments': {}}
Input: list the blog posts
Tool: get_blog_posts  Args: {}
Result: [TextBlock(type='text', text="id: 1 | title: World War 2 | content=World War 2 was a global conflict that lasted from 1939 to 1945")]
  

It works! The client called the server and fetched the post we just added.

A Quick Word on MCP

So what is MCP doing "behind the bars"? Think of it as a specialized language for programs to talk to each other about executing tasks. When you run session.call_toll(...), the client doesn't just send a generic HTTP request. It sends a structured MCP message that says:

  • "I want to execute a tool."

  • "The tool's name is add_blog_post."

  • "Here are the arguments: {'title': '...', 'content': '...'}."

The server, which understands the MCP protocol, receives this, knows exactly what to do, runs the function, and sends a structured response back. This protocol makes the communication explicit, reliable, and easily extendable for more complex scenarios like streaming data.

Conclusion

Congratulations! You have successfully built a full-stack application where a local, private LLM can execute real-world actions on your behalf. You've seen how to:

  • Define tools in Python using mcp-server.

  • Use a system prompt to make Ollama a reliable tool selector.

  • Connect the two using mcp-client.

This pattern is incredibly powerful. Imagine extending this to control your smart home devices ("Ollama, turn on the living room lights"), manage your calendar, or even interact with complex APIs, all powered by an AI that runs entirely on your own machine. You've just taken your first step into that world.

Let's connect !!

Get in touch if you want updates, examples, and insights on how AI agents, Langchain and more are evolving and where they’re going next.