Safely Running Ollama with CPU Temperature Checks in Python samperalabs

Introduction

When running large language models (LLMs) locally, especially on laptops or small servers, overheating can quickly become a serious problem. High CPU temperatures not only impact performance but can also shorten hardware lifespan.

In this article, we’ll build a Python proxy wrapper for LangChain chat models that monitors CPU temperature before executing any LLM calls. If the temperature exceeds a threshold, the code will wait until it cools down before continuing.

This approach is useful if you:

Run local LLMs on laptops with limited cooling.
Want to prevent thermal throttling during long inference loops.
Need both async and sync support in LangChain workflows.

Let’s break down the implementation step by step.

Importing Required Libraries

import asyncio
import inspect
import subprocess
import time
from typing import Any

from langchain.chat_models import init_chat_model
from langchain_core.language_models import BaseChatModel

asyncio → for handling asynchronous calls to LLMs.
inspect → to detect whether a function is async or sync.
subprocess → to execute the system command smctemp that reads CPU temperature.
time → for synchronous delays when CPU overheats.
typing.Any → for flexible typing.
langchain.chat_models.init_chat_model → initializes a LangChain LLM.
BaseChatModel → the base class for type safety.

Creating the Temperature-Checked Wrapper

class LLMWithTemperatureCheck:
    """A dynamic proxy wrapper that adds a CPU temperature check before any
    call to the underlying LLM. Forwards all attributes and methods.
    """

This class acts as a proxy for any LangChain LLM. It intercepts method calls and checks the CPU temperature before execution.

Initializing the Wrapper

def __init__(
    self,
    llm_instance: BaseChatModel,
    temp_threshold: float = 85.0,
    wait_seconds: int = 2,
):
    self._llm = llm_instance
    self.temp_threshold = temp_threshold
    self.wait_seconds = wait_seconds

llm_instance → the actual LangChain LLM.
temp_threshold → maximum allowed CPU temperature (default: 85°C).
wait_seconds → how long to wait before retrying if the system is too hot.

Reading CPU Temperature

def _get_cpu_temperature(self):
    """Get the current CPU temperature using smctemp command.
    Retries until the output is not 40, which seems to be a bug.
    """
    while True:
        try:
            temp_output = subprocess.check_output(
                "smctemp -c -i25 -n180 -f", shell=True, text=True
            )
            temp_value = float(temp_output.strip())

            if temp_value == 40:  # buggy reading, retry
                print("Got temperature=40 (buggy). Retrying...")
                time.sleep(0.1)
                continue

            print(f"Current temperature is: {temp_value}")
            return temp_value

        except (subprocess.SubprocessError, ValueError) as e:
            print(f"Error getting temperature: {e}")
            return 0

Runs smctemp to fetch CPU temperature.
Handles a bug where 40°C is returned incorrectly.
Falls back to 0 if something goes wrong.

Intercepting LLM Calls

def __getattr__(self, name: str) -> Any:
    original_attr = getattr(self._llm, name)

    if not callable(original_attr):
        return original_attr

This method dynamically forwards attributes and methods to the original LLM, unless they are callable functions, in which case it wraps them with temperature checks.

Handling Asynchronous LLM Calls

if inspect.iscoroutinefunction(original_attr):

    async def async_wrapper(*args, **kwargs):
        while (
            temp := self._get_cpu_temperature()
        ) and temp > self.temp_threshold:
            print(
                f"CPU temp high ({temp:.1f}°C). Waiting {self.wait_seconds}s..."
            )
            await asyncio.sleep(self.wait_seconds)
        return await original_attr(*args, **kwargs)

    return async_wrapper

Detects if the method is async.
If CPU temp is too high, it waits (await asyncio.sleep) until safe.
Finally, it calls the original async LLM method.

Handling Synchronous LLM Calls

else:
    def sync_wrapper(*args, **kwargs):
        while (
            temp := self._get_cpu_temperature()
        ) and temp > self.temp_threshold:
            print(
                f"CPU temp high ({temp:.1f}°C). Waiting {self.wait_seconds}s..."
            )
            time.sleep(self.wait_seconds)
        return original_attr(*args, **kwargs)

    return sync_wrapper

Same as the async wrapper, but uses blocking time.sleep instead of await.

Helper Function to Initialize a Safe LLM

def get_llm(model_name: str, **kwargs: Any) -> LLMWithTemperatureCheck:
    """Get LLM service wrapped with temperature checking."""
    original_llm = init_chat_model(temperature=0, model=model_name, **kwargs)
    return LLMWithTemperatureCheck(original_llm)

Calls LangChain’s init_chat_model.
Wraps the result inside LLMWithTemperatureCheck.
Ensures all future calls are temperature-aware.

Complete File

import asyncio
import inspect
import subprocess
import time
from typing import Any

from langchain.chat_models import init_chat_model
from langchain_core.language_models import BaseChatModel


class LLMWithTemperatureCheck:
    """A dynamic proxy wrapper that adds a CPU temperature check before any
    call to the underlying LLM. Forwards all attributes and methods.
    """

    def __init__(
        self,
        llm_instance: BaseChatModel,
        temp_threshold: float = 85.0,
        wait_seconds: int = 2,
    ):
        self._llm = llm_instance
        self.temp_threshold = temp_threshold
        self.wait_seconds = wait_seconds

    def _get_cpu_temperature(self):
        """Get the current CPU temperature using smctemp command.
        Retries until the output is not 40, which seems to be a bug.
        """
        while True:
            try:
                temp_output = subprocess.check_output(
                    "smctemp -c -i25 -n180 -f", shell=True, text=True
                )
                # Extract the temperature value from the output
                temp_value = float(temp_output.strip())

                if temp_value == 40:  # buggy reading, retry
                    print("Got temperature=40 (buggy). Retrying...")
                    time.sleep(0.1)  # small delay before retry
                    continue

                print(f"Current temperature is: {temp_value}")
                return temp_value

            except (subprocess.SubprocessError, ValueError) as e:
                print(f"Error getting temperature: {e}")
                return 0  # Return a safe value if temperature check fails

    def __getattr__(self, name: str) -> Any:
        original_attr = getattr(self._llm, name)

        if not callable(original_attr):
            return original_attr

        if inspect.iscoroutinefunction(original_attr):

            async def async_wrapper(*args, **kwargs):
                while (
                    temp := self._get_cpu_temperature()
                ) and temp > self.temp_threshold:
                    print(
                        f"CPU temp high ({temp:.1f}°C). Waiting {self.wait_seconds}s..."
                    )
                    await asyncio.sleep(self.wait_seconds)
                return await original_attr(*args, **kwargs)

            return async_wrapper
        else:

            def sync_wrapper(*args, **kwargs):
                while (
                    temp := self._get_cpu_temperature()
                ) and temp > self.temp_threshold:
                    print(
                        f"CPU temp high ({temp:.1f}°C). Waiting {self.wait_seconds}s..."
                    )
                    time.sleep(self.wait_seconds)
                return original_attr(*args, **kwargs)

            return sync_wrapper


def get_llm(model_name: str, **kwargs: Any) -> LLMWithTemperatureCheck:
    """Get LLM service wrapped with temperature checking."""
    original_llm = init_chat_model(temperature=0, model=model_name, **kwargs)
    return LLMWithTemperatureCheck(original_llm)

Final Thoughts

This wrapper ensures that your Ollama models don’t overheat your machine by automatically pausing execution when CPU temperature exceeds a safe threshold.

It’s especially useful for:

Long training/inference sessions.
Laptops without strong cooling systems.
Experimenting with LLMs locally without risking damage.

By combining LangChain with system-level monitoring, you can run AI models in a much safer and more controlled environment.

Safely Running Ollama with CPU Temperature Checks in Python

Introduction

Importing Required Libraries

Creating the Temperature-Checked Wrapper

Initializing the Wrapper

Reading CPU Temperature

Intercepting LLM Calls

Handling Asynchronous LLM Calls

Handling Synchronous LLM Calls

Helper Function to Initialize a Safe LLM

Final Thoughts

Related Posts

How to handle Human in the loop with Langgraph and FastAPI

Migrating local setup from ollama to llama.cpp

TranslatePrompt: The AI Translator That Learns From Your Instructions

Let's connect !!