Safely Running Ollama with CPU Temperature Checks in Python
Learn how to wrap Ollama models with a CPU temperature check in Python. This tutorial shows you how to prevent overheating by pausing execution when system temperatures exceed safe limits.
Introduction
When running large language models (LLMs) locally, especially on laptops or small servers, overheating can quickly become a serious problem. High CPU temperatures not only impact performance but can also shorten hardware lifespan.
In this article, we’ll build a Python proxy wrapper for LangChain chat models that monitors CPU temperature before executing any LLM calls. If the temperature exceeds a threshold, the code will wait until it cools down before continuing.
This approach is useful if you:
Run local LLMs on laptops with limited cooling.
Want to prevent thermal throttling during long inference loops.
Need both async and sync support in LangChain workflows.
Let’s break down the implementation step by step.
Importing Required Libraries
import asyncio
import inspect
import subprocess
import time
from typing import Any
from langchain.chat_models import init_chat_model
from langchain_core.language_models import BaseChatModel
asyncio
→ for handling asynchronous calls to LLMs.inspect
→ to detect whether a function is async or sync.subprocess
→ to execute the system commandsmctemp
that reads CPU temperature.time
→ for synchronous delays when CPU overheats.typing.Any
→ for flexible typing.langchain.chat_models.init_chat_model
→ initializes a LangChain LLM.BaseChatModel
→ the base class for type safety.
Creating the Temperature-Checked Wrapper
class LLMWithTemperatureCheck:
"""A dynamic proxy wrapper that adds a CPU temperature check before any
call to the underlying LLM. Forwards all attributes and methods.
"""
This class acts as a proxy for any LangChain LLM. It intercepts method calls and checks the CPU temperature before execution.
Initializing the Wrapper
def __init__(
self,
llm_instance: BaseChatModel,
temp_threshold: float = 85.0,
wait_seconds: int = 2,
):
self._llm = llm_instance
self.temp_threshold = temp_threshold
self.wait_seconds = wait_seconds
llm_instance
→ the actual LangChain LLM.temp_threshold
→ maximum allowed CPU temperature (default: 85°C).wait_seconds
→ how long to wait before retrying if the system is too hot.
Reading CPU Temperature
def _get_cpu_temperature(self):
"""Get the current CPU temperature using smctemp command.
Retries until the output is not 40, which seems to be a bug.
"""
while True:
try:
temp_output = subprocess.check_output(
"smctemp -c -i25 -n180 -f", shell=True, text=True
)
temp_value = float(temp_output.strip())
if temp_value == 40: # buggy reading, retry
print("Got temperature=40 (buggy). Retrying...")
time.sleep(0.1)
continue
print(f"Current temperature is: {temp_value}")
return temp_value
except (subprocess.SubprocessError, ValueError) as e:
print(f"Error getting temperature: {e}")
return 0
Runs
smctemp
to fetch CPU temperature.Handles a bug where
40°C
is returned incorrectly.Falls back to
0
if something goes wrong.
Intercepting LLM Calls
def __getattr__(self, name: str) -> Any:
original_attr = getattr(self._llm, name)
if not callable(original_attr):
return original_attr
This method dynamically forwards attributes and methods to the original LLM, unless they are callable functions, in which case it wraps them with temperature checks.
Handling Asynchronous LLM Calls
if inspect.iscoroutinefunction(original_attr):
async def async_wrapper(*args, **kwargs):
while (
temp := self._get_cpu_temperature()
) and temp > self.temp_threshold:
print(
f"CPU temp high ({temp:.1f}°C). Waiting {self.wait_seconds}s..."
)
await asyncio.sleep(self.wait_seconds)
return await original_attr(*args, **kwargs)
return async_wrapper
Detects if the method is async.
If CPU temp is too high, it waits (
await asyncio.sleep
) until safe.Finally, it calls the original async LLM method.
Handling Synchronous LLM Calls
else:
def sync_wrapper(*args, **kwargs):
while (
temp := self._get_cpu_temperature()
) and temp > self.temp_threshold:
print(
f"CPU temp high ({temp:.1f}°C). Waiting {self.wait_seconds}s..."
)
time.sleep(self.wait_seconds)
return original_attr(*args, **kwargs)
return sync_wrapper
Same as the async wrapper, but uses blocking time.sleep
instead of await
.
Helper Function to Initialize a Safe LLM
def get_llm(model_name: str, **kwargs: Any) -> LLMWithTemperatureCheck:
"""Get LLM service wrapped with temperature checking."""
original_llm = init_chat_model(temperature=0, model=model_name, **kwargs)
return LLMWithTemperatureCheck(original_llm)
Calls LangChain’s
init_chat_model
.Wraps the result inside
LLMWithTemperatureCheck
.Ensures all future calls are temperature-aware.
Complete File
import asyncio
import inspect
import subprocess
import time
from typing import Any
from langchain.chat_models import init_chat_model
from langchain_core.language_models import BaseChatModel
class LLMWithTemperatureCheck:
"""A dynamic proxy wrapper that adds a CPU temperature check before any
call to the underlying LLM. Forwards all attributes and methods.
"""
def __init__(
self,
llm_instance: BaseChatModel,
temp_threshold: float = 85.0,
wait_seconds: int = 2,
):
self._llm = llm_instance
self.temp_threshold = temp_threshold
self.wait_seconds = wait_seconds
def _get_cpu_temperature(self):
"""Get the current CPU temperature using smctemp command.
Retries until the output is not 40, which seems to be a bug.
"""
while True:
try:
temp_output = subprocess.check_output(
"smctemp -c -i25 -n180 -f", shell=True, text=True
)
# Extract the temperature value from the output
temp_value = float(temp_output.strip())
if temp_value == 40: # buggy reading, retry
print("Got temperature=40 (buggy). Retrying...")
time.sleep(0.1) # small delay before retry
continue
print(f"Current temperature is: {temp_value}")
return temp_value
except (subprocess.SubprocessError, ValueError) as e:
print(f"Error getting temperature: {e}")
return 0 # Return a safe value if temperature check fails
def __getattr__(self, name: str) -> Any:
original_attr = getattr(self._llm, name)
if not callable(original_attr):
return original_attr
if inspect.iscoroutinefunction(original_attr):
async def async_wrapper(*args, **kwargs):
while (
temp := self._get_cpu_temperature()
) and temp > self.temp_threshold:
print(
f"CPU temp high ({temp:.1f}°C). Waiting {self.wait_seconds}s..."
)
await asyncio.sleep(self.wait_seconds)
return await original_attr(*args, **kwargs)
return async_wrapper
else:
def sync_wrapper(*args, **kwargs):
while (
temp := self._get_cpu_temperature()
) and temp > self.temp_threshold:
print(
f"CPU temp high ({temp:.1f}°C). Waiting {self.wait_seconds}s..."
)
time.sleep(self.wait_seconds)
return original_attr(*args, **kwargs)
return sync_wrapper
def get_llm(model_name: str, **kwargs: Any) -> LLMWithTemperatureCheck:
"""Get LLM service wrapped with temperature checking."""
original_llm = init_chat_model(temperature=0, model=model_name, **kwargs)
return LLMWithTemperatureCheck(original_llm)
Final Thoughts
This wrapper ensures that your Ollama models don’t overheat your machine by automatically pausing execution when CPU temperature exceeds a safe threshold.
It’s especially useful for:
Long training/inference sessions.
Laptops without strong cooling systems.
Experimenting with LLMs locally without risking damage.
By combining LangChain with system-level monitoring, you can run AI models in a much safer and more controlled environment.
Related Posts
How to handle Human in the loop with Langgraph and FastAPI
A practical guide to building a chatbot with human review capabilities. Learn how to combine AI chat with human oversight using React, FastAPI, and Gemini LLM. Perfect for developers looking to create more reliable AI chat applications that benefit from human expertise
Migrating local setup from ollama to llama.cpp
Ollama is a great tool to run local llm models, but it's not the fastest and sometimes has some bugs. To run everything smoother you can use llama.cpp. This guide shows how to use existing ollama models for llama.cpp
TranslatePrompt: The AI Translator That Learns From Your Instructions
Stop making the same corrections. TranslatePrompt is the smart translator that learns from your instructions to build custom glossaries and rules, ensuring perfect translations every time.
Let's connect !!
Get in touch if you want updates, examples, and insights on how AI agents, Langchain and more are evolving and where they’re going next.