By Bernat Sampera 4 min read

How to create your own OCR service with Doctr and modal

Modal allows you to host your own small services on the cloud to run it's operations there. Great for open source llms, ocr... in general things that require heavy computation or where you want to have an small microservice that is easily deployable

The Problem

For a recent project I had to convert and translate expedients like the following one. These expedients came as images and I needed a tool to be able to extract the text.
I started by using a combination of easyocr and pytesseract but the results were pretty poor, then I also tried docling and doctr and the results were much better.
Finally I decided to use doctr as the gpu cost was much lower and I didn’t really need the extra features from docling.

In this post I will briefly explain how to set up doctr in modal, the place where I decided to host it.

example image for ocr

To this

"Wilhem School\nSt. Wilhem\nCo Montgomery\nIreland\nTo whom it may concern,\nName: Fulano de Tal\nDate of Birth: 05/07/2025\nFulano successfully completed 4th year at Wilhem School for the academic year September 2024 to June\n2025.\nEnd of year report:\nSUBJECT\n% GRADE\nEnglish\n60\nMathematics\n92\nFrench\n60\nHistory\n80\nChemistry\n100\nBiology\n80\nPhysical Education\n60\nTechnology\n85\nTechnical Graphics\n88\nIrish Secondary School grading legend:\nGrade\nGrade\nGrade\nGrade Scale\nDescription Grade Scale\nDescription Grade Scale\nDescription\nA\n85.00- Excellent\nC\n55.00 Good\nE\n25.00-\n100.00\n69.99\n39.99\nWeak\nB\n70.00- Very Good\nD\n40.00. Satisfactory F\n10.00-\n84.99\n54.99 Pass\n24.99\nFail\n- -\nMax Mustermann\nActing Principal"

Services and Libraries used.

modal.com : This is a service to be able to run code remotely. It’s in python and the syntax is a bit weird, but once the basic syntax is learned it can make complex problems much easier.

Doctr : Doctr is a lightweight open source OCR library for python to read images and convert the content of the images to text. The models can be choosen. Alternatives: easyocr (from my experience, worse performance and slower), docling (converts to markdown respecting the format). As a paid version Nvidia Nemo retriever has a very good performance.

You can see the example here GIST

Challenges

The main problem I had is that every time I was triggering the function the model from doctr was being downloaded again, to solve it I set up a custom volume where the model would be stored and just downloaded one single time.

Set up the modal image

When using doctr in modal is very important that we add an extra volume so the model is not downloaded every time a new container is created.

# Configuration
models_vol = modal.Volume.from_name("doctr-models", create_if_missing=True)
MODEL_DIR = Path("/models")

# Enhanced image with DocTR dependencies
image = (
    modal.Image.debian_slim()
    .apt_install("libgl1-mesa-glx", "libglib2.0-0", "libsm6", "libxext6", "libxrender-dev", "libgomp1")
    .pip_install("fastapi[standard]", "python-doctr>=0.7.0", "torch", "torchvision", "huggingface_hub[hf_transfer]==0.26.2")
    .env({"HF_HUB_ENABLE_HF_TRANSFER": "1"})
)

app = modal.App(name="extract-doctr-markdown", image=image)

Create Post function to trigger the modal execution

Also important to define here the MODEL_DIR that has to be attached to the function

@app.function(
    image=image,
    volumes={str(MODEL_DIR): models_vol},
    scaledown_window=300,
)
@modal.fastapi_endpoint(method="POST")
def extract_text(file: UploadFile = File(...)):
   ...

How to call this service

The modal url will look like something like this, you can see this in the dashboard from modal.

DOCTR_API_URL="https://<<username>>--<<function_name>>.modal.run"
response = requests.post(DOCTR_API_URL, files=files)

modal example doctr call

Metrics

Doing each of these calls costs less than 0.01$ and you can test with the free plan that is of 30$ per month.

It is not the fastest as it takes about 10 seconds, much less if another call has been done in the past few minutes as the container is already started