.

Technology

OCRopus

A modular document analysis system using Python and LSTM networks for high-accuracy optical character recognition.

OCRopus delivers a suite of document analysis tools designed for high-performance text recognition and layout analysis. Originally developed by Thomas Breuel (Google/DFKI), the framework utilizes Long Short-Term Memory (LSTM) networks to process complex scripts and historical fonts. The system operates through a modular pipeline: binarization, page segmentation, and line recognition. This structure enables precise control over each stage (using tools like ocropus-nlbin or ocropus-rrecognize) and supports custom training for specialized datasets.

https://github.com/ocropus/ocropus
2 projects · 2 cities

Related technologies

Recent Talks & Demos

Showing 1-2 of 2

Members-Only

Sign in to see who built these projects