Technology
OCRopus
A modular document analysis system using Python and LSTM networks for high-accuracy optical character recognition.
OCRopus delivers a suite of document analysis tools designed for high-performance text recognition and layout analysis. Originally developed by Thomas Breuel (Google/DFKI), the framework utilizes Long Short-Term Memory (LSTM) networks to process complex scripts and historical fonts. The system operates through a modular pipeline: binarization, page segmentation, and line recognition. This structure enables precise control over each stage (using tools like ocropus-nlbin or ocropus-rrecognize) and supports custom training for specialized datasets.
2 projects
·
2 cities
Related technologies
Recent Talks & Demos
Showing 1-2 of 2