OCR4all is a software designed to digitally explore primarily very early printed texts whose elaborate printing types and often uneven layout concepts are beyond the recognition abilities of most other recognition software. Understandably and independently to use, OCR4all’s suggested semi-automatic workflow also explicitly focusses users with no technical background and combines different tools in one consistent interface. A frequent change between software is not necessary anymore.
From the images’ preparation (“Preprocessing”) via the layout segmentation (“RegionSegmentation” with LAREX), the extraction of classified layout regions (“Region Extraction”), the segmentation of lines (“Line Segmentation”) and character recognition (“Recognition”) to the identified characters’ correction (“Ground Truth Production”) and the building of book specific OCR-models in one software, OCR4all describes an adequate OCR-workflow.
Especially due to the possibility to forge and train book specific recognition models theoretically also applicable on other printings, OCR4all is able to achieve very good results indigital character recognition.
OCR4all received coverage by the following news outlets and websites:
- einBlick: Modernes Tool für alte Texte (23.04.2019)
- Augsburger Allgemeine: Computer liest alte Texte (24.04.2019)
- Der Standard: Zuverlässiges Texterkennungs-Tool für historische Druckschriften (24.04.2019)
- Der Tagesspiegel: Computertool für alte Texte (24.04.2019)
- SWR2 Impuls: Mittelalterliche Handschriften werden Textdokumente (03.05.2019)
OCR4all is still under active developement. To be always up to date, especially with a view to new image releases and other innovation saround OCR4all, please make sure to subscribe to our mailing list.