piwik-script

Deutsch Intern
  • [Translate to Englisch:]
Zentrum für Philologie und Digitalität "Kallimachos"

OCR4all

OCR4all

OCR4all is a software designed to digitally explore primarily very early printed texts whose elaborate printing types and often uneven layout concepts are beyond the recognition abilities of most other recognition software. Understandably and independently to use, OCR4all’s suggested semi-automatic workflow also explicitly focusses users with no technical background and combines different tools in one consistent interface. A frequent change between software is not necessary anymore.

From the images’ preparation (“Preprocessing”) via the layout segmentation (“RegionSegmentation” with LAREX), the extraction of classified layout regions (“Region Extraction”), the segmentation of lines (“Line Segmentation”) and character recognition (“Recognition”) to the identified characters’ correction (“Ground Truth Production”) and the building of book specific OCR-models in one software, OCR4all describes an adequate OCR-workflow.

Especially due to the possibility to forge and train book specific recognition models theoretically also applicable on other printings, OCR4all is able to achieve very good results indigital character recognition.

Media coverage

OCR4all received coverage by the following news outlets and websites:

Mailing list

OCR4all is still under active developement. To be always up to date, especially with a view to new image releases and other innovation saround OCR4all, please make sure to subscribe to our mailing list.