THEMIS is both a production and a project management software platform. Fully hosted and managed by Max, THEMIS ensures that all stages of a digitisation project are managed from within a single platform. It also allows our customers to approach digitisation projects differently, if appropriate: for instance, using THEMIS it is possible to capture and ingest the digitised images prior to cataloguing and indexing the material.
For the majority of our indexing and transcription projects we use our in-house project management platform, THEMIS. THEMIS is built on a relational database for secure internal/external access and can accommodate any existing catalogue information while allowing the addition of further metadata post digitisation.
To discuss THEMIS or any of Max's services please feel free to telephone us on 020 8309 5445 or to contact us via our contact page.
For structured data, we have developed automated algorithms to extract specific fields from templated documents based on their spatial relationship to “marker” text floats. Text floats are the blocks of text with their bounding box coordinates that OCR programmes such as Tesseract produce. For example if a series of printed forms have the word “Invoice Number” at the top of a column, individual text blocks that fall within a range below this “marker text” float can be identified as invoice numbers.
Once data has been separated into fields, content specific heuristic checks can be made against format. For example analysis can show that an invoice number should be in a specific format e.g. XXX000, and THEMIS can mark records that don’t match this for review and QA.
THEMIS offers the most efficient means of viewing, assessing and editing imported OCRed data. It provides essential project management information for, among other things, accuracy rates, remedial work statistics including volumes, and trend analysis for continuous improvement.
For the majority of our indexing and transcription projects we use our in-house project management platform, THEMIS. THEMIS is built on a relational database for secure internal/external access and can accommodate any existing catalogue information while allowing the addition of further metadata post digitisation.
To discuss THEMIS or any of Max's services please feel free to telephone us on 020 8309 5445 or to contact us via our contact page.
For structured data, we have developed automated algorithms to extract specific fields from templated documents based on their spatial relationship to “marker” text floats. Text floats are the blocks of text with their bounding box coordinates that OCR programmes such as Tesseract produce. For example if a series of printed forms have the word “Invoice Number” at the top of a column, individual text blocks that fall within a range below this “marker text” float can be identified as invoice numbers.
Once data has been separated into fields, content specific heuristic checks can be made against format. For example analysis can show that an invoice number should be in a specific format e.g. XXX000, and THEMIS can mark records that don’t match this for review and QA.
THEMIS offers the most efficient means of viewing, assessing and editing imported OCRed data. It provides essential project management information for, among other things, accuracy rates, remedial work statistics including volumes, and trend analysis for continuous improvement.