Generic Information Extraction Platform for Unstructured Documents

Overview

ASTRI's Generic Information Extraction Platform utilizes an OCR engine and various computer vision techniques to convert the text in documents to digital format and extract useful information based on the user's requirements. The platform enables financial and government institutes to automate data entry of different documents, for example, bank statements, land registry documents and salary proof. Therefore, It can reduce typos and labor costs.

  • Generic Information Extraction Platform for Unstructured Documents
Research completion
2020 and 2021
Commercialisation opportunities
IP licensing -Technology co-development
Problem addressed
  • Conventional approach requires manual data entry to transfer  data from document to system
  • It is time consuming and may contain typo, especially when entering a vast amount of data into tables
  • Service charges and labour costs are high

To address the issues, ASTRI’s Generic Information Extraction Platform utilizes OCR engine and various computer vision techniques to convert text in document to digital format and extract useful information based on user’s requirements. The platform enables financial and government institutes to automate data entry of bank statements, land registry documents, salary proof and so on.

Innovation

The Generic Information Extraction Platform allows users to define ruleset to extract information from unstructured documents and export result to human or machine-readable formats.

The innovation outline:

  • Ruleset is defined by user on how to extract the value.
  • Extract Field Values near field name or specific keyword.
  • Extract Table when column names matched the ruleset 
  • Auto Workflow enables users to export results to machine readable format for further processing.

Client-side processing enables secure and fast processing of sensitive documents.

Material UI allows users to validate and update results easily.

Key impact
  • Streamline the workflow of processing unstructured documents.
  • Assist data entry of financial and government documents.
  • Advanced computer vision techniques to improve OCR accuracy
Application
  • Unstructured document processing
  • Automate data entry
  • Locally and securely process documents

Patent

  • US App. No. 16/823,398;  CN App. No. 202080000398.2 and HK App. No. 62020017194.5
Hong Kong Applied Science and Technology Research Institute (ASTRI)

Hong Kong Applied Science and Technology Research Institute (ASTRI) was founded by the Government of the Hong Kong Special Administrative Region in 2000 with the mission of enhancing Hong Kong’s competitiveness through applied research. ASTRI’s core R&D competence in various areas is grouped under four Technology Divisions: Trust and AI Technologies; Communications Technologies; IoT Sensing and AI Technologies and Integrated Circuits and Systems. It is applied across six core areas which are Smart City, Financial Technologies, New-Industrialisation and Intelligent Manufacturing, Digital Health, Application Specific Integrated Circuits and Metaverse.

Enquiry