What are you looking for?
51 Résultats pour : « Portes ouvertes »

L'ÉTS vous donne rendez-vous à sa journée portes ouvertes qui aura lieu sur son campus à l'automne et à l'hiver : Samedi 18 novembre 2023 Samedi 17 février 2024 Le dépôt de votre demande d'admission à un programme de baccalauréat ou au cheminement universitaire en technologie sera gratuit si vous étudiez ou détenez un diplôme collégial d'un établissement québécois.

Information Technology Engineering Research and Innovation Intelligent and Autonomous Systems SYNCHROMEDIA – Multimedia Communication in Telepresence

Detecting Tables with Weakly Supervised Bounding Box Extraction

Table in an ancient manuscript

Purchased on Istockphoto.com. Copyright

Tables in Ancient Manuscripts — A Wealth of Information

Historic documents contain long-term studies in a wide range of research areas. Because of the scarcity of these documents, their information is in danger of decomposition and irretrievable loss. To preserve and retrieve some of the most important parts from the vast amount of information in these documents, we focused on detecting document pages that contain tables.

These graphical elements are very useful for scientists in obtaining essential information in an abstract format. This task is categorized in the field of object detection, which saw recent progress with the advent of deep-learning algorithms. One of these algorithms is the Faster RCNN [1] which we combined with a pre-processing Gabor filter [2], weakly supervised bounding box extraction [3], and pseudo-labeling to respond to the following challenges:

  1. High generalization in detecting images with tables among 32 million image data
  2. Detecting tables with various structures (figure 1)
  3. Insufficient labeled data for the training phase of deep learning algorithms
Tables in ancient documents

Figure 1. Samples of tables in historic documents

Applying a Gabor filter

In the first step of our system design, we applied the Gabor filter to:

  1. Make the data set more compatible with Faster-RCNN-based framework.
  2. Obtain better discrimination between the target object (table) from other parts of an image by exaggerating the gap or white background between text and tables.
  3. Remove visual noise, such as ink stains.

Figure 2 shows the preprocessed image with the Gabor filter.

Image after applying a Gabor filter

Figure 2. Processed image with Gabor filter

Terms and Definitions

In this research, we used two sources of scanned historic documents as follows:

  • ECCO: Eighteenth-Century Collections Online (ECCO) is an enormous collection of historic documents with over 32 million pages. Based on the timeline of collected data, ECCO is divided into ECCO1 and ECCO2.
  • NAS: This data set contains around 0.5 million scanned document images from a longer time period than ECCO (1666 to 1916).

For this binary detection task, we defined two labels:

  • Table: Presentation of important data in text or numerical format in rows and columns to summarize information in a compact manner.
  • Non-table: All scanned document images without tables, such as diagrams, illustrations, maps, and images either on a blank page or on a page with text (figure 3).
Non-tables in ancient documents

Figure 3. Samples of non-tables in historic documents

Faster-RCNN

Based on our data sets and the characteristics of the Faster-RCNN algorithm, we used the algorithm as the main object detection module in our research, for the following reasons:

  1. Better performance on images with low resolutions
  2. Detecting large and small size objects
  3. One of the best algorithms to reach a balance between speed and accuracy

Weakly Supervised Bounding Box Extraction

A Faster-RCNN-based model must be trained with adequately labeled data and bounding boxes around their objects to reach proper performance. But manual labeling data and extracting bounding boxes are costly procedures. To solve this issue, in our research we introduced the weakly supervised bounding box extraction (figure 4) technique, which is an automatic spiral learning approach. It consists of the five following phases:

  1. Phase 1: Train and bias the model based on table
  2. Phase 2: Test the previous biased model on non-table ‒ Output: weak bounding boxes for non-table
  3. Phase 3: Train with two labels i.e., tables with accurate bounding boxes and non-tables with weak bounding boxes
  4. Phase 4: Pseudo labeling ‒ Testing on unlabeled data to augment our train set
  5. Phase 5: Train ‒ Retrain the model by adding data from the previous step
Bounding Box Extraction architecture

Figure 4. Weakly supervised bounding box extraction

Results

We compared the Faster-RCNN-based model with and without the weakly supervised bounding box extraction using the subsets of ECCO (mix of ECCO1 and ECCO2) and NAS data sets:

Table 1. Results of Faster-RCNN based model with and without the weakly bounding box extraction on the subset of the ECCO data set

Table 2. Results of Faster-RCNN based model with and without the weakly bounding box extraction on the subset of the NAS data set

To detect all images with tables, we applied our model to three different data sets, which include 32 million images in total (figure 5).

Results obtained with the bounding Box Extraction method

Figure 5. Results of our model

Conclusion

By taking advantage of the Gabor filter and weakly supervised bounding box extraction, we prepared better input data and enough bounding boxes around the target objects for the training phase, which lead to high performance at low costs. It is also a generalized and robust methodology for detecting tables with various layouts among 32 million scanned historical document images.

High labor costs of extracting bounding boxes, and reliable performance on unbalanced data sets are two common challenges in most machine learning tasks, which we solved with a spiral learning approach using the weakly supervised bounding box extraction technique.

Additional Information

For more information on this research, please read the following research paper:

Samari, A., Piper, A., Hedley, A., Cheriet, M. (2021). Weakly Supervised Bounding Box Extraction for Unlabeled Data in Table Detection. In: , et al. Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021. Lecture Notes in Computer Science(), vol 12667. Springer, Cham. https://doi.org/10.1007/978-3-030-68787-8_25

About the authors
Arash Samari is a master's student at ÉTS.
Mohamed Cheriet is a professor in the Department of Systems Engineering at ÉTS and Director of the Synchromedia laboratory. His research focuses on eco-cloud computing, knowledge acquisition and artificial intelligence systems, learning algorithms, computer networks and intelligent collaborative work.
Andrew Piper is a professor at McGill University.
Alison Hedley is an operations assistant at Antimodular Research.