Learning features for Offline Handwritten Signature Verification

Luiz G. Hafemann, Robert Sabourin, Ph.D., Luiz S. Oliveira, Ph.D.*

* Dr. Luiz S. Oliveira is with the Department of Informatics, Federal University of Parana (UFPR), Brazil


Introduction

Signature verification systems aim to verify the identity of individuals by recognizing their handwritten signature. They rely on recognizing a specific, well-learned gesture, in order to identify a person.

In spite of the many advancements in the field over the last few decades, building classifiers that can separate between genuine signatures and skilled forgeries (forgeries made targeting a particular individual) is still hard, as can be seen by the large error rates obtained on the task, when tested on large public datasets. In particular, defining discriminative feature extractors for offline signatures is a hard task. The question "What characterizes a signature" is a difficult concept to implement as a feature descriptor, and most of the research efforts on this field have been devoted to finding a good representation for signatures. To address both the issue of obtaining a good feature representation for signatures, as well as improving classification performance, we investigate techniques to learn the representations directly from the signature images.


Learning Features for Offline Handwritten Signature Verification

The task of signature verification has some properties that make learning features from data very challenging. The final objective of such systems is to discriminate between genuine signatures and skilled forgeries for each user, where skilled forgeries refer to attempts of a forger to replicate a person's signature after having access to a genuine sample (often practicing the forgery). Effectively, this can be seen as N 2-class classification problems, where N is the number of users in the system, with the property that we cannot expect to have skilled forgeries for every user for training (therefore, this can also be seen as N 1-class classification problems, with only a few samples to determine the probability density for each user's signatures). Another challenge is that N is not fixed: new users may be added to the system at any time. Given these constraints for the problem, it is not straightforward to define how to learn features from signature data.

In the article titled Learning Features for Offline Handwritten Signature Verification using Deep Convolutional Neural Networks (article, preprint), we propose two ways of addressing the problem, using ideas from transfer learning and multi-task learning, for scenarios where only genuine signatures are available for training, and scenarios where skilled forgeries from a subset of users are available. The key insight is to learn Writer-Independent features (i.e. features not specific for a set of users) using Convolutional Neural Networks (CNN), and subsequently training Writer-Dependent classifiers that specialize for each user.

We conducted extensive experiments in four signature verification datasets, which show that the features learned in a subset of users indeed generalize to other users (including users in other datasets). Classifiers trained with the learned feature representation achieved state-of-the-art performance in the four datasets. To illustrate how the features generalize to new users, consider the illustrations below. These figures give an overall sense of how genuine signatures and skilled forgeries are dispersed in a given feature space. We used the trained models to extract features from a validation set (a disjoint set of users), and used a dimensionality reduction algorithm (t-SNE) to project the samples in two dimensions. Points that are close in the 2D representation are close in the high-dimension feature vector representation. In a baseline (CNN trained on ImageNet), signatures from different users are already clustered in different parts of the feature space, but skilled forgeries are very close to genuine signatures for each user. When using representations learned from signatures (proposed in the paper), we see a better separation between genuine signatures and skilled forgeries.


(a) Baseline
(b) Genuine only
(c) Genuine+Forgeries
Fig 1: 2D projections (using t-SNE) of signatures from the validation set. Each point represents a signature: blue points are genuine signatures and orange points are skilled forgeries. (a) Using a model trained for object recognition as the feature extractor; (b) Using a model trained with genuine signatures; (c) Using a model trained with genuine signatures and skilled forgeries (from a subset of users)

More details can be found in the paper: DOI 10.1016/j.patcog.2017.05.012

Handling signatures of varying size

The methods presented in the paper above require that all signatures have the same size, for instance by centering signatures in a canvas of a fixed size. This creates a problem when generalizing to users that have signatures larger than this maximum size: simply resizing the images to a smaller size changes the signal in ways not found in regular signatures (signatures are scanned at a known DPI, so invariance to scale is not learned).

We address this issue in a paper entitled Fixed-sized representation learning from Offline Handwritten Signatures of different sizes (article, preprint), by changing the network architecture to learn a fixed-sized representation regardless of the input size, using Spatial Pyramid Pooling. In this paper, we also investigated the impact of image resolution on classification performance, and the impact of finetuning the representations on different operating conditions.

Code and trained models

We are sharing the trained models so that other researchers can use them as specialized feature extractors for Offline Handwritten Signatures. The code for using the trained models can be found in: https://github.com/luizgh/sigver_wiwd. The trained weights can be downloaded in the following links: SigNet models, SigNet-SPP models.
 

Download extracted features

To facilitate further research, we are also making available the features extracted for each of the four datasets used in this work (GPDS, MCYT, CEDAR, Brazilian PUC-PR), using the models SigNet, SigNet-F (with lambda=0.95) and SigNet-SPP-300dpi

Dataset SigNet SigNet-F SigNet-SPP-300dpi
GPDS GPDS_signet GPDS_signet_f GPDS_signetspp_300dpi
MCYT MCYT_signet MCYT_signet_f MCYT_signetspp_300dpi
CEDAR CEDAR_signet CEDAR_signet_f CEDAR_signetspp_300dpi
Brazilian PUC-PR* brazilian_signet brazilian_signet_f Brazilian_signetspp_300dpi

For using these extracted features, please follow the instructions on this link

References on Offline Signature Verification

L. G. Hafemann, R. Sabourin, L. S. Oliveira, Fixed-sized representation learning from offline handwritten signatures of different sizes. doi:10.1007/s10032-018-0301-6

L. G. Hafemann, R. Sabourin, L. S. Oliveira, Learning features for offline handwritten signature verification using deep convolutional neural networks. doi:10.1016/j.patcog.2017.05.012

D. Impedovo, G. Pirlo, Automatic signature verification: The state of the art 38. doi:10.1109/TSMCC.2008.923866

L. G. Hafemann, R. Sabourin, L. S. Oliveira, Offline Handwritten Signature Verification-Literature Review. arXiv:1507.07909

M. B. Yilmaz, B. Yanikoglu, Score level fusion of classifiers in off-line signature verification. doi:10.1016/j.inffus.2016.02.003

G. Eskander, R. Sabourin, E. Granger, Hybrid writer-independent-writer-dependent offline signature verification system doi:10.1049/iet-bmt.2013.0024

D. Bertolini, L. S. Oliveira, E. Justino, R. Sabourin, Reducing forgeries in writer-independent off-line signature verification through ensemble of classifiers doi:10.1016/j.patcog.2009.05.009

M. A. Ferrer, F. Vargas, A. Morales, A. Ordoñez, Robustness of Off-line Signature Verification based on Gray Level Features doi:10.1109/TIFS.2012.2190281


If using any of the four datasets mentioned above, please cite the paper that introduced the dataset:

GPDS: Vargas, J.F., M.A. Ferrer, C.M. Travieso, and J.B. Alonso. 2007. Off-Line Handwritten Signature GPDS-960 Corpus. doi:10.1109/ICDAR.2007.4377018.

MCYT: Ortega-Garcia, Javier, J. Fierrez-Aguilar, D. Simon, J. Gonzalez, M. Faundez-Zanuy, V. Espinosa, A. Satue, et al. 2003. MCYT Baseline Corpus: A Bimodal Biometric Database. IEE Proceedings-Vision, Image and Signal Processing 150 (6): 395–401. doi:10.1049/ip-vis:20031078

CEDAR: Kalera, Meenakshi K., Sargur Srihari, and Aihua Xu. 2004. “Offline Signature Verification and Identification Using Distance Statistics.” International Journal of Pattern Recognition and Artificial Intelligence 18 (7): 1339–60. doi:10.1142/S0218001404003630.

Brazilian PUC-PR: Freitas, C., M. Morita, L. Oliveira, E. Justino, A. Yacoubi, E. Lethelier, F. Bortolozzi, and R. Sabourin. 2000. “Bases de Dados de Cheques Bancarios Brasilei ros.” In XXVI Conferencia Latinoamericana de Informatica.


Acknowledgements

We would like to thank Dr. M. Ferrer for sharing the GPDS-960 dataset used in this work, as well as the other research groups that shared the MCYT, Cedar and Brazilian PUC-PR datasets for scientific research. This project is financially supported by CNPq and NSERC.


More Projects:

An evolutive approach for the quick bio-watermarking of digital documents

Bio-crypto Systems Based on the Offline Signature Images

Intelligent Systems for Grayscale Rapid Watermarking

Biowatermarking of grayscale images