Deploying Forward-Looking Accuracy with Facial Recognition Systems 

EVENTS
Facial recognition, the method to map and identify the faces of people in photographs or videos, is powered by artificial intelligence. With the exponential improvement of computing power and digital imagery, facial recognition technology can be used for biometric data in several different ways; from the unlocking of mobile devices for self-representation, to address issues of fraud and security in the public sphere, to efficient management and sales revenue growth in the private sector, its entanglements are wide-ranging. 
 
What remains challenging in this technological pursuit is that facial expressions are highly distinctive. Facial expressions can be encoded and mapped with Facial Action Coding System (FACS), a taxonomy of facial action units (FAU), but there are still limitations to capturing a wide range of expressions. 
 
At CVPR, the annual computer vision conference, Bjorn Stenger and Geethu Jacob discussed their recent experiments and methods to accurately capture the subtleties of facial changes and the spatial regions of FAU. As part of their accepted conference paper, "Facial Action Unit Detection With Transformers," they propose a new architecture that includes attention branch networks, a branch structure for estimating attention, to focus on the spatial regions of action units, and a module using a transformer encoder to estimate and capture different FAU relationships; this is combined with the adoption of a Tversky loss function and a contrastive loss term, improving the flexibility and performance of the model and detecting multiple classifications. 
 
While conducting more experiments is warranted, the researchers had positive results which suggest that the proposed method outperforms prior works on action units on public datasets, BP4D and DISFA. Specifically for both datasets, using the F1 score, their method is at least 2% better than other state-of-the-art methodologies with significantly more attention maps and supervision.  
 
While facial recognition technology is not yet invincible, Bjorn and Geethu are taking incremental but important steps to improving accuracy, creating better accountability, and pushing confidence thresholds for more widespread deployment. At Rakuten, it is often the case that there are several models doing the same work with different data. Designing a single unified deep learning model which provides similar performance as an individual model is an area they will continue to research. They will also continue to explore multi-task learning from blackbox state-of-the-art deep learning models and multi-label classifications in long-tailed datasets, where data imbalances can frequently occur, and review the current literature and how it can be useful for Rakuten data moving forward.