Face Recognition – The essential part of “Face ID”

Upon seeing a person, what enters our eyes is the person’s face. Human face plays an important role in our daily life when we interact and communicate with others. Unlike other biometrics such as fingerprint, identifying a person with its face can be a non-contract process. We can easily acquire face images of a person from a distance and recognize the person without interacting with the person directly. As a result, it is intuitive that we use human face as the key to build a Face Recognition system.

 

 

Over the last ten years, Face Recognition is a popular research area only in computer vision. However, with the rapid development of deep learning techniques in recent years, Face Recognition has become an AI  topic and more and more people are interested in this field. Many company such as Google, Microsoft and Amazon have developed their own Face Recognition tools and applications. In the late 2017, Apple also introduced the iPhone X with Face ID, which is a Face Recognition system aimed at replacing the fingerprint-scanning Touch ID to unlock the phone.

 

What Face Recognition can be used?

  • automated border system for arrival and departure in the airport
  • access control system for a company
  • criminal surveillance system for government
  • transaction certification for consumer
  • unlocking system for phone or computer

 

How Face Recognition Works?

Face Recognition system can be divided into three parts:

  • Face Detection : tell where the face is in the image
  • Face Representation : encode facial feature of a face image
  • Face Classification : determine which person is it

Face Detection

Locating the face in the image and finding the size of the face is what Face Detection do. Face Detection, is essentially an object-class detection problem for a given class of human face. For object detection in computer vision, a set of features is first extracted from the image and classifiers or localizers are run in sliding window through the whole image to find the potential bounding box, which is time-consuming and complex. With the approach of deep learning, object detection can be accomplished by a single neural network, from image pixels to bounding box coordinates and class probabilities, with the benefit of end-to-end training and real-time prediction. YOLO, which is an open source real-time object detection system, was built for Face Detection in our Face Recognition pipeline.

 

Face Representation

With the goal of comparing two faces, computing the distance of two face images pixel by pixel is somehow impracticable because of large computing time and resources. Thus, what we need to do is extract face feature to represent face image.

“The distance between your eyes and ears” and “The size of your noes and mouth”….

These facial features become an easy measurement for us to compare whether the two unknown face represent the same person. Eigen-face and genetic algorithm are used in old days to help discover these features. With the new deep learning technique, a deep neural network project each face image on a 128-dimensional unit hypersphere and generate feature vector of each image for us.

Regarding to transforming face images into Face Representations, OpenFace and DLIB are two commonly used model to generate feature vector. Some experiments are done for these two models and we found out that the face representation for DLIB model is more consistent between each frames for the same person and it indeed outperformed OpenFace model for accuracy test, as a result, DLIB was finally used as our face representation model.

 

Each vertical slice represents a face representation for a specific person from a image frame. The x-axis is the timestamp for each frame of video. This results show that dlib model does a better job at making consistent images-to-representations transformation for the face image of the same person between each frame.

 

Face Classification

Gathering the face representations for each person to build a face database, a classifier can be trained to classify each person. To stabilize the final classification results, “weighted moving average” is introduced into our system where we take classification results from the previous frames into consideration when determining the current classification results. With this mechanism, we found out that it smoothes the final classification results and has a better performance on accuracy test compared to classification result from a single image.

 

Feature image by Ars Electronica / CC BY

,

AI frontdesk – improve office security and working conditions

Imagine that someone in your office serves as doorkeeper, takes care of visitors and even cares about your working conditions, 24-7? One of our missions at Ailabs.tw is to explore AI solutions to address society’s problems and improve the quality of life of people and, we have developed one AI-powered front-desk to do all of the tasks mentioned above.

Based on 2016 annual report from Taiwan MOL (Ministry of Labor), the average work hours per year of Taiwanese employee is 2106 hours. Compared with OECD stats, this number ranked No.3 in the world which is just below Mexico and Costa Rica.

Recently on 4th, December, 2017,  the first review of the Labor Standards Act revision was passed. The new version of the law will allow flexible work-time arrangements and expand monthly maximum work hours up to 300. Other major changes of the amendment includes conditionally allowing employees to work 12 days in a row and reduction of a minimum 11 hour break between shifts down to 8 hours. The ruling party plans to finish second and third-reading procedure of this revision early next year (2018), and it will put 9-million Taiwanese labors in worse working environment.To get rid off the bad reputation of “Taiwan – The Island of Overwork “, a system which will notify both employee and employer that one has been extremely over-working, and the attendance report can not easily be manipulated is needed.

In May 2017, an employee Luo Yufen from Pxmart, one of Taiwan’s major supermarket chain, died from a long time of overwork after 7 days of being in the state of coma. However, the OSHA(Occupational Safety and Health Administration) initially find no evidence of overwork after reviewing the clocking report provided by Pxmart which looks ‘normal’. It wasn’t until August, when Luo’s case are requested for further investigation, that the Luo’s real working hours before her death proves her overwork condition.

Read more

PTT Hired First AI Reporter Named Copycat (記者快抄)

Just early this July, Ailab.tw released an AI reporter named Copycat(記者快抄) that produces news covering contents from Taiwan’s largest online forum PTT. It performs its job faster and produces more contents than its human colleagues in real time.

 

 

Now Copycat can write about 500 news articles automatically with popular topics every day.

The Requirements of Media Industry Nowadays

How to attract reader’s attention to produced content, and how to make content rank higher on social networks or search engine are getting more and more important for media industry. To meet this goal, reporters need to produce as many articles as they can, update fast enough and search for interesting materials all over the world. Copycat (記者快抄), an AI reporter, can do this task as well by generating news based on the most discussed topic from Taiwan’s largest online forums PTT.

In the beginning this was a side project. However, we found people are interested in this website, so we made some effort to improve it.

 

PTT, the biggest and non-commercial forum in Taiwan.

 

Generate News Automatically

PTT is the largest terminal-based bulletin board system (BBS) based in Taiwan, it has more than 1.5 million registered users with over 150,000 users online in peak time. This BBS is a non-commercial and open-source online platform which has over 20,000 boards covering a multitude of topics and generates 500,000 comments every day.

Our system now fetches important articles and posts from PTT every 30 minutes, parses them and posts the results on the dashboard. Likes and Boos are also collected to display on each posts, indicating the general public’s reactions.

Three Steps to Generate News Articles

Summary

First, summarization. Based on the popular posts on PTT forum, we describe the main idea in a few sentences. Article contents are broken down into sentences and a score is given to each sentence to represent how tight it connects with other sentences in the article. In addition, other deep learning techniques such as word embedding is also used to support the algorithm.

 

AI generated news from PTT

 

Fill-In

With a list of sentences candidates, we algorithmically pick and compile them into an article. We collect some widely used news templates so Copycat can mix the key sentences with these templates and turns out a common daily news.

Generate

The last part is to make the news article more readable. PTT users often write posts with their own styles and formats such as unexpected new lines and spaces. This make it hard for machine to read and understand the content. To deal with this problem we generate a model from newspaper text as a grammar corrector to teach Copycat how to write like a professional reporter.

Feature Image Selection

Only text is not enough. A news article should have images. The posts on PTT forum often includes some image links which can be a great resource. However, many of them do not have an image associated with the posts.

To search for an image like how a human editor does, we trained a multi-layer document retrieval RNN model as an image search engine. This engine grasps an image by comparing the text-similarity between the image’s description and the news content.

Now, our AI reporter Copycat can not only copy the images from the original post, but also can find a related image when needed.

 

The figure is auto-selected by Copycat based on text content

More to Come

The original categories on PTT and the topic extracted by Copycat are useful tags for people to find related news articles. The discussion and re-posts on the forum are potential data to show further and different standpoints of certain topics.

After importing our face and speech recognition module, Copycat can search for celebrities’ comment related to specific topic all over video clips on the Internet. This news knowledge graph can also benefit human-reporters.

We believe that artificial intelligence will be a support rather than a threat to help reporter produce news with higher quality. By automating the process of picking topics and generate articles online, reporters can move the needle on the content generation process and focus on creating insights or stories for readers.

Copycat is constantly improving and on the way to become a better reporter.

 

Featured image by filipe ferreira / CC BY

Recognize The Speech of Taiwan

We are exploring the new ways people interacts with technologies in the age of AI and speech is one of the most common and natural means of communication. In this post we are introducing our core recipes for automatic speech recognition system in Taiwan.

Cornerstone of Natural Human-Computer Interaction

Mobiles, IoT, wearable devices and robots. Our daily life are more and more likely to be surrounded by smart devices in the future. With the target to interact with them naturally,  just as with human-beings, we need to develop related AI techniques such as machine learning, computer vision, natural language processing and speech processing.

Speech Recognition, so called ASR, is one of the cornerstone that link all these interactions together. With deep-learning-based model and graphical decoder, ASR nowadays is getting more reliable on both accuracy and speed.

 

Unique Language Habits in Taiwan

Different usage of words, new phrases and sentence structures are generated each day in our modern society and between cultures. This is especially true in Taiwan where the language habits of Taiwanese people is different from other Mandarin speakers.

Due to these reasons, the current ASR solutions in the Mandarin-speaking space have limitation when it comes to supporting general usages in Taiwanese people’s daily life. For example, the biggest Taiwan forum and Internet community, PTT, invents hundreds of words and phrases every month. The newly-created words might be used repeatedly or spread frequently by millions of users in online chatting and posting.

Therefore, the challenges of building a localized ASR system are not only about training a local neural network model, but also about how the system updates and adapts rapidly to the dynamically evolved language.

 

 

With a Taiwan-specific language model, our ASR can be much more friendly for speech related applications in Taiwan.

 

Multi-Language Speech Recognition

Although Mandarin is the official language in Taiwan, a Mandarin-only ASR system cannot satisfy our goals. Taiwan is an place with many different cultures. In addition to Mandarin, other languages such as English, Taiwanese, Hakka and Indigenous languages are also used pretty often in Taiwan. To deal with this problem Ailabs.tw gathered linguistics, phonetics and machine learning experts to set up a standard process when ASR facing cross language requirements.

 

 

These processes includes enriching language model with multiple languages and handling mixed-up words and sentences. Our early ASR experiments on Taiwanese works and we are now enhancing our system to production-level.

 

ASR Applications in Ailabs.tw

ASR system is already a powering the front-desk system in Ailabs.tw now. When an employee arrives at the office, they interacts with the ASR system for door access and need ID cards or badges no more.

An employee ask for door access to the ASR system

Another application is to generate automatic transcripts or captions. Videos of news, conferences, interviews can be convert to text files in real-time using ASR.

News video can now generate live captions with ASR

Our ASR API is ready to open, contact us if you want further cooperation.

 

Looking Forward

Speed, accuracy, multi-language and rapid updates are core aspects of a easy to use ASR system. We are continuously improving these cores and trying different deep learning algorithms to reach to a point where AI is doing a better job than human in this field. If you are interested in working on this problem, please contact us, we are actively hiring!

 

featured image by Peter Coombe / CC BY