Replacing Faces in a Video Stream

1 / 9

Description

To solve the problem we considered different options for its solution. Different FaceDetection methods were used (DeepFace, Keypoint R-CNN, HRNet, Dlib, OpenPose, Dlib, OpenCV, MediaPipe and others). The main task is to find faces and anchor points on the video frame. All these methods give inaccurate results. They are designed for a video stream with a good resolution of the face. In the video, the ratio of face to resolution is very small. The result of the methods is not very good. Also considered methods using PoseDetections, and these methods failed. In the picture, you can see that without a face, the system assumes there is a face.

Features

Inanimate object face replacement

Unlike human faces, animals or inanimate objects won't have the same geometric structure or landmarks. Placing a human face on them requires a dynamic approach that can adjust to various shapes and structures. Moreover, some objects or animals move unpredictably, which complicates the tracking and anchoring of the human face onto them.

The Onix team implemented a mapping algorithm that doesn't solely rely on traditional facial landmarks but also uses the broader contours and shape of the target entity, whether it's an animal's face or an object. Moreover, we used advanced tracking algorithms in OpenCV to keep the human face consistently placed on the moving object.

Accessory augmentation

As users move or change expressions, accessories should adjust accordingly. A hat, for instance, should remain on the head, not float above it if a user raises their eyebrows. Also, different users have different face sizes and shapes. The accessories must fit proportionally, looking both too large and too small.

We used facial landmarks to create "anchor points" for accessories. For instance, glasses anchor to the bridge of the nose and ears, ensuring they move with the face realistically. Also, we developed an algorithm that adjusts accessory size based on detected face dimensions. For instance, the width between the eyes can determine the size of glasses.

Performance efficiency

To maintain an enjoyable user experience, the solution needed to achieve real-time performance with minimal processing delay. Any noticeable lag would detract from the humor and user engagement.

Our specialists utilized advanced neural network technology, namely RetinaFace, that demonstrated exceptional real-time face detection and alignment performance. It consistently detected faces within video frames in 10-20 milliseconds, regardless of variations in scale or orientation. This technology was ideal for scenarios as quick face detection and replacement were essential.

Description

Features

Similar projects