In the evolution of identity authentication technology, striking the optimal balance between security and convenience has always been a core proposition for the industry. While single biometric recognition can meet the needs of specific scenarios, it is difficult to cover all application contexts—iris recognition excels in high-security scenarios, facial recognition has greater advantages in terms of convenience, and fingerprint recognition has long established a broad foundation of user habits. This practical demand has driven us to expand the boundaries of our technology R&D from iris recognition to the broader multi-modal domain.

Through systematic technical research and development, Homsh’s technology has basically completed the construction of a unified framework for multi-modal biometric recognition. The core design concept of this framework is to use our iris recognition segmentation and encoding model (which we have refined over many years) as the basic architecture, and complete the transfer learning of the two new modalities (facial and fingerprint recognition) by reusing the underlying feature extraction layer. The choice of this technical route is not accidental— the image segmentation algorithms, feature encoding methods, and matching logic we have accumulated in the field of iris recognition are essentially highly structurally similar to the processing workflows of other biometric features. Based on this insight, the unified framework enables cross-modal knowledge transfer, rather than building three independent systems from scratch.

In terms of performance verification, the facial recognition module achieved an equal error rate (EER) of 0.4% on the internationally used CASIA Face V5 evaluation dataset, while the fingerprint recognition module reached an EER of 0.35% on the standard dataset of the FVC2004 International Fingerprint Competition. It should be specially noted that these two performance metrics are not our ultimate goals, but phased results for verifying the feasibility of the framework. Compared with dedicated algorithms that are deeply optimized for a single modality, the performance of the unified framework in each modality may not be the best, but its core value lies in laying a technical foundation for subsequent fusion recognition and flexible deployment.

Currently, the three-modal fusion recognition function has completed preliminary verification. Fusion recognition refers to the system’s ability to comprehensively use three biometric information (iris, facial, and fingerprint) for identity determination, which theoretically significantly improves the reliability and anti-attack capability of recognition. This capability has direct application value in high-security scenarios such as financial payment, border clearance, and access to critical infrastructure.
In the subsequent R&D work, we will focus on exploring two technical directions.
First, we will conduct a systematic comparative study on the effects of two technical routes: independent channel decoding for each modality and joint training of shared base networks. The goal is to find the optimal balance between recognition performance and model size—this is crucial for deploying the algorithm to resource-constrained edge devices.
Second, we will quickly complete the engineering development of the free switching function of recognition modes, enabling end-users to flexibly choose the recognition method according to the security level requirements of the actual application scenario. For example, iris or fingerprint recognition can be enabled in high-security scenarios, facial recognition can be switched to in daily convenience scenarios, and multi-modal fusion is suitable for special situations that require the highest credibility.
From a longer-term perspective, the construction of the multi-modal unified framework is a key step for Homsh’s technology to evolve from a professional iris recognition manufacturer to a comprehensive biometric recognition solution provider. Whether in application fields such as smart wearable devices, VR/AR terminals, smart security, or financial identity authentication, multi-modal fusion recognition will become one of the basic capabilities of the next-generation human-computer interaction system. We look forward to this technological achievement bringing a safer and more flexible identity authentication experience to partners and end-users in the near future.