Detection and Analysis of Human Mental State Based on Multimodal Information
DOI:
https://doi.org/10.54097/997zeb80Keywords:
Human mental state, multimodal information, framework.Abstract
With the continuous evolution and refinement of intelligent sensing technologies, the development and application of multimodal intelligent information in detecting human mental states have become a key direction in the advancement of intelligent technologies, surpassing traditional unimodal detection methods. Multimodal detection technology that integrates multiple information sources to reflect human mental states has been widely applied in fields such as medical care and interrogation. However, existing detection methods generally have disadvantages such as low precision and detection equipment being susceptible to environmental influences. Based on this background, this paper explores a framework for detecting and analyzing human mental states based on multimodal information. This framework integrates multi-source data, including non-contact sensing, contact-based physiological signal acquisition, and emerging dataset construction. This framework combines feature level, decision-making-level level and end-to-end multimodal fusion methods to achieve the deep synergy of cross-modal features. It enables the coordinated utilization of multiple human signal technologies, establishing a comprehensive data analysis framework for multimodal feature fusion. The rational use of this framework can significantly improve the robustness of the detection method. This study proposes innovative improvement solutions for applications in scenarios such as medical monitoring and judicial interrogations.
Downloads
References
[1] World Health Organization. World mental health report: Transforming mental health for all. Geneva: World Health Organization, 2022.
[2] Global Wellness Institute. Global Wellness Economy Monitor 2023. Miami, FL: Global Wellness Institute, 2023.
[3] Fang, X., Liu, W., & Kawakami, K. Evaluating past emotions in changing facial expressions: The role of current emotions and culture. Emotion, 2024, 24 (1): 213 – 224.
[4] Ma, J., Chen, X., Huang, J., et al. Cam4DOcc: Benchmark for camera-only 4D occupancy forecasting in autonomous driving applications. In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2024: 21486 - 21495.
[5] IEEE. IEEE conference publication (TAFFC.2024.3387654). In IEEE Conference Proceedings. IEEE, 2024.
[6] Qin, S. M. Development of multimodal physiological signal acquisition system and its application in polygraph. Master’s thesis, Lanzhou University, 2022.
[7] Jiang, Z., Seyedi, S., Griner, E., Abbasi, A., Rad, A. B., Kwon, H., ... & Clifford, G. D. Multimodal mental health digital biomarker analysis from remote interviews using facial, vocal, linguistic, and cardiovascular patterns. IEEE journal of biomedical and health informatics, 2024, 28 (3): 1680 - 1691.
[8] Ernst, H., Scherpf, M., Pannasch, S., Helmert, J. R., Malberg, H., & Schmidt, M. Assessment of the human response to acute mental stress–An overview and a multimodal study. PLoS One, 2023, 18 (11): e0294069.
[9] Peng, L., Jian, S., Li, D., & Shen, S. Mrml: Multimodal rumor detection by deep metric learning. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023: 1 - 5.
[10] Guo, X., Selvaraj, N. M., Yu, Z., Kong, A. W. K., Shen, B., & Kot, A. Audio-visual deception detection: Dolos dataset and parameter-efficient crossmodal learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023: 22135 - 22145.
[11] Cohen, J., Richter, V., Neumann, M., Black, D., Haq, A., Wright-Berryman, J., & Ramanarayanan, V. A multimodal dialog approach to mental state characterization in clinically depressed, anxious, and suicidal populations. Frontiers in psychology, 2023, 14: 1135469.
[12] Li, Y., Liu, X., Wang, X., Lee, B. S., Wang, S., Rocha, A., & Lin, W. Fakebench: Probing explainable fake image detection via large multimodal models. IEEE Transactions on Information Forensics and Security, 2025.
[13] Liu, H., Lou, T., Zhang, Y., Wu, Y., Xiao, Y., Jensen, C. S., & Zhang, D. EEG-based multimodal emotion recognition: A machine learning perspective. IEEE Transactions on Instrumentation and Measurement, 2024, 73: 1 - 29.
[14] Liu, R., Huang, Z. A., Hu, Y., Zhu, Z., Wong, K. C., & Tan, K. C. Attention-like multimodality fusion with data augmentation for diagnosis of mental disorders using MRI. IEEE Transactions on Neural Networks and Learning Systems, 2022, 35 (6): 7627 - 7641.
[15] Meng, T., Shou, Y., Ai, W., Yin, N., & Li, K. Deep imbalanced learning for multimodal emotion recognition in conversations. IEEE Transactions on Artificial Intelligence, 2024.
[16] Jiang, Z., Chen, H., Zhou, R., Deng, J., Zhang, X., Zhao, R & Ngai, E. C. Healthprism: a visual analytics system for exploring children's physical and mental health profiles with multimodal data. IEEE Transactions on Visualization and Computer Graphics, 2023, 30 (1): 1205 - 1215.
[17] Middya, A. I., Nag, B., & Roy, S. Deep learning based multimodal emotion recognition using model-level fusion of audio–visual modalities. Knowledge-based systems, 2022, 244: 108580.
[18] Yang, D., Huang, S., Kuang, H., Du, Y., & Zhang, L. Disentangled representation learning for multimodal emotion recognition. In Proceedings of the 30th ACM international conference on multimedia, 2022: 1642 - 1651.
[19] Li, B., Fei, H., Liao, L., Zhao, Y., Teng, C., Chua, T. S., ... & Li, F. Revisiting disentanglement and fusion on modality and context in conversational multimodal emotion recognition. In Proceedings of the 31st ACM International Conference on Multimedia, 2023: 5923 - 5934.
[20] Joshi, A., Bhat, A., Jain, A., Singh, A., & Modi, A. COGMEN: COntextualized GNN based Multimodal Emotion recognitioN. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022: 4148 - 4164.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Highlights in Science, Engineering and Technology

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.







