Method and Analysis of Violent Behavior Recognition Based on Multimodal Information Fusion

Authors

  • Jiale Chen

DOI:

https://doi.org/10.54097/ve0k3231

Keywords:

Identification of violent behavior, Multimodal fusion, Feature extraction network, Integration strategy.

Abstract

This article provides a systematic review and analysis of methods for identifying violent behavior based on multimodal information fusion. The research focuses on two main themes: "Feature Extraction Network - Multimodal Fusion Strategy". In a unified experimental framework, the article compared the differences in parameter quantity, computational efficiency, and detection accuracy among 3D CNN, ConvLSTM, GhostNet, and other networks, and analyzed the performance of early fusion, late fusion, and attention fusion strategies on public datasets such as RWF-2000. By quantitatively comparing the interaction between feature extraction networks and fusion strategies, this study aims to fill in the current review that lacks engineering implementation guidance. Research has found that existing reviews often focus on a single technological path, or only list network structures, or compare fusion strategies, lacking a systematic induction of the "network fusion" coupling effect within the same framework. This makes it difficult for researchers to make quick decisions based on computing power, real-time performance, and privacy constraints during actual deployment. This article provides a reference for researchers to choose appropriate feature extraction networks and fusion strategies in practical applications by analyzing several representative works in a unified coordinate system.

Downloads

Download data is not yet available.

References

[1] Marco Picchioni, Rebecca Ruiz, Giovanni de Girolamo, Laura Iozzino, Manuel Zamparini, Johannes Wancata, Annemarie Unger, Janusz Heitzman, Inga Markewitz, Harald Dressing, Matthew M Large. The predictive validity and temporal characteristics of the HCR-20v3 for inpatient violence in forensic inpatient settings an international study. Psychiatry Research, 2024, 339.

[2] Duba Sriveni, Dr. Loganathan R. An active learning driven deep spatio-textural acoustic feature ensemble assisted learning environment for violence detection in surveillance videos. Engineering Science and Technology, an International Journal, 2025, 66.

[3] Xu Long, Gong Chen, Yang Jie, et al. Violent video detection based on mosift feature and sparse coding. IEEE International Conference on Acoustics, Speech and Signal Processing, 2014.

[4] Pallavi D Chakole, Vishal R Satpute. Analysis of anomalous crowd behavior by employing pre-trained efficient-X3D net for violence detection. Sādhanā, 2025, 50 (1).

[5] Asad, M., Yang, J., He, J. et al. multi-frame feature-fusion-based model for violence detection. The Visual Computer,2021.

[6] S. Sudhakaran and O. Lanz. Learning to detect violent videos using convolutional long short-term memory. 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Lecce, Italy, 2017.

[7] Quentin Pajon, Swan Serre, Hugo Wissocq, Léo Rabaud, Siba Haidar, Antoun Yaacoub. Balancing Accuracy and Training Time in Federated Learning for Violence Detection in Surveillance Videos: A Study of Neural Network Architectures. Journal of Computer Science and Technology, 2024, 39 (5).

[8] Febin, I.P., Jayasree, K. & Joy, P.T. Violence detection in videos for an intelligent surveillance system using MoBSIFT and movement filtering algorithm. Pattern Anal Applic 23, 2020: 611 - 623.

[9] Qiming Liang, Yong Li, Kaikai Yang, Xipeng Wang and Zhi Li, Long-term recurrent convolutional network violent Behaviour recognition with attention mechanism, MATEC Web Conf., 336 (2021) 05013.

[10] Dong, Zhihong, J. Qin, and Y. Wang. Multi-stream Deep Networks for Person-to-Person Violence Detection in Videos. Springer Singapore, 2016.

Downloads

Published

12-03-2026

How to Cite

Chen, J. (2026). Method and Analysis of Violent Behavior Recognition Based on Multimodal Information Fusion. Highlights in Science, Engineering and Technology, 161, 34-42. https://doi.org/10.54097/ve0k3231