In recent years, the deployment of surveillance cameras has significantly increased to enhance security in public and private spaces. Numerous businesses continue to employ individuals to monitor these cameras. However, unusual and suspicious activities in the video feeds are often overlooked due to the potential for human error. Consequently, manual monitoring of security cameras can be time-consuming and inefficient. This study investigates the application of deep learning techniques, particularly convolutional neural networks (CNNs) and support vector machines (SVMs), to predict violence in surveillance video streams. The proposed CNN model is optimised through the utilisation of gamma correction as a preprocessing step to extract essential spatial features from video frames, significantly enhancing the accuracy of violence detection. This study leverages the real-time capabilities of surveillance data by utilising the RLV dataset, which comprises a range of violent and non-violent scenarios. The CNN–SVM hybrid model developed in this study achieved an impressive 99% accuracy, outperforming traditional methods and demonstrating strong spatial feature extraction capabilities. Furthermore, this study addresses the challenges of real-time video surveillance by ensuring scalability and practical applicability, providing a robust solution for enhancing security measures in public and private spaces.