This research tackles major real-time detection obstacles such as changing lighting effects and object blocks and background elements and different object sizes because such problems occur frequently in autonomous driving and surveillance and smart infrastructure applications. A detection pipeline system with modular functionality enabled processing of images, videos and webcams in real-time and had specific optimizations for each input format. The YOLOv5s model was selected due to its accurate and fast performance characteristics so we deployed it in a cloud-based Google Colab system with GPU acceleration capabilities. Real-world data collection succeeded in quantitative analysis through assessment of inference duration together with frame speed and detection precision along with confidence values while qualitative methods measured box precision and label validity. The system produced exceptional results by processing images and video data within 28 to 35 milliseconds and webcam frames between 1.8 to 2.3 seconds while generating confidence scores between 0.70 and 0.93. Real-time applications benefit from this system because it presents stable detection while being environmentally flexible and practically applicable. YOLOv5 proves robust based on the discovered test results which indicate future potential deployments of intelligent visual monitoring systems across all dynamic environments.