Enhanced Semantic-Driven Anomaly Detection: A Multi-class Strategy for Video Data
Conference paper, Communications in Computer and Information Science, 2026, DOI Link
View abstract ⏷
Abnormal event detection in videos has predominantly been explored as a one-class or binary classification problem, limiting the granularity and interpretability of detected anomalies. This paper introduces the Squeeze-Attentive Encoder Network (SAE-Net), a novel encoder architecture that integrates semantic-driven analysis with a Squeeze-and-Excitation (SE) attention mechanism. By emphasizing semantic features, the model effectively discerns and categorizes various types of anomalous events within video sequences, enabling precise multi-class classification. SAE-Net is thoroughly evaluated on the CUHK Avenue and ShanghaiTech datasets, well-known large benchmarks for video anomaly detection. The inclusion of the SE attention mechanism into deep convolutional neural network enhances the network’s ability to focus on semantically significant features within complex video frames, leading to more accurate and reliable anomaly classification across multiple categories. Experimental results highlight the model’s superior performance, demonstrating its robustness and applicability in real-world scenarios, particularly in surveillance and security systems.
STAD-AI: Spatio-Temporal Anomaly Detection in Videos with Attentive Dual-Stage Integration
Article, Neurocomputing, 2025, DOI Link
View abstract ⏷
This paper presents a novel and comprehensive framework for video anomaly detection, distinguished by its specialized spatio-temporal feature extraction and precise anomaly prediction capabilities. The proposed system employs an advanced spatio-temporal attention-based framework designed for effective video frame reconstruction. By isolating and amplifying critical feature regions within the frames, it enables the extraction of fine-grained spatial and temporal representations, which are crucial for detecting subtle anomalies. Complementing this, an attentive U-Net architecture is employed to predict anomalies with high precision, incorporating motion features to enhance temporal coherence and anomaly localization. The attention mechanism in both components is strategically designed to focus on critical areas within each frame and sequence, where abnormal activities are likely to occur, improving detection accuracy and reducing false positives. The two components are seamlessly integrated using a fusion strategy, combining their complementary strengths to enhance the system's overall robustness and effectiveness. Extensive evaluations on benchmark datasets, including UCSD Peds1, UCSD Peds2, CUHK Avenue, and ShanghaiTech, demonstrate that STAD-AI achieves superior performance with AUC scores of 86.6%, 99.1%, 91.4%, and 77.7%, respectively. These results highlight the framework's ability to effectively leverage spatial and temporal features for detecting anomalies with high precision, advancing the state-of-the-art in video anomaly detection.
DAST-Net: Dense visual attention augmented spatio-temporal network for unsupervised video anomaly detection
Article, Neurocomputing, 2024, DOI Link
View abstract ⏷
This paper introduces an innovative end-to-end trainable framework named Dense Attention-aware Spatio-Temporal Network (DAST-Net) for video anomaly detection. The framework adeptly leverages both spatial and temporal data in an unsupervised manner, eliminating the need for manually crafted features. To enhance spatial feature representation, DAST-Net incorporates visual attention-aware residual connections within the denser residual network (DenserResNet), deviating from traditional identity skip connections. The rationale behind this connection choice is to augment the contextual understanding of features across various scales. For capturing temporal patterns, the framework employs a Convolutional LSTM Autoencoder (ConvLSTM-AE) module, enabling effective learning and representation of temporal dependencies in video data. Consequently, discriminating features from attention modules are combined with the features extracted by the ConvLSTM-AE module, enhancing visual recognition capabilities for both spatial and temporal aspects. Our proposed architecture outperforms state-of-the-art methods on four benchmark datasets, showcasing AUC scores of 85.4% on Ped1, 97.9% on Ped2, 89.8% on Avenue, and 73.7% on the ShanghaiTech dataset. The results demonstrate the performance of our method in identifying unusual events in video data.
Anomaly detection in video surveillance: a supervised inception encoder approach
Article, Multimedia Tools and Applications, 2024, DOI Link
View abstract ⏷
Unsupervised video anomaly detection approaches often demand complex models and substantial computational resources for effective performance. In contrast, we introduce a supervised and end-to-end trainable deep learning approach that leverages both performance and computational efficiency by harnessing frame-level annotated data. The framework begins with the utilization of an Inception encoder network in the initial stage to learn feature representations. Notably, the Inception network’s proficiency in capturing intricate and high-level features in frames seamlessly extends to the analysis of video data. By using these extracted features, the model excels in identifying deviations from learned patterns, making it highly adept at detecting anomalies in video sequences. The subsequent stage involves a sequence of fully connected layers followed by a classifier that is responsible for classifying input frames as either normal or anomalous based on the extracted features. To thoroughly validate this methodology, extensive experiments are carried out on widely used benchmark datasets. These evaluations involved comprehensive comparisons with contemporary approaches in the field. The experimental findings consistently validate the efficacy and efficiency of the proposed approach, underscoring its outstanding accuracy in identifying anomalies. Additionally, the approach operates with significantly reduced computational overhead, rendering it an appealing solution for real-world applications that demand timely and precise anomaly detection.
A Supervised Approach for Efficient Video Anomaly Detection Using Transfer Learning
Conference paper, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2023, DOI Link
View abstract ⏷
Video anomaly detection is a complex task that has numerous applications in video surveillance. It involves identifying unusual patterns or events in a video stream that deviate from the expected or typical behavior. This paper introduces a new framework for supervised video anomaly detection using transfer learning from a pre-trained model. The approach presented in this paper utilizes the MobileNetV2 architecture as a feature extractor, which is further fine-tuned using a small set of annotated data. The fine-tuned model is then utilized to classify video frames into normal or anomalous classes. The suggested methodology is evaluated on benchmark datasets and compared with state-of-the-art methods. The experimental results demonstrate the effectiveness and efficiency of the proposed method in detecting anomalies in videos with high accuracy and low computational cost.
Bi-READ: Bi-Residual AutoEncoder based feature enhancement for video anomaly detection
Article, Journal of Visual Communication and Image Representation, 2023, DOI Link
View abstract ⏷
Video anomaly detection (VAD) refers to identifying abnormal events in the surveillance video. Typically, reconstruction based video anomaly detection techniques employ convolutional autoencoders with a limited number of layers, which extracts insufficient features leading to improper network training. To address this challenge, an end-to-end unsupervised feature enhancement network, namely Bi-Residual Convolutional AutoEncoder (Bi-ResCAE) has been proposed that can learn normal events with low reconstruction error and detect anomalies with high reconstruction error. The proposed Bi-ResCAE network incorporates long–short residual connections to enhance feature reusability and training stabilization. In addition, we propose to formulate a novel VAD model that can extract appearance and motion features by fusing both the Bi-ResCAE network and optical flow network in the objective function to recognize the anomalous object in the video. Extensive experiments on three benchmark datasets validate the effectiveness of the model. The proposed model achieves an AUC (Area Under the ROC Curve) of 84.7% on Ped1, 97.7% on Ped2, and 86.71% on the Avenue dataset. The results show that the Bi-READ performs better than state-of-the-art techniques.
Iot based vehicle (car) theft detection
Kommaraju R., Kommanduri R., Rama Lingeswararao S., Sravanthi B., Srivalli C.
Conference paper, Advances in Intelligent Systems and Computing, 2021, DOI Link
View abstract ⏷
It is a foregone conclusion that property crimes will hit ten million. Among such, within the steal list, the vehicle is flat-top and sometimes controls entire components of the earth. There have been some recent technical advances and new strategies are being developed to overcome this downside. The ways involved in the detection of vehicle theft are noted for interference by some or everything, as well as shields. This research helps to prevent the vehicle stealing from third parties with the help of RFID card and authorized key. Here, RFID reader connected to the car and if anyone enter into car and they requires the card authorization. Keypads were connected to the engine and the authorized person can only enter into the car by pressing passwords. If the password is incorrect, buzzer will produce noise continuously. The proposed system gives the regular updates to the car owners regarding the location of car by using GPS technology.