A Comparative Study of 2D and 3D Convolutional Neural Networks for Melanoma Classification
Conference paper, Intelligent Computing and Emerging Communication Technologies, ICEC 2024, 2024, DOI Link
View abstract ⏷
Skin Melanoma is a lethal type of cancer. The early diagnosis of which is crucial to improve the survival rate of the patients. Convolution neural networks are at the heart of the deep learning algorithms. In the present work authors have experimentally compared 2D and 3D Convolution Neural Network (CNN) models to identify the melanoma. We have employed three different types of datasets namely PH2, ISIC archive, and ISIC skin cancer datasets. We applied the two models on each of the datasets to determine their accuracy, precision, recall, f1 score and ROC curves. The experimental results provide the insights about the advantages and limitations of using 2D and 3D CNN models for the identification of skin melanoma. The authors have observed that 2D CNN model shows enhanced capabilities to detect skin lesion structures compared to 3D CNN. Moreover, the classification accuracy of the 2D CNN is also found better than 3D CNN.
Comparative Study of ML Techniques for Classification of Crop Pests
Pamidimukkala J.S., Tarun Teja P., Suman Paul K., Kosaraju D.S., Mahamkali N.
Conference paper, 2024 4th International Conference on Artificial Intelligence and Signal Processing, AISP 2024, 2024, DOI Link
View abstract ⏷
Crop pests pose a great threat to global food security; thus, the best pest prevention measures must be implemented. By using different machine learning (ML) techniques to perform crop pest classification, this research provides ways to improve the accuracy and speed of identifying pests in agricultural sectors. Conventional methods for identifying pests frequently depend on manual observation, which is tedious, error-prone, and labor-intensive. On the other hand, machine learning (ML) presents an effective way to automate this procedure by using sophisticated techniques to analyze massive data sets and produce precise predictions. The study applies a variety of machine learning approaches, such as Random Forests, K-Nearest Neighbor, and Naive Bayes, to classify agricultural pests according to features that have been extracted from images. For model training and validation, an extensive collection of high-resolution images of different agricultural pests taken in a range of environmental settings is used. Metrics like accuracy are used to determine how well the machine learning models perform. The potential of machine learning approaches to revolutionize pest management in agriculture is evident from the results, which indicate how accurately they can identify and classify agricultural pests. The suggested method improves the overall effectiveness of pest management procedures and drastically reduces the time and effort required to identify pests. Ultimately, this research promotes more resilient and productive farming systems by supporting efforts to develop sustainable and technologically advanced solutions for addressing agricultural difficulties. The results demonstrate the potential of machine learning (ML) as an invaluable tool for farmers, agronomists, and policymakers, encouraging a proactive and data-driven approach to pest management in contemporary agriculture.
Synergistic Integration of Skeletal Kinematic Features for Vision-Based Fall Detection
Article, Sensors, 2023, DOI Link
View abstract ⏷
According to the World Health Organisation, falling is a major health problem with potentially fatal implications. Each year, thousands of people die as a result of falls, with seniors making up 80% of these fatalities. The automatic detection of falls may reduce the severity of the consequences. Our study focuses on developing a vision-based fall detection system. Our work proposes a new feature descriptor that results in a new fall detection framework. The body geometry of the subject is analyzed and patterns that help to distinguish falls from non-fall activities are identified in our proposed method. An AlphaPose network is employed to identify 17 keypoints on the human skeleton. Thirteen keypoints are used in our study, and we compute two additional keypoints. These 15 keypoints are divided into five segments, each of which consists of a group of three non-collinear points. These five segments represent the left hand, right hand, left leg, right leg and craniocaudal section. A novel feature descriptor is generated by extracting the distances from the segmented parts, angles within the segmented parts and the angle of inclination for every segmented part. As a result, we may extract three features from each segment, giving us 15 features per frame that preserve spatial information. To capture temporal dynamics, the extracted spatial features are arranged in the temporal sequence. As a result, the feature descriptor in the proposed approach preserves the spatio-temporal dynamics. Thus, a feature descriptor of size (Formula presented.) is formed where m is the number of frames. To recognize fall patterns, machine learning approaches such as decision trees, random forests, and gradient boost are applied to the feature descriptor. Our system was evaluated on the UPfall dataset, which is a benchmark dataset. It has shown very good performance compared to the state-of-the-art approaches.
Spatio Temporal Joint Distance Maps for Skeleton-Based Action Recognition Using Convolutional Neural Networks
Article, International Journal of Image and Graphics, 2021, DOI Link
View abstract ⏷
Skeleton-based action recognition has become popular with the recent developments in sensor technology and fast pose estimation algorithms. The existing research works have attempted to address the action recognition problem by considering either spatial or temporal dynamics of the actions. But, both the features (spatial and temporal) would contribute to solve the problem. In this paper, we address the action recognition problem using 3D skeleton data by introducing eight Joint Distance Maps, referred to as Spatio Temporal Joint Distance Maps (ST-JDMs), to capture spatio temporal variations from skeleton data for action recognition. Among these, four maps are defined in spatial domain and remaining four are in temporal domain. After construction of ST-JDMs from an action sequence, they are encoded into color images. This representation enables us to fine-tune the Convolutional Neural Network (CNN) for action classification. The empirical results on the two datasets, UTD MHAD and NTU RGB+D, show that ST-JDMs outperforms the other state-of-the-art skeleton-based approaches by achieving recognition accuracies 91.63% and 80.16%, respectively.
Learning representations from quadrilateral based geometric features for skeleton-based action recognition using LSTM networks
Article, Intelligent Decision Technologies, 2020, DOI Link
View abstract ⏷
With the recent developments in sensor technology and pose estimation algorithms, skeleton based action recognition has become popular. Classical machine learning methods based on hand-crafted features fail on large scale datasets due to their limited representation power. Recently, recurrent neural networks (RNN) based methods focus on the temporal evolution of body joints and neglect the geometric relations between them. In this paper, we propose eleven quadrilaterals to capture the geometric relations among joints for action recognition. An end-to-end 3-layer Bi-LSTM network is designed as Base-Net to learn robust representations. We propose two subnets based on the Base-Net to extract discriminative spatio temporal features. Specifically, the first subnet (SQuadNet) uses four spatial features and the second one (TQuadNet) uses two temporal features. The empirical results on two benchmark datasets, NTU RGB+D and UTD MHAD, show how our method achieves state of the art performance when compared to recent methods in the literature.
Deep ensemble network using distance maps and body part features for skeleton based action recognition
Article, Pattern Recognition, 2020, DOI Link
View abstract ⏷
Human action recognition is a hot research topic in the field of computer vision. The availability of low cost depth sensors in the market made the extraction of reliable skeleton maps of human objects easier. This paper proposes three subnets, referred to as SNet, TNet, and BodyNet to capture diverse spatio-temporal dynamics for action recognition task. Specifically, SNet is used to capture pose dynamics from the distance maps in the spatial domain. The second subnet (TNet) captures the temporal dynamics along the sequence. The third net (BodyNet) extracts distinct features from the fine-grained body parts in the temporal domain. With the motivation of ensemble learning, a hybrid network, referred to as HNet, is modeled using two subnets (TNet and BodyNet) to capture robust temporal dynamics. Finally, SNet and HNet are fused as one ensemble network for action classification task. Our method achieves competitive results on three widely used datasets: UTD MHAD, UT Kinect and NTU RGB+D.
Ensemble spatio-temporal distance net for Skeleton based action recognition
Article, Scalable Computing, 2019, DOI Link
View abstract ⏷
With the recent developments in sensor technology and pose estimation algorithms, skeleton based action recognition has become popular. This paper proposes a deep learning framework for action recognition task using ensemble learning. We design two subnets to capture spatial and temporal dynamics of the entire video sequence, referred to as Spatial -distance Net (SdNet) and Temporal - distance Net (TdNet) respectively. More specifically, SdNet is a Convolutional Neural Network based subnet to capture spatial dynamics of joints within a frame and TdNet is a long short term memory based subnet to exploit temporal dynamics of joints between frames along the sequence. Finally, two subnets are fused as one ensemble network, referred to as Spatio-Temporal distance Net (STdNet) to explore both spatial and temporal information. The efficacy of the proposed method is evaluated on two widely used datasets, UTD MHAD and NTU RGB+D, and the proposed STdNet achieved 91.16% and 82.55% accuracies respectively.
Learning Representations from Spatio-Temporal Distance Maps for 3D Action Recognition with Convolutional Neural Networks
Article, Advances in Distributed Computing and Artificial Intelligence Journal, 2019, DOI Link
View abstract ⏷
This paper addresses the action recognition problem using skeleton data. In this work, a novel method is proposed, which employs five Distance Maps (DM), named as Spatio- Temporal Distance Maps (ST-DMs), to capture the spatio-temporal information from skeleton data for 3D action recognition. Among five DMs, four DMs capture the pose dynamics within a frame in the spatial domain and one DM captures the variations between consecutive frames along the action sequence in the temporal domain. All DMs are encoded into texture images, and Convolutional Neural Network is employed to learn informative features from these texture images for action classification task. Also, a statistical based normalization method is introduced in this proposed method to deal with variable heights of subjects. The efficacy of the proposed method is evaluated on two datasets: UTD MHAD and NTU RGB+D, by achieving recognition accuracies 91.63% and 80.36% respectively.
Skeleton Joint Difference Maps for 3D Action Recognition with Convolutional Neural Networks
Conference paper, Communications in Computer and Information Science, 2019, DOI Link
View abstract ⏷
Action recognition is a leading research topic in the field of computer vision. This paper proposes an effective method for action recognition task based on the skeleton data. Four features are proposed based on the joint differences from 3D skeleton data. From the differences of 3D coordinates of corresponding joints in successive frames, three maps are extracted related to x, y and z coordinates respectively and then these maps are encoded into 2D color images, named as Joint Difference Maps (JDMs). The fourth JDM is formed by mapping the individual x, y and z difference maps into red, green and blue values. Hence, the 3D action recognition problem is converted into 2D image classification problem. It enables us to fine tune CNNs to learn informative features for 3D action recognition problem. The proposed method achieved 79.30% recognition rate on UTD MHAD dataset.
Vector Quantization based Pairwise Joint Distance Maps (VQ-PJDM) for 3D Action Recognition
Conference paper, Procedia Computer Science, 2018, DOI Link
View abstract ⏷
This paper presents an approach for 3D action recognition using vector quantization with pairwise joint distance maps. We name this approach as VQ-PJDM. The main problem for 3D action recognition using skeleton data is that dealing with the variable length of action sequences. We solve this problem by approximation of each action sequence as a codebook, which is the output of Vector Quantization (VQ) method. The codebook size is fixed for any length of the action sequence. After all actions in the data set are approximated by VQ method, the Pairwise Distance Joint Distance Maps(PJDM) are calculated from approximated actions. The voting classifier is employed for action classification. The empirical results on the UT Kinect dataset prove that the proposed method gives better results than that of state of the art.
3-D Projected PCA Based DMM Feature Fusing with SMO-SVM for Human Action Recognition
Conference paper, Procedia Computer Science, 2016, DOI Link
View abstract ⏷
Action recognition in video sequence is a very important and challenging problem yet. This paper presents an efficient feature extraction method for human action recognition for depth video sequence. For the video sequence acquired by depth sensor, all 3-D projections (xy, yz and zx) are calculated for each depth frame. For each projection view, the difference between each alternative frames have been considered to form the Depth Motion Map (DMM). Principle Component Analysis technique is applied to decrease the facet of DMM-feature. Sequential minimal optimization (SMO) is pre-owned to train the Support Vector Machine (SVM). The proposed approach is evaluated on MSR Action-3D data set and compared with the existing approaches. The empirical results convey that proposed approach achieves good results than the existing approaches.