Dr Saleti Sumalatha

Assistant Professor

Department of Computer Science and Engineering

Contact Details

sumalatha.s@srmap.edu.in

Office Location

SR Block, Level 5, Cabin No: 20

School of Engineering and Sciences Computer Science and Engineering Faculty Dr Saleti Sumalatha

Faculty Information

Education

2020

Ph.D.

National institute of Technology, Warangal

India

2010

M.Tech

Annamacharya Institute of science and technology, JNTU Anantapur

India

2004

B.Tech

Narayana Engineering College, JNTU Hyderabad

India

Experience

January 2019 to March 2020 - Associate Professor – Geethanjali Institute of Science and Technology, Nellore, Andhra Pradesh.
July 2015 to December 2018 – Research Scholar – National Institute of Technology, Warangal, Telangana.
May 2012 to May 2015 – Associate Professor – Geethanjali Institute of Science and Technology, Nellore, Andhra Pradesh.
October 2010 to April 2012 – Assistant Professor - Narayana Engineering College, Nellore, Andhra Pradesh.
July 2004 to August 2008 – Assistant Professor – Narayana Engineering College, Nellore, Andhra Pradesh.

Research Interest

To implement a learning management system and study the navigational patterns to enhance students learning.
To develop incremental mining algorithms.

Awards

2022 - Undergraduate students under my supervision received one gold medal and one silver medal (September 2022) for paper presentation in Research Day organized by SRM University, AP, India.
2022 - Undergraduate students under my supervision received a gold medal (January 2022) for paper presentation in Research Day organized by SRM University, AP, India.
2015 to 2018 - Ph.D. Fellowship - Ministry of Human Resource Development
2010 – Secured first rank in Master’s Degree.

Memberships

Life Member of ISTE

Publications

Federated learning-based disease prediction: A fusion approach with feature selection and extraction
Dr Saleti Sumalatha, Mr Ramdas Kapila
Source Title: Biomedical Signal Processing and Control, Quartile: Q1, DOI Link
View abstract ⏷
The ability to predict diseases is critical in healthcare for early intervention and better patient outcomes. Data security and privacy significantly classified medical data from several institutions is analyzed. Cooperative model training provided by Federated Learning (FL), preserves data privacy. In this study, we offer a fusion strategy for illness prediction, combining FL with Anova and Chi-Square Feature Selection (FS) and Linear Discriminate Analysis (LDA) Feature Extraction (FE) techniques. This research aims to use FS and FE techniques to improve prediction performance while using the beneficial aspects of FL. A comprehensive analysis of the distributed data is ensured by updating aggregate models with information from all participating institutions. Through collaboration, a robust disease prediction system excels in the limited possibilities of individual datasets. We assessed the fusion strategy on the Cleveland heart disease and diabetes datasets from the UCI repository. Comparing the fusion strategy to solo FL or conventional ML techniques, the prediction performance a unique fusion methodology for disease prediction. Our proposed models, Chi-Square with LDA and Anova with LDA leveraging FL, exhibited exceptional performance on the diabetes dataset, achieving identical accuracy, precision, recall, and f1-score of 92.3%, 94.36%, 94.36, and 94.36%, respectively. Similarly, on the Cleveland heart disease dataset, these models demonstrated significant performance, achieving accuracy, precision, recall, and f1-score of 88.52%, 87.87%, 90.62, and 89.23%, respectively. The results have the potential to revolutionize disease prediction, maintain privacy, advance healthcare, and outperform state-of-the-art models
Student Placement Chance Prediction Model using Machine Learning Techniques
Dr Saleti Sumalatha, Manoj Manike., Priyanshu Singh., Purna Sai Madala., Steve Abraham Varghese.,
Source Title: 2021 5th Conference on Information and Communication Technology, DOI Link
View abstract ⏷
Obtaining employment upon graduation from uni-versity is one of the highest, if not the highest, priorities for students and young adults. Developing a system that can help these individuals obtain placement advice, analyze labor market trends, and assist educational institutions in assessing growing fields and opportunities would serve immense value. With the emergence of heavily refined Data Mining techniques and Machine Learning boiler plates, a model based on predictive analysis can help estimate a variety of realistic and possible placement metrics, such as the types of companies a junior year student can be placed in, or the companies that are likely to look for the specific skill sets of a student. Various attributes such as academic results, technical skills, training experiences, and projects can help predict purposes. We devised the XGBoost Technique, a structured or tabular data-focused approach that has recently dominated applied machine learning and Kaggle tournaments. XGBoost is a high-speed and high-performance implementation of gradient boosted decision trees. We created a model and ran numerous EDAs to determine whether the student will be placed or not, as well as in which type of organization he will be placed [Day Sharing, Dream, Super Dream, Marquee].
Assessing Performance Across Various Machine Learning Algorithms with Integrated Feature Selection for Fetal Heart Classification
Dr Saleti Sumalatha, Laura Rizka Amanda., Mila Desi Anasanti., Thunakala Bala Kokil
Source Title: International journal of artificial intelligence research, DOI Link
View abstract ⏷
-
Optimizing Predictive Models for Parkinson’s Disease Diagnosis
Dr Saleti Sumalatha, Mr Ramdas Kapila
Source Title: Intelligent Technologies and Parkinsons Disease: Prediction and Diagnosis, DOI Link
View abstract ⏷
-
Enhancing Forecasting Accuracy with a Moving Average-Integrated Hybrid ARIMA-LSTM Model
Dr Saleti Sumalatha, Panchumarthi L Y., Kallam Y R., Parchuri L., Jitte S
Source Title: SN Computer Science, Quartile: Q1, DOI Link
View abstract ⏷
This research provides a time series forecasting model that is hybrid which combines Long Short-Term Memory (LSTM) and Autoregressive Integrated Moving Average (ARIMA) models with moving averages. For modelling stationary time series, LSTM models are utilized, while modelling non-stationary time series is done using ARIMA models. While LSTM models are more suited for capturing long-term dependencies, ARIMA models are superior in catching short-term relationships in time series data. The hybrid model combines the short-term dependency modeling of ARIMA utilising LSTMs long-term dependency modelling. This combination leads to greater accuracy predictions for time series data that are both stationary and non-stationary. Also, Triple Exponential Moving Average (TEMA), Weighted Moving Average (WMA), Simple Moving Average (SMA), and six other moving averages were examined to determine how well the hybrid model performed. Kaufman Adaptive Moving Average (KAMA), MIDPOINT, MIDPRICE individually helps to know which methods give much precision. The study compares the hybrid models predicting performance to that of standalone ARIMA and LSTM models, in addition to other prominent forecasting approaches like linear regression and random forest. The findings indicate that the hybrid model surpasses the individual models and other benchmark methods, achieving increased precision in terms of Mean absolute percentage error (MAPE) and Root mean squared error (RMSE). The research also investigates the impact of different hyperparameters and model configurations on performance forecasts, giving information about the ideal settings for the hybrid model. Overall, the proposed ARIMA-LSTM hybrid model with moving averages is a promising approach for accurate and reliable stock price forecasting, which has practical implications for financial decision-making and risk management. © The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. 2024.
Enhancing the Accuracy of Manufacturing Process Error Detection Through SMOTE-Based Oversampling Using Machine Learning and Deep Learning Techniques
Dr Saleti Sumalatha, Boyapati S V., Rakshitha G B S., Reddy M R.,
Source Title: International Conference on Integrated Circuits, Communication, and Computing Systems, ICIC3S 2024 - Proceedings, DOI Link
View abstract ⏷
A production competency study leads to a rise in the manufacturing sectors' strategic emphasis. Developing semiconductor materials is a highly complex approach that necessitates numerous evaluations. It is impossible to emphasize the significance of the quality of the product. We put up a number of methods for automatically creating a prognostic model that is effective at identifying equipment flaws throughout the semiconductor materials' wafer fabrication process. The SECOM dataset is representative of semiconductor production procedures that go through numerous tests performed. There are imbalanced statistics in the dataset, so our proposed methodology incorporates SMOTE (Synthetic Minority Over-sampling Technique) functionality that is introduced to mitigate the imbalance of the training dataset by leveling off any unbalanced attributes. Detecting faults in the manufacturing process improves semiconductor quality and testing efficiency, and is used to validate both approaches to Machine Learning and Deep Learning algorithms. This is accomplished by collecting performance metrics during the development process. Another aspect of our effort to cut down on the training time for testing is highlighted in our research report. © 2024 IEEE.
Comparative Analysis of Optimization Algorithms for Feature Selection in Heart Disease Classification
Dr Saleti Sumalatha, Mr Ramdas Kapila
Source Title: Lecture notes in networks and systems, DOI Link
View abstract ⏷
In order to classify the Cleveland Heart Disease dataset, this study evaluates the performance of three optimization methods, namely Fruit Fly Optimization (FFO), Particle Swarm Optimization (PSO), and Grey Wolf Optimizer (GWO). The top 10 features are identified using FFO, with remarkable results for accuracy (88.52%), precision (87.88%), recall (90.63%), and f1-score (89.23%). After using PSO, the accuracy, precision, recall, specificity, and f1-score are all 85.25%, 87.10%, 84.38%, and 85.71% respectively. Finally, GWO is used, which results in precision, accuracy, recall, and f1-score values of 93.33%, 90.16%, 90.11%, 87.5%, and 90.32% respectively, highlighting its consistent superior performance. FFO shows competitive outcomes with notable accuracy and recall through a comparative examination. PSO displays comparable precision and recall while displaying a somewhat poorer accuracy. In contrast, GWO performs better than both FFO and PSO, displaying great accuracy and precision along with remarkable recall and specificity. These results provide important information on the effectiveness of feature selection methods utilized in optimization algorithms for heart disease classification. The study also highlights the need for more investigation into the potential of these optimization algorithms in many fields, broadening their use beyond the classification of disease. This kind of work might improve the progress of the field of feature selection and aid in the creation of better classification models
A Survey on Occupancy-Based Pattern Mining
Dr Saleti Sumalatha, Inaganti Bhavana.,
Source Title: Lecture notes in networks and systems, DOI Link
View abstract ⏷
Occupancy-based pattern mining has emerged as a significant research topic in recent times. This paper presents a comprehensive survey on occupancy, which serves as a measure to augment the significance of patterns. The survey covers various types of patterns, including frequent itemsets, high utility itemsets, frequent sequences, and high utility sequences, all in the context of occupancy. Additionally, the paper delves into techniques aimed at reducing the search space in the aforementioned pattern mining problems. These techniques are crucial for improving the efficiency and scalability of the mining process, especially when dealing with large-scale datasets. Furthermore, the paper discusses potential research extensions for occupancy-based pattern mining. These extensions could explore new applications, explore novel algorithms, or further enhance the effectiveness of occupancy as a measure for pattern evaluation. In general, this survey provides an important resource for researchers interested in understanding and advancing occupancy-based pattern mining techniques
Insights into Gun-Related Deaths: A Comprehensive Machine Learning Analysis
Dr Saleti Sumalatha, Lovelylovely Yeswanth Panchumarthi., Lavanya Parchuri
Source Title: 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT)[CONFERENCE=>ISSN(-)=>ISBN(-)], DOI Link
View abstract ⏷
This work employs both supervised and unsupervised machine learning techniques to examine firearm-related fatalities in the US and identify trends, patterns, and risk factors within the data. During the supervised learning phase, techniques such as logistic regression, decision trees, random forests, and neural networks were used to predict the kind of death (suicide, homicide, accidental, or unknown) based on demographic data like sex, age, race, place, and education. Findings show that the neural network and random forest models exhibit promising precision and recall values across several classes, and that they obtained the highest accuracy, reaching 79.88% and 83.59%, respectively. Using clustering techniques including Agglomerative clustering, K-means, and Gaussian mixture models, gun-related fatalities were categorized based on demographic and temporal data during the unsupervised learning stage. The analysis revealed distinct clusters of deaths, providing insights into the varying patterns and trends over time and across demographic groups. The K-means algorithm, with a silhouette score of 0.42, demonstrated meaningful separation among clusters. The research contributes to understanding the complex dynamics of gun-related deaths, shedding light on both individual risk factors and broader trends. However, further analysis could explore additional dimensions of the dataset or delve deeper into the interpretation of clustering results. The study also highlights how crucial it is to take into consideration the moral consequences and constraints of machine learning applications in complex fields like public health.
Enhancing Customer Churn Prediction: Advanced Models and Resampling Techniques in Dynamic Business Environments
Dr Saleti Sumalatha, Yaswanth Chowdary Thotakura., Dinesh Manikanta Yarramsetty., Kalyan Kumar Doppalapudi., Sai Shasank Alaparthi
Source Title: 2024 International Conference on Intelligent Computing and Emerging Communication Technologies (ICEC), DOI Link
View abstract ⏷
Customer churn analysis is critical for businesses looking to hold onto market share in todays dynamic business environment. The development of e-Finance presents additional difficulties for the traditional banking sector as the digital marketplace grows. Banks face several challenges, including fintech competition, dwindling client loyalty, and digital transformation. Bank managers can identify problems, identify potential churn customers early on, and develop effective retention strategies based on client traits and preferences by analyzing probable causes of bank customer turnover from multiple perspectives and building models for predicting churn. Not only banks but also large corporate sectors like telecommunication, and over-the-top (OTT) platforms do face customer churn. This study proposes the Random Leaf Model (RLM) and also explores the Logit Leaf Model (LLM), and Neural Network Ensemble Model, three sophisticated predictive modeling methodologies. Proactive strategies are necessary in today's marketplaces due to their competitive nature. The primary problem with current automatic churn prediction algorithms is the substantial gap between majority and minority class proportions in the datasets, which might lead to model bias in favor of the dominant class. The shortcomings of conventional churn analysis techniques underscore the necessity of implementing advanced cutting-edge algorithms to achieve precise forecasts
Leveraging ResNet for Efficient ECG Heartbeat Classification
Dr Saleti Sumalatha, Lovely Yeswanth Panchumarthi., Sriya Padmanabhuni
Source Title: 2024 IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems (SPICES), DOI Link
View abstract ⏷
This paper provides a novel approach that uses a modified version of the ResNet architecture to classify heartbeats on an electrocardiogram (ECG). Padding, convolution, max pooling, convolutional blocks, average pooling, and fully linked layers are the six processes in the approach. The MIT-BIH Arrhythmia Database is used to test the approach on five different types of heartbeats: unclassifiable, supraventricular premature, premature ventricular contraction, fusion of ventricular and normal, and normal. The outcomes demonstrate that the suggested approach outperforms other current techniques like LSTM, CNN, and EfficientNet, achieving an accuracy of 98.6%. The performance, restrictions, and future directions of the model are also thoroughly examined in this work. The automated ECG heartbeat categorization using deep learning techniques is one way that the article advances the field of cardiac diagnosis
Optimizing Recommendation Systems: Analyzing the Impact of Imputation Techniques on Individual and Group Recommendation Systems
Dr Saleti Sumalatha, Sri Phani Bhushan Mada., Rahul Tata., Sneha Teja Sree Reddy Thondapu
Source Title: 2024 IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems (SPICES), DOI Link
View abstract ⏷
In todays world, Recommendation Systems play a significant role in guiding and simplifying the decision-making process for individuals and groups. However, the presence of missing data in user-item interaction matrices poses a challenge to accurately identify user preferences and provide relevant suggestions. This is particularly true for group recommendation systems that cater to multiple users. To address this challenge, we have applied four imputation techniques to individual and group recommendation models, including User-based Collaborative filtering, Matrix factorization using Singular Value Decomposition, and deep learning-based models like Autoencoders. We evaluated the effectiveness of these techniques using root mean squared error and mean absolute error metrics and observed a significant impact on the quality of recommendations. Additionally, we implemented aggregation strategies like Borda count, Additive Utilitarian, Multiplicative Utilitarian, Least Misery, and Most Pleasure for Group Recommendations. We evaluated the performance of these strategies using satisfaction score and disagreement score. Overall, our findings suggest that imputation techniques can significantly improve the quality of recommendations in both individual and group recommendation systems
A comparison of various machine learning algorithms and execution of flask deployment on essay grading
Dr Saleti Sumalatha, Udhika Meghana Kotha., Haveela Gaddam., Deepthi Reddy Siddenki
Source Title: International Journal of Electrical and Computer Engineering, Quartile: Q3, DOI Link
View abstract ⏷
Students performance can be assessed based on grading the answers written by the students during their examination. Currently, students are assessed manually by the teachers. This is a cumbersome task due to an increase in the student-teacher ratio. Moreover, due to coronavirus disease (COVID-19) pandemic, most of the educational institutions have adopted online teaching and assessment. To measure the learning ability of a student, we need to assess them. The current grading system works well for multiple choice questions, but there is no grading system for evaluating the essays. In this paper, we studied different machine learning and natural language processing techniques for automated essay scoring/grading (AES/G). Data imbalance is an issue which creates the problem in predicting the essay score due to uneven distribution of essay scores in the training data. We handled this issue using random over sampling technique which generates even distribution of essay scores. Also, we built a web application using flask and deployed the machine learning models. Subsequently, all the models have been evaluated using accuracy, precision, recall, and F1-score. It is found that random forest algorithm outperformed the other algorithms with an accuracy of 97.67%, precision of 97.62%, recall of 97.67%, and F1-score of 97.58%.
Analyzing the Health Data: An Application of High Utility Itemset Mining
Dr Saleti Sumalatha, Lakshmi Sai Bhargavi G., Shanmukh R., Lokesh T., Sobin C C., Padmavathi K., Tottempudi S S
Source Title: 2023 International Conference on Advances in Computation, Communication and Information Technology, ICAICCIT 2023, DOI Link
View abstract ⏷
A data science endeavour called "high utility pattern mining"entails finding important patterns based on different factors like profit, frequency, and weight. High utility itemsets are among the various patterns that have undergone thorough study. These itemsets must exceed a minimum threshold specified by the user. This is particularly useful in practical applications like retail marketing and web services, where items have diverse characteristics. High-utility itemset mining facilitates decision- making by uncovering patterns that have a significant impact. Unlike frequent itemset mining, which identifies commonly oc- curring itemsets, high-utility itemsets often include rare items in real-world applications. Considering the application to the medical field, data mining has been employed in various ways. In this context, the primary method involves analyzing a health dataset that spans from 2014 to 2017 in the United States. The dataset includes categories such as diseases, states, and deaths. By examining these categories and mortality rates, we can derive high-utility itemsets that reveal the causes of the most deaths. In conclusion, high-utility pattern mining is a data science activity that concentrates on spotting significant patterns based on objective standards. It has proven valuable in various fields, including the medical domain, where analyzing datasets can uncover high-utility itemsets related to mortality rates and causes of death.Categories and Subject Descriptors - [Health Database Applica- tion] Data Mining © 2023 IEEE.
An Enhancement in the Efficiency of Disease Prediction Using Feature Extraction and Feature Selection
Dr Saleti Sumalatha, Mr Ramdas Kapila
Source Title: Contemporary Applications of Data Fusion for Advanced Healthcare Informatics, DOI Link
View abstract ⏷
Cardiovascular diseases constitute one of the most dangerous and fatal illnesses. According to statistics, in 2019,17.9 million deaths are reportedfrom cardiovascular diseases. As a result, it is essential to detect the sickness early on to minimize the death rate. To handle data efficiently and precisely forecast the symptoms of illness, data mining and machine learning approaches may be applied. This study intends to employ seven supervised machine learning (ML) techniques to anticipate heart disease. The adoption of ML algorithms is the study's main objective and to investigate how feature extraction (FE) and feature selection (FS) methods might increase the effectiveness of ML models. The experimental results indicate that models withfeature selection and extraction techniques outperformed the model with the entire features from the dataset. As a case study, the authors considered three additional datasets, namely Parkinson's, diabetes, and lung cancer, in addition to the Cleveland Heart Disease dataset. However, the main focus of this study is on predicting heart disease.
Optimizing fetal health prediction: Ensemble modeling with fusion of feature selection and extraction techniques for cardiotocography data
Dr Saleti Sumalatha, Mr Ramdas Kapila
Source Title: Computational Biology and Chemistry, Quartile: Q2, DOI Link
View abstract ⏷
Cardiotocography (CTG) captured the fetal heart rate and the timing of uterine contractions. Throughout pregnancy, CTG intelligent categorization is crucial for monitoring fetal health and preserving proper fetal growth and development. Since CTG provides information on the fetal heartbeat and uterus contractions, which helps determine if the fetus is pathologic or not, obstetricians frequently use it to evaluate a childs physical health during pregnancy. In the past, obstetricians have artificially analyzed CTG data, which is time-consuming and inaccurate. So, developing a fetal health categorization model is crucial as it may help to speed up the diagnosis and treatment and conserve medical resources. The CTG dataset is used in this study. To diagnose the illness, 7 machine learning models are employed, as well as ensemble strategies including voting and stacking classifiers. In order to choose and extract the most significant and critical attributes from the dataset, Feature Selection (FS) techniques like ANOVA and Chi-square, as well as Feature Extraction (FE) strategies like Principal Component Analysis (PCA) and Independent Component Analysis (ICA), are being used. We used the Synthetic Minority Oversampling Technique (SMOTE) approach to balance the dataset because it is unbalanced. In order to forecast the illness, the top 5 models are selected, and these 5 models are used in ensemble methods such as voting and stacking classifiers. The utilization of Stacking Classifiers (SC), which involve Adaboost and Random Forest (RF) as meta-classifiers for disease detection. The performance of the proposed SC with meta-classifier as RF model, which incorporates Chi-square with PCA, outperformed all other state-of-the-art models, achieving scores of 98.79%,98.88%,98.69%,96.32%, and 98.77% for accuracy, precision, recall, specificity, and f1-score respectively.
An efficient ensemble-based Machine Learning for breast cancer detection
Dr Saleti Sumalatha, Mr Ramdas Kapila, Ramdas Kapila
Source Title: Biomedical Signal Processing and Control, Quartile: Q1, DOI Link
View abstract ⏷
Breast cancer is a very severe type of cancer that often develops in breast cells. Attempting to develop an effective predictive model for breast cancer prognosis prediction is urgently needed despite substantial advancements in the management of symptomatic breast cancer over the past ten years. The precise prediction will offer numerous advantages, including the ability to diagnose cancer at an early stage and protect patients from needless medical care and related costs. In the medical field, recall is just as important as model accuracy. Even more crucially in the medical area, a model is not very good if its accuracy is high but its recall is low. To boost accuracy while still assigning equal weight to recall, we proposed a model that ensembles Feature Selection (FS), Feature Extraction (FE), and 5 Machine Learning (ML) models. There are three steps in our proposed model. The Correlation Coefficient (CC) and Anova (Anv) feature selection methodologies to choose the features in the first stage. Applying Uniform Manifold Approximation and Projection (UMAP), t-distributed Stochastic Neighbour Embedding (t-SNE), and Principal Component Analysis (PCA) to extract the features in the second stage without compromising the crucial information. With 5 ML models and ensemble models such as Voting Classifier (VC) and Stacking Classifier (SC) after selecting and extracting features from the dataset to predict the disease will be the last stage. The results show that the proposed model CC-Anv with PCA using a SC outperformed all the existing methodologies with 100% accuracy, precision, recall, and f1-score.
A Comparative Analysis of the Evolution of DNA Sequencing Techniques along with the Accuracy Prediction of a Sample DNA Sequence Dataset using Machine Learning
Dr Saleti Sumalatha, Khizar Baig Mohammed., Sai Venkat Boyapati., Manasa Datta Kandimalla., Madhu Babu Kavati
Source Title: 2023 2nd International Conference on Paradigm Shifts in Communications Embedded Systems, Machine Learning and Signal Processing (PCEMS), DOI Link
View abstract ⏷
DNA is widely considered the blueprint of life. The instructions required for all life forms, to evolve, breed, and thrive are found in DNA. Deoxyribonucleic acid, more commonly known as DNA, is among the most essential chemicals in living cells. A biological macro-molecule is DNA, also known as deoxyri-bonucleic acid. Life's blueprint is encoded by it. Sequencing of DNA has exponentially progressed due to the immense increase in data production in today's world. By means of this paper, we intend to evaluate the evolution of DNA Sequencing methods and perform a comparative analysis of modern-day DNA sequencing techniques to the ones of the past. We also illuminate the potential of machine learning in this domain by taking an exploratory and predicting the DNA Sequence using a Multinomial Naive Bayes classifier.
Exploring Patterns and Correlations Between Cryptocurrencies and Forecasting Crypto Prices Using Influential Tweets
Dr Saleti Sumalatha, Mohit Kumar., Gurram Sahithi Priya., Praneeth Gadipudi., Ishita Agarwal
Source Title: Communications in Computer and Information Science, Quartile: Q3, DOI Link
View abstract ⏷
The Crypto market, as we know, is a market full of various kinds of investors and influencers. We all know the pizza incident in 2010 where a guy purchased two pizzas at 10000 BTC, which ranges nearly around 80 million in current times. That describes how much the market has progressed in these 1012 years. You can see drastic changes in the price of several coins in the past few years, which brings in many new investors to invest their money in this market. Crypto Market has highly volatile currencies. Bitcoin was around 5K INR in 2013, and by year 2021, it reached 48 Lakhs INR, which shows how volatile the market is. The dataset provides many fascinating and valuable insights that help us gather practical knowledge. As data scientists, we are very keen to understand such a market whose data is unstable and keeps changing frequently and making out new patterns with time. This introduction of new patterns with time makes this problem an interesting one and keeps on motivating us to find some valuable information. So, through this manuscript, we tried to analyze two specific crypto coins for a particular period, including more than 2900 records. We found several interesting patterns in the dataset and explored the historical return using several statistical models. We plotted the opening and closing prices of the particular coin by using NumPy, SciPy, and Matplotlib. We also tried to make predictions of the cost of the specific currency and then plot the predicted price line with the actual price line and understand the difference in the prediction model with the fundamental price mode. To do so, we used the Simple Exponential Smoothing (SES) model and performed sentiment analysis based on influencing tweets on Twitter. That makes our prediction more accurate and more reliable than existing techniques. Lastly, we used a linear regression model to establish the relationship between the returns of Ripple and Bitcoin.
Incremental mining of high utility sequential patterns using MapReduce paradigm
Dr Saleti Sumalatha
Source Title: Cluster Computing, Quartile: Q1, DOI Link
View abstract ⏷
High utility sequential pattern (HUSP) mining considers the nonbinary frequency values of items purchased in a transaction and the utility of each item. Incremental updates are very common in many real-world applications. Mining the high utility sequences by rerunning the algorithm every time when the data grows is not a simple task. Moreover, the centralized algorithms for mining HUSPs incrementally cannot handle big data. Hence, an incremental algorithm for high utility sequential pattern mining using MapReduce para-digm (MR-INCHUSP) has been introduced in this paper. The proposed algorithm includes the backward mining strategy that profoundly handles the knowledge acquired from the past mining results. Further, elicited from the co-occurrence relation between the items, novel sequence extension rules have been introduced to increase the speed of the mining process. The experimental results exhibit the performance of MR-INCHUSP on several real and synthetic datasets.
Ontology Based Food Recommendation
Dr T Jaya Lakshmi, Dr Saleti Sumalatha, Rohit Chivukula., Kandula Lohith Ranganadha Reddy
Source Title: Smart Innovation, Systems and Technologies, Quartile: Q4, DOI Link
View abstract ⏷
Eating right is the most crucial aspect of healthy living. A nutritious, balanced diet keeps our bodies support fight off diseases. Many lifestyle related diseases such as diabetes and thyroid can often be avoided by active living and better nutrition. Having diet related knowledge is essential for all. With this motivation, an ontology related to food domain is discussed and developed in this work. The aim of this work is to create on ontology model in the food domain to help people in getting right recommendation about the food, based on their health conditions if any.
Constraint Pushing Multi-threshold Framework for High Utility Time Interval Sequential Pattern Mining
Dr Saleti Sumalatha, N Naga Sahithya., K Rasagna., K Hemalatha., B Sai Charan., P V Karthik Upendra
Source Title: Communications in Computer and Information Science, Quartile: Q3, DOI Link
View abstract ⏷
This paper aims to detect high utility sequential patterns including time intervals and multiple utility thresholds. There are many algorithms that mine sequential patterns considering utility factor, these can find the order between the items purchased but they exclude the time interval among items. Further, they consider only the same utility threshold for each item present in the dataset, which is not convincing to assign equal importance for all the items. Time interval of items plays a vital role to forecast the most valuable real-world situations like retail sector, market basket data analysis etc. Recently, UIPrefixSpan algorithm has been introduced to mine the sequential patterns including utility and time intervals. Nevertheless, it considers only a single minimum utility threshold assuming the same unit profit for each item. Hence, to solve the aforementioned issues, in the current work, we proposed UIPrefixSpan-MMU algorithm by utilizing a pattern growth approach and four time constraints. The experiments done on real datasets prove that UIPrefixSpan-MMU is more efficient and linearly scalable for generating the time interval sequences with high utility.
Mining Spatio-Temporal Sequential Patterns Using MapReduce Approach
Dr Saleti Sumalatha, P Radha Krishna., D Jaswanth Reddy
Source Title: Communications in Computer and Information Science, Quartile: Q3, DOI Link
View abstract ⏷
Spatio-temporal sequential pattern mining (STSPM) plays an important role in many applications such as mobile health, criminology, social media, solar events, transportation, etc. Most of the current studies assume the data is located in a centralized database on which a single machine performs mining. Thus, the existing centralized algorithms are not suitable for the big data environment, where data cannot be handled by a single machine. In this paper, our main aim is to find out the Spatio-temporal sequential patterns from the event data set using a distributed framework suitable for mining big data. We proposed two distributed algorithms, namely, MR-STBFM (MapReduce based spatio-temporal breadth first miner), and MR-SPTreeSTBFM (MapReduce based sequential pattern tree spatio-temporal breadth first miner). These are the distributed algorithms for mining spatio-temporal sequential patterns using Hadoop MapReduce framework. A spatio-temporal tree structure is used in MR-SPTreeSTBFM for reducing the candidate generation cost. This is an extension to the proposed MR-STBFM algorithm. The tree structure significantly improves the performance of the proposed approach. Also, the top-most significant pattern approach has been proposed to mine the top-most significant sequential patterns. Experiments are conducted to evaluate the performance of the proposed algorithms on the Boston crime dataset.
Mining High Utility Time Interval Sequences Using MapReduce Approach: Multiple Utility Framework
Dr T Jaya Lakshmi, Dr Saleti Sumalatha, Mohd Wazih Ahmad
Source Title: IEEE Access, Quartile: Q1, DOI Link
View abstract ⏷
Mining high utility sequential patterns is observed to be a significant research in data mining. Several methods mine the sequential patterns while taking utility values into consideration. The patterns of this type can determine the order in which items were purchased, but not the time interval between them. The time interval among items is important for predicting the most useful real-world circumstances, including retail market basket data analysis, stock market fluctuations, DNA sequence analysis, and so on. There are a very few algorithms for mining sequential patterns those consider both the utility and time interval. However, they assume the same threshold for each item, maintaining the same unit profit. Moreover, with the rapid growth in data, the traditional algorithms cannot handle the big data and are not scalable. To handle this problem, we propose a distributed three phase MapReduce framework that considers multiple utilities and suitable for handling big data. The time constraints are pushed into the algorithm instead of pre-defined intervals. Also, the proposed upper bound minimizes the number of candidate patterns during the mining process. The approach has been tested and the experimental results show its efficiency in terms of run time, memory utilization, and scalability.
A Comparison of Various Class Balancing and Dimensionality Reduction Techniques on Customer Churn Prediction
Dr Saleti Sumalatha, Sri Phani Bhushan Mada., Nandini Thimmireddygari., Rahul Tata., Snehatejasree Reddy Thondapu
Source Title: 2022 IEEE 7th International Conference on Recent Advances and Innovations in Engineering, DOI Link
View abstract ⏷
With the advancement of technology, companies are able to foresee the customers who are going to leave their organization much before. This problem of customer churn prediction is handled in the current work. In the real-world, data is not balanced, having more observations with respect to few classes and less observations in case of other classes. But, giving equal importance to each class is really significant to build an efficient prediction model. Moreover, real-world data contains many attributes meaning that the dimensionality is high. In the current paper, we discussed three data balancing techniques and two methods of dimensionality reduction i.e. feature selection and feature extraction. Further, selecting the best machine learning model for churn prediction is an important issue. This has been dealt in the current paper. Also, we aim to improve the efficiency of customer churn prediction by evaluating various class balancing and dimensionality reduction techniques. Moreover, we evaluated the performance of the models using AUC curves and K-fold cross-validation techniques.
Distributed Mining of High Utility Time Interval Sequential Patterns with Multiple Minimum Utility Thresholds
Dr T Jaya Lakshmi, Dr Saleti Sumalatha, Thirumalaisamy Ragunathan
Source Title: Lecture Notes in Computer Science, Quartile: Q3, DOI Link
View abstract ⏷
The problem of mining high utility time interval sequential patterns with multiple utility thresholds in a distributed environment is considered. Mining high utility sequential patterns (HUSP) is an emerging issue and the existing HUSP algorithms can mine the order of items and they do not consider the time interval between the successive items. In real-world applications, time interval patterns provide more useful information than the conventional HUSPs. Recently, we proposed distributed high utility time interval sequential pattern mining (DHUTISP) algorithm using MapReduce in support of the BigData environment. The algorithm has been designed considering a single minimum utility threshold. It is not convincing to use the same utility threshold for all the items in the sequence, which means that all the items are given the same importance. Hence, in this paper, a new distributed framework is proposed to efficiently mine high utility time interval sequential patterns with multiple minimum utility thresholds (DHUTISP-MMU) using the MapReduce approach. The experimental results show that the proposed approach can efficiently mine HUTISPs with multiple minimum utility thresholds.
Distributed Mining of High Utility Sequential Patterns with Negative Item Values
Dr Saleti Sumalatha, Dr Manojkumar V, Akhileshwar Reddy
Source Title: International Journal of Advanced Computer Science and Applications, Quartile: Q3, DOI Link
View abstract ⏷
The sequential pattern mining was widely used to solve various business problems, including frequent user click pattern, customer analysis of buying product, gene microarray data analysis, etc. Many studies were going on these pattern mining to extract insightful data. All the studies were mostly concentrated on high utility sequential pattern mining (HUSP) with positive values without a distributed approach. All the existing solutions are centralized which incurs greater computation and communication costs. In this paper, we introduce a novel algorithm for mining HUSPs including negative item values in support of a distributed approach. We use the Hadoop map reduce algorithms for processing the data in parallel. Various pruning techniques have been proposed to minimize the search space in a distributed environment, thus reducing the expense of processing. To our understanding, no algorithm was proposed to mine High Utility Sequential Patterns with negative item values in a distributed environment. So, we design a novel algorithm called DHUSP-N (Distributed High Utility Sequential Pattern mining with Negative values). DHUSP-N can mine high utility sequential patterns considering the negative item utilities from Bigdata.

Patents

A system and method for detection and mitigation of cyber threats in social networking platforms
Dr M Krishna Siva Prasad, Dr Elakkiya E, Dr Saleti Sumalatha
Patent Application No: 202441036235, Date Filed: 07/05/2024, Date Published: 17/05/2024, Status: Published
A system and a method for an essay grading system
Dr Saleti Sumalatha
Patent Application No: 202241043045, Date Filed: 27/07/2022, Date Published: 19/08/2022, Status: Granted
System and Method for Mining of Constraint Based High Utility Time Interval Sequential Patterns
Dr Saleti Sumalatha
Patent Application No: 202241044001, Date Filed: 01/08/2022, Date Published: 19/08/2022, Status: Published
A system and a method for privacy-preserving disease prediction using a federated learning technique
Dr Saleti Sumalatha
Patent Application No: 202341076138, Date Filed: 08/11/2023, Date Published: 15/12/2023, Status: Published
Systema and method for predicting customer churn using random leaf model
Dr Saleti Sumalatha
Patent Application No: 202441036236, Date Filed: 07/05/2024, Date Published: 17/05/2024, Status: Published
An error detection system for manufacturing process and a method thereof
Dr Saleti Sumalatha
Patent Application No: 202441069636, Date Filed: 14/09/2024, Date Published: 15/11/2024, Status: Published
A method and system for disease prediction using machine learning models during medical diagnoses of patients
Dr Saleti Sumalatha
Patent Application No: 202441032199, Date Filed: 23/04/2024, Date Published: 26/04/2024, Status: Published

Projects

Scholars

Post- Doctoral Scholars

Dr Mohamad Mulham Belal

Doctoral Scholars

Mr Ramdas Kapila
Ms A Sai Sunanda

Interests

Artificial Intelligence
Data Science
Distributed Computing
Machine Learning

Thought Leaderships

There are no Thought Leaderships associated with this faculty.

Top Achievements

Education

2004

B.Tech

Narayana Engineering College, JNTU Hyderabad

India

2010

M.Tech

Annamacharya Institute of science and technology, JNTU Anantapur

India

2020

Ph.D.

National institute of Technology, Warangal

India

Experience

January 2019 to March 2020 - Associate Professor – Geethanjali Institute of Science and Technology, Nellore, Andhra Pradesh.
July 2015 to December 2018 – Research Scholar – National Institute of Technology, Warangal, Telangana.
May 2012 to May 2015 – Associate Professor – Geethanjali Institute of Science and Technology, Nellore, Andhra Pradesh.
October 2010 to April 2012 – Assistant Professor - Narayana Engineering College, Nellore, Andhra Pradesh.
July 2004 to August 2008 – Assistant Professor – Narayana Engineering College, Nellore, Andhra Pradesh.

Research Interests

To implement a learning management system and study the navigational patterns to enhance students learning.
To develop incremental mining algorithms.

Awards & Fellowships

2022 - Undergraduate students under my supervision received one gold medal and one silver medal (September 2022) for paper presentation in Research Day organized by SRM University, AP, India.
2022 - Undergraduate students under my supervision received a gold medal (January 2022) for paper presentation in Research Day organized by SRM University, AP, India.
2015 to 2018 - Ph.D. Fellowship - Ministry of Human Resource Development
2010 – Secured first rank in Master’s Degree.

Memberships

Life Member of ISTE

Publications

Federated learning-based disease prediction: A fusion approach with feature selection and extraction
Dr Saleti Sumalatha, Mr Ramdas Kapila
Source Title: Biomedical Signal Processing and Control, Quartile: Q1, DOI Link
View abstract ⏷
The ability to predict diseases is critical in healthcare for early intervention and better patient outcomes. Data security and privacy significantly classified medical data from several institutions is analyzed. Cooperative model training provided by Federated Learning (FL), preserves data privacy. In this study, we offer a fusion strategy for illness prediction, combining FL with Anova and Chi-Square Feature Selection (FS) and Linear Discriminate Analysis (LDA) Feature Extraction (FE) techniques. This research aims to use FS and FE techniques to improve prediction performance while using the beneficial aspects of FL. A comprehensive analysis of the distributed data is ensured by updating aggregate models with information from all participating institutions. Through collaboration, a robust disease prediction system excels in the limited possibilities of individual datasets. We assessed the fusion strategy on the Cleveland heart disease and diabetes datasets from the UCI repository. Comparing the fusion strategy to solo FL or conventional ML techniques, the prediction performance a unique fusion methodology for disease prediction. Our proposed models, Chi-Square with LDA and Anova with LDA leveraging FL, exhibited exceptional performance on the diabetes dataset, achieving identical accuracy, precision, recall, and f1-score of 92.3%, 94.36%, 94.36, and 94.36%, respectively. Similarly, on the Cleveland heart disease dataset, these models demonstrated significant performance, achieving accuracy, precision, recall, and f1-score of 88.52%, 87.87%, 90.62, and 89.23%, respectively. The results have the potential to revolutionize disease prediction, maintain privacy, advance healthcare, and outperform state-of-the-art models
Student Placement Chance Prediction Model using Machine Learning Techniques
Dr Saleti Sumalatha, Manoj Manike., Priyanshu Singh., Purna Sai Madala., Steve Abraham Varghese.,
Source Title: 2021 5th Conference on Information and Communication Technology, DOI Link
View abstract ⏷
Obtaining employment upon graduation from uni-versity is one of the highest, if not the highest, priorities for students and young adults. Developing a system that can help these individuals obtain placement advice, analyze labor market trends, and assist educational institutions in assessing growing fields and opportunities would serve immense value. With the emergence of heavily refined Data Mining techniques and Machine Learning boiler plates, a model based on predictive analysis can help estimate a variety of realistic and possible placement metrics, such as the types of companies a junior year student can be placed in, or the companies that are likely to look for the specific skill sets of a student. Various attributes such as academic results, technical skills, training experiences, and projects can help predict purposes. We devised the XGBoost Technique, a structured or tabular data-focused approach that has recently dominated applied machine learning and Kaggle tournaments. XGBoost is a high-speed and high-performance implementation of gradient boosted decision trees. We created a model and ran numerous EDAs to determine whether the student will be placed or not, as well as in which type of organization he will be placed [Day Sharing, Dream, Super Dream, Marquee].
Assessing Performance Across Various Machine Learning Algorithms with Integrated Feature Selection for Fetal Heart Classification
Dr Saleti Sumalatha, Laura Rizka Amanda., Mila Desi Anasanti., Thunakala Bala Kokil
Source Title: International journal of artificial intelligence research, DOI Link
View abstract ⏷
-
Optimizing Predictive Models for Parkinson’s Disease Diagnosis
Dr Saleti Sumalatha, Mr Ramdas Kapila
Source Title: Intelligent Technologies and Parkinsons Disease: Prediction and Diagnosis, DOI Link
View abstract ⏷
-
Enhancing Forecasting Accuracy with a Moving Average-Integrated Hybrid ARIMA-LSTM Model
Dr Saleti Sumalatha, Panchumarthi L Y., Kallam Y R., Parchuri L., Jitte S
Source Title: SN Computer Science, Quartile: Q1, DOI Link
View abstract ⏷
This research provides a time series forecasting model that is hybrid which combines Long Short-Term Memory (LSTM) and Autoregressive Integrated Moving Average (ARIMA) models with moving averages. For modelling stationary time series, LSTM models are utilized, while modelling non-stationary time series is done using ARIMA models. While LSTM models are more suited for capturing long-term dependencies, ARIMA models are superior in catching short-term relationships in time series data. The hybrid model combines the short-term dependency modeling of ARIMA utilising LSTMs long-term dependency modelling. This combination leads to greater accuracy predictions for time series data that are both stationary and non-stationary. Also, Triple Exponential Moving Average (TEMA), Weighted Moving Average (WMA), Simple Moving Average (SMA), and six other moving averages were examined to determine how well the hybrid model performed. Kaufman Adaptive Moving Average (KAMA), MIDPOINT, MIDPRICE individually helps to know which methods give much precision. The study compares the hybrid models predicting performance to that of standalone ARIMA and LSTM models, in addition to other prominent forecasting approaches like linear regression and random forest. The findings indicate that the hybrid model surpasses the individual models and other benchmark methods, achieving increased precision in terms of Mean absolute percentage error (MAPE) and Root mean squared error (RMSE). The research also investigates the impact of different hyperparameters and model configurations on performance forecasts, giving information about the ideal settings for the hybrid model. Overall, the proposed ARIMA-LSTM hybrid model with moving averages is a promising approach for accurate and reliable stock price forecasting, which has practical implications for financial decision-making and risk management. © The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. 2024.
Enhancing the Accuracy of Manufacturing Process Error Detection Through SMOTE-Based Oversampling Using Machine Learning and Deep Learning Techniques
Dr Saleti Sumalatha, Boyapati S V., Rakshitha G B S., Reddy M R.,
Source Title: International Conference on Integrated Circuits, Communication, and Computing Systems, ICIC3S 2024 - Proceedings, DOI Link
View abstract ⏷
A production competency study leads to a rise in the manufacturing sectors' strategic emphasis. Developing semiconductor materials is a highly complex approach that necessitates numerous evaluations. It is impossible to emphasize the significance of the quality of the product. We put up a number of methods for automatically creating a prognostic model that is effective at identifying equipment flaws throughout the semiconductor materials' wafer fabrication process. The SECOM dataset is representative of semiconductor production procedures that go through numerous tests performed. There are imbalanced statistics in the dataset, so our proposed methodology incorporates SMOTE (Synthetic Minority Over-sampling Technique) functionality that is introduced to mitigate the imbalance of the training dataset by leveling off any unbalanced attributes. Detecting faults in the manufacturing process improves semiconductor quality and testing efficiency, and is used to validate both approaches to Machine Learning and Deep Learning algorithms. This is accomplished by collecting performance metrics during the development process. Another aspect of our effort to cut down on the training time for testing is highlighted in our research report. © 2024 IEEE.
Comparative Analysis of Optimization Algorithms for Feature Selection in Heart Disease Classification
Dr Saleti Sumalatha, Mr Ramdas Kapila
Source Title: Lecture notes in networks and systems, DOI Link
View abstract ⏷
In order to classify the Cleveland Heart Disease dataset, this study evaluates the performance of three optimization methods, namely Fruit Fly Optimization (FFO), Particle Swarm Optimization (PSO), and Grey Wolf Optimizer (GWO). The top 10 features are identified using FFO, with remarkable results for accuracy (88.52%), precision (87.88%), recall (90.63%), and f1-score (89.23%). After using PSO, the accuracy, precision, recall, specificity, and f1-score are all 85.25%, 87.10%, 84.38%, and 85.71% respectively. Finally, GWO is used, which results in precision, accuracy, recall, and f1-score values of 93.33%, 90.16%, 90.11%, 87.5%, and 90.32% respectively, highlighting its consistent superior performance. FFO shows competitive outcomes with notable accuracy and recall through a comparative examination. PSO displays comparable precision and recall while displaying a somewhat poorer accuracy. In contrast, GWO performs better than both FFO and PSO, displaying great accuracy and precision along with remarkable recall and specificity. These results provide important information on the effectiveness of feature selection methods utilized in optimization algorithms for heart disease classification. The study also highlights the need for more investigation into the potential of these optimization algorithms in many fields, broadening their use beyond the classification of disease. This kind of work might improve the progress of the field of feature selection and aid in the creation of better classification models
A Survey on Occupancy-Based Pattern Mining
Dr Saleti Sumalatha, Inaganti Bhavana.,
Source Title: Lecture notes in networks and systems, DOI Link
View abstract ⏷
Occupancy-based pattern mining has emerged as a significant research topic in recent times. This paper presents a comprehensive survey on occupancy, which serves as a measure to augment the significance of patterns. The survey covers various types of patterns, including frequent itemsets, high utility itemsets, frequent sequences, and high utility sequences, all in the context of occupancy. Additionally, the paper delves into techniques aimed at reducing the search space in the aforementioned pattern mining problems. These techniques are crucial for improving the efficiency and scalability of the mining process, especially when dealing with large-scale datasets. Furthermore, the paper discusses potential research extensions for occupancy-based pattern mining. These extensions could explore new applications, explore novel algorithms, or further enhance the effectiveness of occupancy as a measure for pattern evaluation. In general, this survey provides an important resource for researchers interested in understanding and advancing occupancy-based pattern mining techniques
Insights into Gun-Related Deaths: A Comprehensive Machine Learning Analysis
Dr Saleti Sumalatha, Lovelylovely Yeswanth Panchumarthi., Lavanya Parchuri
Source Title: 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT)[CONFERENCE=>ISSN(-)=>ISBN(-)], DOI Link
View abstract ⏷
This work employs both supervised and unsupervised machine learning techniques to examine firearm-related fatalities in the US and identify trends, patterns, and risk factors within the data. During the supervised learning phase, techniques such as logistic regression, decision trees, random forests, and neural networks were used to predict the kind of death (suicide, homicide, accidental, or unknown) based on demographic data like sex, age, race, place, and education. Findings show that the neural network and random forest models exhibit promising precision and recall values across several classes, and that they obtained the highest accuracy, reaching 79.88% and 83.59%, respectively. Using clustering techniques including Agglomerative clustering, K-means, and Gaussian mixture models, gun-related fatalities were categorized based on demographic and temporal data during the unsupervised learning stage. The analysis revealed distinct clusters of deaths, providing insights into the varying patterns and trends over time and across demographic groups. The K-means algorithm, with a silhouette score of 0.42, demonstrated meaningful separation among clusters. The research contributes to understanding the complex dynamics of gun-related deaths, shedding light on both individual risk factors and broader trends. However, further analysis could explore additional dimensions of the dataset or delve deeper into the interpretation of clustering results. The study also highlights how crucial it is to take into consideration the moral consequences and constraints of machine learning applications in complex fields like public health.
Enhancing Customer Churn Prediction: Advanced Models and Resampling Techniques in Dynamic Business Environments
Dr Saleti Sumalatha, Yaswanth Chowdary Thotakura., Dinesh Manikanta Yarramsetty., Kalyan Kumar Doppalapudi., Sai Shasank Alaparthi
Source Title: 2024 International Conference on Intelligent Computing and Emerging Communication Technologies (ICEC), DOI Link
View abstract ⏷
Customer churn analysis is critical for businesses looking to hold onto market share in todays dynamic business environment. The development of e-Finance presents additional difficulties for the traditional banking sector as the digital marketplace grows. Banks face several challenges, including fintech competition, dwindling client loyalty, and digital transformation. Bank managers can identify problems, identify potential churn customers early on, and develop effective retention strategies based on client traits and preferences by analyzing probable causes of bank customer turnover from multiple perspectives and building models for predicting churn. Not only banks but also large corporate sectors like telecommunication, and over-the-top (OTT) platforms do face customer churn. This study proposes the Random Leaf Model (RLM) and also explores the Logit Leaf Model (LLM), and Neural Network Ensemble Model, three sophisticated predictive modeling methodologies. Proactive strategies are necessary in today's marketplaces due to their competitive nature. The primary problem with current automatic churn prediction algorithms is the substantial gap between majority and minority class proportions in the datasets, which might lead to model bias in favor of the dominant class. The shortcomings of conventional churn analysis techniques underscore the necessity of implementing advanced cutting-edge algorithms to achieve precise forecasts
Leveraging ResNet for Efficient ECG Heartbeat Classification
Dr Saleti Sumalatha, Lovely Yeswanth Panchumarthi., Sriya Padmanabhuni
Source Title: 2024 IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems (SPICES), DOI Link
View abstract ⏷
This paper provides a novel approach that uses a modified version of the ResNet architecture to classify heartbeats on an electrocardiogram (ECG). Padding, convolution, max pooling, convolutional blocks, average pooling, and fully linked layers are the six processes in the approach. The MIT-BIH Arrhythmia Database is used to test the approach on five different types of heartbeats: unclassifiable, supraventricular premature, premature ventricular contraction, fusion of ventricular and normal, and normal. The outcomes demonstrate that the suggested approach outperforms other current techniques like LSTM, CNN, and EfficientNet, achieving an accuracy of 98.6%. The performance, restrictions, and future directions of the model are also thoroughly examined in this work. The automated ECG heartbeat categorization using deep learning techniques is one way that the article advances the field of cardiac diagnosis
Optimizing Recommendation Systems: Analyzing the Impact of Imputation Techniques on Individual and Group Recommendation Systems
Dr Saleti Sumalatha, Sri Phani Bhushan Mada., Rahul Tata., Sneha Teja Sree Reddy Thondapu
Source Title: 2024 IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems (SPICES), DOI Link
View abstract ⏷
In todays world, Recommendation Systems play a significant role in guiding and simplifying the decision-making process for individuals and groups. However, the presence of missing data in user-item interaction matrices poses a challenge to accurately identify user preferences and provide relevant suggestions. This is particularly true for group recommendation systems that cater to multiple users. To address this challenge, we have applied four imputation techniques to individual and group recommendation models, including User-based Collaborative filtering, Matrix factorization using Singular Value Decomposition, and deep learning-based models like Autoencoders. We evaluated the effectiveness of these techniques using root mean squared error and mean absolute error metrics and observed a significant impact on the quality of recommendations. Additionally, we implemented aggregation strategies like Borda count, Additive Utilitarian, Multiplicative Utilitarian, Least Misery, and Most Pleasure for Group Recommendations. We evaluated the performance of these strategies using satisfaction score and disagreement score. Overall, our findings suggest that imputation techniques can significantly improve the quality of recommendations in both individual and group recommendation systems
A comparison of various machine learning algorithms and execution of flask deployment on essay grading
Dr Saleti Sumalatha, Udhika Meghana Kotha., Haveela Gaddam., Deepthi Reddy Siddenki
Source Title: International Journal of Electrical and Computer Engineering, Quartile: Q3, DOI Link
View abstract ⏷
Students performance can be assessed based on grading the answers written by the students during their examination. Currently, students are assessed manually by the teachers. This is a cumbersome task due to an increase in the student-teacher ratio. Moreover, due to coronavirus disease (COVID-19) pandemic, most of the educational institutions have adopted online teaching and assessment. To measure the learning ability of a student, we need to assess them. The current grading system works well for multiple choice questions, but there is no grading system for evaluating the essays. In this paper, we studied different machine learning and natural language processing techniques for automated essay scoring/grading (AES/G). Data imbalance is an issue which creates the problem in predicting the essay score due to uneven distribution of essay scores in the training data. We handled this issue using random over sampling technique which generates even distribution of essay scores. Also, we built a web application using flask and deployed the machine learning models. Subsequently, all the models have been evaluated using accuracy, precision, recall, and F1-score. It is found that random forest algorithm outperformed the other algorithms with an accuracy of 97.67%, precision of 97.62%, recall of 97.67%, and F1-score of 97.58%.
Analyzing the Health Data: An Application of High Utility Itemset Mining
Dr Saleti Sumalatha, Lakshmi Sai Bhargavi G., Shanmukh R., Lokesh T., Sobin C C., Padmavathi K., Tottempudi S S
Source Title: 2023 International Conference on Advances in Computation, Communication and Information Technology, ICAICCIT 2023, DOI Link
View abstract ⏷
A data science endeavour called "high utility pattern mining"entails finding important patterns based on different factors like profit, frequency, and weight. High utility itemsets are among the various patterns that have undergone thorough study. These itemsets must exceed a minimum threshold specified by the user. This is particularly useful in practical applications like retail marketing and web services, where items have diverse characteristics. High-utility itemset mining facilitates decision- making by uncovering patterns that have a significant impact. Unlike frequent itemset mining, which identifies commonly oc- curring itemsets, high-utility itemsets often include rare items in real-world applications. Considering the application to the medical field, data mining has been employed in various ways. In this context, the primary method involves analyzing a health dataset that spans from 2014 to 2017 in the United States. The dataset includes categories such as diseases, states, and deaths. By examining these categories and mortality rates, we can derive high-utility itemsets that reveal the causes of the most deaths. In conclusion, high-utility pattern mining is a data science activity that concentrates on spotting significant patterns based on objective standards. It has proven valuable in various fields, including the medical domain, where analyzing datasets can uncover high-utility itemsets related to mortality rates and causes of death.Categories and Subject Descriptors - [Health Database Applica- tion] Data Mining © 2023 IEEE.
An Enhancement in the Efficiency of Disease Prediction Using Feature Extraction and Feature Selection
Dr Saleti Sumalatha, Mr Ramdas Kapila
Source Title: Contemporary Applications of Data Fusion for Advanced Healthcare Informatics, DOI Link
View abstract ⏷
Cardiovascular diseases constitute one of the most dangerous and fatal illnesses. According to statistics, in 2019,17.9 million deaths are reportedfrom cardiovascular diseases. As a result, it is essential to detect the sickness early on to minimize the death rate. To handle data efficiently and precisely forecast the symptoms of illness, data mining and machine learning approaches may be applied. This study intends to employ seven supervised machine learning (ML) techniques to anticipate heart disease. The adoption of ML algorithms is the study's main objective and to investigate how feature extraction (FE) and feature selection (FS) methods might increase the effectiveness of ML models. The experimental results indicate that models withfeature selection and extraction techniques outperformed the model with the entire features from the dataset. As a case study, the authors considered three additional datasets, namely Parkinson's, diabetes, and lung cancer, in addition to the Cleveland Heart Disease dataset. However, the main focus of this study is on predicting heart disease.
Optimizing fetal health prediction: Ensemble modeling with fusion of feature selection and extraction techniques for cardiotocography data
Dr Saleti Sumalatha, Mr Ramdas Kapila
Source Title: Computational Biology and Chemistry, Quartile: Q2, DOI Link
View abstract ⏷
Cardiotocography (CTG) captured the fetal heart rate and the timing of uterine contractions. Throughout pregnancy, CTG intelligent categorization is crucial for monitoring fetal health and preserving proper fetal growth and development. Since CTG provides information on the fetal heartbeat and uterus contractions, which helps determine if the fetus is pathologic or not, obstetricians frequently use it to evaluate a childs physical health during pregnancy. In the past, obstetricians have artificially analyzed CTG data, which is time-consuming and inaccurate. So, developing a fetal health categorization model is crucial as it may help to speed up the diagnosis and treatment and conserve medical resources. The CTG dataset is used in this study. To diagnose the illness, 7 machine learning models are employed, as well as ensemble strategies including voting and stacking classifiers. In order to choose and extract the most significant and critical attributes from the dataset, Feature Selection (FS) techniques like ANOVA and Chi-square, as well as Feature Extraction (FE) strategies like Principal Component Analysis (PCA) and Independent Component Analysis (ICA), are being used. We used the Synthetic Minority Oversampling Technique (SMOTE) approach to balance the dataset because it is unbalanced. In order to forecast the illness, the top 5 models are selected, and these 5 models are used in ensemble methods such as voting and stacking classifiers. The utilization of Stacking Classifiers (SC), which involve Adaboost and Random Forest (RF) as meta-classifiers for disease detection. The performance of the proposed SC with meta-classifier as RF model, which incorporates Chi-square with PCA, outperformed all other state-of-the-art models, achieving scores of 98.79%,98.88%,98.69%,96.32%, and 98.77% for accuracy, precision, recall, specificity, and f1-score respectively.
An efficient ensemble-based Machine Learning for breast cancer detection
Dr Saleti Sumalatha, Mr Ramdas Kapila, Ramdas Kapila
Source Title: Biomedical Signal Processing and Control, Quartile: Q1, DOI Link
View abstract ⏷
Breast cancer is a very severe type of cancer that often develops in breast cells. Attempting to develop an effective predictive model for breast cancer prognosis prediction is urgently needed despite substantial advancements in the management of symptomatic breast cancer over the past ten years. The precise prediction will offer numerous advantages, including the ability to diagnose cancer at an early stage and protect patients from needless medical care and related costs. In the medical field, recall is just as important as model accuracy. Even more crucially in the medical area, a model is not very good if its accuracy is high but its recall is low. To boost accuracy while still assigning equal weight to recall, we proposed a model that ensembles Feature Selection (FS), Feature Extraction (FE), and 5 Machine Learning (ML) models. There are three steps in our proposed model. The Correlation Coefficient (CC) and Anova (Anv) feature selection methodologies to choose the features in the first stage. Applying Uniform Manifold Approximation and Projection (UMAP), t-distributed Stochastic Neighbour Embedding (t-SNE), and Principal Component Analysis (PCA) to extract the features in the second stage without compromising the crucial information. With 5 ML models and ensemble models such as Voting Classifier (VC) and Stacking Classifier (SC) after selecting and extracting features from the dataset to predict the disease will be the last stage. The results show that the proposed model CC-Anv with PCA using a SC outperformed all the existing methodologies with 100% accuracy, precision, recall, and f1-score.
A Comparative Analysis of the Evolution of DNA Sequencing Techniques along with the Accuracy Prediction of a Sample DNA Sequence Dataset using Machine Learning
Dr Saleti Sumalatha, Khizar Baig Mohammed., Sai Venkat Boyapati., Manasa Datta Kandimalla., Madhu Babu Kavati
Source Title: 2023 2nd International Conference on Paradigm Shifts in Communications Embedded Systems, Machine Learning and Signal Processing (PCEMS), DOI Link
View abstract ⏷
DNA is widely considered the blueprint of life. The instructions required for all life forms, to evolve, breed, and thrive are found in DNA. Deoxyribonucleic acid, more commonly known as DNA, is among the most essential chemicals in living cells. A biological macro-molecule is DNA, also known as deoxyri-bonucleic acid. Life's blueprint is encoded by it. Sequencing of DNA has exponentially progressed due to the immense increase in data production in today's world. By means of this paper, we intend to evaluate the evolution of DNA Sequencing methods and perform a comparative analysis of modern-day DNA sequencing techniques to the ones of the past. We also illuminate the potential of machine learning in this domain by taking an exploratory and predicting the DNA Sequence using a Multinomial Naive Bayes classifier.
Exploring Patterns and Correlations Between Cryptocurrencies and Forecasting Crypto Prices Using Influential Tweets
Dr Saleti Sumalatha, Mohit Kumar., Gurram Sahithi Priya., Praneeth Gadipudi., Ishita Agarwal
Source Title: Communications in Computer and Information Science, Quartile: Q3, DOI Link
View abstract ⏷
The Crypto market, as we know, is a market full of various kinds of investors and influencers. We all know the pizza incident in 2010 where a guy purchased two pizzas at 10000 BTC, which ranges nearly around 80 million in current times. That describes how much the market has progressed in these 1012 years. You can see drastic changes in the price of several coins in the past few years, which brings in many new investors to invest their money in this market. Crypto Market has highly volatile currencies. Bitcoin was around 5K INR in 2013, and by year 2021, it reached 48 Lakhs INR, which shows how volatile the market is. The dataset provides many fascinating and valuable insights that help us gather practical knowledge. As data scientists, we are very keen to understand such a market whose data is unstable and keeps changing frequently and making out new patterns with time. This introduction of new patterns with time makes this problem an interesting one and keeps on motivating us to find some valuable information. So, through this manuscript, we tried to analyze two specific crypto coins for a particular period, including more than 2900 records. We found several interesting patterns in the dataset and explored the historical return using several statistical models. We plotted the opening and closing prices of the particular coin by using NumPy, SciPy, and Matplotlib. We also tried to make predictions of the cost of the specific currency and then plot the predicted price line with the actual price line and understand the difference in the prediction model with the fundamental price mode. To do so, we used the Simple Exponential Smoothing (SES) model and performed sentiment analysis based on influencing tweets on Twitter. That makes our prediction more accurate and more reliable than existing techniques. Lastly, we used a linear regression model to establish the relationship between the returns of Ripple and Bitcoin.
Incremental mining of high utility sequential patterns using MapReduce paradigm
Dr Saleti Sumalatha
Source Title: Cluster Computing, Quartile: Q1, DOI Link
View abstract ⏷
High utility sequential pattern (HUSP) mining considers the nonbinary frequency values of items purchased in a transaction and the utility of each item. Incremental updates are very common in many real-world applications. Mining the high utility sequences by rerunning the algorithm every time when the data grows is not a simple task. Moreover, the centralized algorithms for mining HUSPs incrementally cannot handle big data. Hence, an incremental algorithm for high utility sequential pattern mining using MapReduce para-digm (MR-INCHUSP) has been introduced in this paper. The proposed algorithm includes the backward mining strategy that profoundly handles the knowledge acquired from the past mining results. Further, elicited from the co-occurrence relation between the items, novel sequence extension rules have been introduced to increase the speed of the mining process. The experimental results exhibit the performance of MR-INCHUSP on several real and synthetic datasets.
Ontology Based Food Recommendation
Dr T Jaya Lakshmi, Dr Saleti Sumalatha, Rohit Chivukula., Kandula Lohith Ranganadha Reddy
Source Title: Smart Innovation, Systems and Technologies, Quartile: Q4, DOI Link
View abstract ⏷
Eating right is the most crucial aspect of healthy living. A nutritious, balanced diet keeps our bodies support fight off diseases. Many lifestyle related diseases such as diabetes and thyroid can often be avoided by active living and better nutrition. Having diet related knowledge is essential for all. With this motivation, an ontology related to food domain is discussed and developed in this work. The aim of this work is to create on ontology model in the food domain to help people in getting right recommendation about the food, based on their health conditions if any.
Constraint Pushing Multi-threshold Framework for High Utility Time Interval Sequential Pattern Mining
Dr Saleti Sumalatha, N Naga Sahithya., K Rasagna., K Hemalatha., B Sai Charan., P V Karthik Upendra
Source Title: Communications in Computer and Information Science, Quartile: Q3, DOI Link
View abstract ⏷
This paper aims to detect high utility sequential patterns including time intervals and multiple utility thresholds. There are many algorithms that mine sequential patterns considering utility factor, these can find the order between the items purchased but they exclude the time interval among items. Further, they consider only the same utility threshold for each item present in the dataset, which is not convincing to assign equal importance for all the items. Time interval of items plays a vital role to forecast the most valuable real-world situations like retail sector, market basket data analysis etc. Recently, UIPrefixSpan algorithm has been introduced to mine the sequential patterns including utility and time intervals. Nevertheless, it considers only a single minimum utility threshold assuming the same unit profit for each item. Hence, to solve the aforementioned issues, in the current work, we proposed UIPrefixSpan-MMU algorithm by utilizing a pattern growth approach and four time constraints. The experiments done on real datasets prove that UIPrefixSpan-MMU is more efficient and linearly scalable for generating the time interval sequences with high utility.
Mining Spatio-Temporal Sequential Patterns Using MapReduce Approach
Dr Saleti Sumalatha, P Radha Krishna., D Jaswanth Reddy
Source Title: Communications in Computer and Information Science, Quartile: Q3, DOI Link
View abstract ⏷
Spatio-temporal sequential pattern mining (STSPM) plays an important role in many applications such as mobile health, criminology, social media, solar events, transportation, etc. Most of the current studies assume the data is located in a centralized database on which a single machine performs mining. Thus, the existing centralized algorithms are not suitable for the big data environment, where data cannot be handled by a single machine. In this paper, our main aim is to find out the Spatio-temporal sequential patterns from the event data set using a distributed framework suitable for mining big data. We proposed two distributed algorithms, namely, MR-STBFM (MapReduce based spatio-temporal breadth first miner), and MR-SPTreeSTBFM (MapReduce based sequential pattern tree spatio-temporal breadth first miner). These are the distributed algorithms for mining spatio-temporal sequential patterns using Hadoop MapReduce framework. A spatio-temporal tree structure is used in MR-SPTreeSTBFM for reducing the candidate generation cost. This is an extension to the proposed MR-STBFM algorithm. The tree structure significantly improves the performance of the proposed approach. Also, the top-most significant pattern approach has been proposed to mine the top-most significant sequential patterns. Experiments are conducted to evaluate the performance of the proposed algorithms on the Boston crime dataset.
Mining High Utility Time Interval Sequences Using MapReduce Approach: Multiple Utility Framework
Dr T Jaya Lakshmi, Dr Saleti Sumalatha, Mohd Wazih Ahmad
Source Title: IEEE Access, Quartile: Q1, DOI Link
View abstract ⏷
Mining high utility sequential patterns is observed to be a significant research in data mining. Several methods mine the sequential patterns while taking utility values into consideration. The patterns of this type can determine the order in which items were purchased, but not the time interval between them. The time interval among items is important for predicting the most useful real-world circumstances, including retail market basket data analysis, stock market fluctuations, DNA sequence analysis, and so on. There are a very few algorithms for mining sequential patterns those consider both the utility and time interval. However, they assume the same threshold for each item, maintaining the same unit profit. Moreover, with the rapid growth in data, the traditional algorithms cannot handle the big data and are not scalable. To handle this problem, we propose a distributed three phase MapReduce framework that considers multiple utilities and suitable for handling big data. The time constraints are pushed into the algorithm instead of pre-defined intervals. Also, the proposed upper bound minimizes the number of candidate patterns during the mining process. The approach has been tested and the experimental results show its efficiency in terms of run time, memory utilization, and scalability.
A Comparison of Various Class Balancing and Dimensionality Reduction Techniques on Customer Churn Prediction
Dr Saleti Sumalatha, Sri Phani Bhushan Mada., Nandini Thimmireddygari., Rahul Tata., Snehatejasree Reddy Thondapu
Source Title: 2022 IEEE 7th International Conference on Recent Advances and Innovations in Engineering, DOI Link
View abstract ⏷
With the advancement of technology, companies are able to foresee the customers who are going to leave their organization much before. This problem of customer churn prediction is handled in the current work. In the real-world, data is not balanced, having more observations with respect to few classes and less observations in case of other classes. But, giving equal importance to each class is really significant to build an efficient prediction model. Moreover, real-world data contains many attributes meaning that the dimensionality is high. In the current paper, we discussed three data balancing techniques and two methods of dimensionality reduction i.e. feature selection and feature extraction. Further, selecting the best machine learning model for churn prediction is an important issue. This has been dealt in the current paper. Also, we aim to improve the efficiency of customer churn prediction by evaluating various class balancing and dimensionality reduction techniques. Moreover, we evaluated the performance of the models using AUC curves and K-fold cross-validation techniques.
Distributed Mining of High Utility Time Interval Sequential Patterns with Multiple Minimum Utility Thresholds
Dr T Jaya Lakshmi, Dr Saleti Sumalatha, Thirumalaisamy Ragunathan
Source Title: Lecture Notes in Computer Science, Quartile: Q3, DOI Link
View abstract ⏷
The problem of mining high utility time interval sequential patterns with multiple utility thresholds in a distributed environment is considered. Mining high utility sequential patterns (HUSP) is an emerging issue and the existing HUSP algorithms can mine the order of items and they do not consider the time interval between the successive items. In real-world applications, time interval patterns provide more useful information than the conventional HUSPs. Recently, we proposed distributed high utility time interval sequential pattern mining (DHUTISP) algorithm using MapReduce in support of the BigData environment. The algorithm has been designed considering a single minimum utility threshold. It is not convincing to use the same utility threshold for all the items in the sequence, which means that all the items are given the same importance. Hence, in this paper, a new distributed framework is proposed to efficiently mine high utility time interval sequential patterns with multiple minimum utility thresholds (DHUTISP-MMU) using the MapReduce approach. The experimental results show that the proposed approach can efficiently mine HUTISPs with multiple minimum utility thresholds.
Distributed Mining of High Utility Sequential Patterns with Negative Item Values
Dr Saleti Sumalatha, Dr Manojkumar V, Akhileshwar Reddy
Source Title: International Journal of Advanced Computer Science and Applications, Quartile: Q3, DOI Link
View abstract ⏷
The sequential pattern mining was widely used to solve various business problems, including frequent user click pattern, customer analysis of buying product, gene microarray data analysis, etc. Many studies were going on these pattern mining to extract insightful data. All the studies were mostly concentrated on high utility sequential pattern mining (HUSP) with positive values without a distributed approach. All the existing solutions are centralized which incurs greater computation and communication costs. In this paper, we introduce a novel algorithm for mining HUSPs including negative item values in support of a distributed approach. We use the Hadoop map reduce algorithms for processing the data in parallel. Various pruning techniques have been proposed to minimize the search space in a distributed environment, thus reducing the expense of processing. To our understanding, no algorithm was proposed to mine High Utility Sequential Patterns with negative item values in a distributed environment. So, we design a novel algorithm called DHUSP-N (Distributed High Utility Sequential Pattern mining with Negative values). DHUSP-N can mine high utility sequential patterns considering the negative item utilities from Bigdata.

Contact Details

sumalatha.s@srmap.edu.in

Scholars

Doctoral Scholars

Mr Ramdas Kapila
Ms A Sai Sunanda

Interests

Artificial Intelligence
Data Science
Distributed Computing
Machine Learning

Education

2004

B.Tech

Narayana Engineering College, JNTU Hyderabad

India

2010

M.Tech

Annamacharya Institute of science and technology, JNTU Anantapur

India

2020

Ph.D.

National institute of Technology, Warangal

India

Experience

January 2019 to March 2020 - Associate Professor – Geethanjali Institute of Science and Technology, Nellore, Andhra Pradesh.
July 2015 to December 2018 – Research Scholar – National Institute of Technology, Warangal, Telangana.
May 2012 to May 2015 – Associate Professor – Geethanjali Institute of Science and Technology, Nellore, Andhra Pradesh.
October 2010 to April 2012 – Assistant Professor - Narayana Engineering College, Nellore, Andhra Pradesh.
July 2004 to August 2008 – Assistant Professor – Narayana Engineering College, Nellore, Andhra Pradesh.

Research Interests

To implement a learning management system and study the navigational patterns to enhance students learning.
To develop incremental mining algorithms.

Awards & Fellowships

2022 - Undergraduate students under my supervision received one gold medal and one silver medal (September 2022) for paper presentation in Research Day organized by SRM University, AP, India.
2022 - Undergraduate students under my supervision received a gold medal (January 2022) for paper presentation in Research Day organized by SRM University, AP, India.
2015 to 2018 - Ph.D. Fellowship - Ministry of Human Resource Development
2010 – Secured first rank in Master’s Degree.

Memberships

Life Member of ISTE

Publications

Federated learning-based disease prediction: A fusion approach with feature selection and extraction
Dr Saleti Sumalatha, Mr Ramdas Kapila
Source Title: Biomedical Signal Processing and Control, Quartile: Q1, DOI Link
View abstract ⏷
The ability to predict diseases is critical in healthcare for early intervention and better patient outcomes. Data security and privacy significantly classified medical data from several institutions is analyzed. Cooperative model training provided by Federated Learning (FL), preserves data privacy. In this study, we offer a fusion strategy for illness prediction, combining FL with Anova and Chi-Square Feature Selection (FS) and Linear Discriminate Analysis (LDA) Feature Extraction (FE) techniques. This research aims to use FS and FE techniques to improve prediction performance while using the beneficial aspects of FL. A comprehensive analysis of the distributed data is ensured by updating aggregate models with information from all participating institutions. Through collaboration, a robust disease prediction system excels in the limited possibilities of individual datasets. We assessed the fusion strategy on the Cleveland heart disease and diabetes datasets from the UCI repository. Comparing the fusion strategy to solo FL or conventional ML techniques, the prediction performance a unique fusion methodology for disease prediction. Our proposed models, Chi-Square with LDA and Anova with LDA leveraging FL, exhibited exceptional performance on the diabetes dataset, achieving identical accuracy, precision, recall, and f1-score of 92.3%, 94.36%, 94.36, and 94.36%, respectively. Similarly, on the Cleveland heart disease dataset, these models demonstrated significant performance, achieving accuracy, precision, recall, and f1-score of 88.52%, 87.87%, 90.62, and 89.23%, respectively. The results have the potential to revolutionize disease prediction, maintain privacy, advance healthcare, and outperform state-of-the-art models
Student Placement Chance Prediction Model using Machine Learning Techniques
Dr Saleti Sumalatha, Manoj Manike., Priyanshu Singh., Purna Sai Madala., Steve Abraham Varghese.,
Source Title: 2021 5th Conference on Information and Communication Technology, DOI Link
View abstract ⏷
Obtaining employment upon graduation from uni-versity is one of the highest, if not the highest, priorities for students and young adults. Developing a system that can help these individuals obtain placement advice, analyze labor market trends, and assist educational institutions in assessing growing fields and opportunities would serve immense value. With the emergence of heavily refined Data Mining techniques and Machine Learning boiler plates, a model based on predictive analysis can help estimate a variety of realistic and possible placement metrics, such as the types of companies a junior year student can be placed in, or the companies that are likely to look for the specific skill sets of a student. Various attributes such as academic results, technical skills, training experiences, and projects can help predict purposes. We devised the XGBoost Technique, a structured or tabular data-focused approach that has recently dominated applied machine learning and Kaggle tournaments. XGBoost is a high-speed and high-performance implementation of gradient boosted decision trees. We created a model and ran numerous EDAs to determine whether the student will be placed or not, as well as in which type of organization he will be placed [Day Sharing, Dream, Super Dream, Marquee].
Assessing Performance Across Various Machine Learning Algorithms with Integrated Feature Selection for Fetal Heart Classification
Dr Saleti Sumalatha, Laura Rizka Amanda., Mila Desi Anasanti., Thunakala Bala Kokil
Source Title: International journal of artificial intelligence research, DOI Link
View abstract ⏷
-
Optimizing Predictive Models for Parkinson’s Disease Diagnosis
Dr Saleti Sumalatha, Mr Ramdas Kapila
Source Title: Intelligent Technologies and Parkinsons Disease: Prediction and Diagnosis, DOI Link
View abstract ⏷
-
Enhancing Forecasting Accuracy with a Moving Average-Integrated Hybrid ARIMA-LSTM Model
Dr Saleti Sumalatha, Panchumarthi L Y., Kallam Y R., Parchuri L., Jitte S
Source Title: SN Computer Science, Quartile: Q1, DOI Link
View abstract ⏷
This research provides a time series forecasting model that is hybrid which combines Long Short-Term Memory (LSTM) and Autoregressive Integrated Moving Average (ARIMA) models with moving averages. For modelling stationary time series, LSTM models are utilized, while modelling non-stationary time series is done using ARIMA models. While LSTM models are more suited for capturing long-term dependencies, ARIMA models are superior in catching short-term relationships in time series data. The hybrid model combines the short-term dependency modeling of ARIMA utilising LSTMs long-term dependency modelling. This combination leads to greater accuracy predictions for time series data that are both stationary and non-stationary. Also, Triple Exponential Moving Average (TEMA), Weighted Moving Average (WMA), Simple Moving Average (SMA), and six other moving averages were examined to determine how well the hybrid model performed. Kaufman Adaptive Moving Average (KAMA), MIDPOINT, MIDPRICE individually helps to know which methods give much precision. The study compares the hybrid models predicting performance to that of standalone ARIMA and LSTM models, in addition to other prominent forecasting approaches like linear regression and random forest. The findings indicate that the hybrid model surpasses the individual models and other benchmark methods, achieving increased precision in terms of Mean absolute percentage error (MAPE) and Root mean squared error (RMSE). The research also investigates the impact of different hyperparameters and model configurations on performance forecasts, giving information about the ideal settings for the hybrid model. Overall, the proposed ARIMA-LSTM hybrid model with moving averages is a promising approach for accurate and reliable stock price forecasting, which has practical implications for financial decision-making and risk management. © The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. 2024.
Enhancing the Accuracy of Manufacturing Process Error Detection Through SMOTE-Based Oversampling Using Machine Learning and Deep Learning Techniques
Dr Saleti Sumalatha, Boyapati S V., Rakshitha G B S., Reddy M R.,
Source Title: International Conference on Integrated Circuits, Communication, and Computing Systems, ICIC3S 2024 - Proceedings, DOI Link
View abstract ⏷
A production competency study leads to a rise in the manufacturing sectors' strategic emphasis. Developing semiconductor materials is a highly complex approach that necessitates numerous evaluations. It is impossible to emphasize the significance of the quality of the product. We put up a number of methods for automatically creating a prognostic model that is effective at identifying equipment flaws throughout the semiconductor materials' wafer fabrication process. The SECOM dataset is representative of semiconductor production procedures that go through numerous tests performed. There are imbalanced statistics in the dataset, so our proposed methodology incorporates SMOTE (Synthetic Minority Over-sampling Technique) functionality that is introduced to mitigate the imbalance of the training dataset by leveling off any unbalanced attributes. Detecting faults in the manufacturing process improves semiconductor quality and testing efficiency, and is used to validate both approaches to Machine Learning and Deep Learning algorithms. This is accomplished by collecting performance metrics during the development process. Another aspect of our effort to cut down on the training time for testing is highlighted in our research report. © 2024 IEEE.
Comparative Analysis of Optimization Algorithms for Feature Selection in Heart Disease Classification
Dr Saleti Sumalatha, Mr Ramdas Kapila
Source Title: Lecture notes in networks and systems, DOI Link
View abstract ⏷
In order to classify the Cleveland Heart Disease dataset, this study evaluates the performance of three optimization methods, namely Fruit Fly Optimization (FFO), Particle Swarm Optimization (PSO), and Grey Wolf Optimizer (GWO). The top 10 features are identified using FFO, with remarkable results for accuracy (88.52%), precision (87.88%), recall (90.63%), and f1-score (89.23%). After using PSO, the accuracy, precision, recall, specificity, and f1-score are all 85.25%, 87.10%, 84.38%, and 85.71% respectively. Finally, GWO is used, which results in precision, accuracy, recall, and f1-score values of 93.33%, 90.16%, 90.11%, 87.5%, and 90.32% respectively, highlighting its consistent superior performance. FFO shows competitive outcomes with notable accuracy and recall through a comparative examination. PSO displays comparable precision and recall while displaying a somewhat poorer accuracy. In contrast, GWO performs better than both FFO and PSO, displaying great accuracy and precision along with remarkable recall and specificity. These results provide important information on the effectiveness of feature selection methods utilized in optimization algorithms for heart disease classification. The study also highlights the need for more investigation into the potential of these optimization algorithms in many fields, broadening their use beyond the classification of disease. This kind of work might improve the progress of the field of feature selection and aid in the creation of better classification models
A Survey on Occupancy-Based Pattern Mining
Dr Saleti Sumalatha, Inaganti Bhavana.,
Source Title: Lecture notes in networks and systems, DOI Link
View abstract ⏷
Occupancy-based pattern mining has emerged as a significant research topic in recent times. This paper presents a comprehensive survey on occupancy, which serves as a measure to augment the significance of patterns. The survey covers various types of patterns, including frequent itemsets, high utility itemsets, frequent sequences, and high utility sequences, all in the context of occupancy. Additionally, the paper delves into techniques aimed at reducing the search space in the aforementioned pattern mining problems. These techniques are crucial for improving the efficiency and scalability of the mining process, especially when dealing with large-scale datasets. Furthermore, the paper discusses potential research extensions for occupancy-based pattern mining. These extensions could explore new applications, explore novel algorithms, or further enhance the effectiveness of occupancy as a measure for pattern evaluation. In general, this survey provides an important resource for researchers interested in understanding and advancing occupancy-based pattern mining techniques
Insights into Gun-Related Deaths: A Comprehensive Machine Learning Analysis
Dr Saleti Sumalatha, Lovelylovely Yeswanth Panchumarthi., Lavanya Parchuri
Source Title: 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT)[CONFERENCE=>ISSN(-)=>ISBN(-)], DOI Link
View abstract ⏷
This work employs both supervised and unsupervised machine learning techniques to examine firearm-related fatalities in the US and identify trends, patterns, and risk factors within the data. During the supervised learning phase, techniques such as logistic regression, decision trees, random forests, and neural networks were used to predict the kind of death (suicide, homicide, accidental, or unknown) based on demographic data like sex, age, race, place, and education. Findings show that the neural network and random forest models exhibit promising precision and recall values across several classes, and that they obtained the highest accuracy, reaching 79.88% and 83.59%, respectively. Using clustering techniques including Agglomerative clustering, K-means, and Gaussian mixture models, gun-related fatalities were categorized based on demographic and temporal data during the unsupervised learning stage. The analysis revealed distinct clusters of deaths, providing insights into the varying patterns and trends over time and across demographic groups. The K-means algorithm, with a silhouette score of 0.42, demonstrated meaningful separation among clusters. The research contributes to understanding the complex dynamics of gun-related deaths, shedding light on both individual risk factors and broader trends. However, further analysis could explore additional dimensions of the dataset or delve deeper into the interpretation of clustering results. The study also highlights how crucial it is to take into consideration the moral consequences and constraints of machine learning applications in complex fields like public health.
Enhancing Customer Churn Prediction: Advanced Models and Resampling Techniques in Dynamic Business Environments
Dr Saleti Sumalatha, Yaswanth Chowdary Thotakura., Dinesh Manikanta Yarramsetty., Kalyan Kumar Doppalapudi., Sai Shasank Alaparthi
Source Title: 2024 International Conference on Intelligent Computing and Emerging Communication Technologies (ICEC), DOI Link
View abstract ⏷
Customer churn analysis is critical for businesses looking to hold onto market share in todays dynamic business environment. The development of e-Finance presents additional difficulties for the traditional banking sector as the digital marketplace grows. Banks face several challenges, including fintech competition, dwindling client loyalty, and digital transformation. Bank managers can identify problems, identify potential churn customers early on, and develop effective retention strategies based on client traits and preferences by analyzing probable causes of bank customer turnover from multiple perspectives and building models for predicting churn. Not only banks but also large corporate sectors like telecommunication, and over-the-top (OTT) platforms do face customer churn. This study proposes the Random Leaf Model (RLM) and also explores the Logit Leaf Model (LLM), and Neural Network Ensemble Model, three sophisticated predictive modeling methodologies. Proactive strategies are necessary in today's marketplaces due to their competitive nature. The primary problem with current automatic churn prediction algorithms is the substantial gap between majority and minority class proportions in the datasets, which might lead to model bias in favor of the dominant class. The shortcomings of conventional churn analysis techniques underscore the necessity of implementing advanced cutting-edge algorithms to achieve precise forecasts
Leveraging ResNet for Efficient ECG Heartbeat Classification
Dr Saleti Sumalatha, Lovely Yeswanth Panchumarthi., Sriya Padmanabhuni
Source Title: 2024 IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems (SPICES), DOI Link
View abstract ⏷
This paper provides a novel approach that uses a modified version of the ResNet architecture to classify heartbeats on an electrocardiogram (ECG). Padding, convolution, max pooling, convolutional blocks, average pooling, and fully linked layers are the six processes in the approach. The MIT-BIH Arrhythmia Database is used to test the approach on five different types of heartbeats: unclassifiable, supraventricular premature, premature ventricular contraction, fusion of ventricular and normal, and normal. The outcomes demonstrate that the suggested approach outperforms other current techniques like LSTM, CNN, and EfficientNet, achieving an accuracy of 98.6%. The performance, restrictions, and future directions of the model are also thoroughly examined in this work. The automated ECG heartbeat categorization using deep learning techniques is one way that the article advances the field of cardiac diagnosis
Optimizing Recommendation Systems: Analyzing the Impact of Imputation Techniques on Individual and Group Recommendation Systems
Dr Saleti Sumalatha, Sri Phani Bhushan Mada., Rahul Tata., Sneha Teja Sree Reddy Thondapu
Source Title: 2024 IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems (SPICES), DOI Link
View abstract ⏷
In todays world, Recommendation Systems play a significant role in guiding and simplifying the decision-making process for individuals and groups. However, the presence of missing data in user-item interaction matrices poses a challenge to accurately identify user preferences and provide relevant suggestions. This is particularly true for group recommendation systems that cater to multiple users. To address this challenge, we have applied four imputation techniques to individual and group recommendation models, including User-based Collaborative filtering, Matrix factorization using Singular Value Decomposition, and deep learning-based models like Autoencoders. We evaluated the effectiveness of these techniques using root mean squared error and mean absolute error metrics and observed a significant impact on the quality of recommendations. Additionally, we implemented aggregation strategies like Borda count, Additive Utilitarian, Multiplicative Utilitarian, Least Misery, and Most Pleasure for Group Recommendations. We evaluated the performance of these strategies using satisfaction score and disagreement score. Overall, our findings suggest that imputation techniques can significantly improve the quality of recommendations in both individual and group recommendation systems
A comparison of various machine learning algorithms and execution of flask deployment on essay grading
Dr Saleti Sumalatha, Udhika Meghana Kotha., Haveela Gaddam., Deepthi Reddy Siddenki
Source Title: International Journal of Electrical and Computer Engineering, Quartile: Q3, DOI Link
View abstract ⏷
Students performance can be assessed based on grading the answers written by the students during their examination. Currently, students are assessed manually by the teachers. This is a cumbersome task due to an increase in the student-teacher ratio. Moreover, due to coronavirus disease (COVID-19) pandemic, most of the educational institutions have adopted online teaching and assessment. To measure the learning ability of a student, we need to assess them. The current grading system works well for multiple choice questions, but there is no grading system for evaluating the essays. In this paper, we studied different machine learning and natural language processing techniques for automated essay scoring/grading (AES/G). Data imbalance is an issue which creates the problem in predicting the essay score due to uneven distribution of essay scores in the training data. We handled this issue using random over sampling technique which generates even distribution of essay scores. Also, we built a web application using flask and deployed the machine learning models. Subsequently, all the models have been evaluated using accuracy, precision, recall, and F1-score. It is found that random forest algorithm outperformed the other algorithms with an accuracy of 97.67%, precision of 97.62%, recall of 97.67%, and F1-score of 97.58%.
Analyzing the Health Data: An Application of High Utility Itemset Mining
Dr Saleti Sumalatha, Lakshmi Sai Bhargavi G., Shanmukh R., Lokesh T., Sobin C C., Padmavathi K., Tottempudi S S
Source Title: 2023 International Conference on Advances in Computation, Communication and Information Technology, ICAICCIT 2023, DOI Link
View abstract ⏷
A data science endeavour called "high utility pattern mining"entails finding important patterns based on different factors like profit, frequency, and weight. High utility itemsets are among the various patterns that have undergone thorough study. These itemsets must exceed a minimum threshold specified by the user. This is particularly useful in practical applications like retail marketing and web services, where items have diverse characteristics. High-utility itemset mining facilitates decision- making by uncovering patterns that have a significant impact. Unlike frequent itemset mining, which identifies commonly oc- curring itemsets, high-utility itemsets often include rare items in real-world applications. Considering the application to the medical field, data mining has been employed in various ways. In this context, the primary method involves analyzing a health dataset that spans from 2014 to 2017 in the United States. The dataset includes categories such as diseases, states, and deaths. By examining these categories and mortality rates, we can derive high-utility itemsets that reveal the causes of the most deaths. In conclusion, high-utility pattern mining is a data science activity that concentrates on spotting significant patterns based on objective standards. It has proven valuable in various fields, including the medical domain, where analyzing datasets can uncover high-utility itemsets related to mortality rates and causes of death.Categories and Subject Descriptors - [Health Database Applica- tion] Data Mining © 2023 IEEE.
An Enhancement in the Efficiency of Disease Prediction Using Feature Extraction and Feature Selection
Dr Saleti Sumalatha, Mr Ramdas Kapila
Source Title: Contemporary Applications of Data Fusion for Advanced Healthcare Informatics, DOI Link
View abstract ⏷
Cardiovascular diseases constitute one of the most dangerous and fatal illnesses. According to statistics, in 2019,17.9 million deaths are reportedfrom cardiovascular diseases. As a result, it is essential to detect the sickness early on to minimize the death rate. To handle data efficiently and precisely forecast the symptoms of illness, data mining and machine learning approaches may be applied. This study intends to employ seven supervised machine learning (ML) techniques to anticipate heart disease. The adoption of ML algorithms is the study's main objective and to investigate how feature extraction (FE) and feature selection (FS) methods might increase the effectiveness of ML models. The experimental results indicate that models withfeature selection and extraction techniques outperformed the model with the entire features from the dataset. As a case study, the authors considered three additional datasets, namely Parkinson's, diabetes, and lung cancer, in addition to the Cleveland Heart Disease dataset. However, the main focus of this study is on predicting heart disease.
Optimizing fetal health prediction: Ensemble modeling with fusion of feature selection and extraction techniques for cardiotocography data
Dr Saleti Sumalatha, Mr Ramdas Kapila
Source Title: Computational Biology and Chemistry, Quartile: Q2, DOI Link
View abstract ⏷
Cardiotocography (CTG) captured the fetal heart rate and the timing of uterine contractions. Throughout pregnancy, CTG intelligent categorization is crucial for monitoring fetal health and preserving proper fetal growth and development. Since CTG provides information on the fetal heartbeat and uterus contractions, which helps determine if the fetus is pathologic or not, obstetricians frequently use it to evaluate a childs physical health during pregnancy. In the past, obstetricians have artificially analyzed CTG data, which is time-consuming and inaccurate. So, developing a fetal health categorization model is crucial as it may help to speed up the diagnosis and treatment and conserve medical resources. The CTG dataset is used in this study. To diagnose the illness, 7 machine learning models are employed, as well as ensemble strategies including voting and stacking classifiers. In order to choose and extract the most significant and critical attributes from the dataset, Feature Selection (FS) techniques like ANOVA and Chi-square, as well as Feature Extraction (FE) strategies like Principal Component Analysis (PCA) and Independent Component Analysis (ICA), are being used. We used the Synthetic Minority Oversampling Technique (SMOTE) approach to balance the dataset because it is unbalanced. In order to forecast the illness, the top 5 models are selected, and these 5 models are used in ensemble methods such as voting and stacking classifiers. The utilization of Stacking Classifiers (SC), which involve Adaboost and Random Forest (RF) as meta-classifiers for disease detection. The performance of the proposed SC with meta-classifier as RF model, which incorporates Chi-square with PCA, outperformed all other state-of-the-art models, achieving scores of 98.79%,98.88%,98.69%,96.32%, and 98.77% for accuracy, precision, recall, specificity, and f1-score respectively.
An efficient ensemble-based Machine Learning for breast cancer detection
Dr Saleti Sumalatha, Mr Ramdas Kapila, Ramdas Kapila
Source Title: Biomedical Signal Processing and Control, Quartile: Q1, DOI Link
View abstract ⏷
Breast cancer is a very severe type of cancer that often develops in breast cells. Attempting to develop an effective predictive model for breast cancer prognosis prediction is urgently needed despite substantial advancements in the management of symptomatic breast cancer over the past ten years. The precise prediction will offer numerous advantages, including the ability to diagnose cancer at an early stage and protect patients from needless medical care and related costs. In the medical field, recall is just as important as model accuracy. Even more crucially in the medical area, a model is not very good if its accuracy is high but its recall is low. To boost accuracy while still assigning equal weight to recall, we proposed a model that ensembles Feature Selection (FS), Feature Extraction (FE), and 5 Machine Learning (ML) models. There are three steps in our proposed model. The Correlation Coefficient (CC) and Anova (Anv) feature selection methodologies to choose the features in the first stage. Applying Uniform Manifold Approximation and Projection (UMAP), t-distributed Stochastic Neighbour Embedding (t-SNE), and Principal Component Analysis (PCA) to extract the features in the second stage without compromising the crucial information. With 5 ML models and ensemble models such as Voting Classifier (VC) and Stacking Classifier (SC) after selecting and extracting features from the dataset to predict the disease will be the last stage. The results show that the proposed model CC-Anv with PCA using a SC outperformed all the existing methodologies with 100% accuracy, precision, recall, and f1-score.
A Comparative Analysis of the Evolution of DNA Sequencing Techniques along with the Accuracy Prediction of a Sample DNA Sequence Dataset using Machine Learning
Dr Saleti Sumalatha, Khizar Baig Mohammed., Sai Venkat Boyapati., Manasa Datta Kandimalla., Madhu Babu Kavati
Source Title: 2023 2nd International Conference on Paradigm Shifts in Communications Embedded Systems, Machine Learning and Signal Processing (PCEMS), DOI Link
View abstract ⏷
DNA is widely considered the blueprint of life. The instructions required for all life forms, to evolve, breed, and thrive are found in DNA. Deoxyribonucleic acid, more commonly known as DNA, is among the most essential chemicals in living cells. A biological macro-molecule is DNA, also known as deoxyri-bonucleic acid. Life's blueprint is encoded by it. Sequencing of DNA has exponentially progressed due to the immense increase in data production in today's world. By means of this paper, we intend to evaluate the evolution of DNA Sequencing methods and perform a comparative analysis of modern-day DNA sequencing techniques to the ones of the past. We also illuminate the potential of machine learning in this domain by taking an exploratory and predicting the DNA Sequence using a Multinomial Naive Bayes classifier.
Exploring Patterns and Correlations Between Cryptocurrencies and Forecasting Crypto Prices Using Influential Tweets
Dr Saleti Sumalatha, Mohit Kumar., Gurram Sahithi Priya., Praneeth Gadipudi., Ishita Agarwal
Source Title: Communications in Computer and Information Science, Quartile: Q3, DOI Link
View abstract ⏷
The Crypto market, as we know, is a market full of various kinds of investors and influencers. We all know the pizza incident in 2010 where a guy purchased two pizzas at 10000 BTC, which ranges nearly around 80 million in current times. That describes how much the market has progressed in these 1012 years. You can see drastic changes in the price of several coins in the past few years, which brings in many new investors to invest their money in this market. Crypto Market has highly volatile currencies. Bitcoin was around 5K INR in 2013, and by year 2021, it reached 48 Lakhs INR, which shows how volatile the market is. The dataset provides many fascinating and valuable insights that help us gather practical knowledge. As data scientists, we are very keen to understand such a market whose data is unstable and keeps changing frequently and making out new patterns with time. This introduction of new patterns with time makes this problem an interesting one and keeps on motivating us to find some valuable information. So, through this manuscript, we tried to analyze two specific crypto coins for a particular period, including more than 2900 records. We found several interesting patterns in the dataset and explored the historical return using several statistical models. We plotted the opening and closing prices of the particular coin by using NumPy, SciPy, and Matplotlib. We also tried to make predictions of the cost of the specific currency and then plot the predicted price line with the actual price line and understand the difference in the prediction model with the fundamental price mode. To do so, we used the Simple Exponential Smoothing (SES) model and performed sentiment analysis based on influencing tweets on Twitter. That makes our prediction more accurate and more reliable than existing techniques. Lastly, we used a linear regression model to establish the relationship between the returns of Ripple and Bitcoin.
Incremental mining of high utility sequential patterns using MapReduce paradigm
Dr Saleti Sumalatha
Source Title: Cluster Computing, Quartile: Q1, DOI Link
View abstract ⏷
High utility sequential pattern (HUSP) mining considers the nonbinary frequency values of items purchased in a transaction and the utility of each item. Incremental updates are very common in many real-world applications. Mining the high utility sequences by rerunning the algorithm every time when the data grows is not a simple task. Moreover, the centralized algorithms for mining HUSPs incrementally cannot handle big data. Hence, an incremental algorithm for high utility sequential pattern mining using MapReduce para-digm (MR-INCHUSP) has been introduced in this paper. The proposed algorithm includes the backward mining strategy that profoundly handles the knowledge acquired from the past mining results. Further, elicited from the co-occurrence relation between the items, novel sequence extension rules have been introduced to increase the speed of the mining process. The experimental results exhibit the performance of MR-INCHUSP on several real and synthetic datasets.
Ontology Based Food Recommendation
Dr T Jaya Lakshmi, Dr Saleti Sumalatha, Rohit Chivukula., Kandula Lohith Ranganadha Reddy
Source Title: Smart Innovation, Systems and Technologies, Quartile: Q4, DOI Link
View abstract ⏷
Eating right is the most crucial aspect of healthy living. A nutritious, balanced diet keeps our bodies support fight off diseases. Many lifestyle related diseases such as diabetes and thyroid can often be avoided by active living and better nutrition. Having diet related knowledge is essential for all. With this motivation, an ontology related to food domain is discussed and developed in this work. The aim of this work is to create on ontology model in the food domain to help people in getting right recommendation about the food, based on their health conditions if any.
Constraint Pushing Multi-threshold Framework for High Utility Time Interval Sequential Pattern Mining
Dr Saleti Sumalatha, N Naga Sahithya., K Rasagna., K Hemalatha., B Sai Charan., P V Karthik Upendra
Source Title: Communications in Computer and Information Science, Quartile: Q3, DOI Link
View abstract ⏷
This paper aims to detect high utility sequential patterns including time intervals and multiple utility thresholds. There are many algorithms that mine sequential patterns considering utility factor, these can find the order between the items purchased but they exclude the time interval among items. Further, they consider only the same utility threshold for each item present in the dataset, which is not convincing to assign equal importance for all the items. Time interval of items plays a vital role to forecast the most valuable real-world situations like retail sector, market basket data analysis etc. Recently, UIPrefixSpan algorithm has been introduced to mine the sequential patterns including utility and time intervals. Nevertheless, it considers only a single minimum utility threshold assuming the same unit profit for each item. Hence, to solve the aforementioned issues, in the current work, we proposed UIPrefixSpan-MMU algorithm by utilizing a pattern growth approach and four time constraints. The experiments done on real datasets prove that UIPrefixSpan-MMU is more efficient and linearly scalable for generating the time interval sequences with high utility.
Mining Spatio-Temporal Sequential Patterns Using MapReduce Approach
Dr Saleti Sumalatha, P Radha Krishna., D Jaswanth Reddy
Source Title: Communications in Computer and Information Science, Quartile: Q3, DOI Link
View abstract ⏷
Spatio-temporal sequential pattern mining (STSPM) plays an important role in many applications such as mobile health, criminology, social media, solar events, transportation, etc. Most of the current studies assume the data is located in a centralized database on which a single machine performs mining. Thus, the existing centralized algorithms are not suitable for the big data environment, where data cannot be handled by a single machine. In this paper, our main aim is to find out the Spatio-temporal sequential patterns from the event data set using a distributed framework suitable for mining big data. We proposed two distributed algorithms, namely, MR-STBFM (MapReduce based spatio-temporal breadth first miner), and MR-SPTreeSTBFM (MapReduce based sequential pattern tree spatio-temporal breadth first miner). These are the distributed algorithms for mining spatio-temporal sequential patterns using Hadoop MapReduce framework. A spatio-temporal tree structure is used in MR-SPTreeSTBFM for reducing the candidate generation cost. This is an extension to the proposed MR-STBFM algorithm. The tree structure significantly improves the performance of the proposed approach. Also, the top-most significant pattern approach has been proposed to mine the top-most significant sequential patterns. Experiments are conducted to evaluate the performance of the proposed algorithms on the Boston crime dataset.
Mining High Utility Time Interval Sequences Using MapReduce Approach: Multiple Utility Framework
Dr T Jaya Lakshmi, Dr Saleti Sumalatha, Mohd Wazih Ahmad
Source Title: IEEE Access, Quartile: Q1, DOI Link
View abstract ⏷
Mining high utility sequential patterns is observed to be a significant research in data mining. Several methods mine the sequential patterns while taking utility values into consideration. The patterns of this type can determine the order in which items were purchased, but not the time interval between them. The time interval among items is important for predicting the most useful real-world circumstances, including retail market basket data analysis, stock market fluctuations, DNA sequence analysis, and so on. There are a very few algorithms for mining sequential patterns those consider both the utility and time interval. However, they assume the same threshold for each item, maintaining the same unit profit. Moreover, with the rapid growth in data, the traditional algorithms cannot handle the big data and are not scalable. To handle this problem, we propose a distributed three phase MapReduce framework that considers multiple utilities and suitable for handling big data. The time constraints are pushed into the algorithm instead of pre-defined intervals. Also, the proposed upper bound minimizes the number of candidate patterns during the mining process. The approach has been tested and the experimental results show its efficiency in terms of run time, memory utilization, and scalability.
A Comparison of Various Class Balancing and Dimensionality Reduction Techniques on Customer Churn Prediction
Dr Saleti Sumalatha, Sri Phani Bhushan Mada., Nandini Thimmireddygari., Rahul Tata., Snehatejasree Reddy Thondapu
Source Title: 2022 IEEE 7th International Conference on Recent Advances and Innovations in Engineering, DOI Link
View abstract ⏷
With the advancement of technology, companies are able to foresee the customers who are going to leave their organization much before. This problem of customer churn prediction is handled in the current work. In the real-world, data is not balanced, having more observations with respect to few classes and less observations in case of other classes. But, giving equal importance to each class is really significant to build an efficient prediction model. Moreover, real-world data contains many attributes meaning that the dimensionality is high. In the current paper, we discussed three data balancing techniques and two methods of dimensionality reduction i.e. feature selection and feature extraction. Further, selecting the best machine learning model for churn prediction is an important issue. This has been dealt in the current paper. Also, we aim to improve the efficiency of customer churn prediction by evaluating various class balancing and dimensionality reduction techniques. Moreover, we evaluated the performance of the models using AUC curves and K-fold cross-validation techniques.
Distributed Mining of High Utility Time Interval Sequential Patterns with Multiple Minimum Utility Thresholds
Dr T Jaya Lakshmi, Dr Saleti Sumalatha, Thirumalaisamy Ragunathan
Source Title: Lecture Notes in Computer Science, Quartile: Q3, DOI Link
View abstract ⏷
The problem of mining high utility time interval sequential patterns with multiple utility thresholds in a distributed environment is considered. Mining high utility sequential patterns (HUSP) is an emerging issue and the existing HUSP algorithms can mine the order of items and they do not consider the time interval between the successive items. In real-world applications, time interval patterns provide more useful information than the conventional HUSPs. Recently, we proposed distributed high utility time interval sequential pattern mining (DHUTISP) algorithm using MapReduce in support of the BigData environment. The algorithm has been designed considering a single minimum utility threshold. It is not convincing to use the same utility threshold for all the items in the sequence, which means that all the items are given the same importance. Hence, in this paper, a new distributed framework is proposed to efficiently mine high utility time interval sequential patterns with multiple minimum utility thresholds (DHUTISP-MMU) using the MapReduce approach. The experimental results show that the proposed approach can efficiently mine HUTISPs with multiple minimum utility thresholds.
Distributed Mining of High Utility Sequential Patterns with Negative Item Values
Dr Saleti Sumalatha, Dr Manojkumar V, Akhileshwar Reddy
Source Title: International Journal of Advanced Computer Science and Applications, Quartile: Q3, DOI Link
View abstract ⏷
The sequential pattern mining was widely used to solve various business problems, including frequent user click pattern, customer analysis of buying product, gene microarray data analysis, etc. Many studies were going on these pattern mining to extract insightful data. All the studies were mostly concentrated on high utility sequential pattern mining (HUSP) with positive values without a distributed approach. All the existing solutions are centralized which incurs greater computation and communication costs. In this paper, we introduce a novel algorithm for mining HUSPs including negative item values in support of a distributed approach. We use the Hadoop map reduce algorithms for processing the data in parallel. Various pruning techniques have been proposed to minimize the search space in a distributed environment, thus reducing the expense of processing. To our understanding, no algorithm was proposed to mine High Utility Sequential Patterns with negative item values in a distributed environment. So, we design a novel algorithm called DHUSP-N (Distributed High Utility Sequential Pattern mining with Negative values). DHUSP-N can mine high utility sequential patterns considering the negative item utilities from Bigdata.

Contact Details

sumalatha.s@srmap.edu.in

Scholars

Doctoral Scholars

Mr Ramdas Kapila
Ms A Sai Sunanda

Admission Help Line

Admissions 2026 Open — Apply!

About

Admissions

Centres or Directorates

Overview

BTech

BSc

MTech

Integrated MTech

MSc

BA

BSc

Programmes

Research

Placements

International Relations

Campus Life