Pneumonia Detection from X-Ray Images Using Deep Transfer Learning
Dr Tapas Kumar Mishra, Ms Annam Nandini, Sri Sahithya Vemuri., Sowmya Kotha., Sravya Voruganti., Praneeth Reddy Kunam
Source Title: Communications in computer and information science, Quartile: Q3, DOI Link
View abstract ⏷
Pneumonia is a global health concern, especially in underserved regions. Traditional diagnostic methods, relying on costly chest X-rays, suffer from interpretation variances. To overcome these challenges, weve developed an advanced computer-assisted evaluation system utilising deep transfer learning techniques. Our approach aims to improve diagnostic accuracy and accessibility, particularly in resource-limited settings, offering a promising solution to enhance pneumonia diagnosis globally. Using a large dataset gathered from Kaggle, our novel approach uses convolutional neural network (CNN) models such as VGG16, ResNet-50, and InceptionNet-v5 to autonomously detect pneumonia in chest X-ray pictures. Deep transfer learning helps our models overcome data scarcity limits, allowing them to accurately recognise relevant visual attributes and patterns. Our methodology employs an ensemble approach, combining the strengths of each CNN model. We introduce a groundbreaking strategy for calculating optimal weights based on key evaluation metric such as accuracy. Evaluation using RSNA dataset shows remarkable accuracy: 92% with the CNN model and 84% with ResNet-50, promising improved diagnosis, especially in resource-constrained settings, potentially saving lives. This innovative approach represents a significant leap forward in pneumonia diagnosis, offering a scalable and reliable solution for healthcare providers globally. To sum up, our computer-aided diagnosis method offers a state-of-the-art approach to addressing the difficulties involved in diagnosing pneumonia. By combining ensemble modeling with deep transfer learning methods, we have created a very useful tool for correctly detecting pneumonia in chest X-ray pictures. This technology offers enormous promise for enhancing healthcare delivery and outcomes globally with additional development and use.
Prediction of Suicidal Behaviour among the users on Social Media using NLP and ML
Dr Tapas Kumar Mishra, G Sucharitha., Narala Siddhartha., Bommena Raju., Sachi Nandan Mohanty
Source Title: 2024 International Conference on Emerging Systems and Intelligent Computing (ESIC), DOI Link
View abstract ⏷
suicide attempts. Our findings contribute to the growing field of suicide prevention through cutting-edge machine learning techniques applied to natural language data. According to our experimental results, the PM model outperformed other machine learning algorithms with the highest classification score. It achieved an impressive accuracy of 92% and an F1 score of 85%. These findings highlight the effectiveness and potential of the PM model for the task at hand. Suicides are happening more often these days. Social media messages and chats are the preferred means of expression for those who are committing suicide. Several studies have shown that it is possible to spot suicide suspects from their online conversations and posts on social media. Consequently, it is imperative to create a machine learning system for automatic early identification of suicide ideation or any abrupt changes in a user's behaviour by examining his or her posts and chats on social media such as Twitter, Instagram, WhatsApp, and Facebook. Our research is focused on significantly improving the performance of our model to accurately recognize early indications of suicide attempts with high precision, thus contributing to the prevention of such tragic events. To achieve this, we utilized advanced text pre-processing techniques and feature extraction approaches, including CountVectorizer and word embedding. Our study involved training both XGBoost and NLP models, which were rigorously evaluated using a substantial dataset of 34,100 samples. Furthermore, to assess real-world applicability, we conducted live tests of our model using tweets collected via the Stream lit python-based web interface tool. The results of our experiments are promising, demonstrating the potential of our approach in detecting early signs of distress and aiding in the timely intervention to prevent.
Application of Machine Learning Algorithms and Feature Selection using Genetic Algorithm: A Case Study on Cyber Attack Detection
Dr Tapas Kumar Mishra, Sai Karthik., Pradeep Sai Teja., Raja Pavan Vignesh., Yeswanth Venkata Kumar
Source Title: 2024 IEEE 9th International Conference for Convergence in Technology (I2CT), DOI Link
View abstract ⏷
Feature Reduction is one the important aspect in Machine Learning. This method is used to reduce the number of features in a data set while preparing a model. In ML, high-dimensional data refers to data with a large number of features or variables. The curse of dimensionality is one the problem which occurs while preparing a model, where the performance of the model decreases as the number of features increases. This is because the complexity of the model increases with the number of features, and it becomes more difficult to find a better solution. In addition, high-dimensional data can also lead to overfitting, where the model fits the training data too closely and does not generalize well to new data. Traditionally feature reduction can be done in many ways such as principal component analysis (PCA), singular value decomposition (SVD), and linear discriminant analysis (LDA). Each procedure is done in different way But, all these procedures were used to reduce the dimensions and retrieve the information as much possible. Even though there many pre-existing techniques available each technique has its own limitation. In this research, We used genetic algorithm, a bio inspired algorithm which takes inspiration from Darwin's Theory of evolution. The main objective of using this method is to optimize the overall model's computation time. We applied genetic algorithm on a data set for feature reduction. Further, different classifiers are used to test the performance of the resultant dataset. After a set of observations, it is found that genetic algorithms can improve the model's performance with existing dataset and reduces computational time drastically around 50% less time.
Detecting Threats in IoT based Healthcare using Machine Learning Algorithms
Dr Tapas Kumar Mishra, Ms Annam Nandini, Ms Arati Behera
Source Title: 2024 IEEE 9th International Conference for Convergence in Technology (I2CT), DOI Link
View abstract ⏷
Internet of Things(IoT) technology enables medical devices, sensors, and other healthcare-related equipment to connect and communicate with each other.This connectivity enables enhanced monitoring, data collecting, and analysis, resulting in enhanced patient outcomes and more effective healthcare delivery. However, with the increased use of connected devices, detecting and preventing Cyberattack is essential to protect sensitive patient information. To address this challenge, the article discusses the use of some machine learning algorithms, such as Logistic Regression(LR), Random Forest(RF), Gradient Boosting(GB), and Support Vector Machine(SVM), Naive Bayes(NB), K-Nearest Neighbour(KNN) to detect the attacks. The WUSTL EHMS 2020 data set was used to test these algorithms, resulting in the best accuracy. Overall, the article highlights the potential benefits of IoT technology and machine learning in healthcare while emphasizing the importance of data privacy and detecting attacks.
Enhancing DDoS detection in SDIoT through effective feature selection with SMOTE-ENN
Dr Kshira Sagar Sahoo, Dr Tapas Kumar Mishra, Ms Arati Behera, Anand Nayyar., Muhammad Bilal
Source Title: PLoS ONE, Quartile: Q1, DOI Link
View abstract ⏷
Internet of things (IoT) facilitates a variety of heterogeneous devices to be enabled with network connectivity via various network architectures to gather and exchange real-time information. On the other hand, the rise of IoT creates Distributed Denial of Services (DDoS) like security threats. The recent advancement of Software Defined-Internet of Things (SDIoT) architecture can provide better security solutions compared to the conventional networking approaches. Moreover, limited computing resources and heterogeneous network protocols are major challenges in the SDIoT ecosystem. Given these circumstances, it is essential to design a low-cost DDoS attack classifier. The current study aims to employ an improved feature selection (FS) technique which determines the most relevant features that can improve the detection rate and reduce the training time. At first, to overcome the data imbalance problem, Edited Nearest Neighbor-based Synthetic Minority Oversampling (SMOTE-ENN) was exploited. The study proposes SFMI, an FS method that combines Sequential Feature Selection (SFE) and Mutual Information (MI) techniques. The top k common features were extracted from the nominated features based on SFE and MI. Further, Principal component analysis (PCA) is employed to address multicollinearity issues in the dataset. Comprehensive experiments have been conducted on two benchmark datasets such as the KDDCup99, CIC IoT-2023 datasets. For classification purposes, Decision Tree, K-Nearest Neighbor, Gaussian Naive Bayes, Random Forest (RF), and Multilayer Perceptron classifiers were employed. The experimental results quantitatively demonstrate that the proposed SMOTE-ENN+SFMI+PCA with RF classifier achieves 99.97% accuracy and 99.39% precision with 10 features.
A combination learning framework to uncover cyber attacks in IoT networks
Dr Kshira Sagar Sahoo, Dr Tapas Kumar Mishra, Ms Arati Behera, Monowar Bhuyan
Source Title: Internet of Things (Netherlands), Quartile: Q1, DOI Link
View abstract ⏷
The Internet of Things (IoT) is rapidly expanding, connecting an increasing number of devices daily. Having diverse and extensive networking and resource-constrained devices creates vulnerabilities to various cyber-attacks. The IoT with the supervision of Software Defined Network (SDN) enhances the network performance through its flexibility and adaptability. Different methods have been employed for detecting security attacks; however, they are often computationally efficient and unsuitable for such resource-constraint environments. Consequently, there is a significant requirement to develop efficient security measures against a range of attacks. Recent advancements in deep learning (DL) models have paved the way for designing effective attack detection methods. In this study, we leverage Genetic Algorithm (GA) with a correlation coefficient as a fitness function for feature selection. Additionally, mutual information (MI) is applied for feature ranking to measure their dependency on the target variable. The selected optimal features were used to train a hybrid DNN model to uncover attacks in IoT networks. The hybrid DNN integrates Convolutional Neural Network, Bi-Gated Recurrent Units (Bi-GRU), and Bidirectional Long Short-Term Memory (Bi-LSTM) for training the input data. The performance of our proposed model is evaluated against several other baseline DL models, and an ablation study is provided. Three key datasets InSDN, UNSW-NB15, and CICIoT 2023 datasets, containing various types of attacks, were used to assess the performance of the model. The proposed model demonstrates an impressive accuracy and detection time over the existing model with lower resource consumption.
Assessment of Data Augmentation Paradigms in Pathology Identification
Dr Tapas Kumar Mishra, Sai Karthik Nallamothu., Pradeep Sai Teja Sanka., Srikar Vaka., Raja Pavan Vignesh Kajjayam., Yeswanthvenkatakumar Vidavaluru
Source Title: 2024 OITS International Conference on Information Technology (OCIT), DOI Link
View abstract ⏷
Data augmentation has an important role in improving the performance of models in medical realm. This research shows the impact of various data augmentation techniques by evaluating its performance using two medical datasets. By employing augmentation strategies such as under sampling and oversampling techniques to balance imbalanced datasets which includes Repeated Edited Nearest Neighbors (RENN), Random under sampling (RUS), Near Miss and SVM, Borderline, Simple Smote techniques. This study explores the connection between data preprocessing, model development and the effectiveness of various data augmentation techniques. Three different deep learning models-Artificial Neural Networks (ANN), Long Short-Term Memory networks (LSTM), and its varient Gated Recurrent Units (LSTM) were used on each method, and their performance was observed and analyzed. The comparison of these models on different augmentation techniques provides crucial insights into how each method affects the models performance. These insights can be useful in the real time where the data is not distributed equally such as in medical, security industry. In this research two medical datasets related to heart and cancer were taken and experimented with sampling methods which have imbalance labels.
A Novel Approach for Diabetic Retinopathy Screening Using Asymmetric Deep Learning Features
Dr Tapas Kumar Mishra, Pradeep Kumar Jena., Charulata Palai., Manjushree Nayak., Bonomali Khuntia., Sachi Nandan Mohanty
Source Title: Big Data and Cognitive Computing, Quartile: Q1, DOI Link
View abstract ⏷
Automatic screening of diabetic retinopathy (DR) is a well-identified area of research in the domain of computer vision. It is challenging due to structural complexity and a marginal contrast difference between the retinal vessels and the background of the fundus image. As bright lesions are prominent in the green channel, we applied contrast-limited adaptive histogram equalization (CLAHE) on the green channel for image enhancement. This work proposes a novel diabetic retinopathy screening technique using an asymmetric deep learning feature. The asymmetric deep learning features are extracted using U-Net for segmentation of the optic disc and blood vessels. Then a convolutional neural network (CNN) with a support vector machine (SVM) is used for the DR lesions classification. The lesions are classified into four classes, i.e., normal, microaneurysms, hemorrhages, and exudates. The proposed method is tested with two publicly available retinal image datasets, i.e., APTOS and MESSIDOR. The accuracy achieved for non-diabetic retinopathy detection is 98.6% and 91.9% for the APTOS and MESSIDOR datasets, respectively. The accuracies of exudate detection for these two datasets are 96.9% and 98.3%, respectively. The accuracy of the DR screening system is improved due to the precise retinal image segmentation.
Heart Disease Prediction using Machine Learning Algorithms from ECG images: A short Summary
Dr Tapas Kumar Mishra, Ms Annam Nandini, Ms Arati Behera
Source Title: 2023 OITS International Conference on Information Technology (OCIT), DOI Link
View abstract ⏷
Heart disease refers to a group of disorders that affect the heart's ability to circulate blood as well as oxygen throughout the body. Heart rhythm problems to more serious conditions like coronary artery disease or heart failure can all be the result of several illnesses, including heart disease. This paper's main goal is to use image processing to categorize Coronary Artery Disease (CAD). Further, the major contribution is to transform Electrocardiogram(ECG) recordings into a 1-D signal using image segmentation. ECG image dataset that includes P, QRS, and T waves is used. Several classifiers that includes k-nearest neighbours(KNN), Logistic Regression (LR), Support Vector Machine(SVM), Random Forest (RF), and Decision Tree(DT) are experimented. The performance of the classifier is evaluated using widely-Accepted standard metrics, including recall, accuracy, precision, and f1-score. Finally, the performance is analyzed and presented for further research. The further research will be focused on data extraction from the image and minority eradication of the dataset.
5G-Enabled Secure IoT Applications in Smart Cities Using Software-Defined Networks
Dr Tapas Kumar Mishra, Dr Kshira Sagar Sahoo, Ms Arati Behera, Syed Yaser Mahmood., Aashrit S., Venkatesh Reddy B
Source Title: Handbook of Research on Network-Enabled IoT Applications for Smart City Services, DOI Link
View abstract ⏷
With the idea of shifting towards a smart future there is a lot of research being done in the area of internet of things (IoT) and wireless communication, especially 5G network technology. These technologies are instrumenting society towards a world of high connectivity, through secure evolutionary telecommunication methodologies. In this chapter we understand the role of 5G networks in enhancing IoT devices and discuss their security aspects. Integration of IoT and software defined network termed as SDIoT enables automatic traffic rerouting, device reconfiguration, and bandwidth allocation seamlessly. Smart cities utilize the SDIoT integrated with 5G to gather real-time data, better understand how demand patterns are changing, and respond with quicker and more affordable solutions. The authors try to understand the existing research scenario in 5G networks and IoT, and what areas are being taken into consideration for improvement in the coming future. Copyright
Content-based Image Retrieval using Encoder based RGB and Texture Feature Fusion
Dr Tapas Kumar Mishra, Satya Ranjan Pattanaik., Charulata Palai., Pradeep Kumar Jena
Source Title: International Journal of Advanced Computer Science and Applications, Quartile: Q3, DOI Link
View abstract ⏷
Recent development of digital photography and the use of social media using smartphones has boosted the demand for image query by its visual semantics. Content-Based Image Retrieval (CBIR) is a well-identified research area in the domain of image and video data analysis. The major challenges of a CBIR system are (a) to derive the visual semantics of the query image and (b) to find all the similar images from the repository. The objective of this paper is to precisely define the visual semantics using hybrid feature vectors. In this paper, a CBIR system using encoded-based feature fusion is proposed. The CNN encoding features of the RGB channel are fused with the encoded texture features of LBP, CSLBP, and LDP separately. The retrieval performance of the different fused features is tested using three public datasets i.e. Corel-lK, Caltech, and 102flower. The result shows the class properties are better retained using the LDP with RGB encoded features, this helps to enhance the classification and retrieval performance for all three datasets. The average precision of Corel-lK is 94.5% and it is 89.7% for Caltech, and 88.7% for the 102flower. The average f1-score is 89.5% for Caltech, and 88.5% for the 102flower. The improvement in the f1-score value implies the proposed fused feature is more stable to deal the class imbalance problem
Content-Based Image Retrieval using Adaptive CIE Color Feature Fusion
Dr Tapas Kumar Mishra, Pradeep Kumar Jena., Bonomali Khuntia., Charulata Palai
Source Title: Revue d'Intelligence Artificielle, DOI Link
View abstract ⏷
This work proposes a novel content-based image retrieval framework using adaptive weight feature fusion in the International Commission on Illumination (CIE) color space. To enhance the weights of the saliency region features of an image, an adaptive wrapper model is proposed for the adaptive feature selection. Initially, the images are transferred to the CIE color space, i.e., the L*, a*, b* color space. The local binary model (LBP) texture features of all four channels are analyzed class-wise. For each class, the weights of the LBP features for a* and b* axis are calculated dynamically as per their class variance. The weighted LBP features along a* and b* axis are merged, which is referred to as the LBPCW feature in the CIE color space. To test the performance of the proposed LBPCW feature we developed a CBIR system, here two standard classifiers i.e. Support Vector Machine (SVM), and Naïve Bayes (NB) is used for classification and Euclidian distance measure is used for image retrieval. The model is tested with two public datasets Wang-1K and Corel-5K. It is observed that our proposed LBPCW feature outperforms LBP and local binary pattern with saliency map (LBPSM) features.
Applications of Federated Learning in Computing Technologies
Dr Sambit Kumar Mishra, Dr Tapas Kumar Mishra, Kotipalli Sindhu., Mogaparthi Surya Teja., Vutukuri Akhil., Ravella Hari Krishna., Pakalapati Praveen
Source Title: Convergence of Cloud with AI for Big Data Analytics: Foundations and Innovation, DOI Link
View abstract ⏷
Federated learning is a technique that trains the knowledge across different decentralized devices holding samples of information without exchanging them. The concept is additionally called collaborative learning. In federated learning, the clients are allowed separately to teach the deep neural network models with the local data combined at the deep neural network model at the central server. All the local datasets are uploaded to a minimum of one server, so it assumes that local data samples are identically distributed. It doesnt transmit the information to the server. Because of its security and privacy concerns, its widely utilized in many applications like IoT, cloud computing; Edge computing, Vehicular edge computing, and many more. The details of implementation for the privacy of information in federated learning for shielding the privacy of local uploaded data are described. Since there will be trillions of edge devices, the system efficiency and privacy should be taken with no consideration in evaluating federated learning algorithms in computing technologies. This will incorporate the effectiveness, privacy, and usage of federated learning in several computing technologies. Here, different applications of federated learning, its privacy concerns, and its definition in various fields of computing like IoT, Edge, and Cloud Computing are presented.
An Improved Machine Learning Framework for Cardiovascular Disease Prediction
Dr Tapas Kumar Mishra, Dr Kshira Sagar Sahoo, Ms Arati Behera, Sarathchandra B
Source Title: Communications in Computer and Information Science, Quartile: Q3, DOI Link
View abstract ⏷
Cardiovascular diseases have the highest fatality rate among the worlds most deadly syndromes. They have become stress, age, gender, cholesterol, Body Mass Index, physical inactivity, and an unhealthy diet are all key risk factors for cardiovascular disease. Based on these parameters, researchers have suggested various early diagnosis methods. However, the correctness of the supplied treatments and approaches needs considerable fine-tuning due to the cardiovascular illnesses intrinsic criticality and life-threatening hazards. This paper proposes a framework for accurate cardiovascular disorder prediction based on machine learning techniques. To attain the purpose, the method employs an approach called synthetic minority over-sampling (SMOTE). The benchmark datasets are used to validate the framework for achieving better accuracy, such as Recall and Accuracy. Finally, a comparison has been presented with existing state-of-the-art approaches that shows 99.16% accuracy by a collaborative model by logistic regression and KNN.
Adaptive Congestion Control Mechanism to Enhance TCP Performance in Cooperative IoV
Dr Tapas Kumar Mishra, Dr Kshira Sagar Sahoo, Manas Kumar Mishra., Muhammad Bilal
Source Title: IEEE Access, Quartile: Q1, DOI Link
View abstract ⏷
One of the main causes of energy consumption in Internet of Vehicles (IoV) networks is an ill-designed network congestion control protocol, which results in numerous packet drops, lower throughput, and increased packet retransmissions. In IoV network, the objective to increase network throughput can be achieved by minimizing packets re- transmission and optimizing bandwidth utilization. It has been observed that the congestion control mechanism (i.e., the congestion window) can plays a vital role in mitigating the aforementioned challenges. Thus, this paper present a cross-layer technique to controlling congestion in an IoV network based on throughput and buffer use. In the proposed approach, the receiver appends two bits in the acknowledgment (ACK) packet that describes the status of the buffer space and link utilization. The sender then uses this information to monitor congestion and limit the transmission of packets from the sender. The proposed model has been experimented extensively and the results demonstrate a significantly higher network performance percentage in terms of buffer utilization, link utilization, throughput, and packet loss.
Combination of Reduction Detection Using TOPSIS for Gene Expression Data Analysis
Dr Sambit Kumar Mishra, Dr Tapas Kumar Mishra, Jogeswar Tripathy., Rasmita Dash., Binod Kumar Pattanayak., Deepak Puthal
Source Title: Big Data and Cognitive Computing, Quartile: Q1, DOI Link
View abstract ⏷
In high-dimensional data analysis, Feature Selection (FS) is one of the most fundamental issues in machine learning and requires the attention of researchers. These datasets are characterized by huge space due to a high number of features, out of which only a few are significant for analysis. Thus, significant feature extraction is crucial. There are various techniques available for feature selection; among them, the filter techniques are significant in this community, as they can be used with any type of learning algorithm and drastically lower the running time of optimization algorithms and improve the performance of the model. Furthermore, the application of a filter approach depends on the characteristics of the dataset as well as on the machine learning model. Thus, to avoid these issues in this research, a combination of feature reduction (CFR) is considered designing a pipeline of filter approaches for high-dimensional microarray data classification. Considering four filter approaches, sixteen combinations of pipelines are generated. The feature subset is reduced in different levels, and ultimately, the significant feature set is evaluated. The pipelined filter techniques are Correlation-Based Feature Selection (CBFS), Chi-Square Test (CST), Information Gain (InG), and Relief Feature Selection (RFS), and the classification techniques are Decision Tree (DT), Logistic Regression (LR), Random Forest (RF), and k-Nearest Neighbor (k-NN). The performance of CFR depends highly on the datasets as well as on the classifiers. Thereafter, the Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) method is used for ranking all reduction combinations and evaluating the superior filter combination among all.
Crop Recommendation System Using Support Vector Machine Considering Indian Dataset
Dr Tapas Kumar Mishra, Dr Sambit Kumar Mishra, Kanaparthi Jeevan Sai., Shreyas Peddi., Manideep Surusomayajula
Source Title: Lecture Notes in Networks and Systems, Quartile: Q4, DOI Link
View abstract ⏷
Since a long years, agriculture is considered as a major profession for livelihoods of the Indians. Still, agriculture is not profitable as many farmers take the worse step as they cannot survive from the burden of loans. So, one such place where there is yet large scope to develop is agriculture. In comparison with other countries, India has the highest production rate in agriculture. However, still, most agricultural fields are underdeveloped due to the lack of deployment of ecosystem control technologies. Agriculture when combined with technology can bring the finest results. Crop yield depends on multiple climatic conditions such as air temperature, soil temperature, humidity, and soil moisture. In general, farmers depend on self-monitoring and experience for harvesting fields. Scarcity of water is a main issue in todays life. This scarcity is affecting people worldwide. So water is also a vital component of crop yield, here we are considering rainfall instead direct water. Predicting the crop selection/yield in advance of its harvest would help the policymakers and farmers for taking appropriate measures for farming, marketing, and storage. Thus, in this paper we propose a crop selection using machine learning technique as support vector machine (SVM) and polynomial regression. This model will help the farmers to know the yield of their crop before cultivating the agricultural field and thus help them to make the appropriate decisions. It attempts to solve the issue by building a prototype of an interactive prediction system. Accurate yield prediction is required to be done after understanding the functional relationship between yield and these parameters because along with all advances in the machines and technologies used in farming, useful and accurate information about different matters also plays a significant role in it. In this paper, we have simulated SVM and polynomial regression technique to predict which crop can yield better profit. Both of the models are simulated comprehensively on the Indian dataset, and an analytical report has been presented.
Crop Recommendation System using KNN and Random Forest considering Indian Data set
Dr Tapas Kumar Mishra, Mishra S K., Sai K J., Alekhya B S., Nishith A R
Source Title: Proceedings - 2021 19th OITS International Conference on Information Technology, OCIT 2021, DOI Link
View abstract ⏷
The agriculture plays crucial role in the growth of the country's economy. In comparison to other countries, India has the highest production rate in agriculture. Agriculture when combined with technology can bring the finest results. Crop prediction is a highly complex trait determined by multiple factors such as Contents of Nitrogen, Phosphorous, Potassium, Rainfall, Temperature, Humidity, Ph level. Predicting the crop in advance would help the policymakers and farmers for taking appropriate measures for farming, marketing and storage. Thus, in this paper we propose crop selection using machine learning techniques such as K-Nearest Neighbour (KNN) and Random Forest. Both of the models are simulated comprehensively on Indian Data set and an analytical report has been presented. This model will help the farmers to know the type of the crop before cultivating onto the agricultural field and thus help them to make appropriate decisions.