Application and user-specific data prefetching and parallel read algorithms for distributed file systems
Ms Anusha Nalajala, T Ragunathan., Ranesh Naha., Sudheer Kumar Battula
Source Title: Cluster Computing, Quartile: Q1, DOI Link
View abstract ⏷
Cloud computing systems are widely used to deploy big data-based applications because of their high storage and computation capacity. The key component for storage in cloud computing environment is distributed file system which can store and process data produced by big data-based applications effectively. The users of such big data-based applications issue read requests more frequently when compared to write requests. So, most of these cloud-based applications demand optimal performance from the distributed file system, especially for read operations. Numerous caching and prefetching techniques have been proposed in the existing literature to enhance the performance of distributed file system. However, these techniques typically adopt a synchronous approach, focusing on either application data prefetching or user data prefetching, when the user application starts executing and this may result in an extended read access time. Furthermore, the data is prefetched either based on access frequency or reuse distance with out considering the access recency of data which may result in less cache hit ratio. In this paper, we have proposed application-specific and user-specific data prefetching algorithms for prefetching the data from the distributed file system and storing the same in the multi-level caches present in the distributed file system based on the combination of access frequency and recency ranking of file blocks that were previously accessed by client application programs. Additionally, we have divided the cache into two partitions namely user and application caches to store the prefetched data as per the popularity value calculated by considering user and application level accesses. We have also introduced a parallel read algorithm to read data simultaneously from the multiple caches present in the distributed file system environment. The simulation results demonstrate that, the proposed algorithms improved the distributed file systems performance by minimum of 8 to maximum of 92 percent in terms of average read access time when compared with different existing approaches.
HRFP: Highly Relevant Frequent Patterns-Based Prefetching and Caching Algorithms for Distributed File Systems
Ms Anusha Nalajala, Ranesh Naha., Sudheer Kumar Battula., T Ragunathan
Source Title: Electronics, Quartile: Q3, DOI Link
View abstract ⏷
Data-intensive applications are generating massive amounts of data which is stored on cloud computing platforms where distributed file systems are utilized for storage at the back end. Most users of those applications deployed on cloud computing systems read data more often than they write. Hence, enhancing the performance of read operations is an important research issue. Prefetching and caching are used as important techniques in the context of distributed file systems to improve the performance of read operations. In this research, we introduced a novel highly relevant frequent patterns (HRFP)-based algorithm that prefetches content from the distributed file system environment and stores it in the client-side caches that are present in the same environment. We have also introduced a new replacement policy and an efficient migration technique for moving the patterns from the main memory caches to the caches present in the solid-state devices based on a new metric namely the relevancy of the patterns. According to the simulation results, the proposed approach outperformed other algorithms that have been suggested in the literature by a minimum of 15% and a maximum of 53%.
Efficient Prefetching and Client-side Caching Algorithms for Improving the Performance of Read Operations in Distributed File Systems
Ms Anusha Nalajala, Ranesh Naha.,Thirumalaisamy Ragunathan
Source Title: IEEE Access, Quartile: Q1, DOI Link
View abstract ⏷
Modern web applications are deployed in cloud computing systems because they support unlimited storage and computing power. One of the main back-end storage components of this cloud computing system is the distributed file system which allows massive amounts of data to be stored and accessed. In most web applications deployed in such systems, read operations are performed more frequently than write operations. Consequently, increasing the efficiency of read operations in distributed file systems is a challenging and important research problem. The two main procedures used in distributed file systems to improve the performance of read operations are prefetching and caching. In this paper, we proposed novel prefetching and multi-level caching algorithms based on the Access-Frequency and Access-Recency ranking of file blocks that were previously accessed by client application programs. We also proposed new augmented ranking algorithms for prefetching file blocks by combining the Access-Frequency and Access-Recency ranking of the file blocks. We used rank-based replacement algorithms to replace file blocks in the cache. The simulation results show that, the proposed algorithms improve the performance of read operations on distributed file systems by 29% to 77% in comparison to algorithms proposed in the literature.
Rank-Based Prefetching and Multi-level Caching Algorithms to Improve the Efficiency of Read Operations in Distributed File Systems
Ms Anusha Nalajala, Rathnamma Gopisetty., Vignesh Garrapally., Thirumalaisamy Ragunathan
Source Title: Lecture Notes in Computer Science, Quartile: Q3, DOI Link
View abstract ⏷
In the era of big data, web-based applications deployed in cloud computing systems have to store and process large data generated by the users of such applications. Distributed file systems are used as the back end storage component in the cloud computing systems and they are used for storing large data efficiently. Improving the read performance of the distributed file system is the important research problem as most of the web-based applications deployed in the cloud computing systems carry out read operations more frequently. Prefetching and caching are the two important techniques used for improving the performance of the read operations in the distributed file system. In this paper, we have proposed novel rank-based prefetching, multi-level caching and rank-based replacement algorithms for the effective caching process. Our simulation results reveal that the proposed algorithms improve the performance of the read operations carried out in the distributed file systems better than the algorithms proposed in the literature.