Multi-Dimensional Resource Allocation Strategy for Large-Scale Computational Grid Systems

Multi-Dimensional Resource Allocation Strategy for Large-Scale Computational Grid Systems

In this thesis, we propose a novel distributed resource-scheduling algorithm capable of handling multiple resource requirements for jobs that arrive in a Grid Computing Environment. In our proposed algorithm, referred to as Multi- Dimension Resource Scheduling (MRS) algorithm, we take into account both the site capabilities and the resource requirements of jobs. The main objective of the algorithm is to obtain a minimal execution schedule through efficient management of available Grid resources. We first propose a model in which the job and site resource characteristics can be captured together and used in the scheduling algorithm. To do so, we introduce the concept of a n-dimensional virtual map and resource potential. Based on the proposed model, we conduct rigorous simulation experiments with real-life workload traces reported in the literature to quantify the performance. We compare our strategy with most of the commonly used algorithms in place on performance metrics such as, job wait times, queue completion times, and average resource utilization. Our combined consideration of job and resource characteristics is shown to render high-performance with respect to above-mentioned metrics in the environment. Our study also reveals the fact that MRS scheme has a capability to adapt to both serial and parallel job requirements, especially when job fragmentation occurs. Our experimental results clearly show that MRS outperforms other strategies and we highlight the impact and importance of our strategy.
We further investigate the capability of this algorithm to handle failures through dimension expansion. Three types of pro-active failure handling strate- gies for grid environments are proposed. These strategies estimates the availabil- ity of resources in the Grid, and also preemptively calculate the expected long term capacity of the Grid. Using these strategies, we create modified versions of the backfill and replication algorithms to include all three pro- active strate- gies to ascertain each of its effectiveness in the prevention of job failures during execution. A variation of MRS called 3D-MRS is presented. The extended algorithm continues shows continual improvement when operating under the same execution environment. In our experiments, we compare these enhanced algo- rithms to their original forms, and show that pro-active failure handling is able to, in some cases, achieve a 0% job failure rate during execution. Also, we show that a combination of node based prediction and site capacity filter used with MRS provides the best balance of enhanced throughput and job failures during execution in the algorithms we have considered.