Optimizing Resource Provisioning by using GPU Usage Pattern Extraction in GPU-based Cloud Environment Outline • Introduction • Background and Motivation • System Overview • Scheduling Polices • Experimental Evaluation • Related Work • Conclusion and Future Work 2 Introduction • Many of the large-scale cloud providers such as Amazon EC2, Nimbix, Peer1 Hosting and Penguin Computing supply GPU services. • The usage effectiveness of GPUs in such cloud environment presents to be low resource utilization, long turnaround time and low system throughput. • It is limited by static provisioning of GPU resources (the dedicated access). • In order to optimize resource provisioning, one approach is to schedule multiple applications on multiple GPUs. • Nevertheless, that multiple applications run on the same multi-tasked GPU device may lead to performance degradation for one or more applications. 3 Introduction (cont.) • Therefore, for optimizing resource provisioning, it is crucial that: • Obtain the characteristics/behaviors of applications before the actual execution. • Explore some scheduling algorithms. • The support for acquiring the characteristics and behaviors of applications has advanced over years. • CUPTI • PAPI, Tau, Vampir • Mystic • The disadvantages of the existing approaches. Related Work Obtain applications behavior before execution Modify the source code The extra overhead The accuracy of the result CUPTI [11] No No Little high PAPI [12], Tau[13], Vampir [14] No Yes Little high Mystic [10] Yes No Large XXX Yes No Little low 4 high Introduction (cont.) • In this paper, we define GPU usage pattern as an application’s access pattern to a GPU device during the execution of the application, represented by a directed graph in which each vertex indicates a pivotal CUDA activity such as GPU kernel function execution, GPU memory allocation, Host-to-Device memory copy and Device-to-Host memory copy, etc. • Extract GPU usage pattern. • Propose 2 scheduling algorithm. • Our contributions: • Propose a method of extracting GPU usage pattern by using intermediate code. • Two scheduling methods are proposed for different application scenarios. • system XXX is implemented, which is readily deployable within current data centers without hardware modification. 5 Background and Motivation • The rationality of using intermediate code • Source code vs. intermediate code (Safety, friendly) • CUDA also supports Java, Python and other interpreted languages . After compiling, the byte code is generated, that is what we call the intermediate code. Similarly, if given the intermediate code which comes from C/C++ program, it can also be analyzed too. • In the analysis of these intermediate codes, the analysis of the C/C++ intermediate code is the most difficult. This paper takes the analysis of the C/C++ intermediate code as an example. • The feasibility of using static analysis • GPU based applications’ feature 1: total time of using CPU << total time of using GPU • GPU based applications’ feature 2: CPU code control process, prepare data while GPU code is responsible for computing 6 Background and Motivation (cont.) • The reason of extracting program features to optimize the scheduling • Applications have different GPU usage pattern. That multiple applications run on the same multitasked GPU device may lead to performance degradation. Resource contention is the fundamental factor that causes the decline of program performance. The performance degradation means the increase of program turnaround time and the decrease of system throughput. • Lexical analyzer • Lexical analyzer used to parse the meaning of each string in the code. In this paper, we will use it to identify the key function, syntax module and so on. 7 System Overview • Architecture • Key algorithms • GPU usage pattern extraction (3 stage) • Compute GPU key CUDA call time 8 Scheduling Polices • Interference aware scheduler by using GPU resource demand • If there are other idle GPUs, the application is assigned to the GPU. • Otherwise, get the GPU resource demand vector 𝑣1 of the application, compute the similarity between 𝑣1 and every vector in GPU. The higher the similarity, the higher the interference score. • Interference aware scheduler by using GPU key call graph • If there are other idle GPUs, the application is assigned to the GPU. • Otherwise, get the GPU key call graph of the application, compute the similarity between it and every graph in GPU. The higher the similarity, the higher the interference score 9 Experimental Evaluation • The accuracy of GPU usage pattern extraction • Scheduling performance ( Our scheduling vs. LL, RR) • System performance (ANTT, STP) • GPU utilization (Our scheduling vs. LL) • Quality of launch sequence selection (COV) • Scheduling decision quality (scheduling fairness) • Scheduling overhead using GPU usage pattern extraction 10 Related Work • Obtain the characteristics/behavirs of application • S. Browne, J. Dongarra, et al. "A Portable Programming Interface for Performance Evaluation on Modern Processors." International Journal of High Performance Computing Applications 14.3(2000):189-204. • S. S. Shende, A. D. Malony. "The Tau Parallel Performance System." International Journal of High Performance Computing Applications 20.2(2006):287-311. • Knüpfer, Andreas, et al. "The Vampir Performance Analysis Tool-Set." TOOLS for High PERFORMANCE Computing - Proceedings of the, International Workshop on Parallel TOOLS for High PERFORMANCE Computing, July 2008, Hlrs, Stuttgart DBLP, 2008:139-155. • Scheduling Polices • Phull, Rajat, et al. "Interference-driven resource management for GPU-based heterogeneous clusters." International Symposium on High-Performance Parallel and Distributed Computing ACM, 2012:109-120. • Sengupta, Dipanjan, et al. "Scheduling multi-tenant cloud workloads on accelerator-based systems." High Performance Computing, Networking, Storage and Analysis, SC14: International Conference for. IEEE, 2014:513-524. • Ukidave, Yash, X. Li, and D. Kaeli. "Mystic: Predictive Scheduling for GPU Based Cloud Servers Using Machine Learning." IEEE International Parallel and Distributed Processing Symposium IEEE, 2016:353-362. 11 Thanks! 12
© Copyright 2026 Paperzz