GPU Processing for Distributed Live Video Database Jun Ye [email protected] Data Systems Group Outline • Introduction to GPU • GPU language (OpenCL or CUDA) • OpenCL programming • Case Study: Live Video Database Management System (LVDBMS) Introduction • Current GPUs are more than graphics cards for rendering the images for video games. • They are used for more general purposes of all kinds of parallel computing. (e.g. mining the Bitcoin, training the deep neural network in deep learning) • GPGPU: general purpose GPU. nVidia Tesla K20 nVidia Gforce GTX 580 GPU language • Two main components: CUDA and OpenCL • CUDA (2007) • Compute Unified Device Architecture created and owned by nVidia • OpenCL (2009) • Open Computing Language. Designed by Apple and Khronos, public standard. CUDA or OpenCL ? • Proprietary • Only work in nVidia’s card • Normally has a higher performance without any tuning • • • • • Open Standards A lot of hardware support: ATI, intel, Apple, nVidia, Qualcomm, Xilinx, and more… Heterogeneous: PC, mobile device, FPGA, DSP …. Performance is generally not as good as CUDA Needs knowledge of the hardware to tune the performance Tip One thing for sure: ATI has a better support for OpenCL than nVidia. OpenCL+ATI seems a better option than OpenCL+nVidia. Brief intro to OpenCL Programming • Best fit for problems of parallel computing (1D, 2D, 3D data) • A big number of simple computations • E.g. Array addition, matrix multiplication, image processing (e.g. Gaussian blur) • Greatly enhance the speed by orders of magnitude (hardware specific) • Overhead, resource initialization, GPU/CPU memory swap OpenCL programming GPU memory model http://de.wikipedia.org/wiki/Datei:OpenCL_Memory_model.svg OpenCL programming GPU memory model • NDrange configuration • Global work size • Local work size • Thread http://gpgpu-computing4.blogspot.com/2009/09/matrix-multiplication-2-opencl.html OpenCL programming coding • Host code: runs in the CPU (can be c/c++, python, matlab, javascript) • • • Initialize resources, Config environment (global, local work item size, ) Buffer swapping • Kernel code: runs in the device (GPU) (kernel language: .cl) • Execute the parallel computing OpenCL programming An example (C) • Matrix multiplication • A,B are all 1024by 1024 square matrix, • Compute C=AxB OpenCL programming Hosting code: • • • #include <CL/cl.h> Initialize device • • • • clGetPlatformIDs clGetDeviceIDs clCreateContext clCreateCommandQueue Create program • • • • LoadOpenCLKernel(“*.cl”) clCreateProgramWithSource clBuildProgram clCreateKernel OpenCL programming Hosting code: (opencl binding code) • • • Create buffer • • • • clCreateBuffer clSetKernelArg Set localworksize (must consider the hardware specs) Set globalworksize (the dimension of your problem) Buffer enque • clEnqueueNDRangeKernel Read result from kernel • clEnqueueReadBuffer OpenCL programming /* kernel.cl Matrix multiplication: C = A * B. */ // OpenCL Kernel __kernel void matrixMul(__global float* C, __global float* A, __global float* B, int wA, int wB) { int tx = get_global_id(0); int ty = get_global_id(1); // value stores the element that is computed by the thread float value = 0; for (int k = 0; k < wA; ++k) { float elementA = A[ty * wA + k]; float elementB = B[k * wB + tx]; value += elementA * elementB; } } // Write the matrix to device memory each // thread writes one element C[ty * wA + tx] = value; Demo • I will show you the execution of the program • And compare it against a naive CPU solution • Source code available at http://www.es.ele.tue.nl/~mwijtvliet/5KK73/?page=mmopencl Case Study • 1. Realistic ray tracing rendering • http://webcl.nokiaresearch.com/ • 2. Real-time 3D spatial-query in live video database • • • http://www.eecs.ucf.edu/~jye/demo.html Jun Ye and Kien A. Hua, "Octree-based 3D Logic and Computation of Spatial Relationships in Live Video Query Processing," ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 11 (2), December 2014. Jun Ye and Kien A. Hua, "Exploiting Depth Camera for 3D Spatial Relationship Interpretation," in proceedings of ACM Multimedia Systems 2013, Oslo, Norway. Real-time 3D spatial-query in live video database • Background: A live video database management system • Technique: Distributed Live video computing • Components: Distributed 3D cameras (Microsoft Kinect) Camera servers Query processing servers Real-time 3D spatial-query in live video database • • 3D spatial operators GPU-accelerated computing algorithm Real-time 3D spatial-query in live video database • Spatial-temporal event query • E.g. a person walks out of a room and enter the room next door Real-time 3D spatial-query in live video database Thank you. • Questions?
© Copyright 2026 Paperzz