GPU Processing for Live Video Database

GPU Processing for
Distributed Live Video
Database
Jun Ye
[email protected]
Data Systems Group
Outline
• Introduction to GPU
• GPU language (OpenCL or CUDA)
• OpenCL programming
• Case Study: Live Video Database Management System (LVDBMS)
Introduction
• Current GPUs are more than graphics cards for rendering the images for
video games.
• They are used for more general purposes of all kinds of parallel computing.
(e.g. mining the Bitcoin, training the deep neural network in deep learning)
• GPGPU: general purpose GPU.
nVidia Tesla K20
nVidia Gforce GTX 580
GPU language
• Two main components: CUDA and OpenCL
• CUDA (2007)
•
Compute Unified Device Architecture created and owned by nVidia
• OpenCL (2009)
•
Open Computing Language. Designed by Apple and Khronos, public standard.
CUDA or OpenCL ?
• Proprietary
• Only work in nVidia’s card
• Normally has a higher
performance without any
tuning
•
•
•
•
•
Open Standards
A lot of hardware support: ATI, intel,
Apple, nVidia, Qualcomm, Xilinx, and
more…
Heterogeneous: PC, mobile device,
FPGA, DSP ….
Performance is generally not as good as
CUDA
Needs knowledge of the hardware to
tune the performance
Tip
One thing for sure:
ATI has a better support for OpenCL than nVidia.
OpenCL+ATI seems a better option than OpenCL+nVidia.
Brief intro to OpenCL Programming
• Best fit for problems of parallel computing (1D, 2D, 3D data)
• A big number of simple computations
• E.g. Array addition, matrix multiplication, image processing (e.g. Gaussian
blur)
• Greatly enhance the speed by orders of magnitude (hardware specific)
• Overhead, resource initialization, GPU/CPU memory swap
OpenCL programming
GPU memory model
http://de.wikipedia.org/wiki/Datei:OpenCL_Memory_model.svg
OpenCL programming
GPU memory model
• NDrange configuration
• Global work size
• Local work size
• Thread
http://gpgpu-computing4.blogspot.com/2009/09/matrix-multiplication-2-opencl.html
OpenCL programming
coding
• Host code: runs in the CPU (can be c/c++, python, matlab, javascript)
•
•
•
Initialize resources,
Config environment (global, local work item size, )
Buffer swapping
• Kernel code: runs in the device (GPU) (kernel language: .cl)
•
Execute the parallel computing
OpenCL programming
An example (C)
• Matrix multiplication
• A,B are all 1024by 1024 square matrix,
• Compute C=AxB
OpenCL programming
Hosting code:
•
•
•
#include <CL/cl.h>
Initialize device
•
•
•
•
clGetPlatformIDs
clGetDeviceIDs
clCreateContext
clCreateCommandQueue
Create program
•
•
•
•
LoadOpenCLKernel(“*.cl”)
clCreateProgramWithSource
clBuildProgram
clCreateKernel
OpenCL programming
Hosting code: (opencl binding code)
•
•
•
Create buffer
•
•
•
•
clCreateBuffer
clSetKernelArg
Set localworksize (must consider the hardware specs)
Set globalworksize (the dimension of your problem)
Buffer enque
•
clEnqueueNDRangeKernel
Read result from kernel
•
clEnqueueReadBuffer
OpenCL programming
/* kernel.cl Matrix multiplication: C = A * B. */
// OpenCL Kernel
__kernel void
matrixMul(__global float* C,
__global float* A,
__global float* B,
int wA, int wB)
{
int tx = get_global_id(0);
int ty = get_global_id(1);
// value stores the element that is computed by the thread
float value = 0;
for (int k = 0; k < wA; ++k)
{
float elementA = A[ty * wA + k];
float elementB = B[k * wB + tx];
value += elementA * elementB;
}
}
// Write the matrix to device memory each
// thread writes one element
C[ty * wA + tx] = value;
Demo
• I will show you the execution of the program
• And compare it against a naive CPU solution
• Source code available at
http://www.es.ele.tue.nl/~mwijtvliet/5KK73/?page=mmopencl
Case Study
• 1. Realistic ray tracing rendering
•
http://webcl.nokiaresearch.com/
• 2. Real-time 3D spatial-query in live video database
•
•
•
http://www.eecs.ucf.edu/~jye/demo.html
Jun Ye and Kien A. Hua, "Octree-based 3D Logic and Computation of Spatial
Relationships in Live Video Query Processing," ACM Transactions on Multimedia
Computing, Communications, and Applications (TOMM), 11 (2), December 2014.
Jun Ye and Kien A. Hua, "Exploiting Depth Camera for 3D Spatial Relationship
Interpretation," in proceedings of ACM Multimedia Systems 2013, Oslo, Norway.
Real-time 3D spatial-query in live video database
• Background: A live video database management system
• Technique: Distributed Live video computing
• Components:
Distributed 3D cameras (Microsoft Kinect)
Camera servers
Query processing servers
Real-time 3D spatial-query in live video database
•
•
3D spatial operators
GPU-accelerated computing algorithm
Real-time 3D spatial-query in live video database
• Spatial-temporal event query
• E.g. a person walks out of a room and enter the room next door
Real-time 3D spatial-query in live video database
Thank you.
• Questions?