Background
Founded 2006 by NVIDIA Chief Scientist David Kirk
Mission: long-term strategic research
Discover & invent new markets
Influence product roadmaps
Follow, support, and focus academic research
Improve parallel computing education
© NVIDIA Corporation 2009
Topics
Visual computing
Real-time rendering, cinematic rendering, animation,
modeling, visualization, computational photography
Parallel computing
Programming languages, compilers, numerics, HPC
applications, architecture, circuit design, interconnects
Mobile computing
Low-power computing, networks, HCI
© NVIDIA Corporation 2009
Personnel
Currently 25 full-time researchers in CA, NC, MI, MN,
VA, UT, Berlin, Helsinki
2 National Academy members
1 Academy Award
5 recent former faculty
© NVIDIA Corporation 2009
External Research Collaborations
UC Berkeley: parallel programming
UC Davis – parallel algorithms
U British Columbia – imaging, architecture
U North Carolina – ray tracing, hybrid rendering
U Virginia – architecture, perceptual psychology
UCLA – oceanography
U Massachusetts – real-time rendering
Chalmers University – real-time rendering
U Utah – HPC, ray tracing
NC State – rendering algorithms
Johns Hopkins – data-intensive computing
Brown – computer vision
Saarland U – ray tracing
U Illinois – parallel programming
Weta – cinematic rendering
Williams College – real-time rendering
© NVIDIA Corporation 2009
Example: Skin Rendering
Real-time subsurface
scattering
Multilayer translucent
materials
~5 minutes ~11 ms
No precomputation
Key insight: project
diffusion profiles onto
sum-of-Gaussians basis
© NVIDIA Corporation 2009
Raytracing
© NVIDIA Corporation 2009
NVIRT: CUDA Ray Tracing API
© NVIDIA Corporation 2009
Example: Programming Languages
Copperhead: Cu + Python
Copperhead is a subset of Python, designed
for data parallelism
Python: extant, well accepted high level scripting language
Already understands things like map and reduce
Comes with a parser & lexer
The current Copperhead compiler takes a subset of
Python and produces CUDA code
© NVIDIA Corporation 2009
Copperhead is not Pure Python
Copperhead is not for arbitrary Python code
Most features of Python are unsupported
Connecting Python & Copperhead
code will require binding similar
to Python-C interaction
Copperhead is
compiled, not interpreted
Statically typed
Copperhead
© NVIDIA Corporation 2009
Python
Saxpy: Hello world
def saxpy(a, x, y):
return map(lambda xi, yi: a*xi + yi, x, y)
Some things to notice:
Types are implicit
The Copperhead compiler uses a Hindley-Milner type system
with typeclasses similar to Haskell
Typeclasses are fully resolved in CUDA via C++ templates
Functional programming:
map, lambda (or equivalent in list comprehensions)
you can pass functions around to other functions
Closure: the variable ‘a’ is free in the lambda function, but
bound to the ‘a’ in its enclosing scope
© NVIDIA Corporation 2009
Example: Parallel Programming
Data Structures
• thrust::device_vector
• thrust::host_vector
• thrust::device_ptr
• Etc.
Algorithms
• thrust::sort
• thrust::reduce
• thrust::exclusive_scan
• Etc.
thrust is a library of data parallel algorithms & data
structures with an interface similar to the C++
Standard Template Library for CUDA
C++ template metaprogramming automatically
chooses the fastest code path at compile time
© NVIDIA Corporation 2009
thrust::sort
sort.cu
#include <thrust/host_vector.h>
#include
#include
#include
#include
<thrust/device_vector.h>
<thrust/generate.h>
<thrust/sort.h>
<cstdlib>
int main(void)
{
// generate random data on the host
thrust::host_vector<int> h_vec(1000000);
thrust::generate(h_vec.begin(), h_vec.end(), rand);
// transfer to device and sort
thrust::device_vector<int> d_vec = h_vec;
// sort 140M 32b keys/sec on GT200
thrust::sort(d_vec.begin(), d_vec.end());
return 0;
}
© NVIDIA Corporation 2009
thrust::sort
sort.cu
#include <thrust/host_vector.h>
#include
#include
#include
#include
<thrust/device_vector.h>
<thrust/generate.h>
<thrust/sort.h>
<cstdlib>
int main(void)
{
// generate random data on the host
thrust::host_vector<int> h_vec(1000000);
thrust::generate(h_vec.begin(), h_vec.end(), rand);
// transfer to device and sort
thrust::device_vector<int> d_vec = h_vec;
// sort 140M 32b keys/sec on GT200
thrust::sort(d_vec.begin(), d_vec.end());
return 0;
}
© NVIDIA Corporation 2009
thrust::reduce
reduce.cu
#include
#include
#include
#include
<thrust/host_vector.h>
<thrust/device_vector.h>
<thrust/generate.h>
<thrust/reduce.h>
int main(void)
{
// generate random data on the host
thrust::host_vector<int> h_vec(1000000);
thrust::generate(h_vec.begin(), h_vec.end(), rand);
// compute sum
thrust::device_vector<int> d_vec = h_vec;
int x = thrust::reduce(d_vec.begin(), d_vec.end(),
thrust::plus<int>());
return 0;
}
© NVIDIA Corporation 2009
Thrust
thrust.googlecode.com
Open source (Apache2 license)
© NVIDIA Corporation 2009
Example: Sparse Matrix-Vector
© NVIDIA Corporation 2008
CPU Results from “Optimization of Sparse Matrix-Vector Multiplication on Emerging
Multicore Platforms", Williams et al, Supercomputing 2007
Radix Sorting Rate (pairs/sec)
Millions
Example: Sort Radix Sorting Rate
160
GTX 280
140
9800 GTX+
120
8800 Ultra
8800 GT
100
8600 GTS
80
60
40
20
1,000
10,000
100,000
1,000,000
Sequence Size (key-value pairs)
© NVIDIA Corporation 2009
10,000,000
Example: Fluid Dynamics
COLD
CIRCULATING
CELLS
HOT
INITIAL
TEMPERATURE
Rayleigh-Bénard Convection
© NVIDIA Corporation 2009
Rayleigh-Bénard Results
Double precision
384 x 384 x 192 grid (max that fits in 4GB)
Vertical slice of temperature at y=0
Transition from stratified (left) to turbulent (right)
Regime depends on Rayleigh number: Ra = gαΔT/κν
8.5x speedup versus Fortran code running on 8-core
2.5 GHz Xeon
© NVIDIA Corporation 2009
Mission:
Support Academic Research
Serve as academic liaison
Follow, inform, and influence external research
Direct support – funding and equipment
© NVIDIA Corporation 2009
Sponsored Research
Donate and discount equipment
Professor Partnerships
Ph.D. Fellowships
CUDA Centers of Excellence
New programs:
CUDA Fellows
CUDA Research Awards
© NVIDIA Corporation 2009
Mission:
Support Parallel Computing Education
Supporting courses & curricular efforts
Creating & gathering online training materials
Teaching courses (and putting them online)
Writing textbooks
© NVIDIA Corporation 2009
Final Thoughts – Education
We should teach parallel computing in CS 1 or CS 2
Computers don’t get faster, just wider
now
Manycore is the future of computing
Insertion Sort
Heap Sort
Merge Sort
Which goes faster on large data?
© NVIDIA Corporation 2009
ALL students need to understand this!
Early!
NVIDIA Research Summit
Sept 30 – Oct 2, 2009 – The Fairmont San Jose, California
A cross-disciplinary forum for researchers
using GPUs across science and engineering
Join your colleagues, researchers in other fields,
and the NVIDIA Research team for this valuable
opportunity to gather, learn, and collaborate.
Share your work with peers from many disciplines;
learn from experts at NVIDIA and elsewhere.
In-depth sessions on numeric computing,
computational science, visual computing trends, and
advanced CUDA programming & optimization
Opportunities:
Call for Posters open.
Showcase your work,
learn from your peers.
Research Roundtables
Moderated discussions
led by your peers.
Submit a roundtable to
shape the hot topics in
GPU computing!
Co-located with the GPU Technology Conference, a technical event focused on developers,
engineers, researchers, senior executives, venture capitalists, press and analysts
© NVIDIA Corporation 2009
© Copyright 2026 Paperzz