the phlips nexperia digital video platform

THE PHILIPS NEXPERIA
DIGITAL VIDEO PLATFORM
The Digital Video Revolution

Transition from Analog to Digital Video


Navigate, store, retrieve and share digital programs as well as
access to new interactive services and connectivity possibility
Home entertainment systems will be implemented
with a number of Digital Video appliances




Digital Televisions (DTVs)
DVD Players
Digital Video Recorders
Set-top Boxes
The Philips Nexperia Platform Approach

Philips Semiconductors decided to serve the
application domains of digital video and mobile with
a platform approach : Nexperia

Nexperia : ‘next experience’

Nexperia platform properties




Flexibility
Innovation
Future-proof
Using architecture framework and IP blocks
Nexperia-DVP Platform Conceps

Nexperia-DVP is a Reference Architecture

A set of documents the describes how the products of the Digital
Video Product Family will be partitioned into subsystems and how
functionality will be split over theses subsystems.
1)
2)
3)
Nexperia-DVP Soc Reference Architecture
Nexperia-DVP Software Reference Architecture
Nexperia-DVP System Reference Architecture
Nexperia-DVP SoC Reference Architecture
MIPS™
.
D$
TriMedia™
MMI
TriMedia CPU
PRxxxx
I$
TM-xxxx
DEVICE IP BLOCK
DEVICE IP BLOCK
.
.
.
DEVICE IP BLOCK
DVP SYSTEM SILICON
DEVICE IP BLOCK
PI BUS
DEVICE IP BLOCK
DVP MEMORY BUS
Library of Device
IP Blocks
 Image coprocessors
 DSPs
 UART
 1394
 USB
…and more
MIPS CPU
PI BUS
General-purpose
Scalable RISC
Processor
 50 to 300+ MHz
 32-bit or 64-bit
SDRAM
.
.
.
DEVICE IP BLOCK
D$
I$
Scalable VLIW
Media Processor:
 100 to 300+ MHz
 32-bit or 64-bit
Nexperia™
System Buses
 32-128 bit
Three Levels of Abstraction
Level 1 : DVP Software-Hardware Rules

Deals with the software view of the hardware









Unified Memory Architecture
Rules for Data Movement
Endianess
Ordering & Coherency
Interrupts
Data formats, including Pixel Formats
Trimedia-MIPS Communication
Protection
Boot
Three Levels of Abstraction
Level 2 : Device Transaction Level (DTL)

Deals with point-to-point transfers between Device
IPs and the Connection Network, specifying a Device
IP partition and architecture that is compatible with
Level 1

Two main types of ports : Typical of Device IP


Device Control & Status Ports
DMA ports
Three Levels of Abstraction
Level 3 : Connection Network


Deals with the more traditional Bus Hierarchy & Bus
Level
In second generation Nexperia-DVP


The Device Control & Status (DCS) Network
- Primarily a low latency communication path for the CPUs and other
initiators to access the control & status register in the Device IPs
The Memory Connection Network
- It will automate the generation of structures
- current generation is implemented with two key elements
: Memory Transaction Level (MTL) ports and protocol
& Pipelined Memory Access Network
Chiplet-based Design


Chiplet : partitioning
divides the logic
hierarchy into
manageable sized
blocks
A chiplet is a group of
modules which are
placed together
because either they are
synchronous to each
other or they are not
timing critical.
<PNX8550 chip layout>
Chiplet-based Design (Cont.)

The partitioning of the
top-level netlist among
chiplets followed these
guidelines



There should be as few
synchronous signal
crossings between chiplets
as possible
The clock module is placed
into a separate chiplet
because of its complexity
Cross-chiplet scan chains
are not allowed
<PNX8550 floorplan>
Nexperia-DVP Software Architecture






Scalable from low-end to high-end
Consistent API (on MIPS or TriMedia)
Single Streaming Architecture for MIPS and TriMedia
Aligned to Nexperia-DVP hardware architecture and
IP blocks
Operating system independent software layers
Re-use of software components on any instance of
the platform
Nexperia-DVP
System Reference Architecture

Performance Characteristics





The
The
The
The
system
system
system
system
must
must
must
must
work under normal conditions
have a given (short) maximum input/output delay
behave gracefully under exceptional condition
be as cost effective as possible
Critical Resources



Memory Bandwidth
CPU Cycles
(RAM) Memory Size
Effect of bandwidth
on transaction latency

The memory transaction latency experienced by a
CPU is influenced by four different factors




Minimum transaction cost
Transactions of other blocks that take precedence
Pending transactions of blocks that do not take precedence
Other transactions of the CPU itself
Effect of transaction latency
on CPU performance

Typical average behavior of two types of code
on a Nexperia-DVP system
DSP code
Control Code
Characteristics
Long expressions
Highly repetitive loops
Lots of switch/if statements
Many function calls
Working Set Size
Typically smaller than
CPUs I-cache size
Typically much larger than
CPUs I-cache size
40 or more
Between 1 and 3
20% and below on TM
80% and above on TM
60% and above on MIPS
D-Cache
I-Cache
400KB/s on TM
6.4MB/s on TM
1.5MB/s on MIPS
Typical instruction repetition rate before
cache line invalidate
Typical CPU stall cycles due to cache
misses on moderately loaded system
Predominant Cache Misses
Typical bandwidth consumed for each
effective CPU Mcycle
Memory Arbitration

To regulate the distribution of available bandwidth
over the requesting blocks : Utilize an advanced DDR
arbitration scheme



Fair distribution of bandwidth over requesting blocks
Shortest possible transaction latency for CPUs
Nexperia-DVP memory arbiter deploys a sophisticated algorithm
for distribution of bandwidth over requesting blocks


Guaranteed bandwidth
Priority based distribution of remainder
Scheduling Techniques

Scheduling of Hardware Accelerators
time
A1
A2
A1
A2
<Slicing>
“Run slice of A2 when
A1 finished a slice”
<Sequencing>
“Perform A2 after A1 is
finished”
A1
A2
<Staggering>
“Start A2 after a slice of
A1, and guarantee that
A2 will not overtake A1”
Scheduling Techniques (Cont.)

Properties of Scheduling Techniques
Sequencing
Slicing
Staggering
Software effort
Lowest
Higher, depends
on slice size
Low
Buffer memory
Largest
Smallest
Middle
Input/output
delay
Longest
Shorter
Shorter
Nothing
Must support
slicing
Must be
sequential and
localized
Requires of
processors