Developing high
performance applications
with .NET Compact
Framework
Deepak Gulati
ISV Developer Evangelist
Microsoft
Hardware/Drivers
OEM/IHV Supplied
Programming
Model
Data
Device Building
Tools
BSP
(ARM, SH4, MIPS)
EDB
SQL Server 2005 Mobile Edition
Relational
Native
Server Side
Standard PC
Hardware and Drivers
Windows XP DDK
Windows Embedded
Studio
Platform Builder
Lightweight
Managed
OEM Hardware and
Standard Drivers
SQL Server 2005 Express Edition
SQL Server 2005
Win32
MFC 8.0, ATL 8.0
.NET Compact Framework
ASP.NET Mobile Controls
.NET Framework
ASP.NET
Windows Media
DirectX
Multimedia
Location Services
MapPoint
Development Tools
Visual Studio 2005
Internet Security and Acceleration Server
Exchange Server
Live Communications Server
Speech Server
Communications
& Messaging
Device Update Agent
Management
Tools
Image Update
Software Update Services
Systems Management Server
Microsoft Operations Manager
Measuring Performance
Overview
Basic technique involves:
Find start time
Find end time
Calculate delta
Measuring Performance
Overview
Start and End times can be measured in
various ways
GetTickCount, a Win32 API function
Environment.TickCount is its managed
code equivalent
Both return int that represents time in ms that
has passed since the device was booted
Can also use System.DateTime and get
System.TimeSpan by subtracting Start
and End values
Measuring Performance
Overview
There can be issues with these
techniques:
For a device that has been on for a long
time, TickCount clips and goes negative
Not great for measuring ‘short’ operations,
there can be a variation of upto 500 ms
System.Date also suffers from accuracy
issues
Measuring Performance
Overview
QueryPerformanceCounter/QueryPerfor
manceFrequency to the rescue!
High resolution timer – OEM specific
implementation
Defaults to GetTickCount if not available
Measuring Performance
Overview
No managed implementation available for
QueryPerformanceCounter or Frequency
PInvoke QueryPerformanceFrequency and
get the clock frequency of the device/sec.
Divide by 1000 to get the clock frequency/ms
PInvoke QueryPerformanceCounter before
your call. Make your call. PInvoke
QueryPerformanceCounter again
End – Start / frequency/ms will give you time
for your call in ms
Demo
Using QueryPerformanceCounter
Measuring Performance
Overview
Micro-benchmarks versus Scenarios
Benchmarking tips
Start from known state
Ensure nothing else is running
Measure multiple times, take average
Run each test in own AppDomain / Process
Log results at the end
Understand JIT-time versus runtime cost
.NET Compact Framework
.NET Compact Framework Performance v1->v2
(Pocket PC 2003, XScale 400MHz)
Bigger
is better
Smaller
is better
1.0
1.0 SP3
V2
Beta1
V2 B2+
Method Calls (Calls/sec)
3.7M
7.1M
8.1M
Virtual Calls (Calls/sec)
2.4M
2.7M
5.6M
Simple P/Invoke (Calls/sec)
733K
Primes (to 1500) (iterations/sec)
562
832
853
GC Small (8 bytes) (Bytes/sec)
1M
7M
7.5M
GC Array (100 int’s) (Bytes/sec)
25M
43M
112M
1.7M
XML Text Reader 200KB (seconds)
1.7
1.2
0.72
0.69
DataSet (static data)
4 tables, 1000 records (seconds)
13.1
6.6
7.3
3.3
DataSet (ReadXml)
3 tables, 100 records (seconds)
12.3
6.5
5.2
4.4
Measuring Performance
Performance Counters
There will be times when an application runs slow and the code
looks fine
.NET CompactFramework can be made to report performance
statistics
<My App>.stat (formerly mscoree.stat)
http://msdn.microsoft.com/library/enus/dnnetcomp/html/netcfperf.asp
Registry
HKLM\SOFTWARE\Microsoft\.NETCompactFramework\PerfMonitor
Counters (DWORD) = 1
What does .stat tell you?
Working set and performance statistics
More counters added in v2
Generics usage
COM interop usage
Number of boxed valuetypes
Threading and timers
GUI objects
Network activity (socket bytes send/received)
Demo
Enabling .NET Compact Framework
Performance Statistics
.stat
counter
Total Program Run Time (ms)
App Domains Created
App Domains Unloaded
Assemblies Loaded
Classes Loaded
Methods Loaded
Closed Types Loaded
Closed Types Loaded per Definition
Open Types Loaded
Closed Methods Loaded
Closed Methods Loaded per Definition
Open Methods Loaded
Threads in Thread Pool
Pending Timers
Scheduled Timers
Timers Delayed by Thread Pool Limit
Work Items Queued
Uncontested Monitor.Enter Calls
Contested Monitor.Enter Calls
Peak Bytes Allocated (native + managed)
Managed Objects Allocated
Managed Bytes Allocated
Managed String Objects Allocated
Bytes of String Objects Allocated
Garbage Collections (GC)
Bytes Collected By GC
Managed Bytes In Use After GC
Total Bytes In Use After GC
GC Compactions
Code Pitchings
Calls to GC.Collect
GC Latency Time (ms)
Pinned Objects
Objects Moved by Compactor
Objects Not Moved by Compactor
Objects Finalized
Boxed Value Types
Process Heap
Short Term Heap
JIT Heap
App Domain Heap
GC Heap
Native Bytes Jitted
Methods Jitted
Bytes Pitched
total
55937
18
18
323
18852
37353
730
730
78
46
46
0
46
0
46
57240
0
4024363
1015100
37291444
112108
4596658
33
25573036
17
6
0
279
156
73760
11811
6383
350829
7202214
26910
1673873
last datum
8
1
0
0
28
41592
23528
3091342
16
1626
0
0
0
0
152
0
n
385
40
6
93
1015100
33
33
33
33
430814
178228
88135
741720
376
26910
7047
mean
1
1
1
0
36
774940
259414
2954574
8
511970
718
357796
647240
855105
267
237
min
1
1
0
0
8
41592
23176
1833928
0
952
0
0
0
0
80
0
Peak Bytes Allocated (native + managed)
JIT Heap
App Domain Heap
GC Heap
Garbage Collections (GC)
GC Latency Time (ms)
Boxed Value Types
Managed String Objects Allocated
max
8
2
3
1
55588
1096328
924612
3988607
31
962130
21532
651663
833370
2097152
5448
5448
.NET Compact Framework
How we are different?
Portable JIT Compiler
Fast code generation, less optimized
May pitch JIT-compiled code under
memory pressure
No NGen, install time or persisted code
Interpreted virtual calls (no v-tables)
Simple mark and sweep GC,
non generational
Common Language Runtime
Execution Engine
Call path
Managed calls are more expensive than native
Instance call: ~2-3X the cost of a native function call
Virtual call: ~1.4X the cost of a managed instance call
Platform invoke: ~5X the cost of managed instance call
(*Marshal int parameter)
Properties are calls
JIT compilers
All platforms has the same optimizing JIT compiler
architecture in v2
Optimizations
Method inlining for simple methods
Variable enregistration
Common Language Runtime
Call path (sample)
public class Shape
{
protected int m_volume;
public virtual int Volume
{
get {return m_volume;}
}
}
public class Cube:Shape
{
public MyType(int vol)
{
m_volume = vol;
}
}
public class Shape
{
protected int m_volume;
public int Volume
{
get {return m_volume;}
}
}
public class Cube:Shape
{
public MyType(int vol)
{
m_volume = vol;
}
}
Common Language Runtime
Call path (sample)
public class MyCollection
{
private const int m_capacity = 10000;
private Shape[] storage = new Shape[m_capacity];
…
public void Sort()
{
callvirt instance int32 Shape::get_Volume()
Shape tmp;
for (int i=0; i<m_capacity-1; i++) {
for (int j=0; j<m_capacity-1-i; j++)
if (storage[j+1].Volume < storage[j].Volume){
tmp = storage[j];
storage[j] = storage[j+1];
storage[j+1] = tmp;
}
}
}
}
Common Language Runtime
Call path (sample)
public class Shape
57
{
protected int m_volume;
public virtual int Volume
{
get {return m_volume;}
}
}
public class Cube:Shape
{
public MyType(int vol)
{
m_volume = vol;
}
}
public class Shape
sec
39
{
protected int m_volume;
public int Volume
{
get {return m_volume;}
}
}
public class
Cube:Shape
•No virtual
call overhead
{
•Inlinedpublic
(no call
overhead
MyType(int
vol) at
{
~ Equal to
accessing
m_volume
= vol;field
}
}
sec
all)
Common Language Runtime
Garbage Collector
What triggers a GC?
Memory allocation failure
1M of GC objects allocated (v2)
Application going to background
GC.Collect() (Avoid “helping” the GC!)
What happens at GC time?
Freezes all threads at safe point
Finds all live objects and marks them
An object is live if it is reachable from root location
Unmarked objects are freed and added to finalizer queue
Finalizers are run on a separate thread
GC pools are compacted if required (less than 750K of
free space)
Return free memory to the operating system
In general, if you don’t allocate objects,
GC won’t occur
Beware of side-effects of calls that may allocate objects
http://blogs.msdn.com/stevenpr/archive/2004/07/26/197254.aspx
Common Language Runtime
Garbage Collector
GC Latency per collection
90
80
GC latency (ms)
70
60
50
40
30
20
10
0
0
100000
300000
Number of Live Objects
500000
Common Language Runtime
Garbage Collector
Allocation rate
Allocation rate iter/sec
160000
140000
120000
100000
80000
60000
40000
20000
0
400
4000
20000
40000
Object size (bytes)
80000
Common Language Runtime
Garbage Collector
Allocation throughput
Allocation throughput Mb/sec
90
80
70
60
50
40
30
20
10
0
8
400
4000
20000
Object size (bytes)
40000
80000
Common Language Runtime
Where garbage comes from?
Unnecessary string copies
Strings are immutable
String manipulations (Concat(), etc.)
cause copies
Use StringBuilder
String result = "";
for (int i=0; i<10000; i++) {
result +=
".NET Compact Framework";
result += " Rocks!";
}
StringBuilder result =
new StringBuilder();
for (int i=0; i<10000; i++){
result.Append(".NET Compact
Framework");
result.Append(" Rocks!");
}
.stat
counter
Total Program Run Time (ms)
App Domains Created
App Domains Unloaded
Assemblies Loaded
Classes Loaded
Methods Loaded
Closed Types Loaded
Closed Types Loaded per Definition
Open Types Loaded
Closed Methods Loaded
Closed Methods Loaded per Definition
Open Methods Loaded
Threads in Thread Pool
Pending Timers
Scheduled Timers
Timers Delayed by Thread Pool Limit
Work Items Queued
Uncontested Monitor.Enter Calls
Contested Monitor.Enter Calls
Peak Bytes Allocated (native + managed)
Managed Objects Allocated
Managed Bytes Allocated
Managed String Objects Allocated
Bytes of String Objects Allocated
Garbage Collections (GC)
Bytes Collected By GC
Managed Bytes In Use After GC
Total Bytes In Use After GC
GC Compactions
Code Pitchings
Calls to GC.Collect
GC Latency Time (ms)
Pinned Objects
Objects Moved by Compactor
Objects Not Moved by Compactor
Objects Finalized
Boxed Value Types
Process Heap
Short Term Heap
JIT Heap
App Domain Heap
GC Heap
Native Bytes Jitted
Methods Jitted
Bytes Pitched
Methods Pitched
Method Pitch Latency Time (ms)
Exceptions Thrown
Platform Invoke Calls
total
11843
1
1
2
175
198
0
0
0
0
0
0
1
0
1
2
0
3326004
60266
5801679432
20041
5800480578
4912
5918699036
0
0
0
686
0
0
0
1
3
22427
98
0
0
0
0
last datum
0
0
0
0
28
1160076
580752
1810560
0
278
0
0
0
0
140
0
0
-
n
0
0
2
2
60266
4912
4912
4912
4912
235
278
360
1341
35524
98
0
0
-
mean
0
0
0
0
96267
1204946
381831
1611885
0
2352
986
12103
46799
2095727
228
0
0
-
min
0
0
0
0
8
597824
8364
1097856
0
68
0
0
0
0
68
0
0
-
Run time 173 sec
max
0
0
1
1
580020
1572512
580752
1810560
16
8733
10424
24444
64562
3276800
1367
0
0
-
String result = "";
for (int i=0; i<10000; i++) {
result += ".NET Compact Framework";
result += " Rocks!";
}
Managed String Objects Allocated
Garbage Collections (GC)
Bytes of String Objects Allocate
Bytes Collected By GC
GC latency
0
-
-
-
-
20040
4912
5,800,480,574
5,918,699,036
107128 ms
-
.stat
counter
Total Program Run Time (ms)
App Domains Created
App Domains Unloaded
Assemblies Loaded
Classes Loaded
Methods Loaded
Closed Types Loaded
Closed Types Loaded per Definition
Open Types Loaded
Closed Methods Loaded
Closed Methods Loaded per Definition
Open Methods Loaded
Threads in Thread Pool
Pending Timers
Scheduled Timers
Timers Delayed by Thread Pool Limit
Work Items Queued
Uncontested Monitor.Enter Calls
Contested Monitor.Enter Calls
Peak Bytes Allocated (native + managed)
Managed Objects Allocated
Managed Bytes Allocated
Managed String Objects Allocated
Bytes of String Objects Allocated
Garbage Collections (GC)
Bytes Collected By GC
Managed Bytes In Use After GC
Total Bytes In Use After GC
GC Compactions
Code Pitchings
Calls to GC.Collect
GC Latency Time (ms)
Pinned Objects
Objects Moved by Compactor
Objects Not Moved by Compactor
Objects Finalized
Boxed Value Types
Process Heap
Short Term Heap
JIT Heap
App Domain Heap
GC Heap
Native Bytes Jitted
Methods Jitted
Bytes Pitched
Methods Pitched
Method Pitch Latency Time (ms)
Exceptions Thrown
Platform Invoke Calls
total
11843
1
1
2
175
198
0
0
0
0
0
0
1
0
1
2
0
3326004
60266
5801679432
20041
5800480578
4912
5918699036
0
0
0
686
0
0
0
1
3
22427
98
0
0
0
0
last datum
0
0
0
0
28
1160076
580752
1810560
0
278
0
0
0
0
140
0
0
-
n
0
0
2
2
60266
4912
4912
4912
4912
235
278
360
1341
35524
98
0
0
-
mean
0
0
0
0
96267
1204946
381831
1611885
0
2352
986
12103
46799
2095727
228
0
0
-
min
0
0
0
0
8
597824
8364
1097856
0
68
0
0
0
0
68
0
0
-
Run time 0.1 sec
max
0
0
1
1
580020
1572512
580752
1810560
16
8733
10424
24444
64562
3276800
1367
0
0
-
StringBuilder result = new StringBuilder();
for (int i=0; i<10000; i++){
result.Append(".NET Compact
Framework");
result.Append(" Rocks!");
}
Managed String Objects Allocated
Bytes of String Objects Allocated
Garbage Collections (GC)
Bytes Collected By GC
GC Latency
0
-
-
-
-
56
2097718
2
1081620
21 ms
-
Last notes on StringBuilder
Remember it's all about reducing
memory traffic
If you roughly know the expected
length of your final string – allocate that
much before hand (StringBuilder
constructor)
Getting the string out of a StringBuilder
doesn't cause a new alloc, the existing
buffer is converted into a string
http://weblogs.asp.net/ricom/archive/2003/12/02/40778.aspx
Common Language Runtime
Where garbage comes from?
Unnecessary boxing
Value types allocated on the stack
(fast to allocate)
Boxing causes a heap allocation and a copy
Use strongly typed arrays and collections
(framework collections are NOT strongly typed)
class Hashtable {
struct bucket {
Object key;
Object val;
}
bucket[] buckets;
public Object this[Object key] { get; set; }
}
Demo
String vs. StringBuilder
Common Language Runtime
Generics
Fully specialized implementation in .NET
Compact Framework v2
Pros
Strongly typed
No unnecessary boxing and type casts
Specialized code is more efficient than shared
Cons
Internal execution engine data structures and JITcompiled code aren’t shared
List<int>, List<string>, List<MyType>
http://blogs.msdn.com/romanbat/archive/2005/01/0
6/348114.aspx
Common Language Runtime
Finalization and Dispose
Cost of finalizers
Non-deterministic cleanup
Extends lifetime of object
In general, rely on GC for automatic memory
cleanup
The exceptions to the rule…
If your object contains an unmanaged resource
that the GC is unaware of, you need to implement a
finalizer
Also implement Dispose pattern to release unmanaged
resource in deterministic manner
Dispose method should suppress finalization
If the object you are using implements Dispose,
call it when you are done with the object
Assumes an unmanaged resource in the object chain
Common Language Runtime
Sample Code: Finalization and Dispose
class SerialPort : IDisposable {
IntPtr SerialPortHandle;
public SerialPort(String name) {
// Platform invoke to native code to open serial port
SerialPortHandle = SerialOpen(name);
}
~SerialPort() {
// Platform invoke to native code to close serial port
SerialClose(SerialPortHandle);
}
public void Dispose() {
// Platform invoke to native code to close serial port
SerialClose(SerialPortHandle);
GC.SuppressFinalize(this);
}
}
Common Language Runtime
Sample Code: Finalization and Dispose
class SerialTrace : IDisposable {
SerialPort serialPort;
public SerialTrace() {
serialPort = new SerialPort();
}
public void Dispose() {
serialPort.Dispose();
}
}
Common Language Runtime
Exceptions
Exceptions are cheap…until you throw
Throw exceptions in exceptional
circumstances
Do not use exceptions for normal
flow control
Use performance counters to track the
number of exceptions thrown
Replace “On Error/Goto” with
“Try/Catch/Finally” in Microsoft Visual
Basic® .NET
Common Language Runtime
Reflection
Reflection can be expensive
Reflection performance cost
Type comparisons (for example: typeof() )
Member enumerations (for example: Type.GetFields())
Member access (for example: Type.InvokeMember())
Think ~10-100x slower
Working set cost
Runtime data structures
Think ~100 bytes per loaded type, ~80 bytes per loaded method
Be aware of APIs that use reflection as a side effect
Override
Object.ToString()
GetHashCode() and Equals() (for value types)
Common Language Runtime
Building a Cost Model for Managed Math
Math performance
32 bit integers: Similar to native math
64 bit integers: ~5-10X cost of native math
Floating point: Similar to native math
ARM processors do not have FPU
.NET Compact Framework
Redist
FX
MSI Setup
(ActiveSync)
Per Device CAB
Install (SMS, etc)
Globalization
Microsoft.
VisualBasic
System.
Reflection
System
System.
Data
mscorlib
System.Xml
Debugger
JIT Compiler
& GC
Calendar
Data
Class
Loader
Assembly
Cache
Culture
Data
App Domain
Loader
Native
Interop
Process
Loader
Memory and
Threading
Crypto
System.
System.
Globalization Cryptography
I/O
Net
GUI
System.
IO.Ports
System.
WebServices
DirectX.
DirectD3DM
Microsoft.
Win32.Registry
System.Net.
Http*
Windows.
Forms
System.IO.
File
System.Net.
Sockets
System.
Drawing
File I/O
NTLM
Common
Controls
Registry
SSL
GDI/GWES
Sockets
D3DM
Visual Studio
Debug Engine
ICorDbg
Host
CLR
Sorting
Crypto API
Managed Loader
Cert/Security
File Mapping
Verification
Windows CE
Encodings
Casing
Base Class Library
Collections
Pre-size collection classes appropriately
Resizing creates unnecessary copies
Beware of foreach overhead, use indexer
when available
ArrayList al = new ArrayList(string_array);
foreach (MyType mt in al){//do something;}
will be compiled into:
callvirt instance class
IEnumerator::GetEnumerator()
…
callvirt
instance object
IEnumerator::get_Current()
…
callvirt
instance bool
IEnumerator::MoveNext()
Windows Forms
Best Practices
Load and cache Forms in the background
Populate data separate from Form.Show()
Pre-populate data, or
Load data async to Form.Show()
Use BeginUpdate/EndUpdate when it is available
e.g. ListView, TreeView
Use SuspendLayout/ResumeLayout when
repositioning controls
Keep event handling code tight
Process bigger operations asynchronously
Blocking in event handlers will affect UI responsiveness
Form load performance
Reduce the number of method calls during initialization
Graphics And Games
Best Practices
Compose to off-screen buffers to minimize
direct to screen blitting
Approximately 50% faster
Avoid transparent blitting in areas that
require performance
Approximate 1/3 speed of normal blitting
Consider using pre-rendered images versus
using System.Drawing rendering primitives
Need to measure on a case-by-case basis
XML
Best Practices for Managing Large XML Data Files
Use XMLTextReader/XMLTextWriter
Smaller memory footprint than using XmlDocument
XmlTextReader is a pull model parser which only reads a
“window” of the data
XmlDocument builds a generic, untyped object model
using a tree
Type stored as string
OK to use with smaller documents (64K XML: ~0.25s)
Optimize the structure of XML document
Use elements to group
Allows use of Skip() in XmlReader
Use attributes to reduce size – processing attribute-centric
documents is faster
Keep it short! (attribute and element names)
Avoid gratuitous use of white space
XML
Creating optimized Reader/Writer
In v2 use XmlReader/XmlWriter factory
classes to create optimized reader or writer
Applying proper XMLReaderSettings can
improve performance
XmlReaderSettings settings = new XmlReaderSettings();
settings.IgnoreWhitespace = true;
XmlReader reader = XmlReader.Create(“my.xml”,settings);
Up to 30% performance increase when
IgnoreWhitespace = true is specified
(depends on document format)
Demo
XmlDocument vs. XmlTextReader
XML
Reading local data with DataSet
DataSet is a database independent
container of relational data
Allows you to work with XML
ReadXml Allows you to load XML data into
DataSet
Simple to use, but performs badly,
especially with large XML files
If you must use DS.ReadXml, make sure
that you first supply the schema
Use XmlReader whereever possible for
traversing through your data
Demo
DataSet and .NET CompactFramework
Non-XML local data
Reading files locally
It might be required to read text file
stored locally on the device
StreamReader and FileStream classes
are typically employed
For large file sizes (>100 K), FileStream
outperforms StreamReader
StreamReader specifically looks for linebreaks, FileStream does not
Web Services
Where is a bottleneck
Are you network bound or CPU bound?
Use perf counters: socket bytes sent / received
Do you come close to the network capacity?
If you are network bound – work on reducing the size
of the message
Create a “canned” message, send over HTTP;
Compare performance with the web service;
If you are CPU bound, optimize the serialization
scheme for speed
http://blogs.msdn.com/mikezintel/archive/2005/03
/30/403941.aspx
Moving Forward
More tools
Live Remote Performance Counters
(new in v2)
Under construction:
Allocation profiler (CLR profiler)
Call profiler
Working set improvements
More speed
Summary
Make performance a requirement
and measure
Understand the APIs
Isolate exactly what is being measured
Repeat tests several times and ignore the first time which is
affected by JITting
Track the results in order for later comparisons and review
Ensure comparison of Apples to Apples
Use real code when possible
Test multiple designs and strategies - Understand the
differences or variation
Avoid unnecessary object allocation and copies due to
String manipulations
Boxing
Not pre-sized collections
Performance FAQ
http://blogs.msdn.com/netcfteam/archive/2005/05/04/414820.aspx
© Copyright 2025 Paperzz