Stress / robustness / resiliency / long duration tests are

OPNFV testing strategy
Testing working group
23/06/2017
Reporter for the testing working group
Gabriel (Bottlenecks)
Morgan (Functest)
Stakes




We (test projects) mainly validate a release so
far
Tooling for stress/resiliency/robustness testing
is available in the different frameworks but
under used
Stress tests have been introduced in Danube
(but not in CI)
Stress / robustness / resiliency / long
duration tests are key
Towards a stress test strategy


Community discussions (etherpad, mail)
https://wiki.opnfv.org/display/bottlenecks/Sress+
Testing+over+OPNFV+Platform
Towards a stress test strategy
1. Test Cases Discussed in Danube
Categories
Data-plane Traffic
for a virtual or bare metal POD
Test Case
TC1 –Determine baseline for throughput
TC2 - Determine baseline for CPU limit
TC3 – Perform life-cycle events for ping
Life-cycle Events
for VM pairs/stacks
TC4 – Perform life-cycle events for throughput
TC5 – Perform life-cycle events for CPU limit
2. Test Cases Planning in Euphrates (Under Discussion)
Categories
Test Case
Scaling
Yardstick
Scale-out test
Scale-up test
Set up VMs until maximum throughput reached
Compute & Memory
VSPerf & StorPerf
Test of different Nova schedulers for different compute nodes
Run VSPERF and record numbers (throughput, latency, etc.)
Run StorPerf and record numbers (throughput, latency, etc.)
Run both at the same time and compare numbers.
Cooperation with Bottlenecks Project as load manager for the planned test cases is also under discussion
Towards a stress test strategy

2 Test Cases Implememnted in Danube
 TC1 - Determine baseline for throughput
 TC3 - Perform life-cycle events for ping
Towards a stress test strategy
TC3 measures the reliability/stability of the system under large number of concurrent
requests/traffic.
Problem detected on OPNFV & commercial solutions
StorPerf:
• Bandwidth: 500 mbit/s
VSPERF:
• Throughput: 1000 mbit/s
StorPerf:
• Bandwidth: 1000 mbit/s
Bottlenecks could act like load manager and monitoring system behaviors
Storage Network
VSPERF:
• Throughput: 500 mbit/s
Tenant Network
Some Rough Numbers for General View
Combined Tenant
/ Storage Network
Cross Project Stress
Back to OPNFV release...
Theory versus reality
Going beyond release verification

Impossible to include stress / robustness tests
in CI chain because


target version delivered late and instable due to
proximity to upstream
6 months cadence...(assuming that we would like to
perform tests over weeks)
Needs

2 CI chains (as it is today…)


CI chain master => release validation (as it is today,
i.e. deploy/functest/yardstick looping on different
scenarios)
CI chain « stable »



Focus on « generic » scenarios first (starting with osnosdn-nofeature-ha)
Reinstallation on demand (CI still needed for clean
reinstallation but reinstallation not automated to allow
long duration tests + troubleshooting)
Schedule to be created by the testing group (allocate N
week for project X, N’ for project N’...)
Questions for the TSC



Any feedback/comments ?
How to manage that from a release
perspective...release management process will be
incomplete as testing group would be able to work in //
on 2 releases (decoupling release validation and
release qualification)
Quid on any commitment from the installers




can the installer deal with 2 versions in // ?
Shall we directly work on bifrost/openstack-ansible ?
Is it compatible with Infra resource management (Infra
group) ?
Testing working group is elaborating its stress strategy,
shall it eb validated by TSC ?