Professional Documents
Culture Documents
TE TER
SUBSCRIBE
Its FREE
for testers
August 2014 v2.0 number 28 4 5 /
Including articles by:
Sakis Ladopoulos
Intrasoft International
Sakis Ladopoulos
Intrasoft International
Sakis Ladopoulos
Intrasoft International
Sakis Ladopoulos
Intrasoft International
This issue of
Professional Tester
is sponsored by
Including articles by:
Henrik Rexed
Neotys
Gregory Solovey and
Anca Iorgulescu
Alcatel-Lucent
Nick Mayes
PAC UK
Normand Glaude
Protecode
Edwin van Vliet
Suprida
Sakis Ladopoulos
Intrasoft International
3 PT - August 2014 - prof essi onal tester. com
From the editor
Keep testing in the lead
Is software testing as important as
software requirements engineering,
design, programming or project
management?
No. It is much more important than
all of these. Testing should be the
framework upon which they are
built. Their only aims should be frst
to facilitate the creation of all the
right tests, then to ensure they all
pass. Testing is not a safety net to
rescue the incompetent from failure:
it is the systematic, generic method
applied throughout the lifecycle and
by which success is achieved.
This issue is made possible by our
sponsor Neotys: a performance
testing tool vendor committed to
testing principles. We recommend
readers evaluate its product Neoload
which is available in a completely
free edition limited only in the
number of virtual users it simulates.
If you are one of the tens of thousands
of testers who like PT, please help us to
keep it free to read by considering the
offerings of those who support it, and
letting them know you appreciate that.
Edward Bishop
Editor
IN THIS ISSUE
Testing in the lead
4 Model answers
Testing-led specifcation with Henrik Rexed
9 QA of testing
Testing-led process with Gregory Solovey and Anca Iorgulescu
13 Get ready for testing in the front line
Testing-led digital transformation strategy with Nick Mayes
19 Zero tolerance
Testing-led project management with Sakis Ladopoulos
21 Open but hidden
Testing-led third-party code management with Normand Glaude
24 Forget me not
Testing-led test data management with Edwin van Vliet
Visit professionaltester.com for the latest news and commentary
E s s e n t i a l f o r s o f t wa r e t e s t e r s
TE TER
SUBSCRIBE
Its FREE
for testers
August 2014 v2.0 number 28 4 5 /
Including articles by:
Sakis Ladopoulos
Intrasoft International
Sakis Ladopoulos
Intrasoft International
Sakis Ladopoulos
Intrasoft International
Sakis Ladopoulos
Intrasoft International
This issue of
Professional Tester
is sponsored by
Contact
Editor
Edward Bishop
editor@professionaltester.com
Managing Director
Niels Valkering
ops@professionaltester.com
Art Director
Christiaan van Heest
art@professionaltester.com
Sales
Rikkert van Erp
advertise@professionaltester.com
Publisher
Jerome H. Mol
publisher@professionaltester.com
Subscriptions
subscribe@professionaltester.com
Contributors to this issue
Henrik Rexed
Gregory Solovey
Anca Iorgulescu
Nick Mayes
Sakis Ladopoulos
Normand Glaude
Edwin van Vliet
Professional Tester is published
by Professional Tester Inc
We aim to promote editorial independence
and free debate: views expressed by contri-
butors are not necessarily those of the editor
nor of the proprietors.
Professional Tester Inc 2014
All rights reserved. No part of this publication
may be reproduced in any form without prior
written permission. Professional Tester is a
trademark of Professional Tester Inc.
4 PT - August 2014 - prof essi onal tester. com
Testing in the lead
Performance testing is often done in
a way that is contrary to the princi-
ples of testing. An application is put
under arbitrary load and its response
times measured. Measurements that
seem large, relative to one another or
compared with arbitrary expectations,
are investigated and addressed and the
same test is run again to demonstrate
that the change has been effective.
But how do we know the right things are
being measured under the right condi-
tions? If not there may have been no need
for the changes. In fact the changes may
well have worsened, or even caused,
performance issues that will matter in pro-
duction but have been missed by testing.
Just as functional testing has no meaning
without trusted requirements, perfor-
mance testing can do nothing to provide
assurance unless what needs to be
assured has been defned formally before
testing starts. In both kinds of testing,
the requirements will change, for many
reasons including the light of test results:
but adequate investment in specifying the
right requirements means that testing can
provide a clear result (requirements are
met, or not: that is, the application will or
will not fail in production) and the closer to
right frst time those specifcations are, the
more empowered is testing to save mas-
sive effort in management, operations and
development and the less testing costs.
When an application passes performance
testing then fails in production, proving the
testing to have been unrealistic, it is easy
but wrong to blame the testing itself or the
tools used to execute it. The real problem
is test design without correct basis. It is
necessary to ask what did we need to
know which, if we had known it, would
have allowed us to predict this failure
before production?. In other words: for
what should we have been testing?
This article will attempt to provide a
generic answer to that question by
defning a model minimum set of perfor-
mance specifcations suitable for use as
test basis and explaining how to
Model answers
by Henrik Rexed
Henrik Rexed
explains how to get
the numbers nearly
right frst time
Specifying accurate and testable
performance requirements
5 PT - August 2014 - prof essi onal tester. com
Testing in the lead
estimate accurate quantitative informa-
tion, working from high-level business
requirements, to populate that model
set. The availability of these accurate
estimates gives performance testing,
regardless of how it is carried out, far
greater chance of success.
Important user transactions
The method described here is for online
user-facing applications and specifes
performance from the users point of
view, by considering the time between
action and response. Each different
action available to the user (that is, the
user transactions that, individually or in
sequences, provide the functions of the
application) must be treated separately.
Many of them will require similar perfor-
mance and can be specifed in groups,
but these are the least important, ie the
least likely to cause performance or
reliability failure. Attempting to specify
the more important ones as a group
will result in very inaccurate fgures for
most of them.
Which are the important user transac-
tions to specify is usually obvious
from a high-level understanding of the
application: those user transactions that
require complex data transactions or
processing, especially with or by sub-
systems. However it is not always so
simple for applications which perform
transactions not triggered, or not imme-
diately, by single or specifc user action:
ie push rather than pull transactions.
Here a more detailed understanding of
the technical design might be needed.
Application and transaction capacity
The capacity of an application is a set
of values defning the maximum number
of simultaneous users to whom it must
deliver the performance levels (whose
specifcation will be discussed below).
These values are:
session initiations per unit time (eg
hour). This is the rate at which users are
expected to begin interacting with the
application and refects the expected
busyness or traffc
concurrent users. This is related to
session initiations but also takes into
account the expected length of use
sessions, that is the rate at which users
stop interacting with the application
user accounts. For applications
which require users to be registered
and to log in, this fgure is the ceiling
of the previous one: the maximum
number of users who could be
interacting with the application at any
time. However many applications offer
certain functions to unregistered users
too. In this case the maximum number
of user accounts (and where applica-
ble the maximum number of accounts
of each of the different types) is used
to estimate the next fgure
concurrent user transactions (for
each of the user transactions).
Obviously the correct entity to defne
required capacity is the acquirer, ie the
business. Sometimes this is easy: for
example where an application or user
transaction is available to a limited group
of employees, associates or customers
whose growth is reasonably predictable.
For a public-facing application that aims
to increase its usership, the key source
of information is the business plan. This
must contain estimates of ROI and there-
fore of market share, turnover, average
sale price or similar which can be used
to derive the expected usership and so
the necessary capacity. Moreover, these
estimates will tend toward the optimistic,
which reduces the risk of performing test-
ing under insuffcient load.
Unless the usership of the application
or a user transaction is truly closed so
the maximum usership is known with
good accuracy, capacity fgures should
be derived from the estimated maximum
usership fgures multiplied by 3. This is
to assure against peak loads which may
occur for multiple reasons, including:
recovery: if the application becomes
unavailable for functional or
6 PT - August 2014 - prof essi onal tester. com
Testing in the lead
non-functional reasons (including tran-
sient conditions), usership is multiplied
when it becomes available again
coincidence/external/unknown:
sometimes demand reaches high
peaks for no predictable reason.
Anyone who has had a job serving the
public will recognize this phenomenon,
for example every supermarket worker
has experienced the shop suddenly
becoming very busy in the middle of
Tuesday afternoon, usually the quietest
time. We need not be concerned here
with its causes, but it is interesting to
note that an external reason is not nec-
essarily required: it can be explained
purely mathematically
transient conditions: network-layer
events such as latency, packet loss,
packets arriving out of order etc
increase effective load by prevent-
ing threads from completing, holding
database connections open, flling
queues and caches and complicating
system resource management. The
same effects can also be caused by
application or system housekeeping
events such as web or memory cache
management, backup, reconciliation
etc. While the second group of events,
unlike the frst, is to some extent
predictable and controllable, as we
have already seen the production load
is unpredictable, so we must assume
that all can happen at the same time
as peaks in production load.
Applying the 3 times rule should miti-
gate these risks suffciently but a greater
one remains: unexpected popularity. If
testing is based on expected capacity but
then usership expands much faster than
expected. As well as failing to serve the
new users, perhaps losing a once-in-a-life-
time business opportunity, their presence
may well cause the application to fail to
serve existing customers who may be
even more valuable.
If such an event is considered possible
and cost is no object, one would build the
application and provision infrastructure
to be capable of passing performance
testing at extremely high load. Realistically
however it must be mitigated by scal-
ability testing, that is testing to assure
against failing to increase the applications
capacity economically when needed. For
performance testing purposes we need
to know the highest currently-expected
capacity and every effort should be made
to help the business to estimate this accu-
rately and commit to those estimates.
Unfortunately many testers still fnd this
impossible to accomplish and are forced
to revert to an empirical method. This is
possible only if the application is already
in production or available for testing in a
reasonably production-like environment.
A preliminary load test is performed on
the application using a simple virtual user
script such as a typical visit or just random
navigation (without infrequently-occurring
user transactions). The same exercise is
repeated on the important user transac-
tions, again using a simple script which
simply carries out a typical transaction
without variation or mistakes. In all cases,
the number of VUs is increased slowly
until the performance requirements (dis-
cussed below) are not being met for that
is, too-slow responses are being expe-
rienced by 25% of them. This is taken
to be the point at which the application is
reaching its capacity.
Obviously this fgure is arbitrary and the
method far from satisfactory. However it
does stand a reasonable chance of being
suffciently accurate, used as a start-
ing point for test design, to at least help
limit waste. In fact, if the organizational,
technical and project situation makes it
cheap, it may well be worth doing even if
accurate estimates based on the busi-
ness plan are available, as a check which
might show that the performance required
by that plan is unfeasible so testing
should not begin. Note that, because of
the simplicity of the scripts, it must not be
used the other way round: that is, do not
say the preliminary testing shows the
current performance specifcations can be
met easily, so the business can consider
making them more ambitious.
7 PT - August 2014 - prof essi onal tester. com
Testing in the lead
The worst situation of all is having to begin
test design based on the testers own esti-
mates of needed capacity, or to proceed to
test execution with no clear idea of these
and having to fnd them out empirically.
Only very good luck can prevent this lead-
ing to great inaccuracy, delay and waste
in the testing, development and, usually,
business effort.
User behaviour
Unusual action sequences, making many
mistakes, breaking out of user transac-
tions before they are completed and rapid
repetition of actions have the effect of
increasing the load caused by one user
to that which would be caused by more
than one normally behaving user. Long
pauses between actions (think times)
increase concurrency and can also cause
sudden changes in total load as actions of
users working at different speeds become
and cease to be simultaneous.
It is futile at the specifcation stage to try
and defne a normal user. What test
design needs to know is the required
range of speed (average pause time)
and accuracy (tendency not to make
mistakes). Note that in modern responsive
web and mobile applications, pause times
are shorter than for older, less responsive
web applications: do not consider the
time between form submissions, but that
between clicks and keypresses on a form
which may interact with the user after
each of them via AJAX etc.
Now for the target audience, decide how
much longer this time may be for the slow-
est user compared with the average user,
and how much faster for the fastest user.
Taking the time for the average user to be
0 arbitrary units, the range can now be
expressed as, for example, -0.5 to +0.7.
The same approach is used for the ten-
dency of users to make mistakes, where
a user who makes the average number
of mistakes per n actions is taken as 0
arbitrary units.
It is also vital to realise that no real user
works at a constant rate or accuracy, due
to being affected by interruptions, distrac-
tions etc. In the most extreme case, VUs
must vary from the lowest to the highest
points on the defned range within a single
user journey.
Deciding the distribution of VUs of these
different characteristics and how they
should vary dynamically is the job of test
design. In practice, the average speed
user will be taken as working at the fastest
speed the load injector can achieve. In
the same way, initial VU scripts tend to
assume no user errors. Both are fne,
provided the necessary variation is then
added. Not doing that is like trying to
check the structural soundness of a build-
ing without going in but just by hammering
on the front door.
Specifying the range of variation for which
assurance against failure must be pro-
vided enables test design to fnd creative
ways to do so. Without that specifcation,
testing is degraded to guesswork.
Load levels and response times
Once the capacity fgures are known, load
levels based upon them are defned: No
load (less than 40% of capacity), low
load (40-80%) and high load (80-90%).
There is no need to deal with higher loads:
any exact load, including 100%, is practi-
cally impossible to achieve accurately
by any means, therefore any test result
obtained by an unknown approximation of
it is meaningless. For the same reason,
in testing it is best to aim for the middle
of the band, that is 20%, 60% and 85%
load. It might well be desired to apply load
above capacity but there is no need to
specify performance at that load; the aim
is to predict other types of failure and their
likely or possible impact. In estimating the
capacity, we assume that performance
failure will occur at loads higher than it.
For each of the important user transac-
tions, and for each of the three load
levels, maximum and average response
times are decided. It must also be
decided whether these should or should
not include network transport times: in
other words, whether the time should be
8 PT - August 2014 - prof essi onal tester. com
Testing in the lead
measured from the requestors point of
view, from the time the request is made
(eg user touches button) to the time the
response is fully displayed, or from the
responders, from the time the request
is fully received to the time the response
has been fully sent.
There are arguments for both
approaches. The transient network con-
ditions are unknown and beyond control;
including them in the specifcation neces-
sarily includes error in the testing. Even
if conditions are fairly stable, entropic
events such as packet loss will always
cause some VUs to report performance
failure. On the other hand, these factors
all affect the user experience, which
is the most important consideration.
Moreover, that consideration is really
the only way to decide what are the
desirable and tolerable response times.
Beware of oversimplifed statements
along the lines x% of users abandon
after y seconds delay. It is necessary to
consider the sequence of actions taken
by the user before the transaction and
the amount and nature of the data to be
displayed after it, especially when errors
occur (such as the submission of incom-
plete or invalid data). All of these affect
the users expectations and behaviour.
There is one special case where the
response time must be specifed as
purely the time taken by the responder:
when that responder (a component such
as a database or mainframe) is respond-
ing to more than one requestor. A typical
example of this situation would be both
web and mobile applications connected
to the same external subsystem. While
empirical testing including both working
simultaneously may well be desirable
and is made possible by modern devel-
opments in tools and environment
provision, that will happen far too late to
discover that a key component on which
both depend cannot provide adequate
performance. Testing of each requestor
must assure that the response time
under maximum load of that requestor is
at most that needed to meet the user-
experienced performance specifcations,
divided by the number of requestors
which will be connected to the responder
in production. This response time is the
value that should be specifed. Note that
in almost all cases it will not be necessary
to delve into system architecture: the only
important point is that at which requests
from multiple requestors converge.
Subcomponents beyond this point can be
considered as a single component.
The minimum set of
performance specifcations
In summary, we have defned the follow-
ing values. For the application overall:
maximum session initiations per unit
time, maximum concurrent users and
maximum number of user accounts. For
the users: relative range of variation of
time between actions and rate of making
mistakes. Then for every important user
transaction: maximum concurrent users
carrying it out, maximum response time
and minimum response time. Setting
these values before test design begins
gives performance testing clear targets
and makes it more quickly effective and
less wasteful. A template to record these
values is shown in fgure 1. When all the
symbols have been replaced by quan-
titative values, minimum performance
specifcation is complete
Henrik Rexed is a performance specialist at Neotys (http://neotys.com)
Symbol Definition Note
SIR maximum session initiation
rate per second
of entire application
CU maximum number of
concurrent users
UA maximum number of user
accounts
if applicable
UPmax maximum user pause (time
between actions, seconds)
relative to average
UPmin minimum user pause (time
between actions, seconds)
UMmax maximum number of user
mistakes per n actions
UMmin minimum number of user
mistakes per n actions
SCRTmax maximum response time of
shared component connected
to this and other apps (sec)
total number of apps
connected to it (seconds)
not including network
transport time. For
complex applications
there may be more
than one such
responder component
IUT number of identified important
user transactions
RTmax[n]