Computation of the expected value and variance of the
average annual yield for a stochastic simulation of
rainwater tank clusters
John Mashford, Shiroma Maheepala, Luis Neumann and Esther Coultas
Commonwealth Scientific and Industrial Research Organization (CSIRO), Australia
PO Box 56, Highett, Vic. 3190, Australia
Abstract - The problem of obtaining a detailed
understanding of the behavior of a cluster (or a collection)
of rainwater tanks is complex and can only be solved by
simulation. If the collection of houses is very large, a
tractable solution can only be obtained by stochastic
simulation in which the parameters defining the houses and
tanks are sampled from probability distributions. An
important output from the rainwater tank simulation is the
average annual yield of the cluster and it is of interest to
know its expected value and variance for planning of urban
water systems. This paper carries out a theoretical
calculation of the expected value and variance of the
average annual yield as functions of cluster size and
presents an experimental confirmation of these results.
Keywords: rainwater tank, stochastic simulation, yield,
expected value, variance
1
Introduction
The amount of water that can be supplied from a
rainwater tank (i.e. yield of a rainwater tank) depends on a
number of properties of the house, the tank and the climate
[1, 2, 3]. The relevant properties of the house can be
modeled as the roof area connected to the rainwater tank,
the depression storage (i.e. retention storage of the roof
which depends on the type of roof material and shape of the
roof), the roof area loss factor (i.e. losses from the roof,
which depends on roof material) and the way in which water
stored in the rainwater tank is used by occupants of the
house (i.e. demand time series). The demand time series is a
function of the occupancy status of the house and the type of
end uses for which rainwater is used. For example,
rainwater can be used for garden use, toilet use, laundry use
and hot water use. The principal relevant property of the
rainwater tank is its volumetric capacity while the properties
of the environment relevant to the rainwater tank’s evolution
are the rainfall time series and the potential
evapotranspiration (PET) time series as the evaporation
from the roof of the house and the tank depend on the
temperature, wind, humidity and so on. A quantity of
considerable interest to urban water management planners is
the yield of a collection of tanks over a period of time [4].
The behavior of a tank can be represented by time series
{Vt} and {Yt} where:
Vt = the volume of water in the tank at the end of time
period t; and
Yt = the yield from the tank during time period t.
It has been shown that a daily time step is sufficient to
accurately model a rainwater tank if there is no trickle
supply to the tank from mains supply [3]. This study
assumes that each rainwater tank is fitted with an
appropriate valve which allows end uses of the tank to
switch to mains supply when the tank has run out of water
(i.e. there is no trickle supply from the mains). The behavior
of the tank can be simulated by recursively solving the
storage behavior equations for Vt and Yt , t = 1, …, P where
P is the number of time periods in the simulation [3]. In our
simulation, the demand pattern is taken from a finite
number, N_ds, of demand scenarios.
The yield obtained in this way is a time series which is a
function of the rainwater tank capacity C, the roof area A,
the depression storage δ, the roof area loss factor L and the
demand scenario number d. Thus:
Yt = Yt(C,A,δ,L,d).
(1)
It is also implicitly a function of the rainfall time series and
the PET time series which we consider to be the same for all
houses. It can be shown using induction that Yt is a
nonlinear continuous function of its continuous parameters
for all t = 1, …, P. Let N_years be the number of years over
which the simulation is taken. The average annual yield for
a house defined by parameters C, A, δ, L and d is:
Y = Y(C,A,δ,L,d)
=
1
N _ years
P
∑
t =1
Yt(C,A,δ,L,d).
(2)
We will consider the problem of simulating a cluster of
houses with rainwater tanks, where the parameters defining
each house are chosen randomly according to probability
distributions and the demand scenario for a house is chosen
randomly from a number of possibilities associated with its
occupancy status. In this paper, we will show that the
expected value of the average annual yield is independent of
the cluster size, while the variance of the average annual
yield depends on the cluster size according to a hyperbolic
function. This theoretical result will be confirmed by
experimental computation.
In Section 2, two deterministic examples motivating the
development of the general formulation of the first result are
presented. In section 3, the first result of the paper
concerning the expected value of the average annual yield is
proved. In Section 4, the second result, concerning the
variance of the average annual yield, is proved. In Section 5,
the experimental confirmation of these results is presented,
while Section 6 provides a conclusion to the paper.
2
Two motivating examples
Consider a cluster made up of N identical copies of a
house with parameters C, A, δ, L and d. The total yield of
the cluster is:
P
N
∑ ∑
t =1
P
∑
Yt(C,A,δ,L,d) =
i =1
N Yt(C,A,δ,L,d).
(3)
t =1
Therefore the average annual yield is:
Y=
=
1
N _ years N M
1
N _ years N
=
1
N _ years
N
t =1
i =1
Total yield =
Yt(Ci,Ai,δi,Li,di),
(6)
which is the same as the average annual yield of the original
cluster.
Thus, scaling up a cluster has no effect on the average
annual yield. This is true no matter how small the cluster is.
3
The expected value of the yield as a
function of cluster size
We now want to consider the case in which a cluster is
generated by sampling the parameters from probability
distributions. The yield for a house during any time period is
a function of the parameters for the house. We want to work
out the average value of the average annual yield over a
number of runs in which the probability distributions are
sampled.
We will first consider the case of one variable. Let v be a
real non-negative random variable distributed according to a
probability distribution ρ : [0,∞) → [0,∞). Then:
∫
b
a
ρ(x)dx,
(7)
P
∑
Yt(C,A,δ,L,d),
(4)
<f> =
t =1
N
∑ ∑
t =1
MYt(Ci,Ai,δi,Li,di)
for 0 ≤ a < b. Let f : [0,∞) → R be a continuous function.
Define, as usual [5], the expectation value of f to be:
t =1
which is independent of the number of houses in the cluster.
Averaging is being taken in two ways. Firstly, the annual
yield is being averaged over all houses in the cluster.
Secondly, the average annual value is being computed by
summing over all time periods in the simulation and then
dividing by the number of years in the simulation.
Now, consider a slightly more complicated example.
Suppose that we have a cluster made up of N different
houses defined by parameters Ci, Ai, δi, Li, di; i = 1, …, N.
Now, scale up the cluster M times to form a cluster of MN
houses in which each house in the original cluster is
duplicated M times. The total yield of the cluster is:
P
i =1
P
P
N Yt(C,A,δ,L,d)
t =1
∑ ∑
Pr(v∈[a,b]) =
∑
N
∑ ∑
Therefore the average annual yield (per house) is:
1
Y=
N _ years N
P
i =1
MYt(Ci,Ai,δi,Li,di).
(5)
∫
∞
f(x)ρ(x)dx.
0
(8)
A quick argument can be used to show that the mean of f(v)
over a large number of trials is given by the expectation
value of f. Suppose that we carry out n trials resulting in
values vi of v. Then:
n
1
n
∑
=
1
n
∑
≈
1
n
∑
mean =
f(vi)
i =1
∞
j =0
∞
j =0
∑
{f(vi) : vi ∈Ij}
f(a_j)n
∫
b_ j
a_ j
ρ(x) dx
=
∑
≈
∑
=
∫
∞
j =0
∞
j =0
∫
f(a_j)
∫
b_ j
∫
ρ(x) dx
a_ j
0
f(x)ρ(x) dx
ρ4(L1) … ρ4(LN) dC1 … dCN dA1 … dAN dδ1 … dδN dL1 …
∞
f(x)ρ(x) dx
0
dLN
= <f>.
=
In the above computation, {Ij = [a_j,b_j]} is a fine partition
of the interval [0,∞). The approximations become exact in
the limit as the partition becomes sufficiently fine and the
number of trials becomes infinite.
If one has suitable closed form representations of the
functions f and ρ, then it may be possible to evaluate the
integral expression for <f> analytically. In other cases, it
may be necessary to evaluate the integral numerically which
may be as computationally expensive as evaluating the mean
value by simulation [6, 7].
A similar argument holds for computing the long run
mean value of a continuous function of more than one
random variable. Consider a housing cluster of size N
generated by random parameters Ci, Ai, δi, Li, di; i = 1, …,
N which are sampled from distributions ρ1. : [0,∞) → [0,∞),
ρ2 : [0,∞) → [0,∞), ρ3 : [0,∞) → [0,∞), ρ4 : [0,∞) → [0,∞)
and ρ5 : {1, …, N_ds} → [0,1] with:
∫
∞
0
ρj(x) dx = 1, ∀ j = 1, …, 4,
(9)
and:
∑
N _ ds
k =1
ρ5(k) = 1.
(10)
The average annual yield for one run or trial is :
Y = Y(C1, …, CN, A1, …, AN, δ1, …, δN, L1, …, LN, d1, …,
d N)
=
1
N _ years N
P
N
t =1
i =1
∑ ∑
Yt(Ci,Ai,δi,Li,di).
(11)
The expected or most probable annual average annual yield
is given by :
<Y> =
Y(C1, …, CN, A1, …, AN, δ1, …, δN, L1, …, LN, d1,
…, dN) ρ1(C1) … ρ1(CN) ρ2(A1) … ρ2(AN) ρ3(δ1) … ρ3(δN)
b_ j
a_ j
∞
∑
N _ ds
d _ 1=1
…
∑
N _ ds
d _ N =1
ρ5(d1) … ρ5(dN)
∫
∞
0
…
∫
∞
0
P
1
N _ years N
N
∑ ∑ ∑
t =1
i =1
N _ ds
ρ5(di)
d _ i =1
∫
∞
0
…
Yt(Ci,Ai,δi,Li,di) ρ1(Ci) ρ2(Ai) ρ3(δi) ρ4(Li)
dCidAidδidLi
=
∫
∞
0
∞
0
N
∑ ∑ ∑
t =1
i =1
N _ ds
ρ5(d)
d =1
∫
∞
0
…
Yt(C,A,δ,L,d) ρ1(C) ρ2(A) ρ3(δ) ρ4(L) dCdAdδdL
=
∫
P
1
N _ years N
1
N _ years
P
∑ ∑
t =1
N _ ds
d =1
ρ5(d)
∫
∞
0
…
Yt(C,A,δ,L,d) ρ1(C) ρ2(A) ρ3(δ) ρ4(L) dCdAdδdL.
This is independent of the number of houses in the cluster.
The same average annual yield is obtained by doing many
runs with one house as by doing fewer runs with many
houses as long as the house and tank parameters are sampled
from the same probability distributions. This implies that if
the cluster size is varied with the same number of runs, the
average annual yield will vary depending on the cluster size.
As the cluster size increases (with the same number of runs)
the average annual yield will become less variable and will
begin to approach more closely the expected average annual
yield. We will show evidence for this deduction in Section
5.
It is difficult to evaluate the integrals representing the
expected average annual yield <Y> because the yield
functions Yt are not given in closed form but can only be
obtained by simulation. Thus, in practice <Y> must be
obtained by simulation. The number of trials required for
the accurate simulation of <Y> can vary depending on the
nature of the probability distributions of tank and house
related variables and the nature of the yield functions Yt.
4
Variance of the yield as a function of
cluster size
While the expected (average annual) yield is independent
of the cluster size, the variance of the yield is dependent on
the cluster size. The standard deviation σ of the yield is the
square root of the variance σ2 where σ2 is the long run
average of the square of the difference between the yield
and the average yield. By the argument given above, this is
given by:
σ = <(Y-<Y>) >.
2
2
(12)
∫
∞
…
0
∫
∞
Using Equation 11 we have:
<Y2> =
∫
∑
∞
0
…
∫
N _ ds
d _ 1=1
∞
(
0
…
∑
∞
0
…
∫
N _ ds
Yt(Ci,Ai,δi,Li,di) Ys(Cj,Aj,δj,Lj,dj)
0
dCj dAidAj dδidδj dLidLj : i, j = 1, …, N; i ≠ j})
∫
ρ5(d1) … ρ5(dN)
1
N _ years N
∫
P
N
t =1
i =1
∑ ∑
∞
…
0
∞
0
∑
Yt(Ci,Ai,δi,Li,di))2 ρ1(C1) … ρ1(CN) ρ2(A1) … ρ2(AN) ρ3(δ1)
N _ ds
d _ j =1
∞
P
1
N _ ds
)2 ∑ (N ∑
ρ5(d)
d =1
N _ years N t , s =1
∫
∞
Yt(C,A,δ,L,d)Ys(C,A,δ,L,d)ρ1(C) ρ2(A) ρ3(δ)
0
∑
ρ4(L) dCdAdδdL + (N2-N)
d _ N =1
∑
N _ ds
d _ i =1
ρ1(Ci)ρ1(Cj) ρ2(Ai)ρ2(Aj) ρ3(δi)ρ3(δj)ρ4(Li)ρ4(Lj) dCi
<(Y-<Y>)2> = <Y2+<Y>2-2Y<Y>>
(13)
∫
ρ5(di) ρ5(dj)
= (
= <Y2>-<Y>2.
∑ {∑
ρ3(δi) ρ4(Li) dCidAidδidLi +
Now :
= <Y2>+<Y>2-2<Y><Y>
Yt(Ci,Ai,δi,Li,di)Ys(Ci,Ai,δi,Li,di) ρ1(Ci) ρ2(Ai)
0
N _ ds
d =1
ρ5(d)
∫
∞
0
…
Yt(C,A,δ,L,d) ρ1(C)ρ2(A) ρ3(δ)ρ4(L) dCdAdδdL
N _ ds
d =1
ρ5(d)
∫
∞
0
…
∫
∞
Ys(C,A,δ,L,d) ρ1(C) ρ2(A)
0
ρ3(δ) ρ4(L) dCdAdδdL)
… ρ3(δN) ρ4(L1) … ρ4(LN) dC1 … dCN dA1 … dAN dδ1 …
Thus the variance is given by :
dδN dL1 … dLN
=(
1
N _ ds
)2 ∑
…
d _ 1=1
N _ years N
∑
σ2 = (
N _ ds
d _ N =1
ρ5(d1)
P
1
)2 ∑ (Nγ1(t,s) + (N2-N)γ2(t)γ2(s)) –
N _ years N t , s =1
<Y>2,
… ρ5(dN)
∫
∞
0
…
∫
∞
0
P
N
t , s =1
i , j =1
∑ ∑
Yt(Ci,Ai,δi,Li,di)
=(
P
1 P
1
)2
( ∑ γ1(t,s) -( ∑ γ2(t))2), (14)
N _ years N t , s =1
t =1
Ys(Cj,Aj,δj,Lj,dj) ρ1(C1) … ρ1(CN) ρ2(A1) … ρ2(AN) ρ3(δ1)
where :
… ρ3(δN) ρ4(L1) … ρ4(LN) dC1 … dCN dA1 … dAN dδ1 …
γ1(t,s) =
dδN dL1 … dLN
=(
P
N
1
)2 ∑ ( ∑
N _ years N t , s =1 i =1
∑
N _ ds
d _ i =1
ρ5(di)
∑
N _ ds
d =1
ρ5(d)
∫
∞
0
…
∫
∞
0
Yt(C,A,δ,L,d)
Ys(C,A,δ,L,d) ρ1(C)ρ2(A)ρ3(δ)ρ4(L) dCdAdδdL,
and :
(15)
γ2(t) =
∑
N _ ds
d =1
ρ5(d)
∫
∞
…
0
∫
∞
0
Yt(C,A,δ,L,d) ρ1(C)
ρ2(A) ρ3(δ)ρ4(L) dCdAdδdL.
(16)
If f : [0,∞)4 × {1, …, N_ds}→ R is a function define <f>1
by :
<f>1 =
∑
N _ ds
d =1
ρ5(d)
∫
∞
0
…
∫
∞
0
f(C,A,δ,L,d) ρ1(C)
ρ2(A) ρ3(δ)ρ4(L) dCdAdδdL.
(17)
Then:
γ1(t,s) = <YtYs>1
(18)
and:
γ2(t) = <Yt>1.
(19)
The variance is a hyperbolic function of N and:
limit as N → ∞ of σ2 = 0.
5
(20)
Experimental confirmation
The yield Yt is a nonlinear function of t and the system
parameters which can only be computed by simulation. The
results of computing the total yield over the period of
simulation projected onto the parameters of tank size and
roof area is shown in Figure 1.
Simulation was carried out where the continuous system
parameters were sampled from truncated normal
distributions as described in [8]. In order to confirm the
calculations of Section 3 and Section 4 in numerical detail it
would be necessary to numerically compute multiple
integrals such as those of Equations 15 and 16. However,
the essential correctness of the results can be seen by
examination of Figure 2 which shows the result of carrying
out 50 runs for a variety of values for the cluster size. The
curve labeled “Variable” shows the average over the 50 runs
of the average annual yield and it is seen that this is
approximately independent of the cluster size. The
approximation becomes more accurate as the cluster size
increases. This is because the number of houses sampled in
the simulation is RN where R is the number of runs and N is
the cluster size and as this number increases the average of
the average annual yields tends more closely to the expected
value, as described in Section 3.
The curves labeled “Variable-1SD” and “Variable+1SD”
which show the standard deviation of the average annual
yield over the 50 runs have the rational function form of the
variance given by Equation 14. The curve labeled
“Average” shows the result of computing the average annual
yield for a house with parameters equal to the expected
value of their respective probability distributions, which,
since the yield function is nonlinear, is not equal to the
expected value of the average annual yield (“Variable”) [9,
8].
Figure 1: Annual yield as a function of tank size and roof area (source: Neumann et al. [9]).
Figure 2: Average value and standard deviation of average annual yield as a function of cluster size for Melbourne based
data (source: Maheepala et al. [4]). Each cluster of household rainwater tanks was run 50 times.
6
Conclusion
The determination of the evolution of the state of a
cluster of household rainwater tanks given the relevant
properties of the houses, the tanks and the climate is too
complex to solve analytically (i.e. mathematically).
Stochastic simulation seems to be an attractive method to
compute the expected value and variance of the average
annual yield of a cluster of household rainwater tanks. This
paper presents proof that these quantities have certain
properties, such as: the expected value of the average annual
yield does not vary with the cluster size; and a particular
shape of the variance of the average annual yield as a
function of cluster size, together with the presentation of
experimental results of stochastic simulation which confirm
these theoretical derivations.
In addition the theoretical derivation reported in this
paper provides a sound basis for the results reported in [4]
using stochastic simulation. The practical application of the
theoretical derivation is that it shows how the variability of
average annual yield of a cluster of household rainwater
tanks varies with the cluster size. The acceptable variance
can be defined by practitioners. The acceptable variance
will define the acceptable cluster size for linear scaling up
of the average annual yield of a large number of household
rainwater tanks spread across a city of which there could be
millions.
Acknowledgements
This research was funded by the Urban Water Security
Research Alliance (http://www.urbanwateralliance.org.au/)
which is a partnership between the Queensland State
Government, Griffith University, the University of
Queensland and CSIRO (Commonwealth Scientific and
Industrial Research Organization)’s Water for a Healthy
Country Research Flagship.
References
[1] Fewkes, A. “Modelling the performance of rainwater
collection systems: towards a generalised approach”; Urban
Water 1, 323-333, 1999.
[2] Fewkes, A and Butler, D. “Simulating the performance
of rainwater collection and reuse systems using behavioural
models”; Build, Serv. Eng. Res. & Tech., 21, 99-106, 2000.
[3] Mitchell V.G., Siriwardene N., Duncan H. and Rahilly
M. “Impact of temporal and spatial lumping on rainwater
tank system modelling”; Conf. Proc. of Wat. Down Under
2008, 15-17 April 2008, Adelaide, South Australia, 2008.
[4] Maheepala, S., Loonat, N., Mirza, F. and Coultas, E.
“Quantifying potable water savings of rainwater tanks at a
city scale by considering the effect of spatial lumping”;
OZWater 2011, 09-11 May 2011, Adelaide, Australia,
2011.
[5] Kallenberg, O. “Foundations of Modern Probability”;
Springer, 1997.
[6]
Givens, G. H. and Hoeting, J. A. “Computational
Statistics”; Wiley, 2005.
[7] Ripley, B. D. “Stochastic Simulation”, Wiley, 1987.
[8] Xu, H., Rahilly, M. and Maheepala, S. “Assessing the
impact of spatial lumping on rainwater tank performance
using daily modelling”; Submitted to the 9th International
Conference on Hydro-informatics, September 7-11, 2010,
Chinese Academy of Sciences, China, 2010.
[9] Neumann L., Coultas E., Moglia M., and Mashford J.
“Errors in yield and overflow estimation in rainwater tank
cluster modelling”; to appear in the Proc. of the 12th
International Conference on Urban Drainage, September 11
to 16, 2011, Porto Alegre, Brazil, 2011.
© Copyright 2026 Paperzz