Intern. Conf. July 19-21, Yalta, Ukraine Text of presentation: ON

1
Intern. Conf. July 19-21, Yalta, Ukraine
Text of presentation:
ON METODOLOGICAL PECULIARITIES OF APPLIED
OPTIMIZATION PROBLEMS
Vitaly A. Perepelitsa
Department of Economic Cybernetics,
Zaporozhye National University, Ukraine,
[email protected]
Applied optimization problem is two-level problem. At the bottom level, we
have structuring problem, i.e., the problem of input data modeling. At the upper
level, traditionally, we consider the problem of "optimal" solution finding.
Let concrete applied optimization problem be formulated for the system
being modeled. There always exists a question: what is the degree of adequacy of
this problem to the considered real system? We premise the following inference to
the question of adequacy.
Mathematical modeling is usually placed among evolutional systems.
Reflected parameters of such system change with time. Consequently, the
predictable values of parameters of real applied optimization problem are input
data for this problem. One usually gets these predictable values by forecasting of
time series of considered parameter. Let examine concrete time series (TS) of
parameters values for a range of real applied optimization problems.
Figure 1 represents time series of "yield" parameter. This parameter is
related to well-known optimization problem, namely "assignment problem".
parameter is related to "investor problem".
20050317
20050214
20050114
20041206
20041104
20041006
20040907
20040809
20040709
20040609
20040511
20040407
20040309
20040205
20040106
20031202
20031031
20031002
20030903
20030805
20030707
20030605
20030506
20030403
20030304
20030131
20021230
20021128
Figure 2:
20021029
20020930
1930
1932
1934
1936
1938
1940
1942
1944
1946
1948
1950
1952
1954
1956
1958
1960
1962
1964
1966
1968
1970
1972
1974
1976
1978
1980
1982
1984
1986
1988
1990
1992
1994
1996
1998
2000
20
20020830
20020801
20020703
12
20020603
20020429
20020401
2
Figure №1: Time series of annual yield of grain crops in
Zaporizhya region (1930-2001 y.)
25
centners
15
10
5
0
years
Time series of everyday stock quotation
of “Russian Joint-stock Company
United Power Grids”
(RJSC UPG)
(01.04.2002-03.31.2005)
RUR
10
8
6
4
2
0
dates
Figure 2 represents time series of stock quotation of electrical company. This
1200
0
Figure 4:
related to problem of optimal flood control strategy finding.
17.10.05
17.09.05
18.08.05
19.07.05
19.06.05
20.05.05
20.04.05
21.03.05
19.02.05
20.01.05
21.12.04
21.11.04
22.10.04
22.09.04
23.08.04
24.07.04
24.06.04
25.05.04
25.04.04
26.03.04
25.02.04
26.01.04
Figure 3:
27.12.03
27.11.03
28.10.03
28.09.03
29.08.03
30.07.03
30.06.03
800
months
I.1926
I.1927
I.1928
I.1929
I.1930
I.1931
I.1932
I.1933
I.1934
I.1935
I.1936
I.1937
I.1938
I.1939
I.1946
I.1947
I.1948
I.1949
I.1950
I.1951
I.1952
I.1953
I.1954
I.1955
I.1956
I.1957
I.1958
I.1959
I.1960
I.1961
I.1962
I.1963
I.1964
I.1965
I.1966
I.1967
I.1968
I.1969
I.1970
I.1971
I.1972
I.1973
I.1974
I.1975
I.1976
I.1977
I.1978
I.1979
I.1980
I.1981
I.1982
I.1983
I.1984
I.1985
I.1986
I.1987
I.1988
I.1989
I.1990
I.1991
I.1992
I.1993
I.1994
I.1995
I.1996
I.1997
I.1998
I.1999
I.2000
I.2001
I.2002
I.2003
900
31.05.03
01.05.03
количество туристов
3
Time series of everyday tourist flow into
mountain-ski settlement Dombai
(01.05.2003-01.11.2005)
Number of tourists
700
600
500
400
300
200
100
0
days
дни
days
Figure 3 represents time series of number of tourists that arrive to mountain-
ski base. This parameter is relevant to problem of optimal allocation of resources
of this base.
Time series of monthly flow of Kuban river
(01.1926-12.2003)
volume of monthly flow
1000
800
600
400
200
Figure 4 represents time series of mountain river flow. This parameter is
4
Figure 5: Time series of average monthly solar activity
(January 1981-December 2005)
250
monthly number of sunspots
200
150
100
50
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
0
months
Figure 6: Time series of average annual solar activity
(1700 – 2005).
1989
1972
1955
1938
1921
1904
1887
1870
1853
1836
1819
1802
1785
1768
1751
1734
1700
1717
mean monthly number of sun pots during year
200
180
160
140
120
100
80
60
40
20
0
years
Figure 5 and 6 represents time s of solar activity.
5
Figure 7: Time series of average monthly number of children
from 0 till 15 years old that got acute respiratory
disease (ARD)
(January 1993 – December 2005)
zk
Number of diseased
к о л и ч е с тв о за б о л е в ш и х
3200
2700
2200
1700
1200
k
700
1
11
21
31
41
51
61
71
месяцы
months
81
91
101
111
121
131
Figure 8: Time series of everyday wholesale trading
volume (25.07.05 – 05.02.2006)
160000
daily volume of wholesale trading
140000
120000
100000
80000
60000
40000
20000
30.01.06
02.01.06
09.01.06
16.01.06
23.01.06
12.12.05
19.12.05
26.12.05
14.11.05
21.11.05
28.11.05
05.12.05
24.10.05
31.10.05
07.11.05
26.09.05
03.10.05
10.10.05
17.10.05
05.09.05
12.09.05
19.09.05
08.08.05
15.08.05
22.08.05
29.08.05
25.07.05
01.08.05
0
days
Figures 7 represents time series of number of diseased children, Figure 8
represents time series of wholesale trade volumes. Figure 7 and 8 reflect the
parameters of known "optimal inventory control problem".
6
Figure 9: Time series of monthly electricity
consumption in one district
(January 2000- December 2005)
отребление электроэнергии
в Прикубанском
районе с января
г. по декабрь 2005
ElectricityПconsumption
in one district
from January
2000 2000
till December
2005г.
volume of monthly electricity consumption
4500
4000
3500
3000
2500
2000
1500
1000
сен.05
ноя.05
май.05
июл.05
янв.05
мар.05
ноя.04
май.04
июл.04
сен.04
янв.04
мар.04
ноя.03
май.03
июл.03
сен.03
мар.03
ноя.02
янв.03
сен.02
май.02
июл.02
мар.02
ноя.01
янв.02
сен.01
июл.01
янв.01
мар.01
май.01
сен.00
ноя.00
июл.00
янв.00
0
мар.00
май.00
500
months
Figure 9 represents time series of electric power consumption volumes in
some district. This parameter is relevant to risk control problem in power network.
Thus, we have to resolve mentioned applied optimization problems:
assignment problem, investor problem, inventory control problem, etc. There are
algorithms to solve these problems for the case, when parameters are precise
numbers. But, in reality, every of these parameters is a result of forecasting. It
comes out from visualization of the time series on figures 1-9 that reliable
forecasting result of such time series cannot be a precise number. Prediction value
of the parameter in the form of precise number is non-adequate value. Let us
mention some reasons of inadequacy of forecasting in the form of precise number.
First, these time series possess memory. Consequently, independence
condition does not take place for their levels.
Secondly, the ranges of these time series are comparable to the mean values
of their levels.
Thus, question of the mentioned adequacy emerges at the bottom level of
applied optimization. If we consider the upper level, its problems are determined
7
by the results of the bottom level. If the forecasting values are intervals, then we
have optimization problem with interval data at the upper level. If the forecasting
values are fuzzy sets, then we have optimization problem with fuzzy data at the
upper level. Classical algorithms of mathematical programming are inapplicable in
both cases.
Let us get back to the bottom level of applied optimization. This is the level
where we get forecasting values for parameters of considered problem. The bottom
level consists of two sublevels:
1) preforecasting analysis;
2) forecasting by itself.
Forecasting of time series is soft computing.
L. Zadeh formulated the following definition:
(1)
Soft
=fuzzy
+neural
+ genetic algorithm
computings systems
network
Here, "genetic algorithm" performs optimization tuning of fuzzy sets' membership
functions. "Neural network" computes base values of membership function.
However, this "Zadeh formula", applied to time series on figures 1-9, did not
justify itself. We offer a cardinal correction of "Zadeh formula" (1): instead of
"neural network" we use "cellular automaton", and instead of "genetic algorithm"
we use "phase analysis". Then, we get the following formula of the soft
computings:
Soft
computings
=fuzzy
systems
+cellular
automaton
+ phase analysis
(2)
Forecasting on the basis of formula (2) is performed after preforecasting
analysis of time series
+phase
+ fuzzy system
Preforecasting =fractal
of
analysis of time analysis of time analysis
series
time series
series
(3)
The idea of fractal analysis is stated in book [1]. Algorithm of R/S-analysis
is the base one for fractal analysis. H. Hurst offered this algorithm half a century
8
ago. Here: R is range of the given time series, S is a standard deviation of the time
series, R/S is a normalized range. The modified algorithm of R/S-analysis [3]
constructs two trajectories for the given time series: R/S-trajectory and Htrajectory. There are such trajectories for three different time series on figures 10,
11, 12.
Figure 10: Typical R/S- and H- trajectories for
time series of stock quotation of
“RJSC UPG” (look at figure 2)
0.9
3
0.8
Н- trajectory
0.7
0.6
0.5
0.4
R/S- trajectory
0.3
3
0.2
0.1
log (number of observations)
0
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Figure 11: Typical R/S- and H- trajectories for time series
of average monthly solar activity (look at figure 5)
1.4
H - trajectory
1.2
4
1
0.8
0.6
0.4
R/S - trajectory
4
0.2
log (number of observations)
0
0.3
0.5
0.7
0.9
1.1
1.3
1.5
1.7
1.9
9
Figure 12: Typical R/S- and H- trajectories for time series
of average annual solar activity (look at figure 6)
1
H- trajectory
0.9
11
0.8
0.7
11
0.6
0.5
0.4
0.3
R/S- trajectory
0.2
0.1
0
0.3
0.5
0.7
0.9
1.1
1.3
1.5
log(number of observation)
On figure 12, H-trajectory demonstrates that Hurst rate H stably keeps its
value close to one. R/S-trajectory shows stable trend from point one to point
eleven. There is a change of trend in point eleven, so the memory's depth about
start of time series is eleven. As a whole, H-trajectory and R/S-trajectory
demonstrate good trend-stability of the initial segment of the considered time series
on figure 12.
On figure 10, H-trajectory and R/S/-trajectory show the absence of trend
stability.
On figure 11, H- trajectory and R/S- trajectory show intermediate situation.
We consequently apply algorithm of R/S-analysis to different pieces of the
given time series. So, we get an estimate of memory's depth for the whole time
series. This estimate is of fuzzy set kind
M (Z ) = {(l , µ (l ))} ,
(4)
where µ (l ) is the value of membership function for depth value l; Z = zi ,
i=1,2,…,n is the given time series. We represent fuzzy set (4) graphically for
obviousness.
10
Figure 13: Memory depth fuzzy
uzzy set of time series
of stock quotation of “RJSC UPG”
UPG”
(look at figure 2))
1
µ (l )
0,9
0,66
0,26
1
2
3
4
0,26
5
6
0,05
0,02
0,05
7
8
9
0,02
10
l
On figure 13, there is a fuzzy set (4) of memory depth of time series
presented on figure 2. Membership function µ (l ) attains maximal value at
minimally possible depth l=3, i.e., there is almost no memory in this time series.
One can assert that the time series possess very bad preforecasting characteristics.
It has an alternation of positive and negative increments.
Figure 14: Memory depth fuzzy
uzzy set of time
series of everyday number of people
that got ARD (look at figure 7))
1
0,9
0,8
µ (l )
0,9
0,68
0,7
0,6
0,66
0,5
0,4
0,3
0,2
0,1
0,31
0,28
0,22
0,19
0,12
0,12
0,04
0,03
0
3
4
5
6
7
8
9
10
11
12
13
l
11
Figure 15: Memory depth fuzzy
uzzy set of time series of
average annual solar activity
(look at figure 6))
1
0,9
µ (l )
0,90
0,93
0,79
0,8
0,73
0,7
0,63
0,59
0,6
0,5
0,4
0,32
0,28
0,3
0,18
0,2
0,1
0,14
0,10
0,02
0,02
6
7
0,04
0,06
0,02
0,02
21
22
0
8
9
10
11
12
13
14
15
16
17
18
19
20
l
Figure 15 shows good preforecasting characteristics; figure 14 shows mean
preforecasting characteristics.
Parameters of cyclic component of time series constitute an important
preforecasting characteristic of the time series. Phase analysis is applied for
exposure of this component and its parameters estimation. If we consider time
series Z = zi , i = 1, n , then phase trajectory Ф (Z ) = {(z i , z i + 1 )}, i = 1, n − 1 is a base
for this analysis [3]. On figure 16, there is a graphical representation or phase
trajectory of time series presented on figure 5.
12
Figure 16: Phase trajectory of time series of average
monthly solar activity (look at figure 5)
(first level of hierarchy of the cyclic component)
zi+1
250
200
150
100
50
0
0
50
100
150
200
250
z
-50
i
Figure 17: (a) - typical quasi-cycle in decomposition of phase
trajectory of time series of average monthly
solar activity:
(b) - distribution of relative frequency of quasicycles’ lengths L.
170
z i + 113
0,35
0,3
0,3
160
14
150
0,25
0,25
140
0,2
130
0,15
12
110
100
110
120
130
140
150
0,1
0,05
zi
100
160
(a)
0,13
0,1
15
120
0,16
170
0,05
0,02
0
3
4
5
6
7
8
9
L
(b)
On figure 18, we give phase trajectory of time series given on figure 6.
Phase trajectories are broken into quasi-cycles. There are examples of typical
quasi-cycles on figure 17 and 19. Here, the main parameter is a frequency
distribution of quasi-cycles' lengths.
13
Figure 18: Phase trajectory of time series of average annual
solar activity (look at figure 6)
(second level of hierarchy of the cyclic component)
z i +1
200
180
160
140
120
100
80
60
40
zi
20
0
-20
-20
0
20
40
60
80
100
120
140
160
180
200
Figure 19: (a) - typical quasi-cycle in decomposition of phase
trajectory of time series of average annual
solar activity (look at figure 18);
(b) - distribution of relative frequency of quasicycles’ lengths.
5
z i+1
55
35
25
1
2
6
3
9
11
-5
8
6
7
15
5
h(l) h(l)
частота
frequency
10
4
45
zi
12
4
8
2
10
l
0
-15
-15
5
25
45
zi
9
10
а)
11
12
13
14
длина квазицикла
Quasi-cycle
length
b)
Results of phase analysis form the following data:
• presence or absence of time series cyclic component;
• structure of the set of all quasi-cycles, i.e., the value of their length and
frequencies of these lengths;
• trajectory of evolution of quasi-cycles' centers;
14
• trajectory of evolution of size of overall quasi-cycles' rectangles [3].
We especially note that phase analysis of time series can be fulfilled together
with procedure of aggregation of time series levels. Then we are able to expose the
hierarchical structure of cyclic component of the given time series. Consider, for
instance, a time series of solar activity from 1700 A.D. till 2005 A.D. (figure 5).
Cyclic component of this time series has three-level hierarchy [3] The first level,
i.e., the bottom one of the hierarchy, is constituted by quasi-cycles phase trajectory
of average monthly time series (figures 16 and 17). The average length of these
quasi-cycles is
1
l mean
≈ 5 months. The second level, i.e., the medium one of the
hierarchy, is formed by quasi-cycles of phase trajectory of average annual time
series (figures 18 and 19 where lmean ≈ 11 years ). The third level of the hierarchy is
constituted by centennial quails-cycles, i.e., the quasi-cycles with average length
3
l mean
≈ 100
years (the phase trajectory of third level is represented at mean figure
20).
Figure 20: Phase trajectory of local maxima of time
series of average annual solar activity
(third level of hierarchy of the cyclic component)
200
180
160
140
120
100
80
60
40
28
1
20
0
0
50
100
150
200
We note that cyclic component of infant morbidity time series (figure 7) has
three-level hierarchy. The exposure of the cyclic component hierarchy of the given
time series has an important meaning for this time series forecasting. This
15
hierarchy allows extending the forecasting horizon many times. For example, solar
activity time series may have forecasting horizon of tens years.
Let us get back to the second sublevel of the bottom level of applied
optimization. According to (1) – (3) instrumentation of cellular automaton is
mathematical base of this sublevel. We especially note the familiar result of fon
Neumann: every Turing machine is realized by cellular automaton (look joint issue
of journal [4]. It implies that every algorithm is realized by cellular automation.
Cellular automaton is a basic model for forecasting of time series with
memory. Cellular automaton handles not numbers but conditions. In other words,
automaton's cells remember not numbers but linguistic values, for instance, "low,
medium, high". Before inputting into the memory of cellular automaton, numeric
time series is transformed into linguistic time series. One of the stages of such
transformation is depicted on figure 21.
Figure 21: Transformation of numerical times series
into linguistic time series
UEL - upper envelope line,
BEL – bottom envelope line
50
45
low
40
UEL
medium
35
high
30
25
20
15
BEL
10
5
2002
2000
1998
1996
1994
1992
1990
1988
1986
1984
1982
1980
1978
1976
1974
1972
1970
1968
1966
1964
1962
1960
1958
1956
1954
1952
0
Cellular-automaton forecast is a fuzzy set. There is a result of forecasting of
yield time series on figure 22 (look at figure 1). Here we can see three forms of
representation of the obtained forecast of yield:
• first, in the form of linguistic fuzzy set;
• second, in the form of numeric fuzzy set;
16
• third, in the form of number, which is received with the help of
defuzzification procedure.
Figure 22: The illustration example of forecasting
results
Linguistic fuzzy set
U n +1 =
{(H ; 0 , 03 ), (C ; 0 , 49 ), (B ; 0 , 48 )}
Numerical fuzzy set
3
y i = ∑ µ t ⋅ y t = 0 , 03 ⋅ 18 ,9 + 0 , 49 ⋅ 26,8 + 0 , 48 ⋅ 32 ,0 = 28, 6 ц / га
t =1
Y n + 1 = {(18 ,9 ; 0 , 03 ), (26 ,8 ; 0 , 49 ), (32 , 0 ; 0 , 48 )}
Forecast in the form of number
y n +1 =
3
∑
µ t ⋅ y ' t = 0 , 03 ⋅ 18 , 9 + 0 , 49 ⋅ 26 ,8 + 0 , 48 ⋅ 32 , 0 = 28 , 6 centners
t =1
Forecasting error ~ 9,8 %
Figure 23: Instead of
cellular – automaton forecasting 1=
+ fuzzy systems +
+ cellular automaton
we use hybrid time series forecasting
Triple - hybrid=
+ fuzzy systems +
+ cellular automaton +
+phase analysis
We should pay special attention to hybrid time series forecasting methods,
for example, triple hybrid "cellular automaton + phase analysis + fuzzy system"
(figure 23). Figures 24 and 25 demonstrate the idea of this hybrid (look at figure
24, where two-month quise-cycles on intervals B and C show the property of
17
similarity). Quasi-cycles of phase trajectory "optimize" the value of the
membership function of cellular-automaton forecast such as presented on figure
22. The error of hybrid forecasting is 20-30 per cent lower than the error of
cellular-automaton forecast.
Figure 24: Time series of weekly number of people that
got ARD
900
y j +1
Б
B number of diseased
yj
В
C
800
количество заболевших
700
600
500
400
300
200
100
1
7
13 19 25 31 37 43 49 55 61 67 73 79 85 91 97
weeks
недели
yj
y
B – base interval of time series
C – forecasting interval of time series
Figure 25: Two-month quasi-cycle of phase trajectory of
weekly time series (look at figure 24 the base
interval B, January 6, 2002 – march 3, 2002)
780
y j +1
В
С
Н
473 (5)
680
472 (4)
580
480
470 (2)
469 (1)
474 (6)
471 (3)
380
280
180
150
477 (9)
low
250
475 (7)
476 (8)
medium
350
450
y
high
550
650
750
yj
In several cases, it is possible to replace the fuzzy set with an interval. Then,
at the upper level, we consider the optimization problem with interval data.
18
Let us obtain, at the bottom level, forecasting values of parameters for
optimization problem at the upper level. Let us formulate important statements,
which are relevant to the upper level of applied optimization problem. As an
illustration, we consider well-known perfect matching problem on graph G = (V , E )
V = n . The set of all feasible solutions X = {x} is a set of such subgraphs
x = (V1 , V 2 , E x ), E x ⊆ E that every subgraph has set of edges E x that is perfect
matching in G, E x = n ∀x ∈ X .
Weight ω (e ) is assigned to every edge e ∈ E ; this weight is either a fuzzy
number or an interval ω (e ) = [ω , (e ), ω2 (e )] . Objective (5) function
F (x ) =
w (e ) → extr , extr ∈ {min, max
∑
e∈ E
}
(5)
x
attains correspondingly the values of fuzzy number or interval. In this case,
pairs of incomparable solutions x ' , x' ' ∈ X appear.
Then, in general case, the
problem with objective function (5) does not have optimum, but it has Pareto set
~
(PS) X~ ⊆ X . Then, algorithmic problem of Pareto set X or complete set of
alternatives (CSA) X 0 ⊆ X~ finding arises. Subset X 0 ⊆ X~ is called CSA if its
~
cardinality X 0 is minimal and simultaneously equality is held: F (X 0 ) = F (X ). Let
us formulate a range of statements for optimization problems on graphs with fuzzy
or interval weights.
There are demonstrated the following statements:
1. In classic statement there are the following polynomially resolvable
problems: perfect matching problems, spanning tree problem, shortest chains
problem and others. These problems are intractable in case of fuzzy or interval
weights of graph edges [4].
19
2. The class of such intractable contains polynomially resolvable subclasses.
For instance, subclass, which is defined by graphs that have edges weighted by
intervals of equal length (ω 2 (e ) − ω1 (e )) =const.
3. Every problem with objective function (5) on graphs G = (V , E ) with
interval weights ω (e ) = [ω 1 (e ), ω 2 (e )] is equivalent to two-criterion problem with
vector objective function F (x ) = (F1 (x ), F2 (x )) , where
Fv (x ) =
∑ ω (e ) → extr , v = 1,2
e∈E x
v
[4, 5].
4. Theory of algorithms with estimates is a constructive one for discrete
optimization problems with fuzzy or interval data. Particularly, statistically
effective algorithms [6] and polynomial asymptotically exact algorithms [7] have
sufficient conditions of polynomial solvability and so on exist for these problems.
5. All known operations of "fuzzy numbers summation" are based on
operations of set theory. In other words, in fuzzy systems theory there is no
arithmetic, what satisfies to informal interpretation of applied optimization
problems.
References
1. Peters Edgar E. Chaos and Order in the Capital Markets. A New View of
Cycles, Prices and Market Volatility. John Wiley & Sons, Inc. New York,
Toronto, 1996.
2. Hurst H.E. Long-term Storage of Reservoirs, Transactions of the American
Society of Civil Engineers, 116, 1951.
3. Perepelitsa V.A., Tebueva F.B., Temirova L.G. Data Structuring by
Methods of Nonlinear Dynamics under Process of Two-level Modelling.
Stavropol Booky Publisher, Stavropol, 2006 (in Russian).
20
4. Perepelitsa Vitaliy A., Коzinа Galina L. Interval Discrete Models and
Multiobdjectivity. Complexity Estimates. Interval Computations, №3
(1993), pp.51-59. Institute for New Technologies, St.Petersburg-Moscow.
5. Emelichev V.A., Perepelitsa V.A. On Cardinality of the Set of Alternatives
in Discrete Many-criterion Problems. Discrete Mathematics and
Applications, Vol.2, No.5, pp.461-471 (1992). VSP, Utrecht, the
Netherlands; Tokyo, Japan.
6. Perepelitsa V.A. On Two Problems from Theory Graphs. American
Mathem. Society “ Soviet Math. Dokl.” Vol.11 (1970), №5, 1971, pp.13761379.
7. Perepelitsa V.A. An Asymptotic Approach to the Solution of Certain
Extremal Problems on Graphs. Problems of Cybernetics, №26 (1973),
pp.291-314. Nauka, Moscow (in Russian).