k-Nearest Neighbor Neural Network Models for Very Short

energies
Article
k-Nearest Neighbor Neural Network Models for Very
Short-Term Global Solar Irradiance Forecasting Based
on Meteorological Data
Chao-Rong Chen and Unit Three Kartini *
Department of Electrical Engineering, National Taipei University of Technology, 1, Section 3,
Zhong-Xiao (Chung-Hsiao) E. Rd., Da’an Dist., Taipei 106, Taiwan; [email protected]
* Correspondence: [email protected] or [email protected]; Tel.: +886-2-27712171 (ext. 2112)
Academic Editor: Silvio Simani
Received: 24 November 2016; Accepted: 1 February 2017; Published: 8 February 2017
Abstract: This paper proposes a novel methodology for very short term forecasting of hourly global
solar irradiance (GSI). The proposed methodology is based on meteorology data, especially for
optimizing the operation of power generating electricity from photovoltaic (PV) energy. This
methodology is a combination of k-nearest neighbor (k-NN) algorithm modelling and artificial
neural network (ANN) model. The k-NN-ANN method is designed to forecast GSI for 60 min ahead
based on meteorology data for the target PV station which position is surrounded by eight other
adjacent PV stations. The novelty of this method is taking into account the meteorology data. A set of
GSI measurement samples was available from the PV station in Taiwan which is used as test data.
The first method implements k-NN as a preprocessing technique prior to ANN method. The error
statistical indicators of k-NN-ANN model the mean absolute bias error (MABE) is 42 W/m2 and the
root-mean-square error (RMSE) is 242 W/m2 . The models forecasts are then compared to measured
data and simulation results indicate that the k-NN-ANN-based model presented in this research can
calculate hourly GSI with satisfactory accuracy.
Keywords: global solar irradiance (GSI); photovoltaic (PV); very short term; forecasting; k-nearest
neighbor (k-NN); artificial neural network (ANN)
1. Introduction
Nowadays, forecasting global solar irradiance (GSI) is an essential task, particularly related to the
increased use of photovoltaic (PV) solar energy as a power source. Forecasting GSI can be executed
in different terms: long-term, medium-term, and short-term. Since solar power is categorized as an
intermittent energy source, forecasting is paramount to regulate electricity loads in power networks.
It also functions to optimize power delivery and unit commitment and by extension, it helps minimize
the operating costs of power systems [1]. With a forecast, it is expected that plant operation control
systems can be improved, so as to balance power generation and load. Moreover, distribution of load,
electric energy storage, and energy supply will be maximized and more reliable.
The performance of PV systems is heavily influenced by meteorological conditions such as
temperature, global irradiation, humidity, wind speed and wind direction [2]. The relation is clear:
electrical energy generated by the PV solar depends on the amount of the GSI received by the PV
panels. Solar irradiance absorbed in each PV panel varies depending on geographic location, time, and
the absorption capacity of the PV panels. Previous studies have presented a variety of mathematical
models for GSI forecasting in relation to meteorological variables. GSI forecasting with k-nearest
neighbor (k-NN) statistical methods has been described [3,4]. Hocaoğlu [3] and Pedro and Coimbra [4]
presented modeling of solar irradiation with stochastic methods using a k-NN artificial neural network
Energies 2017, 10, 186; doi:10.3390/en10020186
www.mdpi.com/journal/energies
Energies 2017, 10, 186
2 of 18
(ANN) at a PV station. Solar irradiation prediction is an important problem in geosciences, with direct
applications in renewable energy, where the data in the form of time series can also be analyzed using
a regression model as described in reference [5]. Salcedo-Sanz et al. [5] worked on the prediction of
daily global irradiation using a temporal Gaussian process, in which the study explains the suitability
of Gaussian regression (GPR) for the estimation of solar irradiation compared to other machine
learning regression algorithms. The GSI forecasting is not only used in stochastic modeling, but in
other studies [6,7] attempted forecasting was analyzed using exponential smoothing combined with
decomposition methods and least absolute shrinkage and selection operator model. Yang et al. [6,7]
studied the forecasting of global horizontal irradiance by exponential smoothing using decompositions,
while on another study, they developed the least absolute shrinkage and selection operator model
using irradiance very short-term forecasting. Combining a forecasting model with GSI is important
to get a better result, and forecasting GSI by the spatiotemporal pattern recognition method, ANN
method, parametric models and decomposition models, has been described in [8–10]. Spatiotemporal
pattern recognition and nonlinear principal component analysis (PCA) for global horizontal irradiance
forecasting has been proposed as well by Licciardi et al. [8]. Amrouche and Le Pivert [9] have presented
an ANN based on daily local forecasting for global solar radiation, describing a novel methodology
for local forecasting of daily global horizontal irradiance (GHI). The methodology is a combination of
spatial modelling and ANNs algorithm. Wong et al. [10], have presented solar radiation models based
on parametric models and decomposition models for predicting the average daily and hourly global
radiation, beam radiation and diffuse radiation.
Obtaining GSI forecasting for a PV station is also the subject of several studies. In the previous
references the GSI forecasting was carried out without being influenced by the location of the PV
station, but in [11,12] the GSI forecasting was very influenced by the Mediterranean location studied.
To achieve multi-horizon irradiation forecasting for Mediterranean locations, time series models have
been proposed by Paolia et al. [11]. Lorenz et al. [12] presented irradiance forecasting for the power
prediction of grid-connected PV systems. In addition to GSI forecasting using grid-connected systems
there are also other studies that similarly use the grid-connected method but different locations, as
explained in [13,14]. Wang et al. [13] studied a short term solar irradiance forecasting model based on
an ANN using statistical feature parameters. Another study on GSI forecasting using an ANN was
performed by Mellit and Pavan [14], which they applied the prediction to a grid-connected PV plant
located at Trieste, Italy. The application of a statistical method to detect the motion of cloud structures
for surface irradiance is widely used for forecasting GSI as in the previous reference, and therefore
in [15] optimization and operational validation of the GSI forecasting, i.e., short term forecasting of
solar radiation using statistical methods to determine cloud motion vector fields have been proposed
by Hammer et al. Xiao and Chaovalitwongse [16] have presented optimization models for decomposed
nearest neighbor feature selection. Validation of short and medium term operational solar radiation
forecasts in the US was studied by Perez et al. [17]. Many of the development models done by other
researchers for forecasting GSI, tried GSI forecasting with statistical methods of stochastic learning
and the development of analytical models with time series as in [18,19], i.e., forecasting of global
and direct solar irradiance using stochastic learning methods, ground experiments and the national
weather service’s (NWS) database have been proposed by Marquez et al. [18]. Martin et al. [19]
have presented GSI forecasting methods based on time series analysis that have been used to predict
half daily values of solar irradiance for the next three days. GSI forecasting models developed by
performing functional and fuzzy approach spatiotemporal model development have been used too, as
described in [20,21]. Boata and Gravila [20] have presented a functional fuzzy approach for forecasting
daily global solar irradiation and very short term forecasting of the global horizontal irradiance using
a spatiotemporal autoregressive model has been proposed by Dambreville et al. [21]. Of the various
modeling approaches for forecasting GSI in the previous references, namely developing and combining
forecasting models for short-term forecasts, [22–24] also describe several methods for forecasting GSI,
namely using support vector machines and space exponential smoothing models. A review of solar
Energies 2017, 10, 186
3 of 18
irradiance forecasting methods and a proposition for small-scale insular grids has been provided
by Diagne et al. [22]. Short term solar power prediction using a support vector machine has been
proposed Zeng et al. [23]. Dong et al. [24] have presented a short term solar irradiance forecasting
method using an exponential smoothing state space model. A comparative empirical study based on a
short term wind speed forecating model has been presented by Ren et al. [25]. Mellit et al. [26] and
Farhad et al. [27] have presented ANN models for the prediction of solar radiation and a new bloggers
Energies 2017, 10, 186
17
classification
approach with a hybrid k-NN and ANN model. Probabilistic solar power3 offorecasting
approaches
k-NN kernel
and method
selection
of input
parameters
to model
direct
irradiance
by
termbased
solar on
irradiance
forecasting
using
an exponential
smoothing
state
spacesolar
model.
A
using ANN
have been
proposed
by Zhang
andterm
Wang
[28]
andforecating
López etmodel
al. [29].
Thepresented
aforementioned
comparative
empirical
study based
on a short
wind
speed
has been
by Ren et al.[25].
et al. [26] and Farhad
al. [27] PV
havestation,
presentedbut
ANN
models
prediction
studies elaborated
on Mellit
GSI forecasting
at oneettarget
have
yetfor
tothe
forecast
GSI at PV
of
solar
radiation
and
a
new
bloggers
classification
approach
with
a
hybrid
k-NN
and
ANN
model.
stations surrounded by other PV stations.
Probabilistic solar power forecasting approaches based on k-NN kernel and selection of input
This paper presents part of a study in process seeking to estimate and predict one hour or 60 min
parameters to model direct solar irradiance by using ANN have been proposed by Zhang and Wang
ahead global
solar
irradiation
PVaforementioned
stations for energy
production.
This
part focuses
howPVto predict
[28] and
López
et al. [29].at
The
studies elaborated
on GSI
forecasting
at oneon
target
hourly GSI
for the
station GSI
with
availability
of a local
database
and based on meteorology
station,
but target
have yetPV
to forecast
at the
PV stations
surrounded
by other
PV stations.
This paper
presents
partmethodology
of a study in process
seeking to estimate
and predict and
one hour
or modelling
60
data. In this study,
a new
hybrid
that combines
k-NN modelling
ANN
min ahead global solar irradiation at PV stations for energy production. This part focuses on how to
algorithm has been developed. A k-NN-ANN method is used to forecast GSI at the target PV station by
predict hourly GSI for the target PV station with the availability of a local database and based on
means ofmeteorology
calculatingdata.
k-NNs
based on the Euclidean distance and then do the testing and training data.
In this study, a new hybrid methodology that combines k-NN modelling and ANN
Themodelling
remainder
of
the
paper
organized
as follows:method
Section
2 describes
theatmodelling
algorithm has beenisdeveloped.
A k-NN-ANN
is used
to forecast GSI
the target PVand data
from a PVstation
station,
Section
describesk-NNs
the methodology
used, i.e.,
the k-NN
anddo
neural
network
by means
of 3calculating
based on the Euclidean
distance
and then
the testing
and models,
training4 data.
while Section
presents modelling study cases for very short-term forecasting and their measured
The remainder of the paper is organized as follows: Section 2 describes the modelling and data
errors. Finally,
Section 5 presents some concluding remarks.
from a PV station, Section 3 describes the methodology used, i.e., the k-NN and neural network
models, while Section 4 presents modelling study cases for very short-term forecasting and their
2. Modelling and Data Description
measured errors. Finally, Section 5 presents some concluding remarks.
The GSI measurements were performed continuously every 5 min for four hours. The dataset
2. Modelling and Data Description
thus contains four hours of data (from 5:20 a.m. to 8:00 a.m.) collected on 8 June 2012 at nine PV station
The GSI
performed
5 min
for ◦four
hours.
TheStation
dataset D (140◦ ,
locations: Station
Ameasurements
(0◦ , 8.3 km),were
Station
B (36◦continuously
, 10.5 km), every
Station
C (93
, 10.8
km),
thus contains four◦ hours of data (from 5:20 a.m.
to 8:00 a.m.) collected on 8 ◦June 2012 at nine PV
◦
10.5 km), Station E (180 , 10 km), Station F (250 , 4.3 km), Station G (280 , 5.4 km), Station H (310◦ ,
station locations: Station A (0°, 8.3 km), Station B (36°, 10.5 km), Station C (93°, 10.8 km), Station D
◦ , 0 km), located in Taipei, Taiwan. For this study, the very short-term forecasts
9.1 km) and
Station
S (0Station
(140°,
10.5 km),
E (180°, 10 km), Station F (250°, 4.3 km), Station G (280°, 5.4 km), Station H
of GSI were
only
provided
by PV
Station
which
located
inFor
thethis
center.
nearest
neighbouring
(310°,
9.1 km)
and Station
S (0°,
0 km), S
located
in was
Taipei,
Taiwan.
study,The
the very
short-term
of GSI were
onlySprovided
by PV Station
S which
was located
in the center.
The nearest
locationsforecasts
surrounding
Station
were Station
A, Station
B, Station
C, Station
D, Station
E, Station F,
neighbouring
locations
surrounding
Station
S
were
Station
A,
Station
B,
Station
C,
Station
D,
Station
Station G and Station H (Figure 1).
E, Station F, Station G and Station H (Figure 1).
Figure 1. Modelling of Station S which is surrounded by eight other photovoltaic (PV) stations. ST: Station.
Figure 1. Modelling of Station S which is surrounded by eight other photovoltaic (PV) stations.
ST: Station.
Figure 2 illustrates the GSI values as measured at Station S. These monitored data have been
used to evaluate the methodology algorithm using k-NN-ANN as the proposed model for GSI
forecasting.
Energies 2017, 10, 186
4 of 18
Figure 2 illustrates the GSI values as measured at Station S. These monitored data have been used
to evaluate
the methodology algorithm using k-NN-ANN as the proposed model for GSI 4forecasting.
Energies 2017, 10, 186
of 17
700
GSI actual
600
GSI (W/m2)
500
400
300
200
100
0
00.00
05.35
06.00
06.25
06.50
07.10
07.40
Time
Figure 2. Measured global solar irradiance (GSI) at Station S from 5:20 a.m. to 8:00 a.m. on 8 June 2012.
Figure 2. Measured global solar irradiance (GSI) at Station S from 5:20 a.m. to 8:00 a.m. on 8 June 2012.
3. k-Nearest Neighbor and Artificial Neural Network Model for Forecasting Global Solar
3. k-Nearest
Neighbor
and
Artificial Neural
Irradiance
Based on
Meteorological
Data Network Model for Forecasting Global Solar
Irradiance Based on Meteorological Data
This section explains the basic idea of the construct methodology for GSI prediction, namely the
k-NN-ANN
model. Inthe
thisbasic
study,idea
the subject
the centralmethodology
station, which for
it is GSI
surrounded
by several
This section explains
of the isconstruct
prediction,
namely the
other PV stations. The first purpose of this study is the improvement of forecasting results using the
k-NN-ANN model. In this study, the subject is the central station, which it is surrounded by several
k-NN method combined with an ANN model method, and the process is then used to predict GSI output
other PV stations. The first purpose of this study is the improvement of forecasting results using the
result of a PV station one hour or 60 min ahead based on meteorological data.
k-NN method
combined
ANN
model
and the
is then
used
to predict
GSI
Simulation
of thewith
k-NNan
neural
networks
canmethod,
be programmed
in aprocess
few minutes
after the
recording
of
output result
of
a
PV
station
one
hour
or
60
min
ahead
based
on
meteorological
data.
the first measurements. For the GSI forecasting, k-NN-ANN method employs past meteorological data
(GSI, temperature,
humidity,
wind
speed and
direction).
The forecastinhorizon
four hours
5 min
Simulation
of the k-NN
neural
networks
can
be programmed
a few is
minutes
afterinthe
recording
increments.
The first step
developing
a k-NN-ANN
method is method
to developemploys
the database
features
of the first
measurements.
Forinthe
GSI forecasting,
k-NN-ANN
pastofmeteorological
that will be used for comparison with the current conditions and the forecast GSI a few ahead.
data (GSI, temperature, humidity, wind speed and direction). The forecast horizon is four hours in
5 min increments.
The first step in developing a k-NN-ANN method is to develop the database of
3.1. k-Nearest-Neighbors
features that will be used for comparison with the current conditions and the forecast GSI a few ahead.
The k-NN method is one of the simplest machine learning algorithm methods. The k-NN
algorithm is a non-parametric method used for classification and regression. The output depends on
3.1. k-Nearest-Neighbors
whether k-NN is used for classification or regression: in used k-NN model classification is the value
a class membership.
object validation
is classified
by a majority
vote ofThe
its neighbors,
Theoutput
k-NNismethod
is one of theAn
simplest
machine learning
algorithm
methods.
k-NN algorithm
with
the
object
being
assigned
to
the
class
most
common
among
its
k-NNs.
If
k
=
1,
then
the object
is a non-parametric method used for classification and regression. The output depends
on is
whether
simply assigned to the class of that single nearest neighbor. In a k-NN regression model, the value
k-NN is used for classification or regression: in used k-NN model classification is the value output
output is the property value for the object. This value result output training is the average of the
is a classvalues
membership.
An
object
is classified
byclassification
a majorityofvote
of its
neighbors,
with the
of its k-NNs.
The
k-NN validation
model is applied
to perform
objects
based
on learning
object being
assigned
to
the
class
most
common
among
its
k-NNs.
If
k
=
1,
then
the
object
is
data that were located closest to the object, and the method is considered the simplest among other simply
[2]. The
mainsingle
idea isnearest
that the neighbor.
k-NN algorithm
uses a regression
training set for
data modelling.
Then, is the
assignedmethods
to the class
of that
In a k-NN
model,
the value output
prediction
new points
be the
average
of the
values of
Theof
variables
employed
propertythe
value
for theofobject.
Thiscan
value
result
output
training
is its
thek-NNs.
average
the values
of its k-NNs.
for modelling very short term forecasting have been described in the previous section. The k-NN
The k-NN model is applied to perform classification of objects based on learning data that were located
predict is computed using the features assembled in the matrices in a two-step process. In the first
closest to
the object, and the method is considered the simplest among other methods [2]. The main
step, we have been calculating the pre-defined distance between the variables in the new dataset (the
idea is that
the k-NNoralgorithm
a training
modelling.
thedataset.
prediction
optimalization
the testing uses
sets and
training set
sets)for
anddata
the features
in theThen,
previous
For a of new
1
n
points can
be set
theofaverage
of the
its k-NNs.
Thewith
variables
for distances
modelling
given
featuresof
S =the
{p values
, ..., p } in
new dataset
lengthsemployed
N1, ..., Nn, the
of very
the short
previous data
arebeen
calculated.
In theinsecond
step, choosing
k-NNs
havepredict
k smallest
from
term forecasting
have
described
the previous
section.
Theand
k-NN
is distances
computed
using the
test in[26].
distance
is sorted
in ascending
order,
and
first
k elements
featurestraining
assembled
the The
matrices
in a D
two-step
process.
In the first
step,
wethe
have
been
calculating the
(
)
their associated
time stamps
[2]
D D
≤ D between
≤ ... ≤ D the and
{τ1 ,..., τ k } are extracted
pre-defined
variables
in the newk dataset
(the optimalization
or the testing
sets
S distance
S ,1
S ,2
S ,k
1
n } in
and training
sets)
and
the
features
in
the
previous
dataset.
For
a
given
set
of
features
S
=
{p
,
...,
p
Equation (1). To find the k-NN based on the Euclidean distance, this mathematical equation is used:
the new dataset with lengths N1 , ..., Nn , the distances
of the previous data are calculated. In the second
N
2
d ( x, y ) = distances
w2j ( x j − yfrom
1, 2, 3,.... test [26]. The distance D
(1)is sorted

j ) , j =training
step, choosing k-NNs and have k smallest
j =1
in ascending order, and the first k elements DS ( DS,1 ≤ DS,2 ≤ ... ≤ DS,k ) and their associated k time
Energies 2017, 10, 186
5 of 18
stamps {τ1 , ..., τk } are extracted [2] Equation (1). To find the k-NN based on the Euclidean distance,
this mathematical equation is used:
v
uN
u
2
d( x, y) = t ∑ w2j x j − y j ,j = 1, 2, 3, . . . .
(1)
j =1
where d is the number of forecast instances in the optimization set. We can calculate the distance
between two scenarios using some distance function d(x, y), where x, y are the matrix scenarios
composed of N features x = { x1 , ..., x N }, y = {y1 , ..., y N }, N is the length of data, and the distance
between the current performance and previous condition, wj is the weight value of the dependent
variable members of k-NN (kernel function) and j is the order of the k-NN based on their distance from
the current performance condition and which the nearest with used the lowest order (j = 1, ..., K).
k-Nearest Neighbor Modelling
In this research, to obtain the k-NN model forecast using the algorithm model proposed described
above, several parameters need to be specified and the k-NN modelling process can be divided into
five calculation stages. The procedure of k-NN for regression is as follows:
(1)
(2)
The matrix scenarios composed modelling stage which includes form d-dimensional feature
vectors C or nxy from the historical data x: c = c1 , c2 , ..., c p and nxy = nxy1 , nxy2 , ..., nxy p or
x: C = nxy = [ xt , xt−1 , ..., xt−d+1 ]; Their corresponding successors are denoted as xh . They are
given two pieces point c and nxy in a space vector of n-dimensional c (c1 , c2 , ..., cn ) and nxy (nxy1 ,
nxy2 , ..., nxyn ).
The distance calculate vector stage which includes form n-dimensional distance vector
dsij (c, nxy) = Di for each testing vector C or nxy by calculating the Euclidean distance between
dsij (c, nxy) = Di and the remaining:
C : dsij (c, nxy) = k Di − D j k , j 6= i
(2)
where Di is the value of the dependent variable historical data set and Dj is the value of the
dependent the nearest neighbor based on distance:
v
u k 2
u
dsij (c, nxy) = t ∑ cip − nxyipj , p = 1, 2, 3, ..., n
(3)
q =1
where dsij (c, nxy) is value of the dependent variable GSI, nxyipj is the position coordinat PV
(3)
(4)
station (magnitude of nearest neighbors) cip is the d-dimensional feature vectors, The index j is the
current condition, i is the historical dataset, k = K the number of elements in the nearest neighbors,
q is the historical dataset and where x, y are scenarios composed of k features and p is the number
of the k-NN based on their distance from the current condition (j) in which the nearest have the
lowest order (p = 1, ..., k).
Select the value distance of the k-NN stage, which includes the sort dsij (c, nxy) in ascending order
and select the first K entries as the nearest neighbors Dk , k ∈ {1, ..., K}.
Select the best value of k used in modelling the k-NN stage, because a high k value will reduce the
effect of noise on the classification, but it will make the boundaries between each classification
becomes increasingly blurred. Form a kernel function:
K ( j) =
1
j
K
∑
j =1
or k i =
1
j
1
i = 1, 2, 3, 4, 5
dis(i )
(4)
Energies 2017, 10, 186
(5)
6 of 18
where K(j) = ki is the value k-NN (kernel function), i is the historical data set and the index j is the
number of the k-NN based on their distance from the current condition (i) in which the nearest
have the lowest order (j = 1, ..., K), and K is the length data sets, and the distance between the
current and previous condition [25].
Calculate the final estimation stage, using Equation (5):
k
sumd =
∑ K j xhj
j =1
Energies 2017, 10, 186
(5)
6 of 17
h
where x hj is the magnitude of nearestsumd
neighbor
the order of the nearest neighbors
=  j =1j,K jjxis
(5) based
j
on their distance from the current condition (h) in which the nearest have the lowest order
h
nearest neighbor
j, j is the
of theand
nearest
based
where
(j = 1,
..., k),xsumd
is magnitude
the karnel of
function,
k is the length
of order
data sets,
Kj isneighbors
the kernel
function.
j is the
k
on their distance from the current condition (h) in which the nearest have the lowest order (j = 1,
3.2. Artificial
..., k),Neural
sumd Network
is the karnel function, k is the length of data sets, and Kj is the kernel function.
ANN is a mathematical method inspired by the structure and information processing of biological
3.2. Artificial Neural Network
neural networks. ANNs are intelligent systems that have the capacity to learn, memorize and create
ANN is a mathematical method inspired by the structure and information processing of
relationships
among data [30]. ANN model is a combination of pattern recognition, deductive
biological neural networks. ANNs are intelligent systems that have the capacity to learn, memorize
reasoning and numerical computations to simulate learning in the human brain. ANN consists
and create relationships among data [30]. ANN model is a combination of pattern recognition,
of andeductive
interconnected
many groups namely neurons, and its main task is processes information using
reasoning and numerical computations to simulate learning in the human brain. ANN
a connection
approach
to computation.
ANN consists
of an interconnected
namely
consists of
an interconnected
many groups
namely neurons,
and its main many
task isgroups
processes
neurons,
and
its
main
task
is
processes
information
using
a
connection
approach
to
computation.
information using a connection approach to computation. ANN consists of an interconnected manyANN
models
havenamely
been used
to predict
data [24].
The methodology
is a promising
alternative
groups
neurons,
and itssolar
mainradiation
task is processes
information
using a connection
approach
to
computation.
ANN models
have been used
predict solar
dataradiation
[24]. The methodology
is are
to traditional
approaches
for forecasting
GSI,toespecially
in radiation
cases where
measurements
a promising
alternative
traditional
approaches comprise
for forecasting
GSI,connected
especially neurons
in cases where
not readily
available.
ANN to
models
fundamentally
multiple
and nodes.
radiationnetworks
measurements
are not readily
available.
ANN
fundamentally techniques
comprise multiple
The neural
are considered
as the
member
of models
the non-parametric
which are
connected neurons and nodes. The neural networks are considered as the member of the nonusually used for estimation and classification [25]. The neurons have five basic components, i.e., input,
parametric techniques which are usually used for estimation and classification [25]. The neurons have
weight-bias, threshold, summing junction and output, as illustrated in Figure 3. Neurons are arranged
five basic components, i.e., input, weight-bias, threshold, summing junction and output, as illustrated
in three
layers3.which
consist
of input,
andwhich
output.
in Figure
Neurons
are arranged
in hidden
three layers
consist of input, hidden and output.
Figure 3. Basic structure architecture of a simple artificial neuron model [27].
Figure 3. Basic structure architecture of a simple artificial neuron model [27].
Artificial Neural Network Modelling
Artificial Neural Network Modelling
The ANN algorithm model can be divided into four step: (1) the design model and input pattern
step
which algorithm
includes the
choicecan
of be
thedivided
ANN model,
the number
its design
layers ANN,
number
of
The ANN
model
into four
step: (1)ofthe
modelthe
and
input pattern
in the
eachchoice
layer ANN,
inputsmodel,
and outputs,
the choice
and the
validation
step neuron
which groups
includes
of theitsANN
the number
of of
itstraining
layers set
ANN,
number of
set groups
samples in
which
k-NN
design;
the training
set andthe
testing
set ANN
forecasting
based
on
neuron
eachuse
layer
ANN,
its (2)
inputs
and outputs,
choice
of training
set and
validation
meteorology
data
are
presented
to
the
ANN
and
the
weights
layer
are
adjusted
accordingly
till
a on
set samples which use k-NN design; (2) the training set and testing set ANN forecasting based
predetermined condition is more better; (3) the simulation test passed step, in which the result ANN
forecasting model is tested using measurement data at nine station PV which are not treated during
the training step; and finally (4) the performance evaluation stage.
Layered perceptron ANN model with different topology designs were considered in order to
obtain the best mapping between the ANN algorithm inputs and outputs. The proposed model in
this research is used to predict the GSI values for the next hours or one hour ahead, forecasting based
Energies 2017, 10, 186
7 of 18
meteorology data are presented to the ANN and the weights layer are adjusted accordingly till a
predetermined condition is more better; (3) the simulation test passed step, in which the result ANN
forecasting model is tested using measurement data at nine station PV which are not treated during
the training step; and finally (4) the performance evaluation stage.
Layered perceptron ANN model with different topology designs were considered in order to
obtain the best mapping between the ANN algorithm inputs and outputs. The proposed model in this
research is used to predict the GSI values for the next hours or one hour ahead, forecasting based on
meteorology data and the GSI data from other PV stations. The GSI forecasting data are normalized
10, 186
7 ofnumber
17
to theEnergies
limit2017,
of [0,
1] to avoid neuron groups saturation during the learning progress. The
of neuron groups in the first hidden layer is 10 and we have to consider an ANN with 1475 inputs
normalized to the limit of [0, 1] to avoid neuron groups saturation during the learning progress. The
for GSI
forecasting.
neuron
active
function
of we
hidden
and
output
number
of neuronThe
groups
in thegroups
first hidden
layer
is 10 and
have to
consider
anlayer
ANNare
withtansig
1475 and
purelin,
respectively.
Performance
test
data
set
is
used
to
evaluate
GSI
forecast
accuracy
of
the ANN
inputs for GSI forecasting. The neuron groups active function of hidden and output layer are tansig
algorithm.
The
ANN
models
inputs
acts
directly
on
the
training
period
duration
for
GSI
forecasting
and purelin, respectively. Performance test data set is used to evaluate GSI forecast accuracy of the [9].
ANN the
algorithm.
The ANN
models
acts directly
on the training
for GSI
To choose
best design
modeling
forinputs
forecasting,
we investigated
the period
role of duration
meteorology
data in
forecasting
To choose
bestwe
design
modeling
forecasting,
we investigated
role of good
the ANN
model[9].
effectivity
andthe
then
researched
the for
impact
of the data
mode on thethe
training
meteorology
in the
ANN model
effectivity and then we researched the impact of the data mode
performance
anddata
on the
forecasting
accuracy.
on
the
training
good
performance
and
on will
the forecasting
accuracy.
In the first reason, the ANN models
learn about
the global solar irradiance profile of the
In the first reason, the ANN models will learn about the global solar irradiance profile of the
target location. At the end of the learning process, the k-NN-ANN model will be able to give the GSI
target location. At the end of the learning process, the k-NN-ANN model will be able to give the GSI
forecast results at the target PV station based on meteorology data. When compared to actual data,
forecast results at the target PV station based on meteorology data. When compared to actual data,
ANNANN
also also
presents
thethe
whole
GSI
four-hourwindow,
window,
giving
ANN
possibility
to
presents
whole
GSIvalues
valuesfor
for the
the four-hour
giving
ANN
the the
possibility
to
learnlearn
the existing
relationship
in between
them
and
GSI evolution
evolutionduring
during the
the existing
relationship
in between
them
andtotodevelop
developan
anidea
idea about
about GSI
time the
window.
In the second
reason,reason,
the entire
GSI data
are used
for for
very
short
term
forecasting
time window.
In the second
the entire
GSI data
are used
very
short
term
forecastingusing
using the k-NN-ANN
model
based on meteorological
at PV
the station,
target PV
station,
which is by
the k-NN-ANN
model based
on meteorological
data at thedata
target
which
is surrounded
surrounded
by
eight
other
PV
stations.
The
proposed
k-NN-ANN
model
has
to
forecast
GSI
eight other PV stations. The proposed k-NN-ANN model has to forecast GSI values for thevalues
next hour
theahead
next hour
or 60 min
by taking
into account
the based
forecasting
data based ondata.
meteorology
or 60for
min
by taking
intoahead
account
the forecasting
data
on meteorology
data.
The specific feature of this k-NN-ANN model is that measured data and hourly weather
The specific feature of this k-NN-ANN model is that measured data and hourly weather
meteorology data (temperature, humidity, global irradiance wind speed and direction) forecasts
meteorology data (temperature, humidity, global irradiance wind speed and direction) forecasts for
for the nearest eight surrounding stations are used as input data information, as explain by Figure 4.
the nearest eight surrounding stations are used as input data information, as explain by Figure 4. To
To model
betweenk-NN
k-NNmodel
model
prediction
actual
values
ofatGSI
modelthe
theavailable
availablerelationship
relationship between
prediction
andand
the the
actual
values
of GSI
theat the
pivotpivot
location,
an ANN
based
model
is is
used.
ANNmodels
modelscan
canbebeprogrammed
programmed in
location,
an ANN
based
model
used.The
Thetraining
training of
of the
the ANN
one hours
after the
of first
measurements.
in one ahead
hours ahead
afterrecording
the recording
of first
measurements.
Figure
4. The
proposedforecasting
forecasting model
model using
neighbor
and artificial
neural neural
networknetwork
(kFigure
4. The
proposed
usinga k-nearest
a k-nearest
neighbor
and artificial
NN-ANN) model.
(k-NN-ANN) model.
3.3. k-Nearest Neighbor and Artificial Neural Network Modelling
The forecasting problem in this research was forecasting based on meteorological data study in
progress to forecast GSI at a PV station for energy production by one hour or 60 min ahead by using
the hybrid k-NN-ANN model. The procedure of k-NN-ANN model for GSI forecasting based on
meteorological data has the following steps:
Energies 2017, 10, 186
8 of 18
3.3. k-Nearest Neighbor and Artificial Neural Network Modelling
The forecasting problem in this research was forecasting based on meteorological data study in
progress to forecast GSI at a PV station for energy production by one hour or 60 min ahead by using
the hybrid k-NN-ANN model. The procedure of k-NN-ANN model for GSI forecasting based on
meteorological data has the following steps:
Step 1: Calculate the distance of each data parameter:
r
dsGI
j ( c, nxy )
=
k
∑ q =1
k
Ho
c Ho
p − nxy pj
∑ q =1
k
Ws
cWs
p − nxy pj
k
r
ds Ta
j ( c, nxy )
=
r
ds jHo (c, nxy)
=
r
dsWs
j ( c, nxy )
=
r
dsWd
j ( c, nxy )
=
2
k
∑ q =1
∑ q =1
∑ q =1
GI
cGI
p − nxy pj
Ta
c Ta
p − nxy pj
2
, p = 1, 2, ..., n j = 1, 2, ..., n
(6)
, p = 1, 2, ..., n j = 1, 2, ..., n
(7)
2
, p = 1, 2, ..., n j = 1, 2, ..., n
2
Wd
cWd
p − nxy pj
(8)
, p = 1, 2, ..., n j = 1, 2, ..., n
2
(9)
, p = 1, 2, ..., n j = 1, 2, ..., n
(10)
and the weight distance is:
1
d21
GIx =
sumd
Ta x =
Ho x =
Ws x =
Wd x =
1
d21
sumd
1
d21
sumd
1
d21
sumd
1
d21
sumd
1
d22
· GIi +
sumd
· Tai +
· Hoi +
· Wsi +
· Wdi +
· GIi +
1
d22
sumd
1
d22
sumd
1
d22
sumd
1
d22
sumd
· Tai +
· Hoi +
· Wsi +
· Wdi +
1
d23
sumd
1
d23
sumd
1
d23
sumd
1
d23
sumd
1
d23
sumd
where dsGI
j ( c, nxy ) is the Euclidan distance GSI
ds Ta
j ( c, nxy ) is the Euclidan distance temperature
ds jHo (c, nxy) is the Euclidan distance humidity
dsWs
j ( c, nxy ) is the Euclidan distance wind speed
dsWd
j ( c, nxy ) is the Euclidan distance wind direct
GIx is the weight distance at GSI
Ta x is the weight distance at temperature
Ho x is the weight distance at humidity
Ws x is the weight distance at wind speed
Wd x is the weight distance at wind direct
GI is the global irradiance
Ta is the temperature
Ho is the humidity
1
d24
· GIi +
· Tai +
· GIi +
sumd
1
d24
sumd
· Hoi +
· Wsi +
· Wdi +
· Tai +
1
d24
sumd
1
d24
sumd
1
d24
sumd
1
d25
sumd
1
d25
sumd
· Hoi +
· GIi
(11)
· Tai
(12)
1
d25
sumd
· Wsi +
· Wdi +
· Hoi
(13)
· Wsi
(14)
· Wdi
(15)
1
d25
sumd
1
d25
sumd
Energies 2017, 10, 186
9 of 18
Ws is the wind speed
Wd is the wind direct
C pGI is the d dimensional feature vectors GSI
C pTa is the d dimensional feature vectors temperature
C pHo is the d dimensional feature vectors humidity
CWs
p is the d dimensional feature vectors wind speed
CWd
p is the d dimensional feature vectors wind direct
nxyipj is the position coordinat PV station (magnitude of nearest neighbors)
cip is the d-dimensional feature vectors
sumd is the kernel function:
i is the historical data set
j is the current condition, i is the historical dataset, k = K the number of elements in the
nearest neighbors
q is the historical dataset and where x, y are scenarios composed of k features and p is the number
of the k-NN based on their distance from the current condition (j) in which the nearest have the
lowest order (p = 1, ..., k).
Step 2: Calculate number of nearest neighbors:
k i +n =
1
i = 1, 2, 3, ..., n = 130
dis(i + n)
(16)
where k is the distance nearest neighbor, i is the historical dataset and n is the total number of
features (i = 1, 2, ..., n)
Step 3: Calculate the final estimation as:
k
sumd =
∑ K( j)xhj
(17)
j =1
where x hj is the value magnitude of the k-NN j, j is the order of the nearest neighbors based on
distance (h) in which the nearest have the lowest order (j = 1, ..., k), sumd is the karnel function, k
is the length of data sets, and the distance between the current and previous condition and Kj is
the kernel function.
Step 4: Training data and testing data using ANN method are obtained from the following steps:
a.
b.
c.
d.
e.
f.
g.
The final estimation of GSI from k-NN method is into training sets, validation sets, and test
sets for ANN model.
Architecture model and training sets parameters (create and configure the neural network,
initialize the weights layer and biases layer) are selected
Model GSI forecast using the training set are run
Model forecasting is validated using the validation set
Step b–e are repeated using different architectures model forecasting and training
set parameters
The optimum model is chosen and inserted into training process using data from the
training set and validation set
The ultimate forecasting model is assessed using the test set
Step 5: Perform predictions for future GSI data at the Station S with k-NN-ANN
Energies 2017, 10, 186
10 of 18
The forecast model designed for k-NN is described in Figure 5a. If the validation test for forecasting
the GSI is successful and get more better the result, the forecast model could perform its designed
function, otherwise one or more changes should be made during the previous process. The procedures
are shown in Figure 5b. GSI forecasting process using hybrid k-NN-ANN algorithm can also be divided
Energies 2017, 10, 186
10 of 17
into two process: (1) k-NN modelling to determine the d-dimensional feature and n-dimensional
distance
forANN
input process;
data at the(2)
ANN
(2) the ANNfor
modelling
for GSI forecasting.
distance dimensional
for input data
at the
theprocess;
ANN modelling
GSI forecasting.
The procedures
The procedures
in Figure 5c.
In this
research,
and ANN
structure construction
are displayed
in Figureare
5c.displayed
In this research,
k-NN
and
ANNk-NN
structure
construction
was programmed
was programmed using MATLAB (R2013a) programming. MATLAB is been provided with some
using MATLAB
(R2013a) programming. MATLAB is been provided with some ANN tools.
ANN tools.
Figure 5. Flowchart k-NN-ANN method very short term forecasting: (a) k-NN process; (b) ANN process;
Figure 5.and
Flowchart
k-NN-ANN method very short term forecasting: (a) k-NN process; (b) ANN
(c) hybrid k-NN-ANN model.
process; and (c) hybrid k-NN-ANN model.
3.4. Normalization
3.4. Normalization
The normalization of data input is very important to obtain good results in the ANN method
[9]. In this research for analysis, it needs normalization data for training process the GSI forecasting
The60
normalization
of data[31],
input
is very
toEquation
obtain good
min ahead, as defined
which
can beimportant
calculated by
(18): results in the ANN method [9].
In this research for analysis, it needs normalization data
for training process the GSI forecasting 60 min
GSI dn − GSI min
n
ahead, as defined [31], which can be calculated
by
Equation
(18):
GSI dnorm =
(18)
GSI max − GSI min
GSIdn − GSImin
n
GSIdnorm
=
(18)
GSImax − GSImin
Energies 2017, 10, 186
11 of 18
n
Let us denote GSIdnorm
and GSIdn be normalized target feature at frame index n and the d-GSI
output, respectively. Let us also denote GSImax and GSImin be the maximum value and minimum
values of the GSI, respectively.
4. Results and Discussion
This section discusses the result of k-NN-ANN model used to forecast future GSI by using a
one hour ahead forecasting procedure. The procedure is described in Figure 5. We tested our model
using previously described databases based on meteorology data, i.e., wind direction, wind speed, GI,
temperature, and humidity, during the 1 h or 60 min ahead process.
Data
Using the k-NN-ANN model, it is expected that a valid GSI forecast result will be produced.
The design forecast is divided into two stages:
(a)
(b)
The first stage is calculating d-dimensional feature and n-dimensional distance based on the
Euclidean (k-NN model) every hour for all of the PV stations. Data shown in Table 1 are the
order parameters of each PV station, i.e., angle, distance and position coordinates. The table
summarizes all possible combinations of variables to be considered as an input for k-NN method.
The second stage uses the ANN based on the assumption that the existing data input is a
combination of the results obtained from the k-NN model. The research model proposed in this
study seeks to estimate and predict a PV station production 60 min ahead, which position is
located at the center and surrounded by eight other PV stations.
Table 1. Data position and coordinates of the PV stations.
No.
Station
Angle (◦ )
Distance (d)
Coordinate (xi , yi )
1
2
3
4
5
6
7
8
9
A
B
C
D
E
F
G
H
S
0
36
93
140
180
250
280
310
0
8.3
10.5
10.8
10.5
10
4.3
5.4
9.1
0
(8.3, 0)
(8.5, 6.2)
(−0.6, 10)
(−8, 6.7)
(−10, 0.1)
(−5.5, −5)
(1, −5.3)
(5.8, −7)
(0.1, 0.1)
The d-dimensional feature and n-dimensional distances based on the Euclidean (k-NN model)
every hour for all of PV station are shown in Figure 6. From the simulation results using the k-NN
method based on meteorological data consisting of global irradiance, temperature, humidity, wind
speed and wind direction values, respectively. All of them are used as pre-processing data in the ANN
method, which can be calculated by the polynomial Equation (19):
f ( x, y) = p00 + p10 x + p01 y + p20 x2 + p11 xy + p02 y2
(19)
where f (x, y) is the GSI value of the k-NN method and variable x, y is the coordinate value’s position of
the PV station.
Example Figure 6a can be produced by polynomial Equation (20):
f ( x, y) = 10.74 + 0.05105x + 0.01365y + (−0.0014) x2 + 0.0015xy + 0.002923y2
(20)
Example Figure 6b can be produced by polynomial Equation (21):
f ( x, y) = 520.4 − 1.678x − 0.7583y + (−0.3799) x2 + 0.2108xy + 0.07159y2
(21)
Energies 2017, 10, 186
12 of 18
Energies 2017,
186
17
In which
the10,
polynomial
equation above is the result of the simulation on 5:20 a.m. 12
forof Figure
6a
and 8:00 a.m.
hours
for
Figure
6b
using
the
k-NN
method
for
Station
S.
Moreover
that
polynomial
In which the polynomial equation above is the result of the simulation on 5:20 a.m. for Figure 6a and
equation
also
canhours
be implemented
thethe
other
PV
stations.
8:00
a.m.
for Figure 6b for
using
k-NN
method
for Station S. Moreover that polynomial
To equation
validatealso
thecan
proposed
method,
data PV
of 60
min ahead of a PV station has been calculated
be implemented
forGSI
the other
stations.
To validate
the proposed
method,
GSI results
data of 60
ahead
of a PV station
hasactual
been calculated
using the process
described
in Section
3. The
aremin
then
compared
with the
data of the GSI
using
the
process
described
in
Section
3.
The
results
are
then
compared
with
the
actual
data
the the GSI
at target PV station as described in Section 4. Table 2 shows the optimal k-NN parametersoffor
GSI at target PV station as described in Section 4. Table 2 shows the optimal k-NN parameters for the
forecast with meteorology data every hours from 5:20 a.m. to 8:00 a.m. on 8 June 2012.
GSI forecast with meteorology data every hours from 5:20 a.m. to 8:00 a.m. on 8 June 2012.
08.00
05.20
800
G S I (W /m 2)
G S I (W /m 2 )
14
12
10
8
10
0
-10 -10
d-distance y
(a)
-5
0
5
d-distance x
10
600
400
200
10
0
d-distance y
-10
-10
-5
5
0
10
d-distance x
(b)
Figure 6. (a) d-Distance PV station based on the Euclidean (k-NN model) in time 5:20 a.m. on 8 June
Figure 6.
(a) d-Distance PV station based on the Euclidean (k-NN model) in time 5:20 a.m. on 8 June 2012;
2012; and (b) d-Distance PV station based on the Euclidean (k-NN model) in time 8:00 a.m. on 8 June 2012.
and (b) d-Distance PV station based on the Euclidean (k-NN model) in time 8:00 a.m. on 8 June 2012.
Table 2. Optimal k-NN parameters for the GSI forecast with meteorology data.
Table 2. Optimal k-NN parameters for the
GSI(x,forecast
with meteorology data.
Station
y)*
Time
Time5:20
5:25
5:30
5:205:35
5:255:40
5:305:45
5:355:50
5:405:55
5:456:00
5:506:05
5:556:10
6:15
6:00
6:20
6:05
6:25
6:10
6:30
6:15
6:35
6:20
6:40
6:256:45
6:306:50
6:356:55
6:407:00
6:457:05
6:507:10
6:557:15
7:007:20
7:057:25
7:107:35
7:157:40
7:207:45
7:257:50
7:357:55
7:408:00
7:45
7:50
7:55
8:00
ST-S
(0.1, 0.1)
11
ST-S15
(0.1, 0.1)
22
11 27
15 37
22 42
27 33
37 32
42 40
33 59
32 77
101
40
139
59
169
77
179
101
197
139
216
169230
179240
197252
216269
230292
240308
252324
269345
292357
308370
324362
345360
357344
370427
362475
360
344
427
475
ST-A
(8.3, 0)
11
ST-A
15
(8.3,
22 0)
29
11
39
15
44
22
35
29
32
39
41
44
61
35
82
32
106
41
132
61
152
82
158
106
186
132
200
152
217
158
229
186
238
200
256
217
275
229
290
238
302
256
322
275
334
290
351
302
331
322
323
334
316
414
351
457
331
ST-B
ST-C
ST-D
ST-E
ST-F
ST-G (1,
ST-H
(8.5, 6.2)
(−0.6, 10) Station
(−8, 6.7)
(−5.5, −5)
−5.3)
(5.8, −7)
(x, y)(−10,
* 0.1)
11
11
10
10
10
11
11
ST-B
ST-C
ST-D
ST-G
ST-H
15
15
15
15ST-E
15ST-F
15
15
(8.5,
−10, 0.1) (22
−5.5, −5) (1,22−5.3) (5.8,
22 6.2) (−
220.6, 10) (−
228, 6.7) (22
22 −7)
2711
2511
24 10
25 10
27 10
2811
3011
3715
3315
33 15
35 15
38 15
3915
4115
4422
4222
41 22
40 22
41 22
4222
4322
3827
3825
35 24
31 25
30 27
3128
3130
3137
3133
31 33
32 35
32 38
3239
3241
3644
3342
35 41
40 40
44 41
4442
4643
53
45
48
57
65
67
70
38
38
35
31
30
31
31
76
67
66
72
79
83
88
31
31
31
32
32
32
32
100
91
90
95
103
107
111
36
33
35
40
44
44
46
113
110
127
148
160
155
155
53
45
48
57
65
67
70
133
140
165
189
195
183
178
76
67
66
72
79
83
88
153
172
194
204
197
181
170
100
91
90
95
103
107
111
184
196
207
211
206
197
191
113
110
127
148
160
155
155
202
221
234
235
225
213
203
133
140
165
189
195
183
178
219
236
246
246
237
226
218
153
172
181
170
233
248
256194
254204
245197
235
228
184
196
197
191
243
263
272207
269211
256206
245
235
202
221
213
203
260
278
287234
285235
274225
263
254
219
236
226
218
275
294
310246
314246
304237
290
280
233
248
235
228
290
310
326256
330254
320245
305
295
243
263
245
235
302
327
347272
351269
338256
321
307
260
278
263
254
320
345
366287
373285
361274
343
330
275
294
290
280
334
360
381310
385314
372304
354
340
290
310
305
295
354
379
394326
395330
380320
365
351
302
327
321
307
334
371
397347
400351
380338
355
336
320
345
343
330
327
372
402366
405373
380361
351
327
334
360
354
340
328
367
385381
378385
352372
329
310
414
429
440394
443395
436380
426
418
354
379
365
351
450
467
486397
497400
492380
478
469
334
371
355
336
323* (x, y) are327
372 position
402values of405
380
Note:
the coordinate
the PV stations.
316
328
367
385
378
352
414
414
429
440
443
436
457
450
467
486
497
492
Note: * (x, y) are the coordinate position values of the PV stations.
351
329
426
478
327
310
418
469
Energies 2017, 10, 186
13 of 18
Energies 2017, 10, 186
13 of 17
The k-NN-ANN modelling approach is unique since the neural network algorithm is firstly
The k-NN-ANN modelling approach is unique since the neural network algorithm is firstly
developed using the k-NN model based on meteorology data. Results from the k-NN model are then
developed using the k-NN model based on meteorology data. Results from the k-NN model are then
divided it into two sets of data: the training and the validation set. After the simulation test, the ANN
divided it into two sets of data: the training and the validation set. After the simulation test, the ANN
model based on the k-NN results will be ready to use in 60 min ahead GSI forecasting. As explained
model based on the k-NN results will be ready to use in 60 min ahead GSI forecasting. As explained
previously, the optimization parameters for GSI forecasting were obtained from a dataset based on
previously, the optimization parameters for GSI forecasting were obtained from a dataset based on
meteorology data. The calculated values of GSI were then compared with measured values ( GSI ) of
meteorology data. The calculated values of GSI were then compared with measured values (GSI )
each station: Station S, Station A, Station B, Station C, Station D, Station E, Station F, Station G, Station
of
Station
S, Station
A,and
Station
B, Station C, Station
D, Station
Station
F, Station
G,
H. each
Meanstation:
absolute
bias error
(MABE)
root-mean-square
error (RMSE)
wereE,
used
as error
statistical
Station
H. Mean
absolute bias
(MABE) andmodel
root-mean-square
error (RMSE)
were
used
as error
indicators.
The estimation
fromerror
the k-NN-ANN
was then compared
with the
k-NN
model
and
statistical
indicators.
The
estimation
from
the
k-NN-ANN
model
was
then
compared
with
the
k-NN
mean average of meteorological data prediction.
model
and
average k-NN-ANN
of meteorological
prediction.
For
themean
comparison,
modeldata
performed
only 500 iterations for each learning period.
For
the
comparison,
k-NN-ANN
model
performed
500 dataset;
iterations
each learning
period.
Figure 7 shows the all the data for the three subsets: (a) only
training
(b)for
validation
dataset;
and
Figure
7
shows
the
all
the
data
for
the
three
subsets:
(a)
training
dataset;
(b)
validation
dataset;
and
(c) test dataset. In this figure the plots show for GSI versus the time. For the ANN algorithm is initially
(c)
test dataset.
figurebased
the plots
show
for model
GSI versus
theitsANN
algorithm
is initially
constructed
for In
thethis
training
on the
k-NN
data,the
andtime.
afterFor
this,
training
is periodically
as
constructed
for
the
training
based
on
the
k-NN
model
data,
and
after
this,
its
training
is
periodically
the database expands over time.
as the database expands over time.
0.8
0.8
0.6
GSI
G S I in p u t
0.6
0.4
1
0.8
0.6
G S I te s t
1
0.4
0.4
0.2
0.2
0
0
500
1000
0
0
1500
0.2
100
200
0
0
300
100
200
Periode
Periode
Periode
(a)
(b)
(c)
300
Figure
Figure 7.
7. (a)
(a) GSI
GSI for
for the
the training
training set;
set; (b)
(b) GSI
GSI for
for the
the validation
validation set;
set; and
and (c)
(c) GSI
GSI for
for the
the testing
testing set.
set.
The k-NN model database provides pretraining data, and then the database divided it into two
The k-NN model database provides pretraining data, and then the database divided it into two
sets of
data; the training data set and the validations data set, after the validation test, the k-NN-ANN
sets of data; the training data set and the validations data set, after the validation test, the k-NN-ANN
based model will be ready to use for forecasting the GSI, and the the accuracy of the model can be
based model will be ready to use for forecasting the GSI, and the the accuracy of the model can be
determined
using the testing set data that has been gained from the training data process. For the
determined
using
the testing
set datathe
that
has model
been gained
the training
data
process.
Fordata
the
present
forecast
application
program,
ANN
has to from
start learning
based
on the
training
present
forecast
application
program,
the
ANN
model
has
to
start
learning
based
on
the
training
set, and subsequently construct and sharpen the knowledge while ensuring the continuity of its task.
datathe
set,process
and subsequently
construct
sharpen
the knowledge
ensuring
thefrom
continuity
of its
For
of testing the
accuracyand
of the
forecasting
model thatwhile
has been
gained
the training
task. Forusing
the process
of testing the
accuracy
the forecasting
thatwas
has 90%
beenof
gained
from
the
process
backpropagation
method.
The of
amount
of testingmodel
data used
the total.
Note
training
process
using
backpropagation
method.
The
amount
of
testing
data
used
was
90%
of
the
total.
that for this validation, the k-NN-ANNs were trained only while data processing is done on the
Note thatdata
for this
validation,
k-NN-ANNs
werelearning
trained rate
onlyof
while
dataa processing
is done
on the
training
patterns
with a the
target
error of 0.001,
0.1 and
maximum of
50 epochs.
training
data
patterns
with
a
target
error
of
0.001,
learning
rate
of
0.1
and
a
maximum
of
50
epochs.
Figure 8 shows a comparison of the two methods for the GSI data between actual data and kFigure
8 shows
a comparison
two
methods
for and
the GSI
data between
dataterm
and
NN-ANN method,
where:
(a) there isofa the
better
between
actual
forecasting
data foractual
very short
k-NN-ANN
method,
where:
(a)S;there
is ashows
betterthe
between
actualGSI
and
forecasting
data
for very
GSI
forecasting
at Station
target
and (b)
normalized
curve
using the
k-NN-ANN
short
term
GSI
forecasting
at
Station
target
S;
and
(b)
shows
the
normalized
GSI
curve
using
the
method.
k-NN-ANN method.
700
0.8
Ac
k-NN-ANN
0.7
500
Normalized value
G lobal S olar Irradianc e (W /m 2)
600
0.9
Actual
k-NN-ANN
400
300
200
0.6
0.5
0.4
0.3
0.2
100
0
00.00
0.1
05.35
06.00
06.25
06.50
Time
(a)
07.10
07.40
0
00.00
05.35
06.00
06.25
06.50
Time
(b)
07.10
07.40
that for this validation, the k-NN-ANNs were trained only while data processing is done on the
training data patterns with a target error of 0.001, learning rate of 0.1 and a maximum of 50 epochs.
Figure 8 shows a comparison of the two methods for the GSI data between actual data and kNN-ANN method, where: (a) there is a better between actual and forecasting data for very short term
Energies
2017, 10, 186at Station target S; and (b) shows the normalized GSI curve using the k-NN-ANN
14 of 18
GSI
forecasting
method.
700
0.9
Actual
k-NN-ANN
0.8
Ac
k-NN-ANN
0.7
500
Normalized value
G lobal S olar Irradianc e (W /m 2)
600
400
300
200
0.6
0.5
0.4
0.3
0.2
100
0.1
0
00.00
05.35
06.00
06.25
06.50
07.10
0
00.00
07.40
Time
Energies 2017, 10, 186
05.35
06.00
(a)
06.25
06.50
Time
07.10
(b)
07.40
14 of 17
Figure 8. Comparison of the two methods for the GSI data: (a) very short term GSI forecasting using
Figure 8. Comparison of the two methods for the GSI data: (a) very short term GSI forecasting using
the k-NN-ANN model versus actual data for station S; and (b) the normalized GSI curve using the
the k-NN-ANN model versus actual data for station S; and (b) the normalized GSI curve using the kk-NN-ANN method.
NN-ANN method.
Figure
Figure 99 illustrates
illustrates the
the comparison
comparison of
of GSI
GSI forecasting
forecasting in
in a four hour window (5:20 a.m.–8:00 a.m.)
on
basedon
onthe
thek-NN-ANN
k-NN-ANNmodel,
model,actual
actual
data,
and
k-NN
method.
result
shows
on 88 June
June 2012, based
data,
and
thethe
k-NN
method.
TheThe
result
shows
that
that
very
short
term
forecasting
simulation
using
k-NN-ANN
method
during
a
four
hour
window
very short term forecasting simulation using k-NN-ANN method during a four hour window gives better
gives
results
to the k-NN method.
resultsbetter
compared
tocompared
the k-NN method.
700
Global Solar Irradiance (W/m2)
600
Actual
k-NN
k-NN-ANN
500
400
300
200
100
0
00.00
05.35
06.00
06.25
06.50
07.10
07.40
Time
Figure 9.
9. Very
Veryshort
shortterm
termGSI
GSIforecasting
forecasting in
in5-h
5-hwindow
windowbased
based k-NN-ANN
k-NN-ANN model,
model, actual
actual data,
data, and
and
Figure
k-NN
method.
k-NN method.
Global Solar Irradiance (W/m2)
Figure 10 illustrates a very short-term (60 min ahead) GSI forecast using k-NN-ANN and its
Figure 10 illustrates a very short-term (60 min ahead) GSI forecast using k-NN-ANN and its
comparison with actual data. It is evident that the k-NN-ANN model is in a good agreement with the
comparison with actual data. It is evident that the k-NN-ANN model is in a good agreement with the
measured data at the object station.
measured data at the object station.
700 the performance of the models, a statistical error measurement was used in the
To evaluate
experiment, namely
theActual
MABE, and RMSE. To evaluate the accuracyTheoff orecasting
eachGSImethod
to forecast the GSI
f or
600
k-NN-ANN
1 hourly or 60 minute ahead
values, MABE, and RMSE coefficient between results of k-NN-ANN and actual ground measurements
500
were calculated.
These statistical error indicators validation forecasting are calculated according to
Equations (22)400and (23) and the results statistical error are shown in Table 3.
300
200
100
0
00.00
05.35
06.00
06.25
06.50
07.10
07.40
09.00
Time
Figure 10. Very short term (60 min ahead) GSI forecast using k-NN-ANN versus actual data.
To evaluate the performance of the models, a statistical error measurement was used in the
experiment, namely the MABE, and RMSE. To evaluate the accuracy of each method to forecast the GSI
Figure 9. Very short term GSI forecasting in 5-h window based k-NN-ANN model, actual data, and
k-NN method.
Figure 10 illustrates a very short-term (60 min ahead) GSI forecast using k-NN-ANN and its
comparison
with
Energies
2017, 10,
186 actual data. It is evident that the k-NN-ANN model is in a good agreement with
15 ofthe
18
measured data at the object station.
700
Global Solar Irradiance (W/m2)
600
Actual
k-NN-ANN
The f orecasting GSI f or
1 hourly or 60 minute ahead
500
400
300
200
100
0
00.00
05.35
06.00
06.25
06.50
07.10
07.40
09.00
Time
Figure
10. Very
short term
term (60
(60 min
min ahead)
ahead) GSI
GSI forecast
forecast using
using k-NN-ANN
k-NN-ANN versus
versus actual
actual data.
data.
Figure 10.
Very short
To evaluate the performance of the models, a statistical error measurement was used in the
experiment,
the MABE,
and RMSE.
themodels.
accuracy
of each
method
to forecast
the GSI
Error statistical
indicators
of the To
GSIevaluate
forecasting
MABE:
mean
absolute
bias error;
Table 3. namely
values,
MABE,
and RMSE coefficient
between results of k-NN-ANN and actual ground measurements
RMSE:
root-mean-square
error.
were calculated. These statistical error indicators validation forecasting are calculated according to
Model
Equations (22) and (23) and
theError
results statistical
error are shownk-NN-ANN
in Table 3.
k-NN
Indicators
44
42 mean absolute bias error;
Table 3. Error statistical
indicators
models. MABE:
MABE
(W/m2 ) of the GSI forecasting
251
242
RMSEerror.
(W/m2 )
RMSE: root-mean-square
Model Error Indicators
2)
MABE is calculated according
to Equation
(21):
MABE (W/m
RMSE (W/m2)
MABE =
k-NN
44
251
k-NN-ANN
42
242
√
−b ± b2 − 4ac
1 N G f ,i − Gm,i ∑
N i =1
2a
(22)
RMSE is calculated according to Equation (22):
"
RMSE(k ) =
1 N 2
e (t + k|t)
N i∑
=1
#1/2
(23)
where e(t) is the forecasting data and k(t) is the measured (observed) data:
p(t + k|t) = P(t + k) − P(t + k|t)
where G f ,i is forecasted value GSI and Gm,i is measured value GSI, (i = 1, 2, ..., N), N is the number of
the GSI data, i is the number index variations.
Figure 11 illustrates the MABE and RMSE coefficients for the actual data and the very short term
(60 min ahead) forecasting performance using k-NN and the k-NN-ANN model. The error statistical
indicators of the k-NN model are MABE 44 W/m2 and RMSE 251 W/m2 . On the other hand, the
error statistical indicators for the proposed model (k-NN-ANN model) are MABE 42 W/m2 and RMSE
242 W/m2 . Note that the highest RMSE was 191 (W/m2 ) during the thirty-two period, while the
lowest a value was found to be 9 (W/m2 ) for the same location. It is evident that k-NN-ANN model
displays better predictions than the k-NN model.
(60 min ahead) forecasting performance using k-NN and the k-NN-ANN model. The error statistical
indicators of the k-NN model are MABE 44 W/m2 and RMSE 251 W/m2. On the other hand, the error
statistical indicators for the proposed model (k-NN-ANN model) are MABE 42 W/m2 and RMSE 242
W/m2. Note that the highest RMSE was 191 (W/m2) during the thirty-two period, while the lowest a
2
value was
Energies
2017,found
10, 186 to be 9 (W/m ) for the same location. It is evident that k-NN-ANN model displays
16 of 18
better predictions than the k-NN model.
45
250
k-NN model
k-NN-ANN model
k-NN-ANN model
k-NN model
40
200
35
R M S E (W / m 2 )
M A B E (W / m 2 )
30
150
100
25
20
15
10
50
5
0
0
5
10
15
20
Periode
(a)
25
30
35
0
0
5
10
15
20
25
30
35
Periode
(b)
Figure
11. (a)
(a) MABE
MABE coefficients
coefficientsbetween
betweenactual
actualdata
dataand
andGSI
GSI
forecasts
using
k-NN-ANN
model
Figure 11.
forecasts
using
thethe
k-NN-ANN
model
on
on
the
test
data
set;
and
(b)
RMSE
coefficients
between
actual
data
and
GSI
forecasts
using
the
k-NNthe test data set; and (b) RMSE coefficients between actual data and GSI forecasts using the k-NN-ANN
ANN
testset.
data set.
modelmodel
on theon
testthe
data
5. Conclusions
5. Conclusions
A new methodology for very short term (60 min ahead) GSI forecasting of a target PVstation has
A new methodology for very short term (60 min ahead) GSI forecasting of a target PVstation
been introduced. In this work we propose a novel methodology for GSI forecasting using a
has been introduced. In this work we propose a novel methodology for GSI forecasting using a
combination of k-NN modelling and an ANN. The model estimates and predicts the GSI profiles of
combination of k-NN modelling and an ANN. The model estimates and predicts the GSI profiles of PV
PV stations in very short term (60 min ahead) based on hourly meteorology data from eight
stations in very short term (60 min ahead) based on hourly meteorology data from eight surrounding
surrounding PV stations. The following conclusions can be drawn from this research:
PV stations. The following conclusions can be drawn from this research:

A different formulation for very short term GSI forecasting using k-NN-ANN modelling based
•
A
formulation
for very The
short
term GSI
forecasting
using
k-NN-ANN
modelling
on different
meteorology
data is proposed.
proposed
model
attempting
to shape
the patterns
of a
based
on
meteorology
data
is
proposed.
The
proposed
model
attempting
to
shape
the
polynomial equation shows that the proposed model forecasting is more better. The
variable
patterns
of a data
polynomial
shows
the proposed
model
is more
better.
meteorology
weatherequation
is of great
very that
importance
and affects
theforecasting
resulting GSI
forecasting
The
variable
meteorology
data
weather
is
of
great
very
importance
and
affects
the
resulting
GSI
output.
forecasting
output.

The new model
proposed in this study is a combination of k-NN modelling and an ANN model.
•
The
new
model
proposed
this study
a combination
of term
k-NNperiod
modelling
andahead)
an ANN
model.
The model is employed
toin
forecast
GSI is
data
for very short
(60 min
based
on
The model is employed to forecast GSI data for very short term period (60 min ahead) based
on meteorology data. This research concerns how to predict GSI data at a target PV station,
which is surrounded by eight other PV stations. The study also considers the availability of a
local measured database. It clearly shows that the GSI forecasting using a different k-NN-ANN
model for every hour based on meteorological data giving a better result output, which means
the GSI forecasting largely depends on variable meteorological data, where the meteorology data
variables consist of GI, wind speed, wind direct, humidity and temperatures.
•
This paper utilises k-NN-ANN modelling to determine the d-dimensional features and
n-dimensional distances. The results demonstrate that the results of the k-NN-ANN model
are closely matched with the actual data, and are better than data obtained from k-NN models.
The novelty of this article is to predict GSI at a PV station which position is at the center
surrounded by eight other adjacent PV stations. The proposed model is able to learn the characteristics
of meteorology weather data for the past four hours and use the data as model input. In this paper,
the proposed k-NN-ANN model has better approximation compared to k-NN model. The very short
term forecast evaluations of the GSI using the k-NN-ANN model are performed for only four hours
and the results show that the k-NN-ANN method is better than the k-NN method. The error statistical
indicators of the k-NN model are 44 W/m2 for the MABE and 251 W/m2 for the RMSE. On the other
hand, the error statistical indicators for the proposed model (k-NN-ANN model) are 42 W/m2 (MABE)
and 242 W/m2 (RMSE). We noted that the highest RMSE was 191 (W/m2 ) during the thirty-two hour
Energies 2017, 10, 186
17 of 18
period, while the lowest value was found to be 9 (W/m2 ) for the same location. The performance
of the proposed k-NN-ANN method is more better compared with the k-NN model. The proposed
k-NN-ANN model can therefore be used effectively to forecast very short term GSI data while giving
closer result outputs and better matches with actual measured data.
Author Contributions: Chao-Rong Chen and Unit Three Kartini conceived and designed the experiments;
Chao-Rong Chen performed the experiments; Chao-Rong Chen and Unit Three Kartini analyzed the data;
Chao-Rong Chen and Unit Three Kartini contributed reagents/materials/analysis tools; Unit Three Kartini wrote
the paper.
Conflicts of Interest: The authors declare no conflict of interest.
References
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
Yang, H.T.; Yang, P.C.; Huang, C.L. Optimization of Unit Commitment Using Parallel Structures of Genetic
Algorithm. In Proceedings of the 1995 International Conference on Energy Management and Power Delivery,
Singapore, 21–23 November 1995.
Chiang, C.-T.; Lee, Y.-S.; Li, X.R.; Liao, C.-C. A RSCMAC Based Forecasting for Solar Irradiance from Local
Weather Information. In Proceedings of the WCCI 2012 IEEE World Congress on Computational Intelligence,
Brisbane, Australia, 10–15 June 2012.
Hocaoğlu, F.O. Stochastic approach for daily solar radiation modeling. Sol. Energy 2011, 85, 278–287.
[CrossRef]
Pedro, H.T.C.; Coimbra, C.F.M. Nearest-neighbor methodology for prediction of intra-hour global horizontal
and direct normal irradiances. Renew. Energy 2015, 80, 770–782. [CrossRef]
Salcedo-Sanz, S.; Casanova-Mateo, C.; Munoz-Marí, J.; Camps-Valls, G. Prediction of Daily Global Solar
Irradiation Using Temporal Gaussian Processes. IEEE Geosci. Remote Sens. Lett. 2014, 11, 1936–1940.
[CrossRef]
Yang, D.; Sharmaa, V.; Ye, Z.; Lim, L.I.; Zhao, L.; Aryaputera, A.W. Forecasting of global horizontal irradiance
by exponential smoothing using decompositions. Energy 2015, 81, 111–119. [CrossRef]
Yang, D.; Ye, Z.; Lim, L.H.I.; Dong, Z. Very short term irradiance forecasting using the lasso. Sol. Energy 2015,
114, 314–326. [CrossRef]
Licciardi, G.A.; Dambreville, R.; Chanussot, J.; Dubost, S. Spatiotemporal Pattern Recognition and Nonlinear
PCA for Global Horizontal Irradiance Forecasting. IEEE Geosci. Remote Sens. Lett. 2015, 12, 284–288.
[CrossRef]
Amrouche, B.; Le Pivert, X. Artificial neural network based daily local forecasting for global solar radiation.
Appl. Energy 2014, 130, 333–341. [CrossRef]
Wong, L.T.; Chow, W.K. Solar radiation model. Appl. Energy 2001, 69, 191–224. [CrossRef]
Voyant, C.; Paolia, C.; Musellia, M.; Niveta, M.-L. Multi-horizon Irradiation Forecasting for Mediterranean
Locations Using Time Series Models. Energy Procedia 2014, 57, 1354–1363.
Lorenz, E.; Hurka, J.; Heinemann, D.; Beyer, H.G. Irradiance Forecasting for the Power Prediction of
Grid-Connected Photovoltaic Systems. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2009, 2, 2–10.
[CrossRef]
Wang, F.; Mi, Z.; Su, S.; Zhao, H. Short-Term Solar Irradiance Forecasting Model Based on Artificial Neural
Network Using Statistical Feature Parameters. Energies 2012, 5, 1355–1370. [CrossRef]
Mellit, A.; Pavan, A.M. A 24-h forecast of solar irradiance using artificial neural network: Application for
performance prediction of a grid-connected PV plant at Trieste Italy. Sol. Energy 2010, 84, 807–821. [CrossRef]
Hammer, A.; Heinemann, D.; Lorenz, E.; Luckehe, B. Short-Term Forecasting of Solar Radiation: A Statistical
Approach Using Satellite Data. Sol. Energy 1999, 67, 139–150. [CrossRef]
Xiao, C.; Chaovalitwongse, W.A. Optimization Models for Feature Selection of Decomposed Nearest
Neighbor. IEEE Trans. Syst. Man Cybern. Syst. 2016, 46, 2168–2216. [CrossRef]
Perez, R.; Kivalov, S.; Schlemmer, J.; Hemker, K., Jr.; Renné, D.; Hoff, T.E. Validation of short and medium
term operational solar radiation forecasts in the US. Sol. Energy 2010, 84, 2161–2172. [CrossRef]
Marquez, R.; Coimbra, C.F.M. Forecasting of global and direct solar irradiance using stochastic learning
methods, ground experiments and the NWS database. Sol. Energy 2011, 85, 746–756. [CrossRef]
Energies 2017, 10, 186
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
18 of 18
Martin, L.; Zarzalejo, L.F.; Polo, J.; Navarro, A.; Marchante, R.; Cony, M. Prediction of global solar irradiance
based on time series analysis: Application to solar thermal power plants energy production planning.
Sol. Energy 2010, 84, 1772–1781. [CrossRef]
Boata, R.S.; Gravila, P. Functional fuzzy approach for forecasting daily global solar irradiation. Atmos. Res.
2012, 112, 79–88. [CrossRef]
Dambreville, R.; Blanc, P.; Chanussot, J.; Boldo, D. Very short term forecasting of the Global Horizontal
Irradiance using a spatio-temporal autoregressive model. Renew. Energy 2014, 72, 291–300. [CrossRef]
Diagne, M.; David, M.; Lauret, P.; Boland, J.; Schmutz, N. Review of solar irradiance forecasting methods
and a proposition for small-scale insular grids. Renew. Sustain. Energy Rev. 2013, 27, 65–76. [CrossRef]
Zeng, J.; Qiao, W. Short-term solar power prediction using a support vector machine. Renew. Energy 2013, 52,
118–127. [CrossRef]
Dong, Z.; Yang, D.; Reindl, T.; Walsh, W.M. Short-term solar irradiance forecasting using exponential
smoothing state space model. Energy 2013, 55, 1104–1113. [CrossRef]
Ren, Y.; Suganthan, P.N.; Srikanth, N. A Comparative Study of Empirical Mode Decomposition-Based
Short-Term Wind Speed Forecasting Methods. IEEE Trans. Sustain. Energy 2015, 6, 236–244. [CrossRef]
Mellit, A.; Benghanem, M.; Bendekhis, M. Artificial Neural Network Model for Prediction Solar Radiation
Data: Application for Sizing Stand-Alone Photovoltaic Power System. In Proceedings of the 2005 IEEE
Power Engineering Society General Meeting, San Francisco, CA, USA, 12–16 June 2005.
Farhad, S.G.; Khaze, S.R.; Malekl, I. A new Approach in Bloggers Classification with Hybrid of K-Nearest
Neighbor and Artificial Neural Network Algorithms. Indian J. Sci. Technol. 2015, 8, 237–246.
Zhang, Y.; Wang, J. GEFCom2014 Probabilistic Solar Power Forecasting based on k-Nearest Neighbor and
Kernel Density Estimator. In Proceedings of the 2015 IEEE Power & Energy Society General Meeting, Denver,
CO, USA, 26–30 July 2015.
López, G.; Batlles, F.J.; Tovar-Pescador, J. Selection ofinput parameters to model direct solar irradiance by
using artificial neural networks. Energy 2005, 30, 1675–1684. [CrossRef]
Anil, K.; Jain, J.; Mao, K. Mohiuddin, Artificial Neural Networks: A Tutorial. Computer 1996, 29, 31–44.
Wu, B.; Li, K.; Yang, M.; Lee, C.-H. A Reverberation-Time-Aware Approach to Speech Dereverberation Based
on Deep Neural Networks. IEEE/ACM Trans. Audio Speech Lang. Process. 2017, 25, 102–111. [CrossRef]
© 2017 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).