RO10 Code.txt - FTP Directory Listing

###======================================================================
==================================###
###
BEGIN FUNCTION AND VARIABLE
DEFINITIONS
###
###======================================================================
==================================###
####################################
### SEGMENT: PLOTTING FUNCTIONS ###
####################################
###
### 1. setColors: Generates a color map scale using a supplied set of
anchor
###
colors. Called by plotMap().
###
###
cols: A set of anchor colors.
###
###
divs: The number of different colors between each anchor color.
Allowed
###
range is 0 to 255
###
#########################################################################
################
###
### 2. mapParams: Sets up projection parameters for plotMap().
###
###
database: Database to use in the maps package
###
###
boundary: (Logical) Draw boundaries?
###
###
orientation: Vector of lat, lon, and rotation for centering the
map
###
###
fill: (Logical) Fill polygon interiors?
###
###
col: Color for polygon outlines/fill; can be a vector
###
###
xlim: Vector of longitude limits
###
###
ylim: Vector of latitude limits
###
###
wrap: (Logical) Wrap polygons that stretch across the screen?
###
###
bg: Background color
###
###
border: Percent of plot area to use as white space
###
###
lat.lines/lon.lines: (Logical) Draw latitude/longitude lines?
###
###
lat.start/lon.start: Starting latitude/longitude
###
###
lat.end/lon.end: Ending latitude/longitude
###
###
lat.col/lon.col: Color for latitude/longitude lines
###
###
lat.res/lon.res: Resolution for lines
###
###
type: Draw points or rectangles
###
###
pch: Plot character to use
###
###
cex: Size for plot characters
###
#########################################################################
################
###
### 3. makeOverlay: Formats a set of points to use as an overlay in
plotMap().
###
###
X: Data set
###
###
coord: Lat, lon, spatial weights, latitude span, and longitude
span
###
(latter 3 columns are not used for the Antarctica plots)
###
###
norm: Normalize data to a maximum of -1/+1?
###
###
convert: (Logical) Set to TRUE if color scale needs to be
assigned
###
to the data
###
###
pch: Plotting character
###
###
cex: Plotting character size
###
###
type: "rect" for rectangles, "points" for points
###
###
density: Lines per inch for shaded rectangles
###
###
angle: Angle of fill lines for shaded rectangles
###
#########################################################################
################
###
### 4. plotMap: Plots points on a map.
###
###
dat: The data to be plotted
###
###
coord: Matrix containing latitudes, longitudes, spatial weights
(not used),
###
latitude span (not used), and longitude span (not used)
###
###
rng: Color scale maximum (scale will be made from -rng to +rng)
###
###
divs: Number of divisions between anchor colors (0 to 255
allowable)
###
###
cols: Vector of anchor colors
###
###
na.col: Color to use for NA points
###
###
m.title: Main plot title
###
###
units: Text for units to be displayed on the color scale
###
###
font.size: Factor to expand font size
###
###
project: Projection type
###
###
map.params: Output of mapParams()
###
###
norm: (Logical) Normalize data to a maximum of -1/+1?
###
###
overlays: (Logical) Are there supplied overlays?
###
###
Y: Output of makeOverlay()
###
#########################################################################
################
###
### 5. mapRectPlot/mapPointPlot/assignColors: Subroutines for plotMap
###
#########################################################################
################
###
### 6. plotSat: Overplots satellite coverage periods.
###
###
max: y position coordinate for satellite labels
###
###
s: cex setting for text size
###
###
rt: Text rotation (degrees)
###
###
type: 1 = numeric, 2 = date
###
#########################################################################
################
###
### 7. plotEst: Plots results of paired Wilcoxon/T-Tests
###
###
est.w: Vector of Wilcoxon results
###
###
est.t: Vector of T-Test results
###
###
rng: Range the test was performed over
###
#########################################################################
################
###
### 8. plotTrend: Plots continental averages with trends annotated
###
###
dat: Time series of anomalies
###
###
st / en: Start and stop times
###
###
x.pos / y.pos: X and Y positions for the text annotation
###
###
main.t: Text for the main plot label
###
#########################################################################
################
###
### 9. plotBar: Generic bar plot to sort stations by location and
color.
###
###
dat: Input data matrix
###
###
stns: Vector of station names
###
###
cols: Vector of colors
###
###
stat: Row number containing the statistic to be plotted in
"dat"
###
###
y.lb: Y-axis label
###
###
x.lb: X-axis label
###
###
main.t: Main title text
###
###
sub.t: Subtitle text
###
###
b.mar: Bottom margin width in lines
###
###
x.cex: Font size for text station labels
###
###
shadow: Data set to be displayed in gray behind the input data
###
matrix
###
###
p.ch: Plotting character for line end points
###
#########################################################################
################
###
### 10. plotWts: Plots the output of get.wts(), which is used to
compare
###
imputation spatial structure vs. AVHRR spatial
structure
###
###
dat: The output of get.wts()
###
###
method: "S09" or "E-W"
###
###
type: Type of plot
###
###
= 1: Plot the spatial structure of the selected
imputation eof
###
= 2: Plot the spatial structure used to impute a
selected PC vs.
###
the spatial structure from the AVHRR data
###
###
eof: The imputation EOF to plot
###
###
pc: The AVHRR PC to plot
###
###
main.t: Text for a plot title
###
setColors=function (cols, divs) {
### Ensure divs within limits
if (divs < 0) {divs = 0}
if (divs > 255) {divs = 255}
### Get number of anchor points
num.pts = length(cols)
### Start the color map
len = length(cols) - 1
color.map = cols[1]
### Interpolate between anchors
for (i in 1:len) {
### Convert to RGB
col.st = col2rgb(cols[i])
col.en = col2rgb(cols[i + 1])
### Find steps
r = (col.en[1, 1] - col.st[1, 1]) / (divs + 1)
g = (col.en[2, 1] - col.st[2, 1]) / (divs + 1)
b = (col.en[3, 1] - col.st[3, 1]) / (divs + 1)
### Define # of steps
ramp = seq(1:(divs + 1))
### Interpolate
r.ramp = col.st[1, 1] + ramp * r
g.ramp = col.st[2, 1] + ramp * g
b.ramp = col.st[3, 1] + ramp * b
### Concatenate
color.map = c(color.map, rgb(red = r.ramp, green = g.ramp,
blue = b.ramp, maxColorValue = 255))
}
### Return map
color.map
}
mapParams = function (database = "world", boundary = TRUE, parameters =
NULL, orientation = c(-90, 0, 0), fill = FALSE, col = 1, xlim = c(-180,
180), ylim = c(-90, -60), wrap = TRUE, bg = "white", border = c(0.2,
0.2), lat.lines = FALSE, lat.start = -90, lat.end = 90, lat.inc = 10,
lat.col = "darkgray", lat.lty = 3, lat.res = 1, lon.lines = FALSE,
lon.start = -180, lon.end = 180, lon.inc = 45, lon.col = "darkgray",
lon.lty = 3, lon.res = 1, type = "points", pch = 15, cex = 1, lwd = 1) {
### Assemble into list for plotMap()
map.params = list()
map.params[[1]] = database
map.params[[2]] = boundary
map.params[[3]] = parameters
map.params[[4]] = orientation
map.params[[5]] = fill
map.params[[6]] = col
map.params[[7]] = xlim
map.params[[8]] = ylim
map.params[[9]] = wrap
map.params[[10]] = bg
map.params[[11]] = border
map.params[[12]] = lat.lines
map.params[[13]] = c(lat.start, lat.end, lat.inc, lat.col, lat.lty,
lat.res)
map.params[[14]] = lon.lines
map.params[[15]] = c(lon.start, lon.end, lon.inc, lon.col, lon.lty,
lon.res)
map.params[[16]] = type
map.params[[17]] = pch
map.params[[18]] = cex
map.params[[19]] = lwd
map.params
}
makeOverlay = function (X, coord = coords, norm = FALSE, convert = FALSE,
pch = 15, cex = 0.3, lwd = 1, type = "points", density = NULL, angle =
45) {
### Format for read by plotMap()
overlay = list()
overlay$data = X
overlay$char = pch
overlay$size = cex
overlay$lwd = lwd
overlay$lat = coord[, 1]
overlay$lon = coord[, 2]
overlay$lat.span = coord[, 4]
overlay$lon.span = coord[, 5]
overlay$norm = norm
overlay$convert = convert
overlay$type = type
overlay$density = density
overlay$angle = angle
overlay
}
plotMap=function (X, coord = coords, rng = 1, divs=255, cols =
anchor.colors, na.col = "gray", m.title = NA, units = NA, font.size =
0.8, project = "azequidistant", map.params = NA, norm = FALSE, overlays =
FALSE, Y = NA, outline = TRUE) {
library(mapproj)
### Read coordinates
lat = coord[, 1]
lon = coord[, 2]
lat.span = coord[, 4]
lon.span = coord[, 5]
### If no user defined projection parameters, load defaults
if (is.na(map.params[[1]])) {map.params = mapParams()}
### Assign projection parameters
dbase = map.params[[1]]
bound = map.params[[2]]
p.params = map.params[[3]]
orientate = map.params[[4]]
poly.fill = map.params[[5]]
poly.cols = map.params[[6]]
xlims = map.params[[7]]
ylims = map.params[[8]]
wrapper = map.params[[9]]
backgnd = map.params[[10]]
borders = map.params[[11]]
lat.lines = map.params[[12]]
lat.params = map.params[[13]]
lon.lines = map.params[[14]]
lon.params = map.params[[15]]
type = map.params[[16]]
char = map.params[[17]]
char.size = map.params[[18]]
line.w = map.params[[19]]
### Split screen
if (screen() != FALSE) {close.screen(all.screens = TRUE)}
fig = matrix(nrow = 3, ncol = 4)
fig[1, ] = c(0, 1, 0.85, 1)
fig[2, ] = c(0, .85, 0, 0.85)
fig[3, ] = c(.85, 1, 0, 0.85)
split.screen(figs = fig)
screen(2)
### Plot projection
par(mar = c(0, 0, 0, 0) + 0.1)
map(database = dbase, boundary = bound, projection = project,
parameters = p.params, orientation = orientate, fill = poly.fill, col =
poly.cols, xlim = xlims, ylim = ylims, wrap = wrapper, bg = backgnd,
border = borders)
### Generate color scale
map.cols = setColors(cols, divs)
### Convert data to color scale
lims = c(-rng, rng)
temp = assignColors(X, lims, map.cols, norm)
cols = temp[[1]]
na.map = temp[[2]]
### Plot
if (type == "rect") {
temp = mapRectPlot(map.cols[cols], lat, lon, lat.span,
lon.span, dens = NULL, ang = 45)
### Plot NAs
if (sum(na.map) > 0) {
temp = mapRectPlot(rep(na.col, length(lat))[na.map],
lat[na.map], lon[na.map], lat.span[na.map], lon.span[na.map], dens =
NULL, ang = 45)
}
}
### Plot
if (type == "points") {
temp = mapPointPlot(map.cols[cols], lat, lon, char,
char.size, line.w)
### Plot NAs
if (sum(na.map) > 0) {
temp = mapPointPlot(rep(na.col, length(lat))[na.map],
lat[na.map], lon[na.map], char, char.size, line.w)
}
}
### Overlays?
if (overlays == TRUE) {
### Find number of overlays
num.overlays = length(Y)
### Iterate through each overlay
for (i in 1:num.overlays) {
### Extract data
overlay = Y[[i]]
dat = overlay$data
char = overlay$char
size = overlay$size
line.w = overlay$lwd
lat = overlay$lat
lon = overlay$lon
lat.span = overlay$lat.span
lon.span = overlay$lon.span
ovr.norm = overlay$norm = norm
convert = overlay$convert
type = overlay$type
dens = overlay$density
ang = overlay$angle
over.cols=dat
### Convert?
if (convert == TRUE) {
converted = assignColors(dat, lims, map.cols,
norm)
cols = converted[[1]]
over.cols=map.cols[cols]
na.map = converted[[2]]
}
### Plot overlay
if (type == "rect") {
temp = mapRectPlot(over.cols, lat, lon, lat.span,
lon.span, dens, ang)
}
if (type == "points" ) {
temp = mapPointPlot(over.cols, lat, lon, char,
size, line.w)
}
}
}
### Overlay map
if (outline == TRUE) {map(database = dbase, boundary = bound,
projection = project, parameters = p.params, orientation = orientate,
fill = poly.fill, col = poly.cols, xlim = xlims, ylim = ylims, wrap =
wrapper, bg = backgnd, border = borders, add = TRUE)}
### Lat lines?
if (lat.lines == TRUE) {
### Get parameters
start.lat = as.numeric(lat.params[1])
end.lat = as.numeric(lat.params[2])
inc.lat = as.numeric(lat.params[3])
col.lat = lat.params[4]
lty.lat = as.numeric(lat.params[5])
res.lat = as.numeric(lat.params[6])
### Generate line segments
lats = seq(start.lat, end.lat, inc.lat)
x.seq = seq(-180, 180, res.lat)
x.len = length(x.seq)
### Plot
for (i in 1:length(lats)) {
lines(mapproject(list(x = x.seq, y = rep(lats[i],
x.len))), col = col.lat, lty = lty.lat)
}
}
### Lon lines?
if (lon.lines == TRUE) {
### Get parameters
start.lon = as.numeric(lon.params[1])
end.lon = as.numeric(lon.params[2])
inc.lon = as.numeric(lon.params[3])
col.lon = lon.params[4]
lty.lon = as.numeric(lon.params[5])
res.lon = as.numeric(lon.params[6])
### Generate line segments
lons = seq(start.lon, end.lon, inc.lon)
y.seq = seq(-90, 90, res.lon)
y.len = length(y.seq)
### Plot
for (i in 1:length(lons)) {
lines(mapproject(list(x = rep(lons[i], y.len), y =
y.seq)), col = col.lon, lty = lty.lon)
}
}
### Plot color scale
screen(3)
par(mar = c(0, 0, 0, 0) + 0.1)
rect(xleft = 0, xright = 1, ybottom = 0, ytop = 1, col = "white",
border = 0)
lines(x = c(0.45, 0.65), y = c(0.9, 0.9), col = 1)
lines(x = c(0.45, 0.65), y = c(0.1, 0.1), col = 1)
lines(x = c(0.4, 0.65), y = c(0.5, 0.5), col = 1)
yinc = 0.8 / length(map.cols)
yvals = 0.1 + seq(1:length(map.cols)) * yinc
for (i in 1:length(map.cols)) {
rect(xleft = 0.45, xright = 0.6, ytop = yvals[i], ybottom =
yvals[i] - yinc, col = map.cols[i], border = NA)
}
text(x = 0.66, y = 0.5, round(0.5 * (lims[2] + lims[1]), 1), pos =
4, cex = font.size)
text(x = 0.66, y = 0.9, round(lims[2], 1), pos = 4, cex =
font.size)
text(x = 0.66, y = 0.1, round(lims[1], 1), pos = 4, cex =
font.size)
par(srt = 90)
text(x = 0.2, y = 0.5, units, cex=font.size)
### Plot title
if (!is.na(m.title)) {
par(mar = c(0, 0, 0, 0) + 0.1, cex.main = font.size)
screen(1)
title(main = m.title)
}
### Remove split screens for subsequent plotting
close.screen(all = TRUE)
}
mapRectPlot = function (dat, lat, lon, lat.span, lon.span, dens, ang) {
idx = sign(lat * lon) == -1
xy1 = mapproject(lon - (0.5 * lon.span), lat + (0.5 * lat.span))
xy2 = mapproject(lon + (0.5 * lon.span), lat - (0.5 * lat.span))
xy3 = mapproject(lon - (0.5 * lon.span), lat - (0.5 * lat.span))
xy4 = mapproject(lon + (0.5 * lon.span), lat + (0.5 * lat.span))
xy1$x[idx] = xy3$x[idx]
xy1$y[idx] = xy3$y[idx]
xy2$x[idx] = xy4$x[idx]
xy2$y[idx] = xy4$y[idx]
rect(xleft = xy1$x, xright = xy2$x, ybottom = xy1$y, ytop = xy2$y,
col = dat, border = NA, density = dens, angle = ang)
}
mapPointPlot = function (dat, lat, lon, char, size, line.w) {
xy = mapproject(lon, lat)
points(x = xy$x, y = xy$y, col = dat, cex = size, pch = char, lwd =
line.w)
}
assignColors = function (X, lims, map.cols, norm) {
### Determine scale limits
steps = (1 + length(map.cols)) / 2
inc = lims[2] / steps
### Scale data by maximum absolute value?
dat = as.vector(X)
na.map = is.na(dat)
if (norm == TRUE ) {dat = dat / max(abs(dat))}
### Assign colors to data points
cols = trunc(dat / inc)
cols = cols + steps
hi = cols > length(map.cols)
cols[hi] = length(map.cols)
lo = cols < 1
cols[lo] = 1
### Return
temp = list(cols, na.map)
}
plotSat = function (max, s, rt, type) {
### Which labels?
if (type == 1) {lbls = noaa} else {lbls = noaa.date}
### Plot lines & legend
for (i in 4:length(noaa)) {
abline(v = lbls[[i]][1], lwd = 1, col = "gray")
text(x = lbls[[i]][1], y = max, labels = noaa.lbls[i], cex =
s, pos = 4, srt = rt)
}
}
plotEst = function (est.w, est.t, rng) {
### Initialize plotting area
plot(est.w ~ I(1:300), ylim = c(-3, 3), xlab = "Months from 1982",
ylab = "Normalized to 95% CI", main = paste("Estimate of Difference in
Means\nAVHRR vs. Ground Data (Common Points)\n", rng, " Month Range", sep
= ""), type = "l", cex.main = .9)
### Define +/- 95% CI range
rect(-100, -1, 400, 1, col = "gray95")
### Plot satellite labels
plotSat(-3, .55, 0, 1)
box()
### Add
abline(h
abline(h
abline(h
pretty lines
= 1, col = 4)
= -1, col = 4)
= 0, lwd = 2)
### Replot the points & add the legend
points(est.w ~ I(1:300), type = "l", col = 2, lwd=2)
points(est.t ~ I(1:300), type = "l", lwd = 2)
legend("topleft", legend = c("Paired Wilcoxon Test", "Paired TTest"), col = c(1, 2), fill = c(1, 2), cex = .8, bg = "white")
}
plotTrend = function (x, st = 1957, en = c(2006, 12), y.pos = NA, x.pos =
NA, main.t = "Untitled") {
### Get trend
trend = lm(window(x, st, en) ~ I(time(window(x, st, en))))
###
N =
I =
rmI
Initialize variables
length(window(x, st, en))
seq(1:N) / frequency(x)
= ssq(I - mean(I))
### Calculate sum of squared errors
SSE = ssq(trend[[2]]) / (N - 2)
### Calculate OLS standard errors
seOLS = sqrt(SSE / rmI) * 10
### Calculate two-tailed 95% CI
ci95 = seOLS * 1.964
### Get lag-1 ACF
resids = residuals(trend)
r1 = acf(resids, lag.max=1, plot=FALSE)$acf[[2]]
### Calculate CIs with Quenouille (Santer) adjustment for
autocorrelation
Q = (1 - r1) / (1 + r1)
ciQ = ci95 / sqrt(Q)
### Plot data
plot(window(x, st, en), main=main.t, ylab="Anomaly (Deg C)")
### Add trendline and x-axis
abline(h = 0, col = 2)
abline(trend, col = 4, lwd = 2)
### Annotate text
text(x = ifelse(is.na(x.pos), st, x.pos), y = ifelse(is.na(y.pos),
max(x, na.rm = TRUE) - 0.2 * max(x, na.rm = TRUE), y.pos), paste("Slope:
", round(trend$coef[2] * 10, 4), "+/-", round(ciQ, 4), "Deg
C/Decade\n(corrected for AR(1) serial correlation)\nLag-1 value:",
round(r1, 3)), cex = 0.8, col = 4, pos = 4)
### Return
r = list(trend$coef[2] * 10, ciQ)
}
plotBar = function (dat, stns, cols, stat, y.lb = NA, x.lb = NA, main.t =
NA, sub.t = NA, b.mar = 8, x.cex = 0.8, sup.line = NA, sup.txt = NA,
shadow = NA, shadow.col = "lightgray", p.ch = 20,
ver.line = FALSE) {
shad = 0
plt.dat = dat
cols = cbind(matrix(stns, ncol = 1), matrix(cols, ncol = 1))
### Check if a shadow present
if (!is.na(shadow[[1]])) {
shad = 1
shadow = sort.stats(shadow, cols)
}
### Sort by geographical location
plt.dat = sort.stats(plt.dat, cols)
### Initialize the plotting area
par(mar = c(b.mar, 4, 7, 2))
plot(c(1:length(stns)), xaxt = "n", col = "white", ylim = c(-0.3,
0.25 + as.numeric(max(plt.dat[stat + 1, ], 1.1))), xlab = x.lb, main =
main.t, sub = sub.t, ylab = y.lb, type = "h")
abline(h = 0)
abline(h = 0.1 + as.numeric(max(plt.dat[stat + 1, ], 1.1)))
if (ver.line == TRUE) {
abline(h = 0.5, col = "lightgray", lty = 2)
abline(h = 1, col = "lightgreen")
}
axis(side = 1, at = c(1:length(stns)), labels = colnames(plt.dat),
las=2, cex.axis = 0.9, tck = 0.01)
row.s = stat + 1
### Plot the shadow
if (shad == 1) {
points(shadow[stat + 1, ] ~ I(seq(1:ncol(shadow)) - 0.35),
type = "h", lend = 2, col = shadow.col, lwd = 2)
abline(h = 0)
points(shadow[stat + 1, ] ~ I(seq(1:ncol(shadow)) - 0.35),
type = "p", cex = 1, pch = p.ch, lwd = 1, col = shadow.col)
}
### Plot the data
points((plt.dat[stat + 1, ]), type = "h", lend = 2, col =
plt.dat[1, ], main = main.t, lwd = 2)
abline(h=0)
legend("top", legend = c("Peninsula", "West Antarctica", "Ross Ice
Shelf", "East Antarctica"), cex = 1, col = c(1, 2, 3, 4), bg = "white",
pch = 15, ncol = 4, bty = "n")
points((plt.dat[stat + 1, ]), type = "p", cex = 1, pch = p.ch, lwd
= 1)
if (stat != 1) {abline(h = 1, col = "gray")}
plt.dat
}
plotWts = function(dat, method = "S09", type = 1, eof = 1, pc, main.t =
"") {
### Find number of ground station columns
gnd.l = ncol(dat[[2]][[1]]) - 1
### Define pc length
pc.l = 1
### Check for allowed combinations
if (method == "S09") {
idx = 1
pc.l = ncol(dat[[1]][[1]]) - gnd.l
}
if (method == "E-W") {idx = 2}
if (type == 1) {pc = 1}
### Select proper matrix
plt.dat = dat[[idx]][[pc]]
### Find neigs
neigs = nrow(plt.dat) - 5
### Find the y-axis limits for plotting type=2
ylm = c(min(plt.dat[(neigs + 2):(neigs + 3), ], na.rm=TRUE),
max(plt.dat[(neigs + 2):(neigs + 3), ], na.rm = TRUE))
ylm = c(ylm[1], 0.1 * (ylm[2] - ylm[1]) + ylm[2])
### Set up plotting
par(mar = c(8, 4, 4, 2))
### Plot imputation spatial EOF
if (type == 1) {
plot(plt.dat[eof + 1, ], xaxt = "n", col = "white", xlab =
"", ylab = "Imputation Spatial EOF Coefficients (Unitless)", main =
paste(main.t, "\nImputation EOF #", eof, sep = ""))
points(plt.dat[eof + 1, ], col = c(plt.dat[1, 1:gnd.l],
rep("goldenrod", pc.l)), type = "h")
points(plt.dat[eof + 1, ])
abline(h = 0, lwd = 2)
}
### Plot comparative relative contributions for a particular PC
if (type == 2) {
plot(plt.dat[neigs + 2, ], xaxt = "n", col = "white", ylim =
ylm, xlab = "", ylab = "Normalized Spatial EOF Coefficients (Unitless)",
main = paste(main.t, "\nAVHRR PC #", pc, sep = ""))
legend("top", legend=c("RegEM Spatial EOF Coefficient",
"AVHRR Spatial EOF Coefficient"), cex=.8, pch=c(1, 8), bg="white",
ncol=2, bty="n")
abline(h=0, lwd = 2)
abline(h = ylm[2] - 0.05 * (ylm[2] - ylm[1]))
for (i in 1:gnd.l) {lines(x = c(i, i), y = c(plt.dat[neigs +
2, i], plt.dat[neigs + 3, i]), col = plt.dat[1, i])}
points(plt.dat[neigs + 2, 1:gnd.l], col = 1);
points(plt.dat[neigs + 3, 1:gnd.l], pch = 8, lwd = 1.5)
if (idx == 1) {
for (i in (gnd.l + 1):(ncol(plt.dat))) {
y.dat = ifelse((i-gnd.l) == pc, NA, plt.dat[neigs
+ 2, i])
points(x = i, y = y.dat, col = "goldenrod", lwd =
1.5, type = "h")
points(x = i, y = y.dat)
}
}
}
### Label axis
axis(side = 1, at = c(1:ncol(plt.dat)), labels = colnames(plt.dat),
las = 2, cex.axis = .8, tck = .01)
attr(plt.dat, "Info") = paste("Total Contribution and AVHRR
Eigenvector stats are for PC #", pc, sep = "")
plt.dat
}
#########################################################################
#############
######################
END PLOTTING FUNCTIONS
######################
#########################################################################
#############
###################################
### SEGMENT: GENERAL FUNCTIONS ###
###################################
###
### 1. ssq: Calculates the sum of the squares of the input data.
###
#########################################################################
################
###
### 2. mScale: Scale/center/add function for matrices.
###
###
X: Input matrix
###
###
d: Scaling vector / adding vector
###
###
scale: (Logical) Multiply columns/rows of X by d?
###
###
center: (Logical) Column/row center X?
###
###
add: (Logical) Add vector d to columns/rows of X?
###
###
type: Column or row operations
###
###
= 1: Column operations
###
= 2: Row operations
###
###
max.size: x1000, maximum number of elements to process at one
time
###
###
NOTE: The order of operations is always centering -> scaling ->
adding.
###
#########################################################################
################
###
### 3. makeAnom: Convert a matrix or vector to anomalies.
###
###
X: Input matrix or vector
###
###
st: Baseline start time / row number
###
###
en: Baseline end time / row number
###
###
idx: Only use values where idx = TRUE for anomaly calculations
###
###
max.size: See mScale
###
#########################################################################
################
###
### 4. makeSeason: Parse a vector or matrix for seasons.
###
###
X: Input matrix or vector
###
###
st: Truncate before this start time / row number
###
###
en: Truncate after this end time / row number
###
###
season: Which season to parse
###
###
= "w": Winter
###
= "sp": Spring
###
= "su": Summer
###
= "f": Fall
###
###
south: (Logical) Southern hemisphere seasons?
###
###
year.type: Data set is in calendar "cal" or meteorological "met"
years
###
#########################################################################
################
###
### 5. makeAnnual: Convert a monthly series into annual.
###
###
X: Input matrix or vector
###
###
cutoff: Least number of monthly data points that must be present
to calculate
###
an annual average
###
#########################################################################
################
###
### 6. parseAll: Parse data. Returns a list of:
###
###
[[1]]: Cloudmasked AVHRR data
###
[[2]]: Ground station data
###
[[3]]: Metadata for ground data
###
[[4]]: AVHRR grid cell coordinates
###
###
dat: A list generated from "load.data"
###
#########################################################################
################
###
### 7. parseSat: Parses an input AVHRR data set for ground station
locations.
###
###
dat: Input matrix
###
###
idx: Metadata to select appropriate grid cells corresponding to
ground data
###
###
form: Select the ground station set; i.e. "form.S09". The set
designates
###
stations to be REMOVED.
###
#########################################################################
################
###
### 8. parseRandom: Parses a READER set for random withholding.
###
###
dat: Input matrix.
###
###
form: Select the ground station set; i.e. "form.S09". The set
designates
###
stations to be REMOVED
###
###
withhold: Fraction of actual data points to withhold
###
#########################################################################
################
###
### 9. parseStn: Parses a ground data set. Returns a list of:
###
###
[[1]]: Parsed ground data
###
[[2]]: Early calibration set
###
[[3]]: Late calibration set
###
###
dat: Input matrix
###
###
form: Select the ground station set; i.e. "form.steig". The
set designates
###
stations to be REMOVED.
###
###
form.ver: Select the ground station set; i.e.
"form.no.ocean.aws". The set
###
designates those stations for which 1/2 of the data
will be
###
removed for split calibration/verification
experiments.
###
#########################################################################
################
###
### 10. sortStats: Sorts ground station statistics into geographical
order
###
for cleaner plotting. Geographical order is
column 10
###
in all.idx, where "1" is Peninsula, "2" is West
Antarctica,
###
"3" is Ross Ice Shelf, and "4" is East Antarctica.
###
###
dat: Input matrix
###
###
idx: Index matrix with the station names in Row 1 and location
numbers
###
in Row 2
###
#########################################################################
################
###
### 11. loadData: Loads the ground station data, index metafile, AVHRR
###
data, and AVHRR coordinates. Returns a list of:
###
###
[[1]]: Manned station data
###
[[2]]: AWS station data
###
[[3]]: Manned station metafile
###
[[4]]: AWS station metafile
###
[[5]]: Cloudmasked AVHRR data
###
[[6]]: Grid coordinated for the AVHRR data
###
#########################################################################
################
###
### 12. downData: Downloads the AVHRR data, the S09 Antarctic
reconstruction,
###
AVHRR longitudes, and AVHRR latitudes from Steig's
site. Also
###
downloads the MET READER data for all manned and AWS
stations.
###
Saves files in the working directory. NOTE:
"parse.data" must
###
be subsequently used for loading the files into R.
###
#########################################################################
################
###
### 13. getReader: Script to scrape BAS' READER site.
###
###
x: Vector of station names to read
###
###
type: Set to "aws" or "manned"
###
ssq = function (x) {sum(x ^ 2)}
mScale = function (X, d = 1, scale = TRUE, center = FALSE, add = FALSE,
type = 1, na.rm = TRUE, max.size = 500) {
### Convert max.size (command line argument is in thousands)
max.size = max.size * 1000
### Get matrix data
n = nrow(X)
p = ncol(X)
pts = n * p
ts.test = FALSE
if (is.ts(X)) {
ts.st = start(X)
ts.fr = frequency(X)
tsp(X) = NULL
ts.test = TRUE
}
### Check for proper length of d & correct ADD setting
if (length(d) == 1 & type == 1 & (scale == TRUE | add == TRUE)) {d
= rep(d, p)}
if (length(d) == 1 & type == 2 & (scale == TRUE | add == TRUE)) {d
= rep(d, n)}
if (length(d) != p & type == 1 & (scale == TRUE | add == TRUE))
{warning("Scaling vector length different than number of
columns.\nScaling vector recycled.", call. = FALSE, immediate. = TRUE)}
if (length(d) < n & type == 2 & (scale == TRUE | add == TRUE))
{warning("Scaling vector length less than number of rows.\nNAs
produced.", call. = FALSE, immediate. = TRUE)}
if (length(d) > n & type == 2 & (scale == TRUE | add == TRUE))
{warning("Scaling vector length greater than number of rows.\nScaling
vector truncated.", call. = FALSE, immediate. = TRUE)}
### If matrix too big, break into chunks
if (pts > max.size) {
blocks = ceiling(pts / max.size)
rows = ceiling(n / blocks)
} else {
blocks = 1
rows = n
}
### Do scaling / centering
for (i in 1:blocks) {
### Get start and end points
st = 1 + (i - 1) * rows
en = min(i * rows, n)
### Column centering
if (type == 1 & center == TRUE & en > st) {
cd = colMeans(X, na.rm = na.rm)
s = matrix(cd, ncol = p, nrow = en - st + 1, byrow =
TRUE)
X[st:en, ] = X[st:en, ] - s
rm(cd, s)
}
### Row centering
if (type == 2 & center == TRUE & en > st) {
cd = rowMeans(X, na.rm = na.rm)
s = matrix(cd[st:en], ncol = p, nrow = en - st + 1)
X[st:en, ] = X[st:en, ] - s
rm(cd, s)
}
### Column scaling / adding
if (type == 1 & (scale == TRUE | add == TRUE) & en > st) {
s = matrix(d, ncol = p, nrow = en - st + 1, byrow =
TRUE)
if (scale == TRUE) {X[st:en, ] = X[st:en, ] * s}
if (add == TRUE) {X[st:en, ] = X[st:en, ] + s}
rm(s)
}
### Row scaling / adding
if (type == 2 & (scale == TRUE | add
s = matrix(d[st:en], ncol = p,
if (scale == TRUE) {X[st:en, ]
if (add == TRUE) {X[st:en, ] =
rm(s)
}
== TRUE) & en > st) {
nrow = en - st + 1)
= X[st:en, ] * s}
X[st:en, ] + s}
}
### Cleanup & return
if (ts.test == TRUE) {X = ts(X, start = ts.st, freq = ts.fr)}
X
}
makeAnom = function (X, st = NA, en = NA, idx = NA, max.size = 500) {
### Initialize
vec = FALSE
ts.conv = FALSE
use.idx = FALSE
def.st = st
def.en = en
### If a time series, make sure to restore after conversion to
matrix
if (is.ts(X)) {
### If no start/stop dates, use whole series
if (is.na(st)) {st = start(X)}
if (is.na(en)) {en = end(X)}
### Store characteristics
ts.conv = TRUE
def.st = st
def.en = en
fr = frequency(X)
### Convert start/ends to row numbers
if (length(st) == 1) {st = (st[1] * fr) - (time(X)[1] * fr) +
1}
if (length(st) == 2) {st = (st[1] * fr + st[2]) - (time(X)[1]
* fr)}
if (length(en) == 1) {en = (en[1] * fr) - (time(X)[1] * fr) +
1}
if (length(en) == 2) {en = (en[1] * fr + en[2]) - (time(X)[1]
* fr)}
### Remove time series attribute
tsp(X) = NULL
}
### Convert vector to matrix
if (is.null(dim(X))) {
X = matrix(X, ncol = 1)
vec = TRUE
}
### Start/end row numbers?
if (is.na(st)) {st = def.st = 1}
if (is.na(en)) {en = def.en = nrow(X)}
### Index provided?
temp = X
if (!identical(idx, NA)) {
temp[!idx] = NA
use.idx = TRUE
}
### Define mean function
if (ncol(X) == 1) {meanFn = mean} else {meanFn = colMeans}
### Iterate through each month
for (i in 1:12) {
### Find month and subset for baseline
all.mo = seq(from = i, to = nrow(X), by = 12)
baseline = all.mo[all.mo >= st & all.mo <= en]
### Calculate and apply monthly means
mo.mean = meanFn(temp[baseline, ], na.rm = TRUE)
X[all.mo, ] = mScale(as.matrix(X[all.mo, ], ncol = 1), mo.mean, type = 1, add = TRUE, scale = FALSE, center = FALSE)
}
### Convert data back to input characteristics and return
if (vec == TRUE) {X = as.vector(X)}
if (ts.conv == TRUE) {X = ts(X, start = def.st, freq = fr)}
attr(X, "Baseline start: ") = def.st
attr(X, "Baseline end: ") = def.en
attr(X, "Index used: ") = use.idx
X
}
makeSeason = function(X, st = NA, en = NA, season = "summer", south =
TRUE, year.type = "cal") {
vec = FALSE; ts.conv = FALSE
### If a time series, make sure to restore after conversion to
matrix
if (is.ts(X)) {ts.conv = TRUE}
### Allows anomalies to be calculated for vectors
if (is.null(dim(X))) {
vec = TRUE
if (!is.ts(X)) {X = matrix(X, ncol = 1)}
}
### Set window
if (ts.conv == TRUE) {
if (identical(st, NA)) {st = start(X)}
if (identical(en, NA)) {en = end(X)}
tsp(X) = NULL
} else {
if (is.na(st)) {st = 1}
if (is.na(en)) {en = nrow(X)}
X = matrix(X[st:en, ], ncol = ncol(X))
}
### Define seasons
if (year.type == "met") {
win = c(10:12)
spr = c(1:3)
sum = c(4:6)
fal = c(7:9)
y.text = "Meteorological"
} else {
win = c(1:2, 12)
spr = c(3:5)
sum = c(6:8)
fal = c(9:11)
y.text = "Calendar"
}
if (south == FALSE) {
if (season == "winter" | season == "w" | season == 1) {seas =
win; s.text = "NH Winter"}
if (season == "spring" | season == "sp" | season == 2) {seas
= spr; s.text = "NH Spring"}
if (season == "summer" | season == "su" | season == 3) {seas
= sum; s.text = "NH Summer"}
if (season == "fall" | season == "autumn" | season == "f" |
season == "a" | season == 4) {seas = fal; s.text = "NH Fall"}
} else {
if (season == "winter" | season == "w" | season == 1) {seas =
sum; s.text = "SH Winter"}
if (season == "spring" | season == "sp" | season == 2) {seas
= fal; s.text = "SH Spring"}
if (season == "summer" | season == "su" | season == 3) {seas
= win; s.text = "SH Summer"}
if (season == "fall" | season == "autumn" | season == "f" |
season == "a" | season == 4) {seas = spr; s.text = "SH Fall"}
}
### Parse for season
idx = rep(TRUE, 12)
idx[seas] = FALSE
idx = rep(idx, ceiling(nrow(X) / 12))
length(idx) = nrow(X)
X[idx, ] = NA
### Restore characteristics of input data
if (vec == TRUE) {X = as.vector(X)}
if (ts.conv == TRUE) {X = ts(X, start = st, freq = 12)}
### Return season
attr(X, "Season: ") = s.text
attr(X, "Year Type: ") = y.text
X
}
makeAnnual = function (X, cutoff = 10) {
vec=FALSE; ts.conv=FALSE; mean.f = colMeans; sum.f = colSums
### If a time series, make sure to restore after conversion to
matrix
if (is.ts(X)) {st = start(X); ts.conv = TRUE}
### Allows anomalies to be calculated for vectors
if (is.null(dim(X))) {X = as.matrix(X, ncol = 1); vec = TRUE;
mean.f = mean; sum.f = sum}
### Check if input data was a 1-column matrix
if (ncol(X) == 1) {mean.f = mean; sum.f = sum}
### Run if data has complete years
len = nrow(X)
if (len / 12 == trunc(len / 12)) {
### Set up new storage
Y = matrix(ncol = ncol(X), nrow = len / 12)
###
Get each year's mean; replace with NAs if months <
cutoff
for (i in 1:(len / 12)) {
idx = seq(1:12) + (12 * (i - 1))
complete = sum.f(!is.na(X[idx, ])) < cutoff
Y[i, ] = mean.f(X[idx, ], na.rm = TRUE)
Y[i, complete] = NA
}
### Replace NaNs with NAs
idx = is.nan(Y)
Y[idx] = NA
### Restore original characteristics of input data
if (vec == TRUE) {Y = as.vector(Y)}
if (ts.conv == TRUE) {Y = ts(Y, start = st, freq = 1)}
} else {
### Oops! Whole years not supplied
Y=NA
message("Input matrix not in whole years.
performed.")
}
###
Y
No calculation
Return
}
parseAll=function(dat = l.data) {
### Initialize list
parsed = list()
### Combine and convert ground to
### Kelvin
ground = ts.union(dat[[1]], dat[[2]]) + 273.15
### Return list
parse = list(dat[[3]], ground, dat[[4]])
}
parseSat=function(dat, idx, form = 0) {
if (sum(form) != 0) {
idx = idx[-form, ]
dat = dat[, idx[, 4]]
}
dat
}
parseRandom = function(dat, form, withhold = 0.05) {
### Get form
dat = dat[, -form.list[[form]]]
### Random withholding for imputation experiments
keep = 1 - withhold
rand = matrix(rnorm(dat), ncol = ncol(dat), nrow = nrow(dat))
ver = dat
ver[rand > qnorm(keep)] = NA
### Return sets
sets = list(dat, ver)
}
parseStn=function(dat, form = 0, ver.form = 0) {
### Initialize variables
early = late = base = dat
all.stns = seq(1:nrow(all.idx))
idx = matrix(FALSE, ncol = nrow(all.idx))
idx[, ver.form] = TRUE
### Ensure stations are selected
if (sum(ver.form) != 0) {
### Iterate station-by-station in ver.form
for (i in 1:length(ver.form)) {
### Find length of data and total record length
temp = dat[, ver.form[i]]
l1 = l2 = trunc(0.5 *
(length(na.omit(as.vector(temp)))))
l3 = length(temp)
### Iterate by point. For late calibration, the
### earliest 1/2 of points are removed. Vice-versa
### for early calibration. Record removed points
### in "idx".
for (j in 1:l3) {
if (!is.na(temp[j]) & l1 > 0) {late[j,
ver.form[i]] = NA; l1 = l1 - 1}
if (!is.na(temp[l3 + 1 - j]) & l2 > 0) {early[l3 +
1 - j, ver.form[i]] = NA; l2 = l2 - 1}
}
}
}
### Parse data for desired stations and record indices
if (sum(form) != 0) {
base = dat[, -form]
early = early[, -form]
late = late[, -form]
idx = matrix(idx[, -form], nrow = 1)
idx = col(idx)[idx == TRUE]
stns = intersect(all.stns[-form], ver.form)
}
### Return list
ver = list(base, early, late, stns, idx)
}
sortStats = function(dat, idx) {
### Set up temporary matrix
temp = matrix(ncol =ncol(dat), nrow = nrow(dat) + 1)
temp[2:(nrow(dat) + 1), ] = dat
### Add color codes for geographical location as row 1
temp[1, ] = as.numeric(idx[, 2])
### Assign names
colnames(temp) = idx[, 1]
rownames(temp) = c("Geographical Group", rownames(dat))
### Remove any stations with NAs
rmv.nas = !is.na(temp[2, ])
temp = temp[, rmv.nas]
### Sort by geographical location
sort.vec = order(temp[1, ])
sorted = temp[, sort.vec]
sorted
}
loadData = function() {
### Load ground data
load("manned.RData")
load("aws.RData")
### Load S09 ground data
grid = scan("READER_groundstations.txt", n = -1)
gnd.S09 = ts(t(array(grid, dim = c(42, 600))), start = 1957, freq =
12)
rm(grid)
### Load AVHRR data
grid = scan("cloudmaskedavhrr.txt", n = -1)
avhrr = ts(t(array(grid, dim = c(5509, 300))), start = 1982,
freq=12)
rm(grid)
### Load longitude/latitude coordinates
lon = read.table("tir_lons.txt")
lat = read.table("tir_lats.txt")
sat.coord = cbind(lon,lat)
### Load S09 recon
grid = scan("ant_recon.txt", n = -1)
recon.S09 = ts(t(array(grid, dim = c(5509, 600))), start = 1957,
freq = 12)
rm(grid)
l.data = list(all.manned, all.aws, avhrr, sat.coord, recon.S09,
gnd.S09)
l.data
}
downData = function() {
###
Download and save S09 data
download.file("http://faculty.washington.edu/steig/nature09data/dat
a/cloudmaskedAVHRR.txt", "cloudmaskedavhrr.txt", mode = "wb")
download.file("http://faculty.washington.edu/steig/nature09data/dat
a/READER_groundstations.txt", "READER_groundstations.txt", mode = "wb")
download.file("http://faculty.washington.edu/steig/nature09data/dat
a/ant_recon.txt", "ant_recon.txt", mode = "wb")
download.file("http://faculty.washington.edu/steig/nature09data/dat
a/Tir_lons.txt", "Tir_lons.txt", mode = "wb")
download.file("http://faculty.washington.edu/steig/nature09data/dat
a/Tir_lats.txt", "Tir_lats.txt", mode = "wb")
### Get manned station data from MET READER
reader.names = all.idx[1:44, 1]
all.manned = getReader(x=reader.names, type = "surface")
all.manned = window(all.manned, start = 1957, end = c(2009, 12))
save(all.manned, file = "manned.RData")
### Get AWS data from MET READER
reader.names = all.idx[45:109, 1]
all.aws = getReader(x=reader.names, type = "aws")
all.aws = window(all.aws, start=1957, end = c(2009, 12))
save(all.aws, file = "aws.RData")
}
getReader = function(x = reader.names, type = "surface") {
### Set up dummy time series
reader.data = ts(matrix(NA, ncol = 1, nrow = 800), start = 1957,
freq = 12)
for (i in 1:length(x)) {
### Download READER data
data.url = paste("http://www.antarctica.ac.uk/met/READER/",
type, "/", x[i], ".All.temperature.txt", sep = "")
temp = read.table(data.url, sep = "", skip=1, na.strings = "", stringsAsFactors = FALSE)
### Make a logical array to throw away pre-1957
datemap = (temp[, 1] > 1956)
temp = temp[datemap, ]
### Get earliest date
startdate = temp[1,1]
### Stack to a single column and make a time series
temp = ts(matrix(t(temp[,-1]), ncol = 1), start = startdate,
freq = 12)
### Place data into reader.data
reader.data = ts.union(reader.data, temp)
}
### Remove dummy column and return
reader.data = reader.data[, -1]
colnames(reader.data) = x
reader.data
}
#########################################################################
############
######################
END GENERAL FUNCTIONS
######################
#########################################################################
############
########################################
### SEGMENT: VERIFICATION FUNCTIONS ###
########################################
###
### 1: getStats: Calculates calibration period rms error, average
explained
###
variance, and correlation coefficient. Calculates
verification
###
period rms error, RE, CE, and correlation
coefficient. If
###
no calibration period map is supplied, it assumes all
values
###
are verification, and does not calculate RE or
calibration
###
statistics.
###
###
orig: Vector/matrix of original data
###
###
est: Vector/matrix of estimated data
###
###
cal.idx: Logical map of calibration points
###
#########################################################################
################
###
### 2. corFn: Calculates correlation coefficient between ordered pairs
of variables
###
in a matrix - much faster than diag(cor(X, Y))
###
###
X: First set of variables
###
###
Y: Second set of variables
###
###
cols: (Logical) If TRUE, variables are arranged in columns;
otherwise,
###
variables are arranged in rows
###
#########################################################################
################
###
### 3. lmFn: Performs linear regression on each variable in a matrix much faster
###
than lm().
###
###
Calculates: Regression coefficient, intercept, residual SE,
residual
###
lag-1 ACF, DoF adjustment for serial
correlation,
###
corrected SE, 2-tailed significance values, and any
###
desired set of confidence intervals
###
###
Also outputs the fit line and the residuals
###
###
X: Independent variable (if a vector, will be recycled against
each variable
###
in Y)
###
###
Y: Dependent variables
###
###
cols: (Logical) If TRUE, variables are arranged in columns
###
###
full: (Logical) If FALSE, return only coefficient and
intercept
###
###
ci: Vector of confidence intervals - i.e., c(0.95, 0.99, 0.995)
- to
###
calculate
###
###
sig: Returns significance calculations
###
###
adj.DoF: (Logical) If TRUE, adjusts degrees-of-freedom for
serial
###
correlation of the residuals
#########################################################################
###################
###
### 4. getVerStns: Obtains the on-grid stations not used for a
particular form.
###
###
Y: Ground station set
###
###
form: Ground station clipping vector
###
###
form.list: List containing all ground station clipping vectors
###
###
ver: List containing all verification station clipping vectors
###
#########################################################################
################
###
### 5. getOffsets: Performs moving Wilcoxon and t-tests between the
averages of
###
the sat and ground anomalies. Calculates offsets
based on
###
definitions supplied in "sat.cal.matrix". Offsets
are based on
###
satellite coverage periods.
###
###
dat: Satellite data at ground locations. MUST BE A TIME SERIES
- RAW DATA.
###
###
base: Ground data. MUST BE A TIME SERIES - RAW DATA (not
anomalies).
###
###
rng: Range for the moving Wilcoxon and t-tests.
###
###
st: Start date.
###
###
en: End date.
###
#########################################################################
################
###
### 6. getWts: Gets the spatial structure used for imputation vs.
###
the AVHRR spatial structure for reconstructions done
###
using the infilling algorith. Calculates three
different
###
scenarios (the user must select the scenario that was
###
actually used when the user goes to plot; the script
###
has no way of knowing how the PCs were infilled).
###
###
X: Ground station matrix
###
###
Y: PC matrix
###
###
form: Ground station clipping vector used
###
###
neigs: Number of imputation EOFs used to generate Y
###
###
alpha: Scaling factor used to generate Y
###
###
DoFL: Lost degrees of freedom assumed when generating Y
###
#########################################################################
################
###
### 7. gCirc: Computes great-circle distance between provided
lat/longs
###
using the Vincenty formula for spheres.
###
###
###
###
###
###
###
###
###
###
###
lat.1:
Origin latitude
lon.1:
Origin longitude
lat.2:
Destination latitude (may be a vector)
lon.2:
Destination longitude (may be a vector)
r:
Radius of the sphere
getStats = function(orig, est, cal.idx = NA) {
### Only use overlapping points
na.idx = is.na(orig) | is.na(est)
orig[na.idx] = NA
est[na.idx] = NA
### Matrix or vectors?
if (is.null(dim(orig))) {orig = matrix(orig, ncol = 1)}
if (is.null(dim(est))) {est = matrix(est, ncol = 1)}
if (!identical(cal.idx, NA) & is.null(dim(cal.idx))) {cal.idx =
matrix(cal.idx, ncol = 1)}
if (!identical(dim(orig), dim(est)) | (!identical(dim(orig),
dim(cal.idx)) & !identical(cal.idx, NA))) {stop("Nonconformable
arguments.")}
if (is.null(colnames(orig))) {colnames(orig) = c(1:ncol(orig))}
### Dummy for cal stats (in case no calibration period)
cal.stats = NA
### # of columns
nc = ncol(orig)
### Calibration period?
if (!identical(cal.idx, NA)) {
### Ensure only overlapping points used
cal.idx[na.idx] = FALSE
### Center baselines on calibration period
orig.temp = orig
orig.temp[!cal.idx] = NA
orig = mScale(orig, -apply(orig, 2, mean, na.rm = TRUE), type
= 1, add = TRUE, scale = FALSE, center = FALSE)
est.temp = est
est.temp[!cal.idx] = NA
est = mScale(est, -apply(est, 2, mean, na.rm = TRUE), type =
1, add = TRUE, scale = FALSE, center = FALSE)
### Store calibration data
orig.cal = orig
est.cal = est
orig.cal[!cal.idx] = NA
est.cal[!cal.idx] = NA
### Set up storage
cal.stats = matrix(nrow = 3, ncol = ncol(orig) + 2)
colnames(cal.stats) = c(colnames(orig), "Station Mean",
"Pointwise Mean")
rownames(cal.stats) = c("RMS Error", "Explained Variance",
"Cor. Coef.")
attr(cal.stats, "Type: ") = "Calibration Period"
### Calculate rms error
resids = (orig.cal - est.cal) ^ 2
n.eff = colSums(!is.na(resids)) - 1
rms = sqrt(colSums(resids, na.rm = TRUE) / n.eff)
stn.rms = mean(rms, na.rm = TRUE)
point.rms = sqrt(sum(resids, na.rm = TRUE) /
(sum(!is.na(resids)) - 1))
cal.stats[1, ] = c(rms, stn.rms, point.rms)
### Calculate average explained variance
den = orig.cal ^ 2
R2c = 1 - colSums(resids, na.rm = TRUE) / colSums(den, na.rm
= TRUE)
stn.R2c = mean(R2c, na.rm = TRUE)
point.R2c = 1 - sum(resids, na.rm = TRUE) / sum(den, na.rm =
TRUE)
cal.stats[2, ] = c(R2c, stn.R2c, point.R2c)
### Calculate correlation coefficient
r = corFn(orig.cal, est.cal)
stn.r = mean(r, na.rm = TRUE)
o = matrix(as.vector(orig.cal), nrow = 1)
e = matrix(as.vector(est.cal), nrow = 1)
point.r = corFn(o, e, cols = FALSE)
cal.stats[3, ] = c(r, stn.r, point.r)
### Set up for verification period
orig[cal.idx] = NA
est[cal.idx] = NA
}
### Verification period
if (identical(cal.idx, NA)) {
### Center
orig = mScale(orig, -apply(orig, 2, mean, na.rm = TRUE), type
= 1, add = TRUE, scale = FALSE, center = FALSE)
est = mScale(est, -apply(est, 2, mean, na.rm = TRUE), type =
1, add = TRUE, scale = FALSE, center = FALSE)
}
###
Set up storage
ver.stats = matrix(nrow = 4, ncol = ncol(orig) + 2)
colnames(ver.stats) = c(colnames(orig), "Station Mean", "Pointwise
Mean")
rownames(ver.stats) = c("RMS Error", "RE", "CE", "Cor. Coef.")
attr(ver.stats, "Type: ") = "Verification Period"
### Calculate rms error
resids = matrix((orig - est) ^ 2, ncol = nc)
n.eff = colSums(!is.na(resids)) - 1
n.eff[n.eff < 1] = NA
rms = sqrt(colSums(resids, na.rm = TRUE) / n.eff)
stn.rms = mean(rms, na.rm = TRUE)
point.rms = sqrt(sum(resids, na.rm = TRUE) / (sum(!is.na(resids)) 1))
ver.stats[1, ] = c(rms, stn.rms, point.rms)
### Calculate RE
if (!identical(cal.idx, NA)) {
den = orig ^ 2
RE = 1 - colSums(resids, na.rm = TRUE) / colSums(den, na.rm =
TRUE)
stn.RE = mean(RE, na.rm = TRUE)
point.RE = 1 - sum(resids, na.rm = TRUE) / sum(den, na.rm =
TRUE)
ver.stats[2, ] = c(RE, stn.RE, point.RE)
} else {
ver.stats[2, ] = NA
}
### Calculate CE
den = mScale(orig, -apply(orig, 2, mean, na.rm = TRUE), type = 1,
add = TRUE, scale = FALSE, center = FALSE) ^ 2
den = colSums(den, na.rm = TRUE)
den[den == 0] = NA
CE = 1 - colSums(resids, na.rm = TRUE) / den
stn.CE = mean(CE, na.rm = TRUE)
point.CE = 1 - sum(resids, na.rm = TRUE) / sum(den, na.rm = TRUE)
ver.stats[3, ] = c(CE, stn.CE, point.CE)
### Calculate correlation coefficient
r = corFn(orig, est)
stn.r = mean(r, na.rm = TRUE)
o = matrix(as.vector(orig), nrow = 1)
e = matrix(as.vector(est), nrow = 1)
point.r = corFn(o, e, cols = FALSE)
ver.stats[4, ] = c(r, stn.r, point.r)
### Return
all.stats = list(cal.stats, ver.stats)
all.stats
}
corFn = function (X, Y, cols = TRUE) {
### Compare existing points only
map = is.na(X) | is.na(Y)
X[map] = NA
Y[map] = NA
### Variables in columns
if (cols == TRUE) {
X = t(X) - colMeans(X, na.rm = TRUE)
Y = t(Y) - colMeans(Y, na.rm = TRUE)
### Variables in rows
} else {
X = X - rowMeans(X, na.rm = TRUE)
Y = Y - rowMeans(Y, na.rm = TRUE)
}
### Calculate
r = rowSums(X * Y, na.rm = TRUE) / sqrt(rowSums(X ^ 2, na.rm =
TRUE) * rowSums(Y ^ 2, na.rm = TRUE))
###
r
Return
}
lmFn = function (X, Y, cols = TRUE, full = FALSE, ci = NA, sig = FALSE,
adj.DoF = TRUE) {
### Time series?
### Check for problems
ts.prob = 0
if (is.ts(X) & is.ts(Y)) {
ts.prob = -1
if (!identical(start(X), start(Y)) | !identical(end(X),
end(Y)) | !identical(frequency(X), frequency(Y))) {ts.prob = 1}
}
if (is.ts(X) & !is.ts(Y)) {ts.prob = 1}
if (ts.prob == 1) {stop("Time series problems. Start/end times or
frequencies do not match (or Y is not a time series).")}
### Save ts data and get rid of ts attribute
if (ts.prob == -1) {
st = start(Y)
en = end(Y)
fr = frequency(Y)
tsp(X) = NULL
tsp(Y) = NULL
}
### Check for dimensions
Y.rows = X.rows = 1
if (is.matrix(Y) & cols == TRUE) {
Y = t(Y)
Y.rows = nrow(Y)
}
if (is.matrix(Y) & cols == FALSE) {
Y.rows = nrow(Y)
}
if (is.matrix(X) & cols == TRUE) {
X = t(X)
X.rows = nrow(X)
}
if (is.matrix(X) & cols == FALSE) {
X.rows = nrow(X)
}
### Find max dimension
rows = max(Y.rows, X.rows)
### Vectors?
if (is.vector(Y)) {Y = matrix(rep(Y, rows), nrow = rows, byrow =
TRUE)}
if (is.vector(X)) {X = matrix(rep(X, rows), nrow = rows, byrow =
TRUE)}
### Compare existing points only
map = is.na(X) | is.na(Y)
X[map] = NA
Y[map] = NA
### Find means & center
Y.mean = rowMeans(Y, na.rm = TRUE)
X.mean = rowMeans(X, na.rm = TRUE)
Y = Y - Y.mean
X = X - X.mean
### Calculate coefficients
n = ncol(X)
b1 = ((1 / n) * rowSums(X * Y, na.rm = TRUE)) / ((1 / n) *
rowSums(X ^ 2, na.rm = TRUE))
b2 = Y.mean - b1 * X.mean
### All standard calculations?
if (full == TRUE | !identical(ci, NA) | sig == TRUE) {
### Find fitted values
fitted = b1 * X + Y.mean
### Find residuals
resids = Y - (X * b1)
### Find SE of the residuals
resid.SE = sqrt(rowSums(resids ^ 2, na.rm = TRUE) / (ncol(Y)
- 2))
### Find SE of the fit
sd.X = sqrt(rowSums(X ^ 2, na.rm = TRUE) / (ncol(Y) - 1))
SE = resid.SE / (sd.X * sqrt(ncol(Y) - 1))
### Adjust DoF for serial correlation?
if (adj.DoF == TRUE) {
### Find the lag-1 ACF
r = corFn(resids[, -1], resids[, -ncol(Y)], cols =
FALSE)
### Apply DoF correction
Q = sqrt((1 - r) / (1 + r))
SE = SE / Q
} else {
r = Q = NA
}
### Time series?
if (ts.prob == -1) {
fitted = ts(t(fitted), start = st, freq = fr)
resids = ts(t(resids), start = st, freq = fr)
}
} else {
fitted = resids = resid.SE = SE = r = Q = NA
}
### Confidence intervals?
if (!identical(ci, NA)) {
SE.mat = matrix(rep(SE, length(ci)), nrow = rows, byrow =
TRUE)
ci = SE.mat * -qnorm(0.5 * (1 - ci))
} else {
ci = NA
}
### Significance calculations (TWO-TAILED)
if (sig == TRUE) {
p = 2 * (1 - pnorm(abs(b1) / SE))
} else {
p = NA
}
### Return
lm = list()
lm$coef = b1
lm$int = b2
lm$resid.SE = resid.SE
lm$resid.acf = r
lm$dof.adj = Q
lm$SE = SE
lm$resids = resids
lm$fitted = fitted
lm$ci = ci
lm$p = p
lm
}
getVerStns = function(Y, form = 11, forms = form.list, ver = form.ver) {
### Obtain verification experiment station complement
### (all stations in vector "idx" withheld)
idx = rep(FALSE, nrow(all.idx))
idx[ver[[form]]] = TRUE
idx = matrix(idx[-forms[[form]]], nrow=1)
idx = col(idx)[idx == TRUE]
Y.ver = ts(Y[, -idx], start = start(Y), freq = frequency(Y))
### Find stations that were entirely withheld from the ground
station imputation
y = x = rep(TRUE, nrow(all.idx))
y[form.grid] = FALSE
x[forms[[form]]] = FALSE
z = as.matrix(x!=y)
form.withheld = row(z)[z != TRUE]
### Return list
ver.list = list(Y.ver, idx, form.withheld)
}
getOffsets = function(dat, base, rng, st=1982, en=c(2006,12)) {
### Calculate anomalies using only common points
base = window(base, st, en)
dat = window(dat, st, en)
idx = !is.na(base) & !is.na(dat)
base.anom = makeAnom(base, idx = idx)
dat.anom = makeAnom(dat, idx = idx)
### Get averages
dat.m = rowMeans(dat.anom, na.rm=TRUE)
base.m = rowMeans(base.anom, na.rm=TRUE)
### Initialize variables
w.error = vector()
t.error = vector()
w.sat = list()
t.sat = list()
w.fac = vector()
t.fac = vector()
for (i in 1:length(dat.m)) {
### Get the start and stop points for the test
st = i - trunc(0.5 * rng)
en = i + trunc(0.5 * rng)
if (st > 0 & en < length(dat.m)) {
###
If start and stop do not exceed series length,
calculate
est.means = wilcox.test(dat.m[st:en], base.m[st:en],
conf.int = TRUE, paired = TRUE)
w.error[i] = as.vector(est.means$est) /
(as.vector(est.means$est) - as.vector(est.means$conf.int[1]))
est.means = t.test(dat.m[st:en], base.m[st:en], paired =
TRUE)
t.error[i] = as.vector(est.means$est) /
(as.vector(est.means$est) - as.vector(est.means$conf.int[1]))
} else {
### If not, fill in NAs
w.error[i] = NA; t.error[i] = NA
}
}
offsets = rep(0, length(dat.m))
for (i in 1:nrow(sat.cal.matrix)) {
type = sat.cal.matrix[i, 1]
st = sat.cal.matrix[i, 2]
en = sat.cal.matrix[i, 3]
st2 = sat.cal.matrix[i, 4]
en2 = sat.cal.matrix[i, 5]
###
Check to ensure valid regresssion model entered.
Default
### to linear if not.
if (type != "lm" & type != "rlm" & type != "loess" & type !=
"t-test") {
print(paste("Invalid type entered (", type, ").
t-test.", sep = ""), quote = FALSE)
type = "t-test"
}
Using
### Linear regression
if(type == "lm") {
fit = lm(base.m[st:en] - dat.m[st:en] ~ I(st:en))
temp.pred = fit$fitted.values + dat.m[st:en]
}
### Robust linear regression
if(type == "rlm") {
fit = rlm(base.m[st:en] - dat.m[st:en] ~ I(st:en))
temp.pred = fit$fitted.values + dat.m[st:en]
}
###
full
Loess (only allows polynomial degree to be 1 and uses
### span because the fit is extrapolated)
if(type == "loess") {
fit = loess(base.m[st:en] ~ dat.m[st:en], degree = 1,
span = 1, model = TRUE, surface = "direct")
temp.pred = predict(fit, dat.m[st2:en2])
}
### Simple offset
if(type == "t-test") {
fit = t.test(base.m[st:en], dat.m[st:en], paired =
TRUE)$est
temp.pred = as.numeric(fit) + dat.m[st:en]
}
offsets[st:en] = dat.m[st:en ]- temp.pred
}
### Return the list
err = list(w.error, t.error, offsets)
}
getWts = function (X, Y, form, neigs = 3, alpha = 10, DoFL = 1) {
### Set up index
idx = all.idx[-form.list[[form]], ]
### Extract AVHRR eigenvector coefficients
svd.avhrr = getPcs(all.avhrr, 1982, c(2006, 12), form, FALSE, DoFL)
coefs = svd.avhrr[[9]]
wts = svd.avhrr[[9]] * alpha
PCs = svd.avhrr[[1]]
rm(svd.avhrr)
### Prepare results matrix
wts.matrix = matrix(nrow = neigs + 5, ncol = ncol(X) + ncol(Y))
wts.matrix[1, ] = c(as.numeric(idx[, 10]), rep(9999, ncol(Y)))
colnames(wts.matrix) = c(as.character(as.vector(idx[, 1])),
paste("AVHRR PC ", seq(1:ncol(Y)), sep = ""))
rownames(wts.matrix) = c("Geographic Location", paste("Imputation
EOF ", seq(1:neigs), sep = ""), "Imputation Coefs", "AVHRR Coefs",
"Imputation Signs", "AVHRR Signs")
wts.matrix.individual = wts.matrix[, ((ncol(X)+2):(ncol(X)+ncol(Y)))]
### Prepare results list
wts.list = list()
### S09 Method:
wts.list[[1]] = list()
for (i in 1:ncol(Y)) {
wts.list[[1]][[i]] = wts.matrix
attr(wts.list[[1]][[i]], "Method") = "S09"
attr(wts.list[[1]][[i]], "PC") = i
}
attr(wts.list[[1]], "Method") = "S09"
### Eigenvector-weighted method:
wts.list[[2]] = list()
for(i in 1:ncol(Y)) {
wts.list[[2]][[i]] = wts.matrix.individual
colnames(wts.list[[2]][[i]]) = c(as.character(as.vector(idx[,
1])), paste("AVHRR PC ", i, sep = ""))
attr(wts.list[[2]][[i]], "Method") = "Individual, Weighted"
attr(wts.list[[2]][[i]], "PC") = i
}
attr(wts.list[[2]], "Method") = "Eigenvector-Weighted"
### Prepare comparison PC matrix (for sign detection)
pc.matrix = matrix(nrow = nrow(Y), ncol = ncol(Y))
colnames(pc.matrix) = paste("PC ", seq(1:ncol(Y)), sep = "")
### Scale PCs to unit variance
DY = sqrt((nrow(Y) - 1) / colSums(Y ^ 2))
Yo = mScale(Y, DY, type = 1, scale = TRUE, add = FALSE, center =
FALSE)
### Scale ground data
DX = sqrt((nrow(X) - 1) / colSums(X ^ 2))
Xo = mScale(X, DX, type = 1, scale = TRUE, add = FALSE, center =
FALSE)
### S09: Simultaneous
for (i in 1:ncol(Y)) {
### Collate ground data and PCs
dat = cbind(Xo, Yo)
dat = mScale(dat, -colMeans(dat, na.rm = TRUE), type = 1, add
= TRUE, scale = FALSE, center = FALSE)
### Take SVD
svd.dat = svd(t(dat), nu = neigs, nv = neigs)
### Store resulting spatial structure
for (j in 1:neigs) {
for (k in 1:ncol(Y)) {
wts.list[[1]][[k]][j + 1, ] = svd.dat$u[, j] *
svd.dat$d[j]
}
}
### Reconstitute
new.X = t(svd.dat$u %*% diag(svd.dat$d[1:neigs]) %*%
t(svd.dat$v))
### Store for sign comparison
pc.matrix = new.X[, (ncol(X) + 1):(ncol(X) + ncol(Y))]
### Calculate signs
s = vector()
for (j in 1:ncol(Y)) {
s[j] = as.numeric(sign(cor.test(Y[1:300, j],
pc.matrix[1:300, j])$estimate))
}
### Save signs
wts.list[[1]][[i]][neigs + 4, ] = c(rep(NA, ncol(X)), s)
### Get AVHRR signs & store coefficients
wts.list[[1]][[i]][neigs + 3, ] = c(coefs[, i], rep(NA,
ncol(Y)))
stdev = sd(wts.list[[1]][[i]][neigs + 3, 1:ncol(X)])
wts.list[[1]][[i]][neigs + 3, ] = wts.list[[1]][[i]][neigs +
3, ] / (stdev)
s = as.numeric(sign(cor.test(Y[301:600, i], PCs[1:300,
i])$estimate))
wts.list[[1]][[i]][neigs + 5, ncol(X) + i] = s
}
### Eigenvector-weighted
for (i in 1:ncol(Y)) {
### Collate
dat = cbind(wts[, i] * sign(wts[, i]) * Xo, Yo[, i])
### Take SVD
svd.dat = svd(t(dat), nu = neigs, nv = neigs)
### Store resulting spatial structure
for (j in 1:neigs) {wts.list[[2]][[i]][j + 1, ] = svd.dat$u[,
j] * svd.dat$d[j]}
### Reconstitute
new.X = t(svd.dat$u %*% diag(svd.dat$d[1:neigs]) %*%
t(svd.dat$v))
### Calculate sign
s = as.numeric(sign(cor.test(Y[1:300, i], new.X[1:300,
ncol(X)+1])$estimate))
### Save sign
wts.list[[2]][[i]][neigs + 4, ncol(X) + 1] = s
### Get AVHRR signs & store coefficients
wts.list[[2]][[i]][neigs + 3, ] = c(coefs[, i], NA)
stdev = sd(wts.list[[2]][[i]][neigs + 3, 1:ncol(X)])
wts.list[[2]][[i]][neigs + 3, ] = wts.list[[2]][[i]][neigs +
3, ] / (stdev)
s = as.numeric(sign(cor.test(Y[301:600, i], PCs[1:300,
i])$estimate))
wts.list[[2]][[i]][neigs + 5, ncol(X) + 1] = s
}
### Calculate imputation coefficients
for (i in 1:ncol(Y)) {
###
Get relative contribution of each imputation EOF for
selected PC
pc.wts1 = wts.list[[1]][[i]][2:(neigs + 1), ncol(X) + i]
pc.wts2 = wts.list[[2]][[i]][2:(neigs + 1), ncol(X) + 1]
### Find relative contribution of each series
for (j in 1:ncol(X)) {
wts.list[[1]][[i]][neigs + 2, j] =
sum(wts.list[[1]][[i]][2:(neigs + 1), j] * pc.wts1)
wts.list[[2]][[i]][neigs + 2, j] =
sum(wts.list[[2]][[i]][2:(neigs + 1), j] * pc.wts2)
}
### Find PC contribution for simultaneous
for (j in 1:(ncol(wts.list[[1]][[i]]) - ncol(X))) {
wts.list[[1]][[i]][neigs + 2, ncol(X) + j] =
sum(wts.list[[1]][[i]][2:(neigs + 1), ncol(X) + j] * pc.wts1)
}
### Normalize such that the variance is 1
wts.list[[1]][[i]][neigs + 2, ] = wts.list[[1]][[i]][neigs +
4, ncol(X) + i] * wts.list[[1]][[i]][neigs + 5, ncol(X) + i] *
wts.list[[1]][[i]][neigs + 2, ] / sd(wts.list[[1]][[i]][neigs + 2,
1:ncol(X)], na.rm = TRUE)
wts.list[[2]][[i]][neigs + 2, ] = wts.list[[2]][[i]][neigs +
4, ncol(X) + 1] * wts.list[[2]][[i]][neigs + 5, ncol(X) + 1] *
wts.list[[2]][[i]][neigs + 2, ] / sd(wts.list[[2]][[i]][neigs + 2,
1:ncol(X)], na.rm = TRUE)
}
### Sort for geographic location
for (i in 1:ncol(Y)) {
sort.vec = order(wts.list[[1]][[i]][1, ])
wts.list[[1]][[i]] = wts.list[[1]][[i]][, sort.vec]
sort.vec = order(wts.list[[2]][[i]][1, ])
wts.list[[2]][[i]] = wts.list[[2]][[i]][, sort.vec]
}
### Return
wts.list
}
gCirc = function(lat.1, lon.1, lat.2, lon.2, r) {
### Convert to
lat.1 = lat.1 *
lat.2 = lat.2 *
lon.1 = lon.1 *
lon.2 = lon.2 *
radians
pi / 180
pi / 180
pi / 180
pi / 180
### Get delta longitude
D.lon = lon.2 - lon.1
### First numerator value
D.rho.num.1 = (cos(lat.2) * sin(D.lon)) ^ 2
### Second numerator value
D.rho.num.2 = (cos(lat.1) * sin(lat.2) - sin(lat.1) * cos(lat.2) *
cos(D.lon)) ^ 2
### Calculate numerator
D.rho.num = sqrt(D.rho.num.1 + D.rho.num.2)
### Calculate denominator
D.rho.den = sin(lat.1) * sin(lat.2) + cos(lat.1) * cos(lat.2) *
cos(D.lon)
### Calculate angular distance
D.rho = atan2(D.rho.num, D.rho.den)
### Calculate linear distance
d = r * D.rho
d
}
#########################################################################
#################
######################
END VERIFICATION FUNCTIONS
######################
#########################################################################
#################
##############################
### SEGMENT: EM ALGORITHM ###
##############################
###
### The EM algorithm performs estimation of missing values in sparse
matrices with the
### following methodological options:
###
###
Truncated total least squares (TTLS)
###
Ridge regression (IRidge and MRidge)
###
Truncated SVD (TSVD)
###
### TTLS and TSVD may be performed by directly proceeding to the fixed
truncation parameter
### (default behavior) or may progressively approach the fixed
truncation parameter by
### progressively increasing the parameter from 1, iterating to
convergence, and using the
### converged covariance/correlation matrix as input for the next
truncation value (recommended).
###
### Ridge regression may be performed by using a common regularization
parameter for
### each time step that is selected by a GCV function (MRidge), using an
individual regularization
### parameter for each variable and each time step (IRidge), or by using
a user-supplied
### vector of regularization parameters (either method).
###
### Main arguments:
###
###
X: Input data matrix, with variables arranged in columns. If a
time-series, the
###
solution will be returned as a time series.
###
###
C: Optional covariance/correlation matrix. If using the EM
algorithm in a correlation
###
setting, C will be scaled to have the diagonal equal unity.
###
###
type: The method of regression to be used:
###
###
"ttls"
- TTLS (default)
###
"mridge"
- Multiple ridge regression
###
"iridge"
- Individual ridge regression
###
"tsvd"
- TSVD
###
###
n.eof: Maximum number of retained EOFs. Must be set to a numeric
value for TTLS
###
and TSVD. If set to "NA" for ridge regression and RTTLS,
all EOFs with non-zero
###
eigenvalues will be used.
###
###
maxiter: Maximum number of iterations (default value of 100).
###
###
tol: Stagnation tolerance, which is calculated as the RMS change
in missing values
###
from the previous iteration, scaled by the relative
distance from the initial
###
mean of zero (default value of 0.001).
###
###
sub.tol: Convergence tolerance to use for progressively raising
retained EOFs by 1
###
until "n.eof" is reached. This tolerance will be used
instead of "tol" for
###
estimation prior to reaching "n.eof" for the truncation
parameter (default
###
value of 0.005).
###
###
covr: (Logical) If FALSE (default), perform regression using
standardized variables.
###
If true, the algorithm does not automatically standardize,
which allows the
###
user to supply a set of weights for standardization.
###
###
DoFL: Degrees of freedom lost (default value of 1).
###
###
inflate: Parameter to inflate the residual scaled covariance
matrices to account for
###
underestimation of residual variance (default value of
1).
###
###
min.relvar: Minimum relative variance of the residuals for setting
the lower bound
###
for determination of the regularization parameter
(ridge regression,
###
default value of 0.05). If set below the sum of the
trace of the negligible
###
eigenvalues, machine precision is used to set the lower
bound. For RTTLS,
###
machine precision is always used for the lower bound.
###
###
min.var: Minimum fraction of variance required to be retained by
regularization
###
(ridge regression only; default to zero). If set to >
0, min.var is used to
###
set the upper bound of the regularization parameter.
Not used for RTTLS.
###
###
penalty: The penalty power used for the GCV function (ridge
regression only; default
###
value of 2.0).
###
###
h.default: Fixed regularization parameters for ridge regression
(default value of NA).
###
This must be a vector of length 1 for "mridge" and a
vector of length equal
###
to the number of variables for "iridge".
###
###
wts: A vector of weighting factors to be used (default value of
NA). Length must be
###
equal to the number of variables.
###
###
B.series: A vector containing the variable numbers for which
retention of the regression
###
coefficients is desired (default value of NA). If any
element is set to "0", the
###
raw B matrices containing all regression coefficients are
returned for each unique
###
combination of available and missing values.
###
###
progressive: A vector describing whether the regression will be
done progressively
###
(ignored for ridge regression) and which solutions for
the incremented values
###
of the truncation parameter are to be stored in memory.
###
###
[1]
- (Logical) Use progressive method (default of
FALSE).
###
[2]
- Starting EOF (default of 1).
###
[3]
- (Logical) Store intermediate converged
solutions (default of FALSE).
###
[4]
- Starting EOF for storage (default of 1).
###
###
plt: A vector describing whether to produce the standard plots for
each iteration.
###
###
[1]
- (Logical) Plot? (default of TRUE)
###
[2]
- Variable column number to plot (default of 1)
###
###
gap: Defines the number of iterations between printing iteration
results and for saving
###
iteration results to disk (default value of 10).
###
###
mem: (Logical) If TRUE, print memory usage information (default
of FALSE).
###
###
save.file: (Logical) If TRUE, algorithm will save according to
the save.method settings
###
(default of FALSE).
###
###
NOTE: To prevent duplicate saves, setting this
parameter does
###
NOT result in saving the overall
list of outputs. Saving
###
the overall output must be done externally
to the algorithm.
###
###
save.method: Determines the type of saving to disk to be
performed:
###
###
[1]
- (Logical) Save each gap iteration (default of
FALSE).
###
[2]
- (Logical) Save upon convergence for each value
of the
###
truncation parameter (progressive TTLS
and TSVD only;
###
default of FALSE).
###
###
dat.path: Path for saving. Format is: "X:/path". Do not include
"/" after the path.
###
If the specified path does not exist, it is created. To
save in the working
###
directory, set dat.path to NA (default value of NA).
###
###
file.name: Filename for saving. Do not include a leading "/".
###
###
max.size: Number of elements in 1,000s to be simultaneously
calculated during matrix
###
scaling/adding operations to prevent allocation and memory
errors. Lower
###
values of max.size result in the algorithm being able to
handle bigger
###
matrices, but results in slower performance. Larger
values speed up scaling
###
operations, but can result in memory allocation errors.
Default value of
###
1,000, which allows matrices up to 1 million elements to
be calculated
###
in one step.
###
emFn = function (X, C = NA, type = "ttls", n.eof = NA, maxiter = 100, tol
= 0.001, sub.tol = NA, covr = FALSE, DoFL = 1, inflate = 1,
min.relvar = 0.05, min.var = 0, penalty = 2, h.default = NA,
wts = NA, progressive = c(FALSE, 1, FALSE, 1),
plt = c(TRUE, 1), gap = 10, mem = FALSE, save.file = FALSE,
save.method = c(FALSE, FALSE), B.series = NA, dat.path = NA,
file.name = NA, max.size = 1000) {
###############################
### Section 1: Initialization
###############################
##############################
### DEFINE INTERNAL FUNCTIONS
##############################
### Calculates sum-of-squares
ssq = function (x) {sum(x ^ 2)}
### Finds identical patterns
combo = function (x, m, s) {identical(x, m)}
### Collation for unique patterns of missing/available data
collate1 = function (x) {c(1:p)[missing[x[1], ]]}
collate2 = function (x) {c(1:p)[!missing[x[1], ]]}
### Function for calculating elements of the scaled residual
covariance matrix
residFn = function (x, h, Fo, d, j) {
### Calculate
diag.S = (h[j] ^ 2 * h[x] ^ 2) / ((d + h[j] ^ 2) * (d +
h[x] ^ 2))
S = sum(Fo[, j] * diag.S * Fo[ ,x])
S
}
### Function for performing generalized cross validation
(ridge regression)
gcvRidge = function (x, a) {
### Define function to be optimized for h
gcvFn = function(h, x) {
filt = h ^ 2 / (x$d + h ^ 2)
G = (sum(filt ^ 2 * x$F2) + x$trS0 ) / (x$n.eff length(x$d) + sum(filt)) ^ x$penalty
G
}
### Parse parameters
d = a$d
dv = a$d.var
len = length(a$d)
mv = a$min.var
mr = a$min.relvar
### Trace of residuals not dependent on h
a$trS0 = sum(a$d.S0[x])
###
Estimated minimum trace of missing scaled
covariance matrix
min.trS = mr * sum(a$d.C[x])
### Define regularization parameter discrimination
### Default value
h.tol = 0.2 / sqrt(a$n.eff)
### Set upper limit on regularization parameter
if (mv > min(dv)) {
q = max(which(dv < mv))
max.h = (d[q] + (d[q + 1] - d[q]) * (mv - dv[q]) /
(dv[q + 1] - dv[q]))
} else {
max.h = sqrt(max(d)) / h.tol
}
### Set lower limit on regularization parameter
### If residual trace not dependent on h exceeds the
lower bound of the trace
### of the scaled covariance matrix, use machine
precision. Otherwise, find
### eigenvalue closest to min.trS using residual norms
of truncated SVD solutions
if (a$trS0 > min.trS) {
min.h = sqrt(.Machine$double.eps)
} else {
r = vector()
r[len] = a$trS0
for (j in 1:(len - 1)) {r[len - j] = r[len - j +
1] + a$F2[len - j + 1]}
min.r = which.min(abs(r - min.trS))
min.h = sqrt(max(d[min.r], min(d) / a$n.eff))
}
### Check for sensible limits & optimize
if (min.h > max.h) {
warning("Upper bound less than lower bound on
regularization parameter; using lower bound.")
opt.h = min.h
} else {
opt.h = optimize(gcvFn, interval = c(min.h,
max.h), tol = h.tol, x = a)$minimum
}
### Return
opt.h
}
### Memory smart & faster matrix scaling & adding
mScale = function (X, d = 1, scale = TRUE, center = FALSE,
add = FALSE, type = 1, na.rm = TRUE, max.size = 500) {
###
Convert max.size (command line argument is in
thousands)
max.size = max.size * 1000
### Get matrix data
n = nrow(X)
p = ncol(X)
pts = n * p
ts.test = FALSE
if (is.ts(X)) {
ts.st = start(X)
ts.fr = frequency(X)
tsp(X) = NULL
ts.test = TRUE
}
### Check for proper length of d & correct ADD setting
if (length(d) == 1 & type == 1 & (scale == TRUE | add ==
TRUE)) {d = rep(d, p)}
if (length(d) == 1 & type == 2 & (scale == TRUE | add ==
TRUE)) {d = rep(d, n)}
if (length(d) != p & type == 1 & (scale == TRUE | add ==
TRUE)) {warning("Scaling vector length different than number of
columns.\nScaling vector recycled.", call. = FALSE, immediate. = TRUE)}
if (length(d) < n & type == 2 & (scale == TRUE | add ==
TRUE)) {warning("Scaling vector length less than number of rows.\nNAs
produced.", call. = FALSE, immediate. = TRUE)}
if (length(d) > n & type == 2 & (scale == TRUE | add ==
TRUE)) {warning("Scaling vector length greater than number of
rows.\nScaling vector truncated.", call. = FALSE, immediate. = TRUE)}
### If matrix too big, break into chunks
if (pts > max.size) {
blocks = ceiling(pts / max.size)
rows = ceiling(n / blocks)
} else {
blocks = 1
rows = n
}
### Do scaling / centering
for (i in 1:blocks) {
### Get start and end points
st = 1 + (i - 1) * rows
en = min(i * rows, n)
### Column centering
if (type == 1 & center == TRUE & en > st) {
cd = colMeans(X, na.rm = na.rm)
s = matrix(cd, ncol = p, nrow = en - st + 1,
byrow = TRUE)
X[st:en, ] = X[st:en, ] - s
rm(cd, s)
}
### Row centering
if (type == 2 & center == TRUE & en > st) {
cd = rowMeans(X, na.rm = na.rm)
s = matrix(cd[st:en], ncol = p, nrow = en st + 1)
X[st:en, ] = X[st:en, ] - s
rm(cd, s)
}
### Column scaling / adding
if (type == 1 & (scale == TRUE | add == TRUE) & en
> st) {
s = matrix(d, ncol = p, nrow = en - st + 1,
byrow = TRUE)
if (scale == TRUE) {X[st:en, ] = X[st:en, ]
* s}
if (add == TRUE) {X[st:en, ] = X[st:en, ] +
s}
rm(s)
}
### Row scaling / adding
if (type == 2 & (scale == TRUE | add == TRUE) & en
> st) {
s = matrix(d[st:en], ncol = p, nrow = en st + 1)
if (scale == TRUE) {X[st:en, ] = X[st:en, ]
* s}
if (add == TRUE) {X[st:en, ] = X[st:en, ] +
s}
rm(s)
}
}
###
Cleanup & return
if (ts.test == TRUE) {X = ts(X, start = ts.st, freq =
ts.fr)}
X
}
#####################################
### TS TO MATRIX (FASTER PROCESSING)
#####################################
if (is.ts(X)) {
### Record characteristics
x.ts = TRUE
st = start(X)
en = end(X)
fr = frequency(X)
r.names = rownames(X)
c.names = colnames(X)
### Convert to non-ts (allows faster computations)
X = matrix(X, nrow = nrow(X), ncol = ncol(X))
} else {
x.ts = FALSE
st = en = fr = NA
r.names = rownames(X)
c.names = colnames(X)
}
###########################################
### FIND UNIQUE AVAILABLE/MISSING PATTERNS
###########################################
n = nrow(X)
p = ncol(X)
n.eff = max(1, n - DoFL)
missing = is.na(X)
miss = list()
avail = list()
### Perform if not TSVD
if (type != "tsvd") {
master = seq(1:n)
common = matrix(nrow = n, ncol = n)
### Step through all rows
for (i in 1:n) {
temp = master[apply(missing, 1, combo, m = missing[i,
])]
common[i, ] = c(temp, rep(0, n - length(temp)))
}
### Remove empty columns and duplicated rows
empty = master[colSums(common) == 0]
dupe = master[master != common[, 1]]
common = common[-dupe, -empty]
### Store missing/available variables
miss = apply(common, 1, collate1)
avail = apply(common, 1, collate2)
} else {temp = common = empty = dupe = master = miss = avail = NA}
####################
### CHECK ARGUMENTS
####################
if (maxiter < 1) {
warning("Supplied iteration limit too low (allowed range >=
1).\nSetting to 1.\n")
maxiter = 1
}
if (tol < 1e-9) {
warning("Supplied convergence tolerance too low (allowed
range > 1e-9).\nSetting to 1e-9.\n")
tol = 1e-9
}
if (inflate < 0) {
warning("Supplied inflation parameter less than limit
(0).\nSetting to 0.\n")
inflate = 0
}
if (is.na(n.eof) & type != "mridge" & type != "iridge") {
warning("Retained EOFs greater than or equal to effective
full rank.\nResetting retained EOFs to one less than effective full
rank.\n")
n.eof = min(n - DoFL - 1, p - 1)
}
if (n.eof > min(n - DoFL - 1, p - 1) & type != "mridge" & type !=
"iridge") {
warning("Retained EOFs greater than or equal to effective
full rank.\nResetting retained EOFs to one less than effective full
rank.\n")
n.eof = min(n - DoFL - 1, p - 1)
}
if (!identical(h.default, NA) & length(h.default) != p & type ==
"iridge") {
warning("Inappropriate length of regularization parameter
vector.\nDiscarding supplied vector and calculating theoretical
parameters using on-board GCV function.\n")
h.default = NA
}
if (!identical(h.default, NA) & length(h.default) != 1 & type ==
"mridge") {
warning("Inappropriate length of regularization parameter
vector.\nDiscarding supplied vector and calculating theoretical
parameters using on-board GCV function.\n")
h.default = NA
}
if (is.na(sub.tol)) {sub.tol = tol}
if (plt[1] == TRUE) {
if (is.na(plt[2]) | plt[2] < 1 | plt[2] > p | length(plt[2])
> 1) {plt[2] == 1}
}
B.series = sort(unique(B.series))
if (is.numeric(B.series) == FALSE) {B.series = NA}
if (!identical(B.series, NA) & (min(B.series) < 0 | max(B.series) >
p | length(B.series) > p)) {
warning("Invalid variables for storing B matrices.\nStorage
of B matrices will not be performed.\n")
B.series = NA
}
#################
### Prep weights
#################
### No weights?
if (identical(wts, NA)) {
wts = rep(1, p)
wts.flag = FALSE
} else {wts.flag = TRUE}
### Check for correct length of weights
if (length(wts) != p) {stop("Length of weighting vector does not
equal number of variables")}
#################
### PREP OPTIONS
#################
if (type != "tsvd" & type != "ttls" & type != "iridge" & type !=
"mridge") {stop(paste("Unknown regression type ", type, ".", sep = ""))}
if (type == "ttls") {
regFn = ttlsRegFn
reg.type = 3
conv.message = "eof"
ridge.trunc = n.eof
}
if (type == "iridge") {
regFn = ridgeRegFn
reg.type = 1
conv.message = "ridge"
ridge.trunc = n.eof
if (is.na(ridge.trunc)) {ridge.trunc = max(1, min(n - DoFL,
p))}
n.eof = 1
}
if (type == "mridge") {
regFn = ridgeRegFn
reg.type = 2
conv.message = "ridge"
ridge.trunc = n.eof
if (is.na(ridge.trunc)) {ridge.trunc = max(1, min(n - DoFL,
p))}
n.eof = 1
}
if (type == "tsvd") {
regFn = tsvdRegFn
reg.type = 3
B.series = NA
conv.message = "eof"
ridge.trunc = n.eof
}
###
Used if progressively iterating (applicable to TTLS and TSVD
only)
start.eof = end.eof = store.eof = n.eof
if (progressive[1] == TRUE & (reg.type == 3 | reg.type == 0)) {
start.eof = progressive[2]
end.eof = n.eof
if (start.eof > n.eof) {
warning("Invalid starting EOF for progressive (greater
than ending EOF).\nProgressive not used.\n")
start.eof = end.eof = store.eof = n.eof
}
if (progressive[3] == TRUE) {store.eof = progressive[4]}
if (store.eof > end.eof) {
warning("Invalid starting EOF for retaining infilled
estimates.\nOnly storing final estimate.\n")
store.eof = n.eof
}
}
### Save options
if (save.file == TRUE) {
### Parse names and ensure directory exists
len = nchar(dat.path)
if (!is.na(dat.path) & substr(dat.path, len, len) == "/")
{substr(dat.path, len, len) = ""}
if (!is.na(file.name) & substr(file.name, 1, 1) == "/")
{substr(file.name, 1, 1) = ""}
if (is.na(dat.path)) {dat.path = getwd()}
if (!file.exists(dat.path)) {
file.create(dat.path)
message(paste("Directory ", dat.path, " does not exist.
Creating.", sep = ""))
}
if (is.na(file.name)) {
a = 1
while (file.exists(paste("EM-", type, "-", Sys.Date(),
"-", a, ".RData", sep = ""))) {a = a + 1}
file.name = paste("EM-", type, "-", Sys.Date(), "-", a,
".RData", sep = "")
}
}
###########
### PREP X
###########
### Check for sufficient # of points
n.orig = colSums(!missing)
n.low = n.orig < DoFL + 1
if (sum(n.low) > 0) {
stop.cols = paste(" ", c(1:p)[n.low], sep = "")
stop(paste("Insufficient number of points (less than ", DoFL
+ 1, ") in variables", stop.cols, sep = ""))
}
### Center
center = colMeans(X, na.rm = TRUE)
center[is.na(center)] = 0
X = mScale(X, -center, add = TRUE, scale = FALSE, max.size =
max.size)
### Estimate variance
scales = sqrt(colSums(X ^ 2, na.rm = TRUE) / (n.orig - DoFL))
### Initial infill
X[missing] = 0
Xhat = matrix(0, nrow = n, ncol = p)
###########
### Prep C
###########
### Supplied C?
if (identical(C, NA)) {C = (t(X) %*% X) / (n - DoFL)}
### Check for proper # of rows
if (nrow(C) != p) {
warning("Incorrect dimensions for covariance
matrix.\nGenerating new matrix.\n")
C = (t(X) %*% X) / (n - DoFL)
}
#############################
### Establish argument lists
#############################
### Option/info list
.em.arg = list()
.em.arg$n = nrow(X)
.em.arg$p = ncol(X)
.em.arg$n.eff = max(1, n - DoFL)
.em.arg$type = type
.em.arg$reg.type = reg.type
.em.arg$n.eof = n.eof
.em.arg$trunc = ridge.trunc
.em.arg$store.eof = store.eof
.em.arg$maxiter = maxiter
.em.arg$tol = tol
.em.arg$sub.tol = sub.tol
.em.arg$covr = covr
.em.arg$DoFL = DoFL
.em.arg$inflate = inflate
.em.arg$min.relvar = min.relvar
.em.arg$min.var = min.var
.em.arg$penalty = penalty
.em.arg$h.default = h.default
.em.arg$wts = wts
.em.arg$wts.flag = wts.flag
.em.arg$B.series = B.series
.em.arg$gap = gap
.em.arg$mem = mem
.em.arg$common = common
.em.arg$missing = missing
.em.arg$miss = miss
.em.arg$avail = avail
.em.arg$nmis = sum(missing)
.em.arg$max.size = max.size
.em.arg$plot = plt[1]
.em.arg$plot.series = plt[2]
.em.arg$start.eof = start.eof
.em.arg$end.eof = end.eof
### Establish data list
.em.results = list()
.em.dat = list()
.em.dat$X = X
.em.dat$Xhat = Xhat
.em.dat$C = C
.em.dat$C.res = matrix(0, nrow = p, ncol = p)
.em.dat$B = list()
.em.dat$dXhat = 1e6
.em.dat$nXhat = 1e6
.em.dat$rXhat = 1e6
.em.dat$p.eff.ave = vector()
.em.dat$v.ret.ave = vector()
.em.dat$d = vector()
.em.dat$M = center
### Establish characteristics list
.em.char = list()
.em.char$st = st
.em.char$en = en
.em.char$fr = fr
.em.char$ts = x.ts
.em.char$r.names = r.names
.em.char$c.names = c.names
.em.char$scales = scales
###
Establish function list
.em.fn = list()
.em.fn$ssq = ssq
.em.fn$residFn = residFn
.em.fn$gcvRidge = gcvRidge
.em.fn$mScale = mScale
### Clean up
rm(X, C, type, reg.type, n.eof, ridge.trunc, maxiter, tol, sub.tol,
covr, DoFL, inflate, min.relvar, min.var, penalty, h.default, wts,
wts.flag, B.series, gap, mem, common, miss, avail, empty, dupe, temp,
master, scales, center, x.ts, st, en, fr)
temp = gc()
###########################
### Section 2: Regression
###########################
for (i in start.eof:end.eof) {
### Print estimated memory usage?
if (.em.arg$mem == TRUE) {
size.X = round(object.size(.em.dat$X) / 1048576, 5)
size.C = round(object.size(.em.dat$C) / 1048576, 5)
if (is.na(.em.arg$B.series[1])) {B.factor = 0} else {
if (.em.arg$B.series[1] == 0) {B.factor =
.em.arg$p} else {B.factor = length(.em.arg$B.series)}
}
message("Memory usage:")
message("-------------")
message(paste("Size of X: ", size.X, " Mb", sep = ""))
if (.em.arg$type != "tsvd") {
message(paste("Size of C: ", size.C, " Mb", sep =
""))
message(paste("Estimated required memory for
storage of X, Xhat, C, C.res, and B: ", (2 + (end.eof - start.eof) *
progressive[3]) * (((3 + B.factor)) * size.X + 2 * size.C), " Mb", sep =
""))
} else {message(paste("Estimated required memory for
storage of X, Xhat, and B: ", (2 + (end.eof - start.eof) *
progressive[3]) * ((3 + B.factor)) * size.X, " Mb", sep = ""))}
message("-------------")
message("")
}
### Initialize
.em.results[[i]] = list()
iter = 1
stag = 1e6
### Check to see if using secondary tolerance
if (i == end.eof) {tol = .em.arg$tol} else {tol =
.em.arg$sub.tol}
###
Iterate
while(iter <= .em.arg$maxiter & stag > tol) {
### Regression
.em.results[[i]] = regFn(.em.arg, .em.dat, .em.char,
.em.fn, i, iter)
### Find changes in estimated values
rel = round(.em.results[[i]]$nXhat[iter], 5)
stag = round(.em.results[[i]]$rXhat[iter], 5)
rms = round(.em.results[[i]]$dXhat[iter], 5)
### Update
.em.dat = .em.results[[i]]
### Print iteration/memory usage or save iteration?
if (iter / .em.arg$gap == trunc(iter / .em.arg$gap)) {
### Iteration info
message(paste(iter, ": Relative: ", rel, "; RMS:
", rms, "; Stagnation: ", stag, ".", sep = ""))
### Memory info
if (.em.arg$mem == TRUE) {
size.X = round(object.size(.em.dat$X) /
1048576, 5)
size.C = round(object.size(.em.dat$C) /
1048576, 5)
if (is.na(.em.arg$B.series[1])) {B.factor =
0} else {
if (.em.arg$B.series[1] == 0) {B.factor
= .em.arg$p} else {B.factor = length(.em.arg$B.series)}
}
message(paste("Total memory usage: ",
sum(gc()[, 2]), " Mb", sep = ""))
message(paste("Estimated current usage to
retain B matrices: ", (i - start.eof + 1) * B.factor * size.X, " Mb", sep
= ""))
}
### Save iteration
if (save.file == TRUE & save.method[1] == TRUE) {
### Recenter to original centering
temp = .em.results[[i]]$X
temp[.em.arg$missing] = NA
center.vec = colMeans(temp, na.rm = TRUE)
.em.results[[i]]$X =
mScale(.em.results[[i]]$X, -center.vec + .em.dat$M, type = 1, scale =
FALSE, add = TRUE, center = FALSE, max.size = .em.arg$max.size)
.em.results[[i]]$Xhat =
mScale(.em.results[[i]]$Xhat, -center.vec + .em.dat$M, type = 1, scale =
FALSE, add = TRUE, center = FALSE, max.size = .em.arg$max.size)
###
Restore row/column names
rownames(.em.results[[i]]$X) =
rownames(.em.results[[i]]$Xhat) = .em.char$r.names
colnames(.em.results[[i]]$X) =
colnames(.em.results[[i]]$Xhat) = .em.char$c.names
### Restore ts
if (.em.char$ts == TRUE) {
.em.results[[i]]$X =
ts(.em.results[[i]]$X, start = .em.char$st, freq = .em.char$fr)
.em.results[[i]]$Xhat =
ts(.em.results[[i]]$Xhat, start = .em.char$st, freq = .em.char$fr)
}
if (.em.char$ts == TRUE &
!is.na(.em.arg$B.series[1]) & .em.arg$B.series[1] != 0) {
for (j in .em.arg$B.series) {
.em.results[[i]]$B[[j]] =
ts(.em.results[[i]]$B[[j]], start = .em.char$st, freq = .em.char$fr)
rownames(.em.results[[i]]$B[[j]]
) = .em.char$r.names
colnames(.em.results[[i]]$B[[j]]
) = .em.char$c.names
}
}
### Save
save.name = paste(dat.path, "/eof-", i, "iter-", iter, "-", file.name, sep = "")
em = list()
em[[iter]] = .em.results[[i]]
em[[iter]]$arg = .em.arg
save(em, file = save.name)
rm(em)
}
}
### Increment
iter = iter + 1
}
### Inform about convergence
if (stag > .em.arg$tol & conv.message == "eof") {
message(paste(.em.arg$maxiter, " iterations without
convergence at ", i, " retained EOFs.\nRelative change: ", rel, "\nRMS
change: ", rms, "\nStagnation value: ", stag, "\n", sep = ""))
.em.results[[i]]$conv = FALSE
}
if (stag <= .em.arg$tol & conv.message == "eof") {
message(paste(iter - 1, " iterations to convergence at
", i, " retained EOFs.\nRelative change: ", rel, "\nRMS change: ", rms,
"\nStagnation value: ", stag, "\n", sep = ""))
.em.results[[i]]$conv = TRUE
}
if (stag > .em.arg$tol & conv.message == "ridge") {
message(paste(.em.arg$maxiter, " iterations without
convergence with ", round(.em.dat$p.eff.ave[iter - 1], 3), " average
number of parameters.\nRelative change: ", rel, "\nRMS change: ", rms,
"\nStagnation value: ", stag, sep = ""))
message(paste("Truncation point: ", .em.arg$trunc, "
(maximum available parameters).\n", sep = ""))
.em.results[[i]]$conv = FALSE
}
if (stag <= .em.arg$tol & conv.message == "ridge") {
message(paste(iter - 1, " iterations to convergence with
", round(.em.dat$p.eff.ave[iter - 1], 3), " average number of
parameters.\nRelative change: ", rel, "\nRMS change: ", rms,
"\nStagnation value: ", stag, sep = ""))
message(paste("Truncation point: ", .em.arg$trunc, "
(maximum available parameters).\n", sep = ""))
.em.results[[i]]$conv = TRUE
}
### Cleanup
if (store.eof <= i | (save.file == TRUE & save.method[2] ==
TRUE)) {
### Recenter to original centering
temp = .em.results[[i]]$X
temp[.em.arg$missing] = NA
center.vec = colMeans(temp, na.rm = TRUE)
.em.results[[i]]$X = mScale(.em.results[[i]]$X, center.vec + .em.dat$M, type = 1, scale = FALSE, add = TRUE, center =
FALSE, max.size = .em.arg$max.size)
.em.results[[i]]$Xhat = mScale(.em.results[[i]]$Xhat, center.vec + .em.dat$M, type = 1, scale = FALSE, add = TRUE, center =
FALSE, max.size = .em.arg$max.size)
### Restore row/column names
rownames(.em.results[[i]]$X) =
rownames(.em.results[[i]]$Xhat) = .em.char$r.names
colnames(.em.results[[i]]$X) =
colnames(.em.results[[i]]$Xhat) = .em.char$c.names
### Restore ts
if (.em.char$ts == TRUE) {
.em.results[[i]]$X = ts(.em.results[[i]]$X, start
= .em.char$st, freq = .em.char$fr)
.em.results[[i]]$Xhat = ts(.em.results[[i]]$Xhat,
start = .em.char$st, freq = .em.char$fr)
}
if (.em.char$ts == TRUE & !is.na(.em.arg$B.series[1]) &
.em.arg$B.series[1] != 0) {
for (j in .em.arg$B.series) {
.em.results[[i]]$B[[j]] =
ts(.em.results[[i]]$B[[j]], start = .em.char$st, freq = .em.char$fr)
rownames(.em.results[[i]]$B[[j]] ) =
.em.char$r.names
colnames(.em.results[[i]]$B[[j]] ) =
.em.char$c.names
}
}
### Save
if (save.file == TRUE & save.method[2] == TRUE) {
save.name = paste(dat.path, "/eof-", i, "-",
file.name, sep = "")
converged = list()
converged[[i]] = .em.results[[i]]
converged[[i]]$arg = .em.arg
save(converged, file = save.name)
rm(converged)
}
}
### Setup for next EOF (progressive)
.em.dat$dXhat = .em.results[[i]]$dXhat[iter - 1]
.em.dat$nXhat = .em.results[[i]]$nXhat[iter - 1]
.em.dat$rXhat = .em.results[[i]]$rXhat[iter - 1]
.em.dat$dXhat.act = .em.results[[i]]$dXhat.act[iter - 1]
.em.dat$rXhat.act = .em.results[[i]]$rXhat.act[iter - 1]
.em.dat$rho = list()
if (store.eof > i) {.em.results[[i]] = list()}
}
### Return
.em.results[[i]]$arg = .em.arg
.em.results
}
### Master routine for TTLS
ttlsRegFn = function (arg, dat, char, fn, n.eof, iter) {
### Initialize B storage
if (!is.na(arg$B.series[1]) & arg$B.series[1] != 0) {
for (j in arg$B.series) {
dat$B[[j]] = matrix(0, nrow = arg$n, ncol = arg$p)
attr(dat$B[[j]], "name") = char$r.names[j]
attr(dat$B[[j]], "variable number") = j
}
}
### New Xhat, C.res, S0, and S
Xhat = matrix(0, nrow = arg$n, ncol = arg$p)
S = S0 = C.res = matrix(0, nrow = arg$p, ncol = arg$p)
### Find scales and weighting factors
if (arg$covr == TRUE) {
D = arg$wts / rep(1, arg$p)
} else {
D = sqrt(diag(dat$C))
D = arg$wts / D
}
### Scale
if (arg$covr == FALSE | arg$wts.flag == TRUE) {
dat$X = fn$mScale(dat$X, D, type = 1, add = FALSE, center =
FALSE, max.size = arg$max.size)
dat$Xhat = fn$mScale(dat$Xhat, D, type = 1, add = FALSE,
center = FALSE, max.size = arg$max.size)
dat$C = fn$mScale(D * dat$C, D, type = 1, add = FALSE, center
= FALSE, max.size = arg$max.size)
}
### Perform eigendecomposition
C.eig = eigen(dat$C, symmetric = TRUE)
### Test for positive eigenvalues; change sign & re-sort if not
positive
sign.vec = sign(C.eig$values)
if (sum(sign.vec) == length(C.eig$values)) {
### All positive, so save
dat$d = C.eig$values
dat$V = C.eig$vectors
} else {
### First, eigenvalues
d = abs(C.eig$values)
sort.vec = rev(order(d))
d = d[sort.vec]
dat$d = d
### Now, eigenvectors
V = fn$mScale(C.eig$vectors, sign.vec, max.size =
arg$max.size)
V = V[, sort.vec]
dat$V = V
### Cleanup
rm(d, V)
}
### Cleanup
rm(C.eig)
### Setup
dat$rho[[iter]] = vector()
### Set the effective rank, adjusted for zero eigenvalues and
truncation
rank.eff = max(1, min(sum(dat$d > (dat$d[1] *
sqrt(.Machine$double.eps))), arg$n.eff, n.eof))
### Iterate through unique combinations of available/missing
values
for (i in 1:nrow(arg$common)) {
### Get the row indices
idx = arg$common[i, ]
idx = idx[idx != 0]
### Parse for rank.eff
V = matrix(dat$V[, 1:rank.eff], ncol = rank.eff)
V11 = matrix(dat$V[arg$avail[[i]], 1:rank.eff], ncol =
rank.eff)
### Calculate estimated covariance of the truncated modes
if (rank.eff < arg$p) {
### Get missing residual estimate
V22 = matrix(dat$V[arg$miss[[i]], (rank.eff + 1):arg$p],
ncol = (arg$p - rank.eff))
### Estimate residual scaled covariance matrix
resids = V22 %*% diag(dat$d[(rank.eff + 1):arg$p], ncol
= (arg$p - rank.eff)) %*% t(V22)
S0[arg$miss[[i]], arg$miss[[i]]] = S0[arg$miss[[i]],
arg$miss[[i]]] + length(idx) * arg$inflate * resids
### If full rank, all residuals dependent on regression
} else {S0 = 0}
### Find the matrix of coefficients
B = V11 %*% solve(t(V11) %*% V11) %*% t(V)
### Find the solution
Xhat[idx, ] = dat$X[idx, arg$avail[[i]]] %*% B
### Store B
if (!is.na(arg$B.series[1]) & arg$B.series[1] == 0) {
attr(B, "miss") = arg$miss[[i]]
attr(B, "rows") = idx
dat$B[[i]] = B
}
if (!is.na(arg$B.series[1]) & arg$B.series[1] != 0) {
for (j in arg$B.series) {
dat$B[[j]][idx, arg$avail[[i]]] = matrix(B[, j],
ncol = length(B[, j]), nrow = length(idx), byrow = TRUE)
}
}
}
### Update C.res
C.res = S0 + S
dat$C.res = C.res
### Unscale
if (arg$covr == FALSE | arg$wts.flag == TRUE) {
dat$X = fn$mScale(dat$X, 1 / D, type = 1, add = FALSE, center
= FALSE, max.size = arg$max.size)
Xhat = fn$mScale(Xhat, 1 / D, type = 1, add = FALSE, center =
FALSE, max.size = arg$max.size)
dat$C.res = fn$mScale((1 / D) * dat$C.res, 1 / D, type = 1,
add = FALSE, center = FALSE, max.size = arg$max.size)
}
### Compute solution rms change in Xhat
dat$dXhat[iter] = sqrt(fn$ssq(Xhat[arg$missing] dat$X[arg$missing]) / arg$nmis)
### Compute relative change
dat$nXhat[iter] = sqrt(fn$ssq(dat$X[arg$missing]) / arg$nmis)
### Compute stagnation value
dat$rXhat[iter] = dat$dXhat[iter] / dat$nXhat[iter]
### Update X & Xhat
dat$X[arg$missing] = Xhat[arg$missing]
dat$Xhat = Xhat
### Recenter
center.vec = colMeans(dat$X)
dat$X = fn$mScale(dat$X, -center.vec, type = 1, add = TRUE, scale =
FALSE, center = FALSE, max.size = arg$max.size)
dat$Xhat = fn$mScale(dat$Xhat, -center.vec, type = 1, add = TRUE,
scale = FALSE, center = FALSE, max.size = arg$max.size)
colnames(dat$X) = colnames(dat$Xhat) = char$c.names
rownames(dat$X) = rownames(dat$Xhat) = char$r.names
### Generate C from X plus C.res
dat$C = ((t(dat$X) %*% dat$X) + dat$C.res) / arg$n.eff
### Plotting?
if (arg$plot == TRUE) {
### Margins
par(mar = c(0.5, 4, 2, .15) + 0.1)
### Split screen
close.screen(all = TRUE)
split.screen(c(3, 1))
### Plot selected series
screen(1)
temp.X = dat$X[, arg$plot.series]
temp.X[arg$missing[, arg$plot.series]] = NA
temp.Xhat = dat$Xhat[, arg$plot.series]
plot(temp.Xhat, ylim = c(min(temp.X, temp.Xhat, na.rm =
TRUE), max(temp.X, temp.Xhat, na.rm = TRUE)), type = "l", col = 2, xaxt =
"n", ylab = "Deg C", main = paste("Series ", arg$plot.series, " Actual
(Black) & Estimated (Red)"))
points(temp.X, type = "l", col = 1)
### Plot effective parameters (regularized TLS) or series
variance (TTLS)
screen(2)
temp.D = diag(dat$C)
plot(char$scales ^ 2, type = "h", ylim = c(0, max(temp.D,
char$scales ^ 2)), xaxt = "n", ylab = "Variance", main = "Variance
(Black: Original; Red: Cov. Matrix Diagonal)")
rect(arg$plot.series - 0.3, 0, arg$plot.series + 0.6,
max(temp.D, char$scales ^ 2), border = NA, col = 7)
points(char$scales ^ 2, type = "h")
points(temp.D ~ I(c(1:arg$p) + 0.3), type = "h", col = 2)
### Plot eigenvalues
screen(3)
plot(dat$d, type = "h", xaxt = "n", ylab = "Eigenvalue", main
= paste("Eigenvalue Spectrum (Truncation Parameter: ", rank.eff, ")", sep
= ""))
### Close & reset to defaults
close.screen(all = TRUE)
par(mar = c(5, 4, 4, 2) + 0.1)
}
###
dat
Cleanup & return
}
### Master routine for ridge regression
ridgeRegFn = function (arg, dat, char, fn, n.eof, iter) {
### Initialize B storage
if (!is.na(arg$B.series[1]) & arg$B.series[1] != 0) {
for (j in arg$B.series) {
dat$B[[j]] = matrix(0, nrow = arg$n, ncol = arg$p)
attr(dat$B[[j]], "name") = char$r.names[j]
attr(dat$B[[j]], "variable number") = j
}
}
### New C.res and Xhat
C.res = matrix(0, nrow = arg$p, ncol = arg$p)
Xhat = matrix(0, nrow = arg$n, ncol = arg$p)
### Find scales and weighting factors
if (arg$covr == TRUE) {
D = arg$wts / rep(1, arg$p)
} else {
D = sqrt(diag(dat$C))
D = arg$wts / D
}
### Scale
if (arg$covr == FALSE | arg$wts.flag == TRUE) {
dat$X = fn$mScale(dat$X, D, type = 1, add = FALSE, center =
FALSE, max.size = arg$max.size)
dat$Xhat = fn$mScale(dat$Xhat, D, type = 1, add = FALSE,
center = FALSE, max.size = arg$max.size)
dat$C = fn$mScale(D * dat$C, D, type = 1, add = FALSE, center
= FALSE, max.size = arg$max.size)
}
### Setup storage variables
if (arg$reg.type == 1) {
dat$h = matrix(0, nrow = arg$n, ncol = arg$p)
dat$p.eff = matrix(0, nrow = arg$n, ncol = arg$p)
dat$v.ret = matrix(0, nrow = arg$n, ncol = arg$p)
} else {
dat$h = rep(0, arg$n)
dat$p.eff = rep(0, arg$n)
dat$v.ret = rep(0, arg$n)
}
### Iterate through unique combinations of available/missing
values
for (i in 1:nrow(arg$common)) {
### Setup B
B = matrix(diag(rep(1, arg$p))[arg$avail[[i]], ], nrow =
length(arg$avail[[i]]))
### Find number of missing values
n.miss = length(arg$miss[[i]])
### Perform eigendecomposition
C.eig = eigen(dat$C[arg$avail[[i]], arg$avail[[i]]],
symmetric = TRUE)
### Test for positive eigenvalues; change sign & re-sort if
not positive
sign.vec = sign(C.eig$values)
if (sum(sign.vec) == length(C.eig$values)) {
### All positive, so save
dat$d = C.eig$values
dat$V = C.eig$vectors
} else {
### First, eigenvalues
d = abs(C.eig$values)
sort.vec = rev(order(d))
d = d[sort.vec]
dat$d = d
### Now, eigenvectors
V = fn$mScale(C.eig$vectors, sign.vec, max.size =
arg$max.size)
V = V[, sort.vec]
dat$V = V
### Cleanup
rm(d, V)
}
### Cleanup
rm(C.eig)
### Find number of non-zero eigenvalues, adjusted for
effective DoF and truncation
rank.eff = min(sum(dat$d > (dat$d[1] * 1e-12)), arg$n.eff,
arg$trunc, length(arg$avail[[i]]))
dat$d = dat$d[1:rank.eff]
dat$V = matrix(dat$V[, 1:rank.eff], ncol = rank.eff)
dat$d.var = cumsum(dat$d) / sum(dat$d)
### Get the row indices
idx = arg$common[i, ]
idx = idx[idx != 0]
### Compute B, scaled residual covariance matrix, and
estimated values
if (n.miss > 0) {
### Setup list for GCV
gcv.arg = list()
gcv.arg$d = dat$d
gcv.arg$d.var = dat$d.var
gcv.arg$n.eff = arg$n.eff
gcv.arg$min.var = arg$min.var
gcv.arg$min.relvar = arg$min.relvar
gcv.arg$penalty = arg$penalty
gcv.arg$d.C = diag(diag(dat$C)[arg$miss[[i]]], nrow =
n.miss)
### Fourier coefficients
Fo = diag(1 / sqrt(dat$d), nrow = rank.eff) %*% t(dat$V)
%*% matrix(dat$C[arg$avail[[i]], arg$miss[[i]]], ncol = n.miss)
gcv.arg$F2 = rowSums(Fo * Fo)
###
Scaled residual covariance matrix that does not
depend on h
if (arg$n.eff > rank.eff) {
S0 = dat$C[arg$miss[[i]], arg$miss[[i]]] - t(Fo)
%*% Fo
} else {
S0 = matrix(0, nrow = n.miss, ncol = n.miss)
}
### Diagonal of S0
gcv.arg$d.S0 = diag(S0)
### Find ridge parameters
### Individual
if (arg$reg.type == 1) {
h = vector()
S = matrix(0, nrow = n.miss, ncol = n.miss)
for (j in 1:n.miss) {
### Ridge parameters
if (is.na(arg$h.default[arg$miss[[i]][j]]))
{
gcv.arg$F2 = matrix(Fo[, j] ^ 2, ncol =
1)
dat$h[idx, arg$miss[[i]][j]] =
rep(fn$gcvRidge(x = j, a = gcv.arg), length(idx))
h[j] = dat$h[idx[1], arg$miss[[i]][j]]
} else {
h[j] = dat$h[idx[1], arg$miss[[i]][j]]
= arg$h.default[arg$miss[[i]][j]]
}
### Effective number of parameters
dat$p.eff[idx, arg$miss[[i]][j]] =
matrix(sum(dat$d / (dat$d + h[j] ^ 2)), nrow = length(idx), byrow = TRUE)
### Retained variance
dat$v.ret[idx, arg$miss[[i]][j]] =
matrix(sum(dat$d ^ 2 / (dat$d + h[j] ^ 2)) / sum(dat$d), nrow =
length(idx), byrow = TRUE)
### Get B
filt = 1 / (dat$d + h[j] ^ 2)
if (rank.eff == 1) {temp = matrix(dat$V *
filt, ncol = 1)} else {
temp = fn$mScale(dat$V, filt, type = 1,
scale = TRUE, add = FALSE, center = FALSE, max.size = arg$max.size)
}
B.temp = temp %*% t(dat$V) %*%
matrix(dat$C[arg$avail[[i]], arg$miss[[i]][j]], ncol=1)
rm(temp, filt)
###
Assemble residual scaled covariance
matrix
temp = apply(matrix(c(1:j), nrow = 1), 2,
fn$residFn, h = h[1:j], Fo = matrix(Fo[, 1:j], ncol = j), d = dat$d, j =
j)
S[j, 1:j] = S[1:j, j] = length(idx) *
arg$inflate * (S0[1:j, j] + temp)
rm(temp)
### Get solution
B[, arg$miss[[i]][j]] = B.temp
Xhat[idx, arg$miss[[i]][j]] = dat$X[idx,
arg$avail[[i]]] %*% B.temp
}
}
### Multiple
if (arg$reg.type == 2) {
### Ridge parameter
if (is.na(arg$h.default)) {h = dat$h[i] =
fn$gcvRidge(x = c(1:n.miss), a = gcv.arg)} else {h = dat$h[i] =
arg$h.default}
### Effective number of parameters
dat$p.eff[idx] = sum(dat$d / (dat$d + h ^ 2))
### Retained variance
dat$v.ret[idx] = sum(dat$d ^ 2 / (dat$d + h ^ 2))
/ sum(dat$d)
### Get B
filt = 1 / (dat$d + h ^ 2)
if (rank.eff == 1) {temp = matrix(dat$V * filt,
ncol = 1)} else {
temp = fn$mScale(dat$V, filt, type = 1,
scale = TRUE, add = FALSE, center = FALSE, max.size = arg$max.size)
}
B = temp %*% t(dat$V) %*%
matrix(dat$C[arg$avail[[i]], ], nrow = length(arg$avail[[i]]))
rm(temp, filt)
###
Estimated scaled covariance matrix of
residuals
S = length(idx) * arg$inflate * (S0 + t(Fo) %*%
(Fo * h ^ 4 / (dat$d + h ^ 2) ^ 2))
### Get solution
Xhat[idx, ] = dat$X[idx, arg$avail[[i]]] %*% B
}
} else {
Xhat[idx, ] = dat$X[idx, ]
B = 0
S = 0
}
### Update C.res
if (!identical(S, 0)) {C.res[arg$miss[[i]], arg$miss[[i]]] =
C.res[arg$miss[[i]], arg$miss[[i]]] + S}
### Store B
if (!is.na(arg$B.series[1]) & arg$B.series[1] == 0) {
attr(B, "miss") = arg$miss[[i]]
attr(B, "rows") = idx
dat$B[[i]] = B
}
if (!is.na(arg$B.series[1]) & arg$B.series[1] != 0) {
for (j in arg$B.series) {
dat$B[[j]][idx, arg$avail[[i]]] = matrix(B[, j],
ncol = length(B[, j]), nrow = length(idx), byrow = TRUE)
}
}
}
### Compute average effective number of parameters, retained
variance, and store C.res
if (arg$reg.type == 1) {
temp.p = dat$p.eff
temp.p[!arg$missing] = NA
dat$p.eff.ave[iter] = sum(temp.p, na.rm = TRUE) /
sum(arg$missing)
} else {
dat$p.eff.ave[iter] = sum(dat$p.eff) / arg$n
}
dat$v.ret.ave[iter] = sum(dat$v.ret) / arg$nmis
dat$C.res = C.res
### Unscale
if (arg$covr == FALSE | arg$wts.flag == TRUE) {
dat$X = fn$mScale(dat$X, 1 / D, type = 1, add = FALSE, center
= FALSE, max.size = arg$max.size)
Xhat = fn$mScale(Xhat, 1 / D, type = 1, add = FALSE, center =
FALSE, max.size = arg$max.size)
dat$C.res = fn$mScale((1 / D) * dat$C.res, 1 / D, type = 1,
add = FALSE, center = FALSE, max.size = arg$max.size)
}
### Compute solution rms change in Xhat
dat$dXhat[iter] = sqrt(fn$ssq(Xhat[arg$missing] dat$X[arg$missing]) / arg$nmis)
### Compute relative change
dat$nXhat[iter] = sqrt(fn$ssq(dat$X[arg$missing]) / arg$nmis)
### Compute stagnation value
dat$rXhat[iter] = dat$dXhat[iter] / dat$nXhat[iter]
### Update X & Xhat
dat$X[arg$missing] = Xhat[arg$missing]
dat$Xhat = Xhat
### Recenter
center.vec = colMeans(dat$X)
dat$X = fn$mScale(dat$X, -center.vec, type = 1, add = TRUE, scale =
FALSE, center = FALSE, max.size = arg$max.size)
dat$Xhat = fn$mScale(dat$Xhat, -center.vec, type = 1, add = TRUE,
scale = FALSE, center = FALSE, max.size = arg$max.size)
colnames(dat$X) = colnames(dat$Xhat) = char$c.names
rownames(dat$X) = rownames(dat$Xhat) = char$r.names
### Generate C from X plus C.res
dat$C = ((t(dat$X) %*% dat$X) + dat$C.res) / arg$n.eff
### Plotting?
if (arg$plot == TRUE) {
### Margins
par(mar = c(0.5, 4, 2, .15) + 0.1)
### Split screen
close.screen(all = TRUE)
split.screen(c(3, 1))
### Plot selected series
screen(1)
temp.X = dat$X[, arg$plot.series]
temp.X[arg$missing[, arg$plot.series]] = NA
temp.Xhat = dat$Xhat[, arg$plot.series]
plot(temp.Xhat, ylim = c(min(temp.X, temp.Xhat, na.rm =
TRUE), max(temp.X, temp.Xhat, na.rm = TRUE)), type = "l", col = 2, xaxt =
"n", ylab = "Deg C", main = paste("Series ", arg$plot.series, " Actual
(Black) & Estimated (Red)"))
points(temp.X, type = "l", col = 1)
### Plot series variance
screen(2)
temp.D = diag(dat$C)
plot(char$scales ^ 2, type = "h", ylim = c(0, max(temp.D,
char$scales ^ 2)), xaxt = "n", ylab = "Variance", main = "Variance
(Black: Original; Red: Cov. Matrix Diagonal)")
rect(arg$plot.series - 0.3, 0, arg$plot.series + 0.6,
max(temp.D, char$scales ^ 2), border = NA, col = 7)
points(char$scales ^ 2, type = "h")
points(temp.D ~ I(c(1:arg$p) + 0.3), type = "h", col = 2)
### Plot effective parameters
screen(3)
if (arg$reg.type == 1) {
temp.p = dat$p.eff
temp.p[!arg$missing] = NA
temp.p = rowMeans(temp.p, na.rm = TRUE)
temp.p[is.na(temp.p)] = 0
} else {
temp.p = dat$p.eff
}
plot(temp.p, ylim = c(0, max(na.omit(temp.p))), pch = 15, col
= "white", xaxt = "n", ylab = "# Eff. Params", main = paste("Effective
Parameters (Time Step Average)", sep = ""))
points(temp.p, pch = 15, cex = 0.5)
### Close & reset to defaults
close.screen(all = TRUE)
par(mar = c(5, 4, 4, 2) + 0.1)
}
###
dat
Cleanup & return
}
### Master routine for TSVD
tsvdRegFn = function (arg, dat, char, fn, n.eof, iter) {
### Remove unnecessary matrices
dat$C = dat$C.res = NA
### Setup Xhat
Xhat = matrix(0, nrow = arg$n, ncol = arg$p)
### Find scales and weighting factors
if (arg$covr == TRUE) {
D = arg$wts / rep(1, arg$p)
} else {
D = sqrt(colSums(dat$X ^ 2) / arg$n.eff)
D = arg$wts / D
}
### Scale
if (arg$covr == FALSE | arg$wts.flag == TRUE) {
dat$X = fn$mScale(dat$X, D, type = 1, add = FALSE, center =
FALSE, max.size = arg$max.size)
dat$Xhat = fn$mScale(dat$Xhat, D, type = 1, add = FALSE,
center = FALSE, max.size = arg$max.size)
}
### Approximate X
svd.X=svd(dat$X, nu = n.eof, nv = n.eof)
if(n.eof > 1) {
Xhat = svd.X$u[, 1:n.eof] %*% diag(svd.X$d[1:n.eof]) %*%
t(svd.X$v[, 1:n.eof])
}
if(n.eof == 1) {
Xhat = as.matrix(svd.X$u[, 1]) %*% as.matrix(svd.X$d[1]) %*%
t(svd.X$v[, 1])
}
### Unscale
if (arg$covr == FALSE | arg$wts.flag == TRUE) {
dat$X = fn$mScale(dat$X, 1 / D, type = 1, add = FALSE, center
= FALSE, max.size = arg$max.size)
Xhat = fn$mScale(Xhat, 1 / D, type = 1, add = FALSE, center =
FALSE, max.size = arg$max.size)
}
### Compute solution rms change in Xhat
dat$dXhat[iter] = sqrt(fn$ssq(Xhat[arg$missing] dat$X[arg$missing]) / arg$nmis)
### Compute relative change
dat$nXhat[iter] = sqrt(fn$ssq(dat$X[arg$missing]) / arg$nmis)
### Compute stagnation value
dat$rXhat[iter] = dat$dXhat[iter] / dat$nXhat[iter]
### Update X & Xhat
dat$X[arg$missing] = Xhat[arg$missing]
dat$Xhat = Xhat
### Recenter
center.vec = colMeans(dat$X)
dat$X = fn$mScale(dat$X, -center.vec, type = 1, add = TRUE, scale =
FALSE, center = FALSE, max.size = arg$max.size)
dat$Xhat = fn$mScale(dat$Xhat, -center.vec, type = 1, add = TRUE,
scale = FALSE, center = FALSE, max.size = arg$max.size)
colnames(dat$X) = colnames(dat$Xhat) = char$c.names
rownames(dat$X) = rownames(dat$Xhat) = char$r.names
### Plotting?
if (arg$plot == TRUE) {
### Margins
par(mar = c(0.5, 4, 2, .15) + 0.1)
### Split screen
close.screen(all = TRUE)
split.screen(c(3, 1))
### Plot selected series
screen(1)
temp.X = dat$X[, arg$plot.series]
temp.X[arg$missing[, arg$plot.series]] = NA
temp.Xhat = dat$Xhat[, arg$plot.series]
plot(temp.Xhat, ylim = c(min(temp.X, temp.Xhat, na.rm =
TRUE), max(temp.X, temp.Xhat, na.rm = TRUE)), type = "l", col = 2, xaxt =
"n", ylab = "Deg C", main = paste("Series ", arg$plot.series, " Actual
(Black) & Estimated (Red)"))
points(temp.X, type = "l", col = 1)
### Plot series variance
screen(2)
temp.D = colSums(dat$X ^ 2) / arg$n.eff
plot(char$scales ^ 2, type = "h", ylim = c(0, max(temp.D,
char$scales ^ 2)), xaxt = "n", ylab = "Variance", main = "Variance
(Black: Original; Red: Cov. Matrix Diagonal)")
rect(arg$plot.series - 0.3, 0, arg$plot.series + 0.6,
max(temp.D, char$scales ^ 2), border = NA, col = 7)
points(char$scales ^ 2, type = "h")
points(temp.D ~ I(c(1:arg$p) + 0.3), type = "h", col = 2)
### Plot eigenvalues
screen(3)
plot(svd.X$d, type="h", xaxt = "n", ylab = "Eigenvalue", main
= paste("Eigenvalue Spectrum (Truncation Parameter: ", n.eof, ")", sep =
""))
### Close & reset to defaults
close.screen(all = TRUE)
par(mar = c(5, 4, 4, 2) + 0.1)
}
###
dat
Cleanup & return
}
#########################################################################
#######
######################
END EM ALGORITHM
######################
#########################################################################
#######
########################################
### SEGMENT: IMPUTATION EXPERIMENTS ###
########################################
###
### 1. doGndRandom: Performs an infilling experiment with each of the
###
station sets. Verification statistics are
calculated
###
using random withholding.
###
###
Y: The full time series set of READER data
###
###
n.max: The maximum number of EOFs to use for the ground station
infilling
###
###
tol: Convergence tolerance
###
###
withhold: Proportion of data point to withhold for crossvalidation
###
###
st: Start date
###
###
en: End date
###
###
form.st: The ground station clipping vector to start with
###
###
form.en: The ground station clipping vector to end with
###
###
form: List of all ground station clipping vectors
###
###
form.names: Vector of descriptive names for the station sets
###
###
plt: See emFn()
###
#########################################################################
################
###
### 2. doGndStn: Performs an infilling experiment with each of the
###
station sets. Verification statistics are
calculated
###
using early/late withholding for designated
###
stations.
###
###
For early calibration, the last half of the data
for stations
###
in vector "form.ver" is withheld. For late
calibration,
###
the first half of the same data is withheld.
###
###
dat: The full time series set of READER data
###
###
n.max: The maximum number of EOFs to use for the ground station
imputation
###
###
tol: Convergence tolerance
###
###
path: Path to use for saving both the imputation results and
the statistics
###
file. Relative paths allowed.
###
###
st: Start date for the imputation
###
###
en: End date for the imputation
###
###
form.st: The ground station clipping vector to start with
###
###
form.en: The ground station clipping vector to end with
###
###
form: List of all ground station clipping vectors
###
###
ver: List of ground station verification clipping vectors
###
###
form.names: Vector of descriptive names for the station sets
###
###
plt: See emFn()
###
#########################################################################
################
###
### 3. doRls: Performs a series of reconstruction experiments using a
regularized
###
least squares fit of ground data to the AVHRR spatial
EOFs.
###
###
v.min: Starting number of spatial eigenvectors
###
###
v.max: Maximum number of spatial eigenvectors. If NA, doRls()
will
###
set v.max equal to the effective rank of the ground station
data.
###
###
other options: See getRls() in the RECONSTRUCTION functions
segment
###
#########################################################################
################
###
### 4. cvGnd: Generates separate infilled ground station sets by
withholding one
###
station at a time.
###
###
Yo: Full ground station set
###
###
type: Regression type ("iridge", "mridge", "tsvd", or "ttls").
See emFn().
###
###
tol / maxiter / DoFL / n.eof / plt / progressive: See emFn()
###
#########################################################################
################
###
### 5. verEigen: Calculates estimates to withheld stations using the
output of cvGnd().
###
###
cv.gnd: Output of cv.gnd
###
###
dat: AVHRR data (must be a time series)
###
###
n.max: Maximum number of retained EOFs to use in the regression
###
###
alpha: Weighting factor for ground stations (alphas > 1 --> OLS
solution with ground
###
stations considered independent; alphas < 1 --> OLS
solution with PCs
###
considered independent)
###
###
st: Start date for the reconstruction
###
###
en: End date for the reconstruction
###
###
forms: List of ground station clipping vectors
###
###
type: Regression type ("iridge", "mridge", "tsvd", or "ttls").
See emFn().
###
###
tol / maxiter / DoFL / n.eof / plt / progressive: See emFn()
###
#########################################################################
################
###
### 6. verRLS: Calculates estimates to withheld stations using the
output of cvGnd().
###
###
cv.gnd: Output of cv.gnd
###
###
X: AVHRR data (must be a time series)
###
###
form: Ground station clipping vector to use
###
###
v.min: Minimum number of AVHRR spatial eigenvectors to use
###
###
v.max: Maximum number of AVHRR spatial eigenvectors to use
###
###
h.min: Minimum bound on regularization parameter
###
###
h.max: Upper bound on regularization parameter
###
###
DoFL: Degrees of freedom lost
###
###
tol: Tolerance for the regularization parameter optimize
function; see optimize()
###
in the R help
###
###
st.y: Start date for the ground data
###
###
en.y: End date for the ground data
###
###
st.x: Start date for the satellite data
###
###
en.x: End date for the satellite data
###
###
external: Not implemented. Leave as NA.
###
#########################################################################
################
###
### 7. verS09: Performs the S09 reconstruction by withholding one
station at a time.
###
###
Y: The 42 S09 stations in the first 42 columns, the first 3
AVHRR PCs in the
###
last 3 columns. Must be a time series.
###
###
default.pcs: The output of getPcs() for form.S09
###
doGndRandom = function (Y = Y.all, n.max = 10, tol = 0.0005, withhold =
0.05, plt = c(T, 1), st = 1957, en = c(2006, 12), form.st = 1,
form.en = length(form.list), form = form.list,
form.names = form.name) {
### Set time window
Y = window(Y, st, en)
Y = mScale(Y, -colMeans(Y, na.rm = TRUE), type = 1, add = TRUE,
scale = FALSE, center = FALSE)
### Set up storage variables
method = c("TSVD", "TTLS")
gnd.stats = list()
for (i in 1:2) {
gnd.stats[[i]] = list()
attr(gnd.stats[[i]], "Method") = method[i]
for (j in form.st:form.en) {
gnd.stats[[i]][[j]] = list()
attr(gnd.stats[[i]][[j]], "Method") = method[i]
attr(gnd.stats[[i]][[j]], "Ground Set") = form.names[j]
}
}
### Iterate through each ground set
for (j in form.st:form.en) {
###
Obtain the station complement and verification
complement
stn.set = parseRandom(Y, form = j, withhold = withhold)
base = stn.set[[1]]
ver = stn.set[[2]]
cal.idx = !is.na(ver)
### TTLS cannot run with regpar > avail, so limit maximum
### regpar to minimum available stations at any time
ttls.lim = min(c(rowSums(ver / ver, na.rm = TRUE))) - 1
###################
### Mode TSVD
###################
### Set mode variable
i = 1
### Set up storage
gnd.stats[[i]][[j]] = matrix(NA, nrow = 14, ncol = n.max)
colnames(gnd.stats[[i]][[j]]) = paste(seq(1:n.max), "Retained
EOFs")
rownames(gnd.stats[[i]][[j]]) = c("Cal. RMS Error", "Cal.
Explained Variance", "Cal. Cor. Coef.", "Ver. RMS Error", "Ver. RE",
"Ver. CE", "Ver. Cor. Coef.", "Cal. RMS Error", "Cal. Explained
Variance", "Cal. Cor. Coef.", "Ver. RMS Error", "Ver. RE", "Ver. CE",
"Ver. Cor. Coef.")
attr(gnd.stats[[i]][[j]], "Info: ") = c("Rows 1 - 7 are
station means; rows 8 - 14 are pointwise statistics.")
infill = emFn(ver, maxiter = 5000, tol = tol, n.eof = n.max,
type = "tsvd", covr = FALSE, progressive = c(T, 1, T, 1), DoFL= 12, plt =
plt)
### Calculate and collate statistics
for(k in 1:n.max) {
stats = getStats(base, infill[[k]]$Xhat, cal.idx)
gnd.stats[[i]][[j]][1:3, k] = stats[[1]][, ncol(ver) +
1]
gnd.stats[[i]][[j]][4:7, k] = stats[[2]][, ncol(ver) +
1]
gnd.stats[[i]][[j]][8:10, k] = stats[[1]][, ncol(ver) +
2]
gnd.stats[[i]][[j]][11:14, k] = stats[[2]][, ncol(ver) +
2]
attr(gnd.stats[[i]][[j]], "Method") = method[i]
attr(gnd.stats[[i]][[j]], "Ground Set") = form.names[j]
}
###################
### Mode TTLS
###################
### Set mode variable
i = 2
### Set up storage
gnd.stats[[i]][[j]] = matrix(NA, nrow = 14, ncol = n.max)
colnames(gnd.stats[[i]][[j]]) = paste(seq(1:n.max), "Retained
EOFs")
rownames(gnd.stats[[i]][[j]]) = c("Cal. RMS Error", "Cal.
Explained Variance", "Cal. Cor. Coef.", "Ver. RMS Error", "Ver. RE",
"Ver. CE", "Ver. Cor. Coef.", "Cal. RMS Error", "Cal. Explained
Variance", "Cal. Cor. Coef.", "Ver. RMS Error", "Ver. RE", "Ver. CE",
"Ver. Cor. Coef.")
attr(gnd.stats[[i]][[j]], "Info: ") = c("Rows 1 - 7 are
station means; rows 8 - 14 are pointwise statistics.")
infill = emFn(ver, maxiter = 1500, tol = tol, n.eof =
min(n.max, ttls.lim), type = "ttls", covr = FALSE, progressive = c(T, 1,
T, 1), DoFL= 12, plt = plt)
### Calculate and collate statistics
for(k in 1:min(n.max, ttls.lim)) {
stats = getStats(base, infill[[k]]$Xhat, cal.idx)
gnd.stats[[i]][[j]][1:3, k] = stats[[1]][, ncol(ver) +
1]
gnd.stats[[i]][[j]][4:7, k] = stats[[2]][, ncol(ver) +
1]
gnd.stats[[i]][[j]][8:10, k] = stats[[1]][, ncol(ver) +
2]
gnd.stats[[i]][[j]][11:14, k] = stats[[2]][, ncol(ver) +
2]
attr(gnd.stats[[i]][[j]], "Method") = method[i]
attr(gnd.stats[[i]][[j]], "Ground Set") = form.names[j]
}
}
### Return
gnd.stats
}
doGndStn = function(Y = Y.all, n.max = 10, tol = .0005, plt = c(T, 1), st
= 1957, en = c(2006, 12), form.st = 1, form.en = length(form.list),
form = form.list, ver = form.ver, form.names = form.name)
{
### Set time window
Y = window(Y, st, en)
Y = mScale(Y, -colMeans(Y, na.rm = TRUE), type = 1, add = TRUE,
scale = FALSE, center = FALSE)
### Set up storage variables
method = c("TSVD", "TTLS")
gnd.stats = list()
for (i in 1:2) {
gnd.stats[[i]] = list()
attr(gnd.stats[[i]], "Method") = method[i]
for (j in form.st:form.en) {
gnd.stats[[i]][[j]] = list()
attr(gnd.stats[[i]][[j]], "Method") = method[i]
attr(gnd.stats[[i]][[j]], "Ground Set") = form.names[j]
}
}
### Iterate through each ground set
for (j in form.st:form.en) {
###
Obtain the station complement and verification
complement
stn.set = parseStn(Y, form[[j]], ver[[j]])
base = stn.set[[1]]
early = stn.set[[2]]
late = stn.set[[3]]
stns = stn.set[[4]]
idx = stn.set[[5]]
### Map calibration data points
cal.early = (early == base & !is.na(early))
cal.late = (late == base & !is.na(late))
### TTLS cannot run with regpar > kavlr, so limit maximum
### regpar to minimum available stations at any time
ttls.lim = min(c(rowSums(early / early, na.rm = TRUE),
rowSums(late / late, na.rm = TRUE))) - 1
###################
### Mode TSVD
###################
###
i=1
Set mode variable
### Set up storage
gnd.stats[[i]][[j]] = matrix(NA, nrow = 28, ncol = n.max)
colnames(gnd.stats[[i]][[j]]) = paste(seq(1:n.max), "Retained
EOFs")
row.list = c("Cal. RMS Error", "Cal. Explained Variance",
"Cal. Cor. Coef.", "Ver. RMS Error", "Ver. RE", "Ver. CE", "Ver. Cor.
Coef.", "Cal. RMS Error", "Cal. Explained Variance", "Cal. Cor. Coef.",
"Ver. RMS Error", "Ver. RE", "Ver. CE", "Ver. Cor. Coef.")
rownames(gnd.stats[[i]][[j]]) = c(paste("Early Cal.",
row.list), paste("Late Cal.", row.list))
attr(gnd.stats[[i]][[j]], "Info: ") = c("Rows 1 - 7 and 15 21 are station means; rows 8 - 14 and 22 - 28 are pointwise statistics.")
### Infill
infill.early = emFn(early, maxiter = 5000, tol = tol, n.eof =
n.max, type = "tsvd", covr = FALSE, DoFL = 12, progressive = c(T, 1, T,
1), plt = plt)
infill.late = emFn(late, maxiter = 5000, tol = tol, n.eof =
n.max, type = "tsvd", covr = FALSE, DoFL = 12, progressive = c(T, 1, T,
1), plt = plt)
### Calculate and collate statistics
for(k in 1:n.max) {
early.stats = getStats(base, infill.early[[k]]$Xhat,
cal.early)
late.stats = getStats(base, infill.late[[k]]$Xhat,
cal.late)
gnd.stats[[i]][[j]][1:3, k] = early.stats[[1]][,
ncol(early) + 1]
gnd.stats[[i]][[j]][4:7, k] = early.stats[[2]][,
ncol(early) + 1]
gnd.stats[[i]][[j]][8:10, k] = early.stats[[1]][,
ncol(early) + 2]
gnd.stats[[i]][[j]][11:14, k] = early.stats[[2]][,
ncol(early) + 2]
gnd.stats[[i]][[j]][15:17, k] = late.stats[[1]][,
ncol(late) + 1]
gnd.stats[[i]][[j]][18:21, k] = late.stats[[2]][,
ncol(late) + 1]
gnd.stats[[i]][[j]][22:24, k] = late.stats[[1]][,
ncol(late) + 2]
gnd.stats[[i]][[j]][25:28, k] = late.stats[[2]][,
ncol(late) + 2]
attr(gnd.stats[[i]][[j]], "Method") = method[i]
attr(gnd.stats[[i]][[j]], "Ground Set") = form.names[j]
}
###################
### Mode TTLS
###################
###
i=2
Set mode variable
### Set up storage
gnd.stats[[i]][[j]] = matrix(NA, nrow = 28, ncol = n.max)
colnames(gnd.stats[[i]][[j]]) = paste(seq(1:n.max), "Retained
EOFs")
row.list = c("Cal. RMS Error", "Cal. Explained Variance",
"Cal. Cor. Coef.", "Ver. RMS Error", "Ver. RE", "Ver. CE", "Ver. Cor.
Coef.", "Cal. RMS Error", "Cal. Explained Variance", "Cal. Cor. Coef.",
"Ver. RMS Error", "Ver. RE", "Ver. CE", "Ver. Cor. Coef.")
rownames(gnd.stats[[i]][[j]]) = c(paste("Early Cal.",
row.list), paste("Late Cal.", row.list))
attr(gnd.stats[[i]][[j]], "Info: ") = c("Rows 1 - 7 and 15 21 are station means; rows 8 - 14 and 22 - 28 are pointwise statistics.")
### Infill
infill.early = emFn(early, maxiter = 5000, tol = tol, n.eof =
n.max, type = "ttls", covr = FALSE, DoFL = 12, progressive = c(T, 1, T,
1), plt = plt)
infill.late = emFn(late, maxiter = 5000, tol = tol, n.eof =
n.max, type = "ttls", covr = FALSE, DoFL = 12, progressive = c(T, 1, T,
1), plt = plt)
### Calculate and collate statistics
for(k in 1:min(n.max, ttls.lim)) {
early.stats = getStats(base, infill.early[[k]]$Xhat,
cal.early)
late.stats = getStats(base, infill.late[[k]]$Xhat,
cal.late)
gnd.stats[[i]][[j]][1:3, k] = early.stats[[1]][,
ncol(early) + 1]
gnd.stats[[i]][[j]][4:7, k] = early.stats[[2]][,
ncol(early) + 1]
gnd.stats[[i]][[j]][8:10, k] = early.stats[[1]][,
ncol(early) + 2]
gnd.stats[[i]][[j]][11:14, k] = early.stats[[2]][,
ncol(early) + 2]
gnd.stats[[i]][[j]][15:17, k] = late.stats[[1]][,
ncol(late) + 1]
gnd.stats[[i]][[j]][18:21, k] = late.stats[[2]][,
ncol(late) + 1]
gnd.stats[[i]][[j]][22:24, k] = late.stats[[1]][,
ncol(late) + 2]
gnd.stats[[i]][[j]][25:28, k] = late.stats[[2]][,
ncol(late) + 2]
attr(gnd.stats[[i]][[j]], "Method") = method[i]
attr(gnd.stats[[i]][[j]], "Ground Set") = form.names[j]
}
}
### Return
gnd.stats
}
doRls = function (Y, X = all.avhrr, form = form.grid.96, v.min = 1, v.max
= 300, h.min = 0.05, h.max = 10, tol = 0.001, DoFL = 12,
st.y = 1957, en.y = c(2006,12), st.x = 1982, en.x =
c(2006,12)) {
### Setup storage
err = vector()
h = vector()
external = list()
### Get predictor station grid cells
external$Y.ver = window(Y.all[, -form], st.y, en.y)
external$Y.ver = mScale(external$Y.ver, -colMeans(external$Y.ver,
na.rm = TRUE), type = 1, add = TRUE, scale = FALSE, center = FALSE)
na.ver = is.na(external$Y.ver)
Y.idx = all.idx[-form, 4]
external$Y.idx = Y.idx
Y = window(Y, st.y, en.y)
Y = mScale(Y, -colMeans(Y, na.rm = TRUE), type = 1, add = TRUE,
scale = FALSE, center = FALSE)
tsp(Y) = NULL
external$Y = matrix(Y, ncol = ncol(Y))
### Get verification stations for optimization
form.ver = form[!is.element(form, form.grid)]
Z = Y.all[, form.ver]
Z = window(Z, st.y, en.y)
Z = mScale(Z, -colMeans(Z, na.rm = TRUE), type = 1, add = TRUE,
scale = FALSE, center = FALSE)
Z = matrix(Z, ncol = ncol(Z))
colnames(Z) = all.idx[form.ver, 1]
external$Z = Z
external$na.ver = cbind(is.na(Z), na.ver)
external$Z.idx = all.idx[form.ver, 4]
### Get spatial structure
X = makeAnom(window(X, st.x, en.x))
X.svd = svd(X)
### Find rank
rank = max(1, min(v.max, sum(X.svd$d > sqrt(.Machine$double.eps) *
10)))
### Extract components
external$V = matrix(X.svd$v[, 1:rank], ncol = rank)
external$d = X.svd$d[1:rank]
external$U = matrix(X.svd$u[, 1:rank], ncol = rank)
external$n.eff = nrow(X) - DoFL
### Define subcomponents
external$VY = matrix(external$V[external$Y.idx, ], ncol = rank)
external$LY = mScale(external$VY, external$d /
sqrt(external$n.eff), type = 1, scale = TRUE, add = FALSE, center =
FALSE)
external$VZt = t(matrix(external$V[external$Z.idx, ], ncol = rank))
### Prepare spatial EOFs
external$LYt = external$LY2 = list()
external$LYt[[1]] = t(external$LY)
external$LY2[[1]] = external$LYt[[1]] %*% external$LY
### Prepare censored EOFs for internal cross-validation
for (j in 1:ncol(Y)) {
external$LYt[[j + 1]] = external$LYt[[1]][, -j]
external$LY2[[j + 1]] = external$LYt[[j + 1]] %*%
t(external$LYt[[j + 1]])
}
### Iterate through v.min to v.max
for (i in v.min:v.max) {
### Find rank
external$r = r = min(i, rank)
### Perform regression
temp = getRls(Y, X, form, i, h.min, h.max, tol, DoFL, st.y,
en.y, st.x, en.x, external)
err[i] = temp$err
h[i] = temp$h
}
### Return
rls.stats = list(err, h)
rls.stats
}
cvGnd = function (Yo, type = "iridge", tol = 0.0005, maxiter = 500, DoFL
= 12, n.eof = NA, plt = c(T, 1),
progressive = c(F, 1, F, 1), form = form.grid.96) {
### Set up storage
cv.gnd = list()
if (is.na(n.eof)) {idx = 1} else {idx = n.eof}
### Infill the supplied set with all series
message(paste("Infilling ground set 1 of", ncol(Yo) + 1))
if (type != "tsvd") {
temp = suppressMessages(emFn(Yo, type =
maxiter = maxiter, n.eof = n.eof, DoFL = DoFL, plt
progressive))
} else {
temp = suppressMessages(emFn(Yo, type =
maxiter = maxiter, n.eof = n.eof, DoFL = DoFL, plt
c(T, 1, T, 1)))
}
type, tol = tol,
= plt, progressive =
type, tol = tol,
= plt, progressive =
### Save results & covariance matrix (to speed up infilling when
stations are withheld)
cv.gnd[[1]] = list()
cv.gnd[[1]]$X = temp[[idx]]$X
cv.gnd[[1]]$C = temp[[idx]]$C
grid.idx = all.idx[-form, 4]
cv.gnd[[1]]$idx = grid.idx
### Repeat infilling with stations in unique grid cells withheld,
one at a time
for (i in 1:ncol(Yo)) {
message(paste("Infilling ground set", i + 1, "of", ncol(Yo) +
1))
cv.gnd[[i + 1]] = list()
g.idx = c(1:ncol(Yo))[grid.idx[i] == grid.idx]
Y2 = Yo[, -g.idx]
if (type != "tsvd") {
temp = suppressMessages(emFn(Y2, C = cv.gnd[[1]]$C[g.idx, -g.idx], type = type, tol = tol, maxiter = maxiter, n.eof = n.eof,
DoFL = DoFL, plt = plt, progressive = progressive))
} else {
temp = suppressMessages(emFn(Y2, type = type, tol = tol,
maxiter = maxiter, n.eof = n.eof, DoFL = DoFL, plt = plt, progressive =
c(T, 1, T, 1)))
}
cv.gnd[[i + 1]]$X = temp[[idx]]$X
cv.gnd[[i + 1]]$idx = grid.idx[-g.idx]
cv.gnd[[i + 1]]$omit = g.idx
}
### Return
cv.gnd
}
verEigen = function (cv.gnd, dat = all.avhrr, type = "tsvd", n.max = 8,
tol = 0.005, form = 11, DoFL = 12, alpha = 10, pc.max = 150,
maxiter = 50, st = 1957, en = c(2006,12), forms =
form.list) {
### Get satellite anomalies
dat = makeAnom(dat)
###
Center Y.ver
Y.ver = window(Y.all[, -forms[[form]]], st, en)
Y.ver = mScale(Y.ver, -colMeans(Y.ver, na.rm = TRUE), type = 1, add
= TRUE, scale = FALSE, center = FALSE)
na.ver = is.na(Y.ver)
Y.idx = all.idx[-forms[[form]], 4]
### Get original data for unused stations
form.ver = forms[[form]][!is.element(forms[[form]], form.grid)]
Z = Y.all[, form.ver]
Z = window(Z, st, en)
Z = mScale(Z, -colMeans(Z, na.rm = TRUE), type = 1, add = TRUE,
scale = FALSE, center = FALSE)
Z = matrix(Z, ncol = ncol(Z))
Z.len = ncol(Z)
colnames(Z) = all.idx[form.ver, 1]
na.ver = cbind(is.na(Z), na.ver)
Z.idx = all.idx[form.ver, 4]
Z = cbind(Z, Y.ver)
### Get correlation AVHRR PCs & weights
pc.data = getPcs(dat, st = start(dat), en = en, form = form, covr =
FALSE, DoFL = DoFL)
U = ts(pc.data[[4]], st = start(dat), freq = 12)
d = pc.data[[3]]
D = pc.data[[5]]
DY = D[Y.idx]
DZ = D[Z.idx]
D = c(DZ, DY)
V = pc.data[[2]]
VY = V[Y.idx, ]
VZ = V[Z.idx, ]
V = rbind(VZ, VY)
Y.wts = pc.data[[9]]
rownames(Y.wts) = colnames(Y.ver)
### Set up storage variables
ver.estimates = matrix(0, nrow = nrow(cv.gnd[[1]]$X), ncol =
ncol(Z))
colnames(ver.estimates) = c(all.idx[form.ver, 1], colnames(Y.ver))
### Iterate through each station in cv.gnd
for (s in 1:length(cv.gnd)) {
### Get ground data
Y = cv.gnd[[s]]$X
Y = mScale(Y, -colMeans(Y), type = 1, add = TRUE, scale =
FALSE, center = FALSE)
rank = min(pc.max, length(d))
message(paste("\n\nReconstruction", s, "of", length(cv.gnd)))
### Set up storage variables
sat.stats = matrix(nrow = n.max + 2, ncol = rank)
colnames(sat.stats) = paste("Recon PCs", c(1:rank))
rownames(sat.stats) = c(paste("RMS Error, k =", c(1:n.max)),
"Min Error at k =", "Recon RMS Error:")
attr(sat.stats, "Method") = type
infilled.Xhat = infilled.X = list()
for(i in 1:n.max) {
### Storage for full recon PCs - unspliced
infilled.Xhat[[i]] = matrix(nrow = nrow(Y), ncol = rank)
colnames(infilled.Xhat[[i]]) = paste("AVHRR PC #",
c(1:rank), sep = "")
attr(infilled.Xhat[[i]], "Method") = type
attr(infilled.Xhat[[i]], "Truncation Parameter") = i
### Storage for full recon PCs - spliced
infilled.X[[i]] = matrix(nrow = nrow(Y), ncol = rank)
colnames(infilled.X[[i]]) = paste("AVHRR PC #",
c(1:rank), sep = "")
attr(infilled.X[[i]], "Method") = type
attr(infilled.X[[i]], "Truncation Parameter") = i
}
### Setup
optimal.pcs.Xhat = matrix(0, nrow = nrow(Y), ncol = rank)
optimal.pcs.X = matrix(0, nrow = nrow(Y), ncol = rank)
recon = matrix(0, nrow = nrow(Y), ncol = ncol(Z))
if (s > 1) {recon = recon[, -1]}
Z.temp = Z
V.temp = V
D.temp = D
na.temp = na.ver
if (s > 1) {
Z.temp = Z.temp[, -(s - 1 + Z.len)]
V.temp = V.temp[-(s - 1 + Z.len), ]
D.temp = D.temp[-(s - 1 + Z.len)]
na.temp = na.temp[, -(s - 1 + Z.len)]
}
### Iterate through each PC, up to pc.max
for (j in 1:rank) {
wts = Y.wts
if (s > 1) {wts = Y.wts[-(s - 1), ]}
### Set up
X = cbind(Y, U[, j])
wts = as.vector(c(alpha * wts[, j], 1))
### Infill
message(paste("Infilling AVHRR PC #", j))
infill.Y = suppressMessages(emFn(X, n.eof = n.max,
maxiter = maxiter, tol = tol, type = type, wts = wts, progressive = c(T,
1, T, 1), plt = c(T, ncol(X)), DoFL = DoFL))
###
Save PCs
for (k in 1:n.max) {
infilled.Xhat[[k]][, j] = infill.Y[[k]]$Xhat[,
ncol(X)] * d[j]
infilled.X[[k]][, j] = infill.Y[[k]]$X[, ncol(X)]
* d[j]
}
}
message("\nOptimizing . . . ")
### Get verification stats
for (j in 1:rank) {
for (k in 1:n.max) {
recon.eof = (matrix(infilled.Xhat[[k]][, j], ncol
= 1) %*% t(V.temp[, j])) %*% diag(D.temp)
recon.eof[na.temp] = NA
recon.eof = mScale(recon.eof, -colMeans(recon.eof,
na.rm = TRUE), type = 1, add = TRUE, center = FALSE, scale = FALSE)
sat.stats[k, j] = sqrt(sum((Z.temp - recon.eof) ^
2, na.rm = TRUE) / sum(na.ver))
}
### Record truncation value for minimum error
sat.stats[n.max + 1, j] =
c(1:n.max)[min(sat.stats[1:n.max, j]) == sat.stats[1:n.max, j]]
### Save the PC
optimal.pcs.Xhat[, j] = infilled.Xhat[[sat.stats[n.max +
1, j]]][, j]
optimal.pcs.X[, j] = infilled.X[[sat.stats[n.max + 1,
j]]][, j]
### Calculate recon verification stats
recon = recon + (matrix(optimal.pcs.Xhat[, j], ncol = 1)
%*% t(V.temp[, j])) %*% diag(D.temp)
recon[na.temp] = NA
recon = mScale(recon, -colMeans(recon, na.rm = TRUE),
type = 1, add = TRUE, center = FALSE, scale = FALSE)
sat.stats[n.max + 2, j] = sqrt(sum((Z.temp - recon) ^ 2,
na.rm = TRUE) / sum(na.temp))
}
### Find minimum recon rms
min.rms = min(sat.stats[n.max + 2, ])
rank = c(1:rank)[sat.stats[n.max + 2, ] ==
min(sat.stats[n.max + 2, ])]
message(paste(". . . complete.\nMinimum RMS error of",
round(min.rms, 5), "at", rank, "retained satellite PCs."))
### Store estimates
if (s == 1) {
ver.estimates[, 1:Z.len] = ((optimal.pcs.Xhat[, 1:rank]
%*% t(V[, 1:rank])) %*% diag(D))[, 1:Z.len]
} else {
ver.estimates[, Z.len + s - 1] = ((optimal.pcs.Xhat[,
1:rank] %*% t(V[, 1:rank])) %*% diag(D))[, Z.len + s - 1]
}
}
### Return
ver.eigen = list()
ver.eigen$ver = ts(Z, start = st, freq = 12)
ver.eigen$estimates = ts(ver.estimates, start = st, freq = 12)
ver.eigen
}
verRls = function (cv.gnd, X = all.avhrr, form = form.grid.96, v.min = 1,
v.max = 300, h.min = 0.05, h.max = 10, DoFL = 12,
tol = 0.001, st.y = 1957, en.y = c(2006,12), st.x = 1982,
en.x = c(2006,12), external = NA) {
### Setup storage
external = list()
external[[1]] = list()
### Get location vectors and original station data
### Get predictor station grid cells
Y.ver = window(Y.all[, -form], st.y, en.y)
tsp(Y.ver) = NULL
external[[1]]$Y.ver = Y.ver = mScale(Y.ver, -colMeans(Y.ver, na.rm
= TRUE), type = 1, add = TRUE, scale = FALSE, center = FALSE)
na.ver = is.na(Y.ver)
external[[1]]$Y.idx = Y.idx = all.idx[-form, 4]
### Get verification stations for optimization
form.ver = form[!is.element(form, form.grid)]
Z = Y.all[, form.ver]
Z = window(Z, st.y, en.y)
tsp(Z) = NULL
external[[1]]$Z = Z = mScale(Z, -colMeans(Z, na.rm = TRUE), type =
1, add = TRUE, scale = FALSE, center = FALSE)
external[[1]]$na.ver = na.ver = cbind(is.na(Z), na.ver)
colnames(Z) = all.idx[form.ver, 1]
external[[1]]$Z.idx = Z.idx = all.idx[form.ver, 4]
### Setup storage for estimates
estimates = matrix(nrow = nrow(Y.ver), ncol = ncol(Y.ver) +
ncol(Z))
estimates = ts(estimates, start = st.y, freq = 12)
### Get spatial structure
X = makeAnom(window(X, st.x, en.x))
X.svd = svd(X)
###
Find rank
if (is.na(v.max)) {v.max = length(X.svd$d)}
rank.eff = max(1, min(v.max, sum(X.svd$d >
sqrt(.Machine$double.eps) * 10)))
### Extract components
external[[1]]$V = V = matrix(X.svd$v[, 1:rank.eff], ncol =
rank.eff)
external[[1]]$d = d = X.svd$d[1:rank.eff]
external[[1]]$U = U = matrix(X.svd$u[, 1:rank.eff], ncol =
rank.eff)
external[[1]]$n.eff = n.eff = nrow(X) - DoFL
### Define subcomponents
external[[1]]$Y = Y = cv.gnd[[1]]$X
external[[1]]$VY = VY = matrix(V[Y.idx, ], ncol = rank.eff)
external[[1]]$LY = LY = mScale(VY, d / sqrt(n.eff), type = 1, scale
= TRUE, add = FALSE, center = FALSE)
external[[1]]$VZt = VZt = t(matrix(V[Z.idx, ], ncol = rank.eff))
### Prepare spatial EOFs
external[[1]]$LYt = LYt = external[[1]]$LY2 = LY2 = list()
external[[1]]$LYt[[1]] = LYt[[1]] = t(LY)
external[[1]]$LY2[[1]] = LY2[[1]] = LYt[[1]] %*% LY
### Prepare censored EOFs for internal cross-validation
for (i in 1:ncol(Y)) {
external[[1]]$LYt[[i + 1]] = LYt[[i + 1]] = LYt[[1]][, -i]
external[[1]]$LY2[[i + 1]] = LY2[[i + 1]] = LYt[[i + 1]] %*%
t(LYt[[i + 1]])
}
### Find optimal regularization parameter for full set and store
estimates
### to unused stations
temp.err = rep(1e12, v.max)
min.err = rep(1e12, ncol(Y.ver) + 1)
message("Finding optimal parameters for full set.")
for (i in v.min:v.max) {
message(paste("AVHRR EOFs 1 -", i))
external[[1]]$r = i
temp = suppressMessages(getRls(cv.gnd[[1]]$X, X, form, i,
h.min, h.max, tol, DoFL, st.y, en.y, st.x, en.x, external[[1]]))
temp.err[i] = temp$err
}
min.err[1] = external[[1]]$r = c(1:v.max)[temp.err ==
min(temp.err)]
full = suppressMessages(getRls(Yo[[1]], X, form, min.err[1], h.min,
h.max, tol, DoFL, st.y, en.y, st.x, en.x, external[[1]])$Xhat)
estimates[, 1:ncol(Z)] = full %*% matrix(VZt[1:min.err[1], ], nrow
= min.err[1])
message(paste("Complete. Optimal AVHRR EOFs:", min.err[1], "\n"))
###
Set up for withholding by each predictor
for (i in 1:ncol(Y)) {
### Extract the appropriate components
grid.idx = cv.gnd[[i + 1]]$idx
omit = cv.gnd[[i + 1]]$omit
external[[2]] = list()
external[[2]]$Y = cv.gnd[[i + 1]]$X
external[[2]]$Y.ver = Y.ver[, -omit]
external[[2]]$Y.idx = grid.idx
external[[2]]$Z = Z
external[[2]]$na.ver = cbind(is.na(external[[2]]$Z),
is.na(external[[2]]$Y.ver))
external[[2]]$Z.idx = Z.idx
external[[2]]$d = d
external[[2]]$n.eff = n.eff
external[[2]]$V = V
external[[2]]$U = U
external[[2]]$VY = matrix(VY[-omit, ], ncol = rank.eff)
external[[2]]$VZt = VZt
external[[2]]$LY = matrix(LY[-omit, ], ncol = rank.eff)
external[[2]]$LYt = external[[2]]$LY2 = list()
external[[2]]$LYt[[1]] = t(external[[2]]$LY)
external[[2]]$LY2[[1]] = external[[2]]$LYt[[1]] %*%
external[[2]]$LY
for (j in 1:ncol(external[[2]]$Y)) {
external[[2]]$LYt[[j + 1]] = external[[2]]$LYt[[1]][, j]
external[[2]]$LY2[[j + 1]] = external[[2]]$LYt[[j + 1]]
%*% t(external[[2]]$LYt[[j + 1]])
}
### Find optimal regularization parameter for full set and
store estimates
### to unused stations
temp.err = rep(1e12, v.max)
message(paste("Finding optimal parameters based on station
#", i, "being withheld."))
for (j in v.min:v.max) {
message(paste("AVHRR EOFs 1 -", j))
external[[2]]$r = j
temp = suppressMessages(getRls(external[[2]]$Y, X, form,
j, h.min, h.max, tol, DoFL, st.y, en.y, st.x, en.x, external[[2]]))
temp.err[j] = temp$err
}
### Record minimum error
min.err[i + 1] = c(1:v.max)[temp.err == min(temp.err)]
external[[2]]$r = min.err[i + 1]
### Get the PC estimates
est = suppressMessages(getRls(external[[2]]$Y, X, form,
min.err[i + 1], h.min, h.max, tol, DoFL, st.y, en.y, st.x, en.x,
external[[2]])$Xhat)
### Get the station estimate
estimates[, ncol(Z) + i] = est %*%
t(matrix(external[[1]]$VY[i, 1:min.err[i + 1]], ncol = min.err[i + 1]))
message(paste("Complete. Optimal AVHRR EOFs:", min.err[i +
1], "\n"))
}
### Return
full.ver = list()
full.ver$ver = ts(cbind(Z, Y.ver), start = st.y, freq = 12)
full.ver$estimates = ts(estimates, start = st.y, freq = 12)
full.ver$err = min.err
full.ver
}
verS09 = function (Y = S09, default.pcs = S09.pcs) {
### Get S09 stations
Y.ver = window(Y.all, 1957, c(2006, 12))
Y.ver = Y.ver[, -form.S09]
Y.ver = mScale(Y.ver, -colMeans(Y.ver, na.rm = TRUE), type = 1, add
= TRUE, scale = FALSE, center = FALSE)
na.ver = is.na(Y.ver)
tsp(Y.ver) = NULL
Y.idx = all.idx[-form.S09, 4]
### Get unused stations
form.ver = form.S09[!is.element(form.S09, form.grid)]
Z = Y.all[, form.ver]
Z = window(Z, 1957, c(2006,12))
tsp(Z) = NULL
Z = mScale(Z, -colMeans(Z, na.rm = TRUE), type = 1, add = TRUE,
scale = FALSE, center = FALSE)
na.ver = na.ver = cbind(is.na(Z), na.ver)
colnames(Z) = all.idx[form.ver, 1]
Z.idx = all.idx[form.ver, 4]
### Setup storage
estimates = matrix(0, nrow = 600, ncol = 98)
### Get unused station estimates
estimates[, 1:ncol(Z)] = (default.pcs$X %*% t(default.pcs$V[Z.idx,
])) %*% diag(default.pcs$D[Z.idx])
colnames(estimates) = c(colnames(Z), colnames(Y.ver))
### Leave out one station at a time & get estimates
for (i in 1:42) {
message(paste("Infilling PCs station", i, "of", 42,
"withheld."))
infilled.S09 = suppressMessages(emFn(Y[, -i], type = "ttls",
DoFL = 1, maxiter = 50, tol = 0.001, n.eof = 3, plt = c(T, 42),
progressive = c(F, 1, F, 1)))
pcs = infilled.S09[[3]]$X[, 42:44]
estimates[, ncol(Z) + i] = (pcs %*%
t(matrix(default.pcs$V[Y.idx[i], ], nrow = 1))) %*%
diag(default.pcs$D[Y.idx[i]], ncol = 1)
}
### Remove off-grid stations
off.grid = all.idx[form.S09.no.ocean, 1]
off.grid = c(1:42)[is.element(colnames(Y.ver), off.grid)]
Y.ver = Y.ver[, -off.grid]
estimates = estimates[, -(ncol(Z) + off.grid)]
ver = cbind(Z, Y.ver)
### Return
S09.ver = list()
S09.ver$ver = ts(ver, start = 1957, freq = 12)
S09.ver$estimates = ts(estimates, start = 1957, freq = 12)
S09.ver
}
#########################################################################
#################
######################
END IMPUTATION EXPERIMENTS
######################
#########################################################################
#################
##########################################
### SEGMENT: RECONSTRUCTION FUNCTIONS ###
##########################################
###
### 1. getPcs: Returns a list of the following:
###
###
[[1]] Principal components as a time series
###
[[2]] Spatial singular vectors
###
[[3]] Singular values
###
[[4]] Temporal singular vectors
###
[[5]] Standardization vector
###
[[6]] Spatial EOFs
###
[[7]] Spatial EOFs at ground station locations
###
[[8]] Spatial eigenvectors at ground station
locations
###
[[9]] Normalized (sum-of-squares = 1.0) spatial
singular
###
vectors at ground station locations
###
###
NOTE: Data set does not need to be anomalies, but
must be a
###
time series.
###
###
dat: Input data set
###
###
st: Start date for processing
###
###
en: End date for processing
###
###
form: Ground station index vector
###
###
covr: (Logical) Covariance (TRUE) or standardized (FALSE)?
###
###
DoFL: Degrees of freedom lost
###
#########################################################################
################
###
### 2. getEigen: Wrapper function for emFn() for infilling satellite
components
###
using a WLS concept for the ground stations
###
###
Y: Ground data (must be a time series)
###
###
dat: AVHRR data (must be a time series)
###
###
type: Regression method ("tsvd" or "ttls")
###
###
n.max: Maximum number of retained EOFs for each regression
###
###
tol: Convergence tolerance to use
###
###
form: Ground station selection vector
###
###
DoFL: Degrees of freedom lost (12 for monthly anomalies)
###
###
alpha: Scaling factor when using eigenvector weighting
###
###
pc.max: Maximum number of AVHRR PCs to include
###
###
maxiter: Maximum number of iterations to convergence
###
###
st: Start period for the reconstruction
###
###
en: End period for the reconstruction
###
###
forms: The list of station clipping vectors
###
#########################################################################
################
###
### 3. getRls: Function to generate principal components by using
###
ground stations as predictors and AVHRR spatial data
###
to provide the spatial coefficients.
###
###
Y: Ground data (must be a time series)
###
###
X: Data set to provide the spatial weights (AVHRR data)
###
###
form: Ground station selection vector
###
###
idx: Index for removing verification stations (vector of
stations to REMOVE)
###
###
v.max: Maximum number of spatial eigenvectors to use from X
###
###
h.min: Minimum allowable regularization parameter (set as small
as possible)
###
###
h.max: Maximum allowable regularization parameter
###
###
tol: Tolerance to use in the optimization function
###
###
DoFL: Degrees of freedom lost (12 for monthly anomalies)
###
###
st.y: Reconstruction start year
###
###
en.y: Reconstruction end year
###
###
st.x: Satellite data start year
###
###
en.x: Satellite data end year
###
###
external: Externally parsed data to be fed to getRls (used by
doRls)
###
#########################################################################
################
###
### 4. getRecon: Function to generate a reconstruction using a fully
infilled
###
set of principal components.
###
###
est.pcs: The estimated PCs
###
###
verbose: (Logical) If true, calculates maps of p-values for all
trends
###
###
compare.S09: (Logical) If true, compares all trend and
significance maps
###
to S09
###
#########################################################################
################
###
### 5. offsetArea: Function to calculate anomaly offsets for the
nearest###
station reconstruction.
###
###
Yo: Station data, presented as a time series of anomalies
###
###
maxoverlap: Maximum number of overlapping data points to use
###
###
for offset calculations
getPcs = function (dat, st = 1982, en = c(2006, 12), form = form.grid.96,
covr = FALSE, DoFL = 12) {
### Define period
dat = makeAnom(window(dat, st, en))
### Initialize scaling
D = rep(1, ncol(dat))
n.eff = nrow(dat) - DoFL
### If covr=FALSE, scale to correlation
if (covr == FALSE) {
D = sqrt(colSums(dat ^ 2) / n.eff)
dat = mScale(dat, 1 / D, type = 1, scale = TRUE, add = FALSE,
center = FALSE)
dat = ts(dat, st, freq = 12)
}
### Calculate the singular value decomposition of the input matrix
dat.svd = svd(t(dat))
### Find rank
rank.eff = max(1, sum(dat.svd$d > sqrt(.Machine$double.eps) * 10))
### Store
U = matrix(dat.svd$v[, 1:rank.eff], ncol = rank.eff)
d = dat.svd$d[1:rank.eff]
V = matrix(dat.svd$u[, 1:rank.eff], ncol = rank.eff)
### Generate PCs
pcs = ts(mScale(U, d, scale = TRUE, add = FALSE, center = FALSE),
start = st, freq = 12)
### Calculate LY (Spatial EOFs at ground stations)
L = mScale(V, d / sqrt(n.eff), scale = TRUE, add = FALSE, center =
FALSE)
LY = matrix(L[all.idx[-form.list[[form]], 4], ], ncol = rank.eff)
### Calculate VY (eigenvector coefficients at ground stations)
VY = matrix(V[all.idx[-form.list[[form]], 4], ], ncol = rank.eff)
### Calculate V.wts (VY normalized to have a sum-of-squares of 1)
V.wts = VY
V.scale = sqrt(1 / colSums(VY ^ 2))
V.wts = mScale(VY, V.scale, type = 1, scale = TRUE, add = FALSE,
center = FALSE)
### Return the desired number of PCs
pc.data = list(pcs, V, d, U, D, L, LY, VY, V.wts)
###
Return
pc.data
}
getEigen = function(Y, dat = all.avhrr, type = "tsvd", n.max = 8, tol =
0.005, form = 11, DoFL = 12, alpha = 10, pc.max = 150,
maxiter = 50, st = 1957, en = c(2006,12), forms =
form.list) {
### Get satellite anomalies
dat = makeAnom(dat)
### Center Y
Y.ver = window(Y.all[, -forms[[form]]], st, en)
Y.ver = mScale(Y.ver, -colMeans(Y.ver, na.rm = TRUE), type = 1, add
= TRUE, scale = FALSE, center = FALSE)
na.ver = is.na(Y.ver)
Y.idx = all.idx[-forms[[form]], 4]
Y = window(Y, st, en)
Y = mScale(Y, -colMeans(Y, na.rm = TRUE), type = 1, add = TRUE,
scale = FALSE, center = FALSE)
### Get original data for unused stations
form.ver = forms[[form]][!is.element(forms[[form]], form.grid)]
Z = Y.all[, form.ver]
Z = window(Z, st, en)
Z = mScale(Z, -colMeans(Z, na.rm = TRUE), type = 1, add = TRUE,
scale = FALSE, center = FALSE)
Z = matrix(Z, ncol = ncol(Z))
colnames(Z) = all.idx[form.ver, 1]
na.ver = cbind(is.na(Z), na.ver)
Z.idx = all.idx[form.ver, 4]
Z = cbind(Z, Y.ver)
### Get correlation AVHRR PCs & weights
pc.data = getPcs(dat, st = start(dat), en = en, form = form, covr =
FALSE, DoFL = DoFL)
U = ts(pc.data[[4]], st = start(dat), freq = 12)
d = pc.data[[3]]
D = pc.data[[5]]
DY = D[Y.idx]
DZ = D[Z.idx]
D = c(DZ, DY)
V = pc.data[[2]]
VY = V[Y.idx, ]
VZ = V[Z.idx, ]
V = rbind(VZ, VY)
Y.wts = pc.data[[9]]
rownames(Y.wts) = colnames(Y)
rank = min(pc.max, length(d))
### Set up storage variables
sat.stats = matrix(nrow = n.max + 2, ncol = rank)
colnames(sat.stats) = paste("Recon PCs", c(1:rank))
rownames(sat.stats) = c(paste("RMS Error, k =", c(1:n.max)), "Min
Error at k =", "Recon RMS Error:")
attr(sat.stats, "Method") = type
infilled.Xhat = infilled.X = list()
for(i in 1:n.max) {
### Storage for full recon PCs - unspliced
infilled.Xhat[[i]] = matrix(nrow = nrow(Y), ncol = rank)
colnames(infilled.Xhat[[i]]) = paste("AVHRR PC #", c(1:rank),
sep = "")
attr(infilled.Xhat[[i]], "Method") = type
attr(infilled.Xhat[[i]], "Truncation Parameter") = i
### Storage for full recon PCs - spliced
infilled.X[[i]] = matrix(nrow = nrow(Y), ncol = rank)
colnames(infilled.X[[i]]) = paste("AVHRR PC #", c(1:rank),
sep = "")
attr(infilled.X[[i]], "Method") = type
attr(infilled.X[[i]], "Truncation Parameter") = i
}
optimal.pcs.Xhat = matrix(0, nrow = nrow(Y), ncol = rank)
optimal.pcs.X = matrix(0, nrow = nrow(Y), ncol = rank)
recon = matrix(0, nrow = nrow(Y), ncol = ncol(Z))
### Iterate through each PC, up to pc.max
for (j in 1:rank) {
### Set up
X = cbind(Y, U[, j])
wts = as.vector(c(alpha * Y.wts[, j], 1))
### Infill
message(paste("Infilling AVHRR PC #", j))
infill.Y = suppressMessages(emFn(X, n.eof = n.max, maxiter =
maxiter, tol = tol, type = type, wts = wts, progressive = c(T, 1, T, 1),
plt = c(T, ncol(X)), DoFL = DoFL))
### Save PCs
for (k in 1:n.max) {
infilled.Xhat[[k]][, j] = infill.Y[[k]]$Xhat[, ncol(X)]
* d[j]
infilled.X[[k]][, j] = infill.Y[[k]]$X[, ncol(X)] * d[j]
}
}
message("\nOptimizing . . . ")
### Get verification stats
for (j in 1:rank) {
for (k in 1:n.max) {
recon.eof = (matrix(infilled.Xhat[[k]][, j], ncol = 1)
%*% t(V[, j])) %*% diag(D)
recon.eof[na.ver] = NA
recon.eof = mScale(recon.eof, -colMeans(recon.eof, na.rm
= TRUE), type = 1, add = TRUE, center = FALSE, scale = FALSE)
sat.stats[k, j] = sqrt(sum((Z - recon.eof) ^ 2, na.rm =
TRUE) / sum(na.ver))
}
### Record truncation value for minimum error
sat.stats[n.max + 1, j] = c(1:n.max)[min(sat.stats[1:n.max,
j]) == sat.stats[1:n.max, j]]
### Save the PC
optimal.pcs.Xhat[, j] = infilled.Xhat[[sat.stats[n.max + 1,
j]]][, j]
optimal.pcs.X[, j] = infilled.X[[sat.stats[n.max + 1, j]]][,
j]
### Calculate recon verification stats
recon = recon + (matrix(optimal.pcs.Xhat[, j], ncol = 1) %*%
t(V[, j])) %*% diag(D)
recon[na.ver] = NA
recon = mScale(recon, -colMeans(recon, na.rm = TRUE), type =
1, add = TRUE, center = FALSE, scale = FALSE)
sat.stats[n.max + 2, j] = sqrt(sum((Z - recon) ^ 2, na.rm =
TRUE) / sum(na.ver))
}
### Find minimum recon rms
min.rms = min(sat.stats[n.max + 2, ])
rank = c(1:pc.max)[sat.stats[n.max + 2, ] == min(sat.stats[n.max +
2, ])]
message(paste(". . . complete.\nMinimum RMS error of",
round(min.rms, 5), "at", rank, "retained satellite PCs."))
### Return
eigen = list()
eigen$stats = sat.stats
eigen$Xhat = ts(optimal.pcs.Xhat[, 1:rank], start = st, frequency =
12)
eigen$X = ts(optimal.pcs.X[, 1:rank], start = st, frequency = 12)
eigen$D = pc.data[[5]]
eigen$V = pc.data[[2]][, 1:rank]
eigen
}
getRls = function (Y, X = all.avhrr, form = form.grid.96, v.max = NA,
h.min = 0.05, h.max = 10, tol = 0.001, DoFL = 12,
st.y = 1957, en.y = c(2006,12), st.x = 1982, en.x =
c(2006,12), external = NA) {
message(paste("RLS using ", v.max, " satellite spatial
eigenvectors.\n==============================================", sep =
""))
### Externally supplied data?
if (identical(external, NA)) {
### Get predictor station grid cells
Y.ver = window(Y.all[, -form], st.y, en.y)
Y.ver = mScale(Y.ver, -colMeans(Y.ver, na.rm = TRUE), type =
1, add = TRUE, scale = FALSE, center = FALSE)
na.ver = is.na(Y.ver)
Y.idx = all.idx[-form, 4]
Y = window(Y, st.y, en.y)
Y = mScale(Y, -colMeans(Y, na.rm = TRUE), type = 1, add =
TRUE, scale = FALSE, center = FALSE)
tsp(Y) = NULL
tsp(Y.ver) = NULL
### Get verification stations for optimization
form.ver = form[!is.element(form, form.grid)]
Z = Y.all[, form.ver]
Z = window(Z, st.y, en.y)
Z = mScale(Z, -colMeans(Z, na.rm = TRUE), type = 1, add =
TRUE, scale = FALSE, center = FALSE)
tsp(Z) = NULL
na.ver = cbind(is.na(Z), na.ver)
colnames(Z) = all.idx[form.ver, 1]
Z.idx = all.idx[form.ver, 4]
### Get spatial structure
X = makeAnom(window(X, st.x, en.x))
X.svd = svd(X)
### Find rank
if (is.na(v.max)) {v.max = length(X.svd$d)}
rank.eff = max(1, min(v.max, sum(X.svd$d >
sqrt(.Machine$double.eps) * 10)))
### Extract components
V = matrix(X.svd$v[, 1:rank.eff], ncol = rank.eff)
d = X.svd$d[1:rank.eff]
U = matrix(X.svd$u[, 1:rank.eff], ncol = rank.eff)
n.eff = nrow(X) - DoFL
### Define subcomponents
VY = matrix(V[Y.idx, ], ncol = rank.eff)
LY = mScale(VY, d / sqrt(n.eff), type = 1, scale = TRUE, add
= FALSE, center = FALSE)
VZt = t(matrix(V[Z.idx, ], ncol = rank.eff))
### Prepare spatial EOFs
LYt = LY2 = list()
LYt[[1]] = t(LY)
LY2[[1]] = LYt[[1]] %*% LY
### Prepare censored EOFs for internal cross-validation
for (i in 1:ncol(Y)) {
LYt[[i + 1]] = LYt[[1]][, -i]
LY2[[i + 1]] = LYt[[i + 1]] %*% t(LYt[[i + 1]])
}
} else {
### Load the external data
rank.eff = external$r
Y = external$Y
Y.ver = external$Y.ver
Y.idx = external$Y.idx
Z = external$Z
na.ver = external$na.ver
Z.idx = external$Z.idx
V = matrix(external$V[, 1:rank.eff], ncol = rank.eff)
d = external$d[1:rank.eff]
U = matrix(external$U[, 1:rank.eff], ncol = rank.eff)
n.eff = external$n.eff
VY = matrix(external$VY[, 1:rank.eff], ncol = rank.eff)
LY = matrix(external$LY[, 1:rank.eff], ncol = rank.eff)
VZt = matrix(external$VZt[1:rank.eff, ], nrow = rank.eff)
LY2 = LYt = list()
LY2[[1]] = matrix(external$LY2[[1]][1:rank.eff, 1:rank.eff],
ncol = rank.eff)
LYt[[1]] = matrix(external$LYt[[1]][1:rank.eff, ], nrow =
rank.eff)
for (i in 1:ncol(Y)) {
LYt[[i + 1]] = matrix(external$LYt[[1]][1:rank.eff, -i],
nrow = rank.eff)
LY2[[i + 1]] = LYt[[i + 1]] %*% t(LYt[[i + 1]])
}
rm(external)
}
### Define optimization function
rlsOpt = function(h, LY2, LYt, Y, Y.ver, Z, n.eff, r, d, VY, VZt,
na.ver) {
### Estimate PCs based on all predictor stations
inv = solve(LY2[[1]] + diag(rep(h ^ 2, r), ncol = r)) %*%
LYt[[1]]
sol.pcs = mScale(Y %*% t(inv), d / sqrt(n.eff), type = 1,
scale = TRUE, add = FALSE, center = FALSE)
### Find estimates for withheld stations (Z)
sol = matrix(nrow = nrow(Z), ncol = ncol(Z) + ncol(Y))
sol[, 1:ncol(Z)] = sol.pcs %*% VZt
###
Perform leave-one-out verification for predictor
stations
for (i in 1:ncol(Y)) {
VZt = matrix(VY[i, ], ncol = 1)
inv = solve(LY2[[i + 1]] + diag(rep(h ^ 2, r), ncol =
r)) %*% LYt[[i + 1]]
sol.pcs = mScale(Y[, -i] %*% t(inv), d / sqrt(n.eff),
type = 1, scale = TRUE, add = FALSE, center = FALSE)
sol[, ncol(Z) + i] = sol.pcs %*% VZt
}
### Get error
sol[na.ver] = NA
sol = mScale(sol, -colMeans(sol, na.rm = TRUE), type = 1, add
= TRUE, scale = FALSE, center = FALSE)
err = sqrt(sum((sol - cbind(Z, Y.ver)) ^ 2, na.rm = TRUE) /
sum(!na.ver))
err
}
### Define cross-validation extraction function
rlsCv = function(h, LY2, LYt, Y, Y.ver, Z, n.eff, r, d, VY, VZt,
na.ver) {
### Estimate PCs based on all predictor stations
inv = solve(LY2[[1]] + diag(rep(h ^ 2, r), ncol = r)) %*%
LYt[[1]]
sol.pcs = mScale(Y %*% t(inv), d / sqrt(n.eff), type = 1,
scale = TRUE, add = FALSE, center = FALSE)
### Find estimates for withheld stations (Z)
sol = matrix(nrow = nrow(Z), ncol = ncol(Z) + ncol(Y))
sol[, 1:ncol(Z)] = sol.pcs %*% VZt
###
Perform leave-one-out verification for predictor
stations
for (i in 1:ncol(Y)) {
VZt = matrix(VY[i, ], ncol = 1)
inv = solve(LY2[[i + 1]] + diag(rep(h ^ 2, r), ncol =
r)) %*% LYt[[i + 1]]
sol.pcs = mScale(Y[, -i] %*% t(inv), d / sqrt(n.eff),
type = 1, scale = TRUE, add = FALSE, center = FALSE)
sol[, ncol(Z) + i] = sol.pcs %*% VZt
}
### Get error
sol[na.ver] = NA
sol = mScale(sol, -colMeans(sol, na.rm = TRUE), type = 1, add
= TRUE, scale = FALSE, center = FALSE)
sol
}
### Find optimal regularization parameter
opt = optimize(rlsOpt, tol = tol, interval = c(h.min, h.max), LY2,
LYt, Y, Y.ver, Z, n.eff, rank.eff, d, VY, VZt, na.ver)
h = opt$minimum
err = opt$objective
message(paste("Regularization parameter: ", round(h, 5), sep = ""))
message(paste("Minimum cross-validation RMS error: ", round(err,
5), "\n", sep = ""))
### Perform RLS regression
inv = solve(LY2[[1]] + diag(rep(h ^ 2, rank.eff), ncol = rank.eff))
%*% LYt[[1]]
sol.pcs = ts(mScale(Y %*% t(inv), d / sqrt(n.eff), type = 1, scale
= TRUE, add = FALSE, center = FALSE), st.y, freq = 12)
### Get cross-validation estimates
sol = rlsCv(h, LY2, LYt, Y, Y.ver, Z, n.eff, rank.eff, d, VY, VZt,
na.ver)
### Store
est.pcs = list()
est.pcs$X = sol.pcs
est.pcs$Xhat = sol.pcs
est.pcs$V = V
est.pcs$D = rep(1, ncol(X))
est.pcs$h = h
est.pcs$err = err
est.pcs$sol = sol
est.pcs$ver = cbind(Z, Y.ver)
### Return
est.pcs
}
getRecon = function (est.pcs, verbose = FALSE, compare.S09 = FALSE) {
### Get data
Xhat = est.pcs$Xhat
X = est.pcs$X
V = est.pcs$V
D = est.pcs$D
st = start(Xhat)
fr = frequency(Xhat)
recon = list()
splice = TRUE
if (identical(Xhat, X)) {splice = FALSE}
if (compare.S09 == TRUE) {verbose = TRUE}
### Get recons
recon[[1]] = ts(mScale(Xhat %*% t(V), D, type = 1, scale = TRUE,
add = FALSE, center = FALSE), start = st, freq = fr)
if (splice == TRUE) {
recon[[2]] = ts(mScale(X %*% t(V), D, type = 1, scale = TRUE,
add = FALSE, center = FALSE), start = st, freq = fr)
} else {
recon[[2]] = recon[[1]]
}
message("
Elements in Returned List")
message("==========================================================
=======")
message("[[1]]: Unspliced reconstruction
[[2]]: Spliced
reconstruction")
message("==========================================================
=======")
### Get regions
recon[[3]] = recon[[4]] = list()
recon[[3]][[1]] = ts(rowMeans(recon[[1]][,
freq = fr)
recon[[3]][[2]] = ts(rowMeans(recon[[1]][,
freq = fr)
recon[[3]][[3]] = ts(rowMeans(recon[[1]][,
st, freq = fr)
recon[[3]][[4]] = ts(rowMeans(recon[[1]][,
freq = fr)
recon[[3]][[5]] = ts(rowMeans(recon[[1]][,
freq = fr)
recon[[3]][[6]] = ts(rowMeans(recon[[1]]),
= st,
= st,
start
= st,
= st,
if (splice == TRUE) {
recon[[4]][[1]]
freq = fr)
recon[[4]][[2]]
freq = fr)
recon[[4]][[3]]
= st, freq = fr)
recon[[4]][[4]]
freq = fr)
recon[[4]][[5]]
freq = fr)
recon[[4]][[6]]
mask.east]), start = st,
mask.west]), start = st,
mask.west.a]), start =
mask.ross]), start = st,
mask.pen]), start = st,
start = st, freq = fr)
= ts(rowMeans(recon[[2]][, mask.east]), start
= ts(rowMeans(recon[[2]][, mask.west]), start
= ts(rowMeans(recon[[2]][, mask.west.a]),
= ts(rowMeans(recon[[2]][, mask.ross]), start
= ts(rowMeans(recon[[2]][, mask.pen]), start
= ts(rowMeans(recon[[2]]), start = st, freq =
fr)
} else {
recon[[4]] = recon[[3]]
}
message("
Time Series of Regional Averages")
message("----------------------------------------------------------------")
message("[[3]]: Unspliced reconstruction
[[4]]: Spliced
reconstruction")
message("
[[1]] <----- East Antarctic Avg -----> [[1]]")
message("
[[2]] <----- West Antarctic Avg -----> [[2]]")
message("
[[3]] <----- West (No Ross) Avg -----> [[3]]")
message("
[[4]] <----- Ross Ice Shelf Avg -----> [[4]]")
message("
[[5]] <----Peninsula Avg
-----> [[5]]")
message("
[[6]] <----- Continental Avg -----> [[6]]")
message("==========================================================
=======")
### Get time-period trend maps & significance calculations
recon[[5]] = recon[[6]] = recon[[7]] = recon[[8]] = list()
X = c(1:600) / 120
recon[[5]][[1]] = lmFn(X[1:300], recon[[1]][1:300, ], sig =
verbose, adj.DoF = TRUE)
recon[[5]][[2]] = lmFn(X, recon[[1]], sig = verbose, adj.DoF =
TRUE)
recon[[5]][[3]] = lmFn(X[265:564], recon[[1]][265:564, ], sig =
verbose, adj.DoF = TRUE)
recon[[5]][[4]] = lmFn(X[301:600], recon[[1]][301:600, ], sig =
verbose, adj.DoF = TRUE)
if (splice == TRUE) {
recon[[6]][[1]] = lmFn(X[1:300], recon[[2]][1:300, ], sig =
verbose, adj.DoF = TRUE)
recon[[6]][[2]] = lmFn(X, recon[[2]], sig = verbose, adj.DoF
= TRUE)
recon[[6]][[3]] = lmFn(X[265:564], recon[[2]][265:564, ], sig
= verbose, adj.DoF = TRUE)
recon[[6]][[4]] = lmFn(X[301:600], recon[[2]][301:600, ], sig
= verbose, adj.DoF = TRUE)
} else {
recon[[6]] = recon[[5]]
}
message("
Spatial Trend Maps by Period (Deg C / Decade)")
message("----------------------------------------------------------------")
message("[[5]]: Unspliced reconstruction
[[6]]: Spliced
reconstruction")
message("
[[1]] <----1957 - 1982
-----> [[1]]")
message("
[[2]] <----1957 - 2006
-----> [[2]]")
message("
[[3]] <----1979 - 2003
-----> [[3]]")
message("
[[4]] <----1982 - 2006
-----> [[4]]")
if (verbose == TRUE) {
SE = list()
SE[[1]] = SE[[3]] = list()
SE[[1]][[1]] = recon[[5]][[1]]$SE
recon[[7]][[1]] = recon[[5]][[1]]$p
recon[[7]][[2]] = recon[[5]][[2]]$p
recon[[7]][[3]] = recon[[5]][[3]]$p
recon[[7]][[4]] = recon[[5]][[4]]$p
recon[[5]][[1]] = recon[[5]][[1]]$coef
recon[[5]][[2]] = recon[[5]][[2]]$coef
recon[[5]][[3]] = recon[[5]][[3]]$coef
recon[[5]][[4]] = recon[[5]][[4]]$coef
recon[[8]][[1]] = recon[[6]][[1]]$p
recon[[8]][[2]] = recon[[6]][[2]]$p
recon[[8]][[3]] = recon[[6]][[3]]$p
recon[[8]][[4]] = recon[[6]][[4]]$p
recon[[6]][[1]] = recon[[6]][[1]]$coef
recon[[6]][[2]] = recon[[6]][[2]]$coef
recon[[6]][[3]] = recon[[6]][[3]]$coef
recon[[6]][[4]] = recon[[6]][[4]]$coef
message("----------------------------------------------------------------")
message("
Two-Tailed Significance Tests (p >
|t|)")
message("----------------------------------------------------------------")
message("[[7]]: Unspliced reconstruction
[[8]]: Spliced
reconstruction")
message("
[[1]] <----1957 - 1982
----->
[[1]]")
message("
[[2]] <----1957 - 2006
----->
[[2]]")
message("
[[3]] <----1979 - 2003
----->
[[3]]")
message("
[[4]] <----1982 - 2006
----->
[[4]]")
message("==========================================================
=======")
} else {
recon[[5]][[1]] = recon[[5]][[1]]$coef
recon[[5]][[2]] = recon[[5]][[2]]$coef
recon[[5]][[3]] = recon[[5]][[3]]$coef
recon[[5]][[4]] = recon[[5]][[4]]$coef
recon[[6]][[1]] = recon[[6]][[1]]$coef
recon[[6]][[2]] = recon[[6]][[2]]$coef
recon[[6]][[3]] = recon[[6]][[3]]$coef
recon[[6]][[4]] = recon[[6]][[4]]$coef
message("==========================================================
=======")
}
### Get seasonal trend maps & significance calculations
recon[[9]] = recon[[10]] = recon[[11]] = recon[[12]] = list()
X = c(1:600) / 120
wint1 = makeSeason(recon[[1]], season = "w", south = TRUE)
spri1 = makeSeason(recon[[1]], season = "sp", south = TRUE)
summ1 = makeSeason(recon[[1]], season = "su", south = TRUE)
fall1 = makeSeason(recon[[1]], season = "f", south = TRUE)
recon[[9]][[1]] = lmFn(X, wint1, sig = verbose, adj.DoF = TRUE)
recon[[9]][[2]] = lmFn(X, spri1, sig = verbose, adj.DoF = TRUE)
recon[[9]][[3]] = lmFn(X, summ1, sig = verbose, adj.DoF = TRUE)
recon[[9]][[4]] = lmFn(X, fall1, sig = verbose, adj.DoF = TRUE)
if (splice ==
wint2 =
spri2 =
summ2 =
fall2 =
TRUE) {
makeSeason(recon[[2]],
makeSeason(recon[[2]],
makeSeason(recon[[2]],
makeSeason(recon[[2]],
season
season
season
season
=
=
=
=
"w", south = TRUE)
"sp", south = TRUE)
"su", south = TRUE)
"f", south = TRUE)
recon[[10]][[1]] = lmFn(X, wint2, sig = verbose, adj.DoF =
TRUE)
recon[[10]][[2]] = lmFn(X, spri2, sig = verbose, adj.DoF =
TRUE)
recon[[10]][[3]] = lmFn(X, summ2, sig = verbose, adj.DoF =
TRUE)
recon[[10]][[4]] = lmFn(X, fall2, sig = verbose, adj.DoF =
TRUE)
} else {
recon[[10]] = recon[[9]]
}
message("
Seasonal Trend Maps (Deg C / Decade)")
message("----------------------------------------------------------------")
message("[[9]]: Unspliced reconstruction [[10]]: Spliced
reconstruction")
message("
[[1]] <----Winter
-----> [[1]]")
message("
[[2]] <----Spring
-----> [[2]]")
message("
[[3]] <----Summer
-----> [[3]]")
message("
[[4]] <----Autumn
-----> [[4]]")
if (verbose == TRUE) {
SE[[1]][[2]] = recon[[9]][[1]]$SE
SE[[1]][[3]] = recon[[9]][[2]]$SE
SE[[1]][[4]] = recon[[9]][[3]]$SE
SE[[1]][[5]] = recon[[9]][[4]]$SE
recon[[11]][[1]] = recon[[9]][[1]]$p
recon[[11]][[2]] = recon[[9]][[2]]$p
recon[[11]][[3]] = recon[[9]][[3]]$p
recon[[11]][[4]] = recon[[9]][[4]]$p
recon[[9]][[1]] = recon[[9]][[1]]$coef
recon[[9]][[2]] = recon[[9]][[2]]$coef
recon[[9]][[3]] = recon[[9]][[3]]$coef
recon[[9]][[4]] = recon[[9]][[4]]$coef
recon[[12]][[1]] = recon[[10]][[1]]$p
recon[[12]][[2]] = recon[[10]][[2]]$p
recon[[12]][[3]] = recon[[10]][[3]]$p
recon[[12]][[4]] = recon[[10]][[4]]$p
recon[[10]][[1]] = recon[[10]][[1]]$coef
recon[[10]][[2]] = recon[[10]][[2]]$coef
recon[[10]][[3]] = recon[[10]][[3]]$coef
recon[[10]][[4]] = recon[[10]][[4]]$coef
message("----------------------------------------------------------------")
message("
Two-Tailed Significance Tests (p >
|t|)")
message("----------------------------------------------------------------")
message("[[11]]: Unspliced reconstruction [[12]]: Spliced
reconstruction")
message("
[[1]] <----Winter
----->
[[1]]")
message("
[[2]] <-----
Spring
----->
message("
[[3]] <-----
Summer
----->
message("
[[4]] <-----
Autumn
----->
[[2]]")
[[3]]")
[[4]]")
message("==========================================================
=======")
} else {
recon[[9]][[1]] = recon[[9]][[1]]$coef
recon[[9]][[2]] = recon[[9]][[2]]$coef
recon[[9]][[3]] = recon[[9]][[3]]$coef
recon[[9]][[4]] = recon[[9]][[4]]$coef
recon[[10]][[1]] = recon[[10]][[1]]$coef
recon[[10]][[2]] = recon[[10]][[2]]$coef
recon[[10]][[3]] = recon[[10]][[3]]$coef
recon[[10]][[4]] = recon[[10]][[4]]$coef
message("==========================================================
=======")
}
### S09 comparisons
if (compare.S09 == TRUE) {
recon[[13]] = recon[[14]] = list()
wints = makeSeason(saved.S09.recon, season = "w", south =
TRUE)
spris = makeSeason(saved.S09.recon, season = "sp", south =
TRUE)
summs = makeSeason(saved.S09.recon, season = "su", south =
TRUE)
falls = makeSeason(saved.S09.recon, season = "f", south =
TRUE)
recon[[13]][[1]]
adj.DoF = TRUE)
recon[[13]][[2]]
TRUE)
recon[[13]][[3]]
TRUE)
recon[[13]][[4]]
TRUE)
recon[[13]][[5]]
TRUE)
= lmFn(X, saved.S09.recon, sig = verbose,
= lmFn(X, wints, sig = verbose, adj.DoF =
= lmFn(X, spris, sig = verbose, adj.DoF =
= lmFn(X, summs, sig = verbose, adj.DoF =
= lmFn(X, falls, sig = verbose, adj.DoF =
SE[[3]][[1]] = recon[[13]][[1]]$SE
SE[[3]][[2]] = recon[[13]][[2]]$SE
SE[[3]][[3]] = recon[[13]][[3]]$SE
SE[[3]][[4]] = recon[[13]][[4]]$SE
SE[[3]][[5]] = recon[[13]][[5]]$SE
recon[[14]][[1]] = recon[[13]][[1]]$p
recon[[14]][[2]]
recon[[14]][[3]]
recon[[14]][[4]]
recon[[14]][[5]]
recon[[13]][[1]]
recon[[13]][[2]]
recon[[13]][[3]]
recon[[13]][[4]]
recon[[13]][[5]]
=
=
=
=
=
=
=
=
=
recon[[13]][[2]]$p
recon[[13]][[3]]$p
recon[[13]][[4]]$p
recon[[13]][[5]]$p
recon[[13]][[1]]$coef
recon[[13]][[2]]$coef
recon[[13]][[3]]$coef
recon[[13]][[4]]$coef
recon[[13]][[5]]$coef
message("
S09 Maps (1957 - 2006)")
message("----------------------------------------------------------------")
message("[[13]]: Trend (Deg C / Decade)
[[14]]:
Significance (p > |t|)")
message("
[[1]] <----All Seasons
----->
[[1]]")
message("
[[2]] <----Winter
----->
[[2]]")
message("
[[3]] <----Spring
----->
[[3]]")
message("
[[4]] <----Summer
----->
[[4]]")
message("
[[5]] <----Autumn
----->
[[5]]")
message("==========================================================
=======")
### Generate significance
recon[[15]] = list()
recon[[15]][[1]] = lmFn(X,
= verbose, adj.DoF = TRUE)$p
recon[[15]][[2]] = lmFn(X,
adj.DoF = TRUE)$p
recon[[15]][[3]] = lmFn(X,
adj.DoF = TRUE)$p
recon[[15]][[4]] = lmFn(X,
adj.DoF = TRUE)$p
recon[[15]][[5]] = lmFn(X,
adj.DoF = TRUE)$p
maps (residual trend p-value)
saved.S09.recon - recon[[1]], sig
wints - wint1, sig = verbose,
spris - spri1, sig = verbose,
summs - summ1, sig = verbose,
falls - fall1, sig = verbose,
message("
Maps of p-values (1957 - 2006)")
message("----------------------------------------------------------------")
message("[[15]]: Residual Trend p-value (2-Tailed)")
message("
[[1]] <----All Seasons")
message("
[[2]] <----Winter")
message("
[[3]] <----Spring")
message("
[[4]] <----Summer")
message("
[[5]] <----Autumn")
message("==========================================================
=======")
### Generate tabular summary
recon[[16]] = matrix(nrow = 20, ncol = 8)
comp.season = list()
comp.season[[1]] = comp.season[[2]] = comp.season[[3]] =
matrix(nrow = 600, ncol = 20)
comp.season[[1]][, 1] = recon[[3]][[6]]
comp.season[[1]][, 2] = recon[[3]][[1]]
comp.season[[1]][, 3] = recon[[3]][[2]]
comp.season[[1]][, 4] = recon[[3]][[5]]
comp.season[[1]][, 5] = ts(rowMeans(wint1), start = st, freq
= fr)
comp.season[[1]][, 6] = ts(rowMeans(wint1[, mask.east]),
start = st, freq = fr)
comp.season[[1]][, 7] = ts(rowMeans(wint1[, mask.west]),
start = st, freq = fr)
comp.season[[1]][, 8] = ts(rowMeans(wint1[, mask.pen]), start
= st, freq = fr)
comp.season[[1]][, 9] = ts(rowMeans(spri1), start = st, freq
= fr)
comp.season[[1]][, 10] = ts(rowMeans(spri1[, mask.east]),
start = st, freq = fr)
comp.season[[1]][, 11] = ts(rowMeans(spri1[, mask.west]),
start = st, freq = fr)
comp.season[[1]][, 12] = ts(rowMeans(spri1[, mask.pen]),
start = st, freq = fr)
comp.season[[1]][, 13] = ts(rowMeans(summ1), start = st, freq
= fr)
comp.season[[1]][, 14] = ts(rowMeans(summ1[, mask.east]),
start = st, freq = fr)
comp.season[[1]][, 15] = ts(rowMeans(summ1[, mask.west]),
start = st, freq = fr)
comp.season[[1]][, 16] = ts(rowMeans(summ1[, mask.pen]),
start = st, freq = fr)
comp.season[[1]][, 17] = ts(rowMeans(fall1), start = st, freq
= fr)
comp.season[[1]][, 18] = ts(rowMeans(fall1[, mask.east]),
start = st, freq = fr)
comp.season[[1]][, 19] = ts(rowMeans(fall1[, mask.west]),
start = st, freq = fr)
comp.season[[1]][, 20] = ts(rowMeans(fall1[, mask.pen]),
start = st, freq = fr)
comp.season[[3]][, 1] = ts(rowMeans(saved.S09.recon), start =
st, freq = fr)
comp.season[[3]][, 2] = ts(rowMeans(saved.S09.recon[,
mask.east]), start = st, freq = fr)
comp.season[[3]][, 3] = ts(rowMeans(saved.S09.recon[,
mask.west]), start = st, freq = fr)
comp.season[[3]][, 4] = ts(rowMeans(saved.S09.recon[,
mask.pen]), start = st, freq = fr)
comp.season[[3]][, 5] = ts(rowMeans(wints), start = st, freq
= fr)
comp.season[[3]][,
start = st, freq = fr)
comp.season[[3]][,
start = st, freq = fr)
comp.season[[3]][,
= st, freq = fr)
comp.season[[3]][,
= fr)
comp.season[[3]][,
start = st, freq = fr)
comp.season[[3]][,
start = st, freq = fr)
comp.season[[3]][,
start = st, freq = fr)
comp.season[[3]][,
= fr)
comp.season[[3]][,
start = st, freq = fr)
comp.season[[3]][,
start = st, freq = fr)
comp.season[[3]][,
start = st, freq = fr)
comp.season[[3]][,
= fr)
comp.season[[3]][,
start = st, freq = fr)
comp.season[[3]][,
start = st, freq = fr)
comp.season[[3]][,
start = st, freq = fr)
6] = ts(rowMeans(wints[, mask.east]),
7] = ts(rowMeans(wints[, mask.west]),
8] = ts(rowMeans(wints[, mask.pen]), start
9] = ts(rowMeans(spris), start = st, freq
10] = ts(rowMeans(spris[, mask.east]),
11] = ts(rowMeans(spris[, mask.west]),
12] = ts(rowMeans(spris[, mask.pen]),
13] = ts(rowMeans(summs), start = st, freq
14] = ts(rowMeans(summs[, mask.east]),
15] = ts(rowMeans(summs[, mask.west]),
16] = ts(rowMeans(summs[, mask.pen]),
17] = ts(rowMeans(falls), start = st, freq
18] = ts(rowMeans(falls[, mask.east]),
19] = ts(rowMeans(falls[, mask.west]),
20] = ts(rowMeans(falls[, mask.pen]),
RO10 = lmFn(X, comp.season[[1]], sig = TRUE, ci = 0.95,
adj.DoF = TRUE)
S09 = lmFn(X, comp.season[[3]], sig = TRUE, ci = 0.95,
adj.DoF = TRUE)
resids = lmFn(X, comp.season[[3]] - comp.season[[1]], sig =
TRUE, ci = 0.95, adj.DoF = TRUE)
recon[[16]][, 1] = RO10$coef
recon[[16]][, 2] = RO10$ci
recon[[16]][, 3] = RO10$SE
recon[[16]][, 4] = S09$coef
recon[[16]][, 5] = S09$ci
recon[[16]][, 6] = S09$SE
recon[[16]][, 7] = resids$coef
recon[[16]][, 8] = resids$ci
colnames(recon[[16]]) = c("RO10 Trend", "RO10 95% CI", "RO10
SE", "S09 Trend", "S09 95% CI", "S09 SE", "Residual Trend", "Residual 95%
CI")
r.names = c("Cont", "East", "West", "Pen")
rownames(recon[[16]]) = c(r.names, paste("Winter", r.names),
paste("Spring", r.names), paste("Summer", r.names), paste("Fall",
r.names))
message("
Tabulated Trends")
message("----------------------------------------------------------------")
message("[[16]]: Matrix of tabulated trends, CIs, SEs, and
Z-test results")
message("==========================================================
=======")
}
### Return
recon
}
offsetArea = function(Yo, maxoverlap = 60) {
### Initialize variables
offset = rep(NA, ncol(Yo))
offset[!is.na(Yo[1, ])] = 0
### Cycle through each year
for (i in 1:600) {
###
Create station mask for data with offsets already
computed
mask = is.na(offset)
newmask = ((!is.na(Yo[i, ])) - !mask) > 0
### Generate offsets if new stations have been introduced
if (sum(newmask) > 0) {
### Iterate for each new station
for (j in which(newmask == TRUE)) {
###
Calculate distance
dist = g.circ(surf.coord[j, 1], surf.coord[j, 2],
surf.coord[, 1], surf.coord[, 2], 6371)
###
Sort stations, smallest distance to largest
hh = order(dist[!mask], decreasing = FALSE)
### Remove stations with no data
namask = is.na(Yo[i, hh])
hh = hh[!namask][1]
###
Get the station information for the selected
station
stationindexb = which(dist == dist[!mask][hh])[1]
###
Define raw station as a new object (easier to
track)
newstation = Yo[, j]
### Define offset station as new object
oldstation =
Yo[,stationindexb]+offset[stationindexb]
maskb = (!is.na(newstation) & !is.na(oldstation))
### Define mask for maximum overlap
if (sum(maskb) > maxoverlap) {
### Truncate if greater than maxoverlap
oldvals = oldstation[maskb][1:maxoverlap]
newvals = newstation[maskb][1:maxoverlap]
} else {
###
Use all if less than or equal to
maxoverlap
oldvals = oldstation[maskb]
newvals = newstation[maskb]
}
### Calculate offset value
offset[j] = mean(oldvals) - mean(newvals)
}
}
}
### Return offsets
offset
}
#########################################################################
###################
######################
END RECONSTRUCTION FUNCTIONS
######################
#########################################################################
###################
######################################
### SEGMENT: VARIABLE ASSIGNMENTS ###
######################################
###
### The purpose of this section is to create turnkey code by removing
### the need to download separate indexing files for the different
### station configurations, closest grid cell annotations, geographical
### locations, satellite dates, calibration offset periods, and trend
### calculation periods.
###
###
###
Form variables (for parsing stations)
### S09: removes all stations not used by S09
### no.ocean: removes all southern ocean stations
### man.no.ocean: removes all AWS and southern ocean stations
### aws.no.ocean: removes all manned and southern ocean stations
### grid: removes all stations not on the AVHRR data grid
### man.grid: removes all AWS stations and stations not on the AVHRR
data grid
### aws.grid: removes all manned stations and stations not on the AVHRR
data grid
###
### Suffixes:
###
### .b: Adelaide and Deception removed
###
### .96: Stations with fewer than 8 years (96 months) of data and
Adelaide
###
and Deception removed
###
### .ver: Stations designated for partial data removal for split
calibration/
###
verification experiments. For verification with sets that do
not include
###
AWS stations, stations with less than 16 years are selected.
For
###
verification with sets that include AWS stations, AWS
stations are
###
selected.
###
### Casey_New_Airstrip is removed in all sets. It does not have enough
points for
### anomaly calculations.
### Casey Airstrip is removed in all sets. It does not have enough
points for
### ground station verification testing.
### Dome_F is removed in all sets. It has only 1 data point.
### Terra_Nova_Bay is removed in all sets. It is a duplicate of
Mario_Zuchelli.
### Hi Priestley Gl (Modesta) is removed in all sets. It displays a
large difference
### in measured temperature following maintenance in ~1992.
###
form.all = c(52, 53, 60, 74, 105)
form.S09 = c(17, 26, 45:109)
form.S09.no.ocean = c(7, 9, 12, 16, 18, 19, 21, 22, 24, 28, 36, 41)
form.no.ocean = c(9, 17, 19, 24, 26, 36, 41, 52, 53, 60, 74, 102, 105)
form.no.ocean.b = c(1, 9, 12, 17, 19, 24, 26, 36, 41, 52, 53, 60, 74,
102, 105)
form.no.ocean.96 = c(form.no.ocean.b, 4, 45, 48, 52, 56, 57, 58, 59, 61,
64, 69, 72, 75, 78, 80, 82, 84, 94, 95, 98, 100, 103, 104, 106)
form.man.no.ocean = c(form.S09, 9, 19, 24, 36, 41)
form.man.no.ocean.b = c(form.man.no.ocean, 1, 12)
form.man.no.ocean.96 = c(form.man.no.ocean.b, 4)
form.grid = c(7, 9, 16, 17, 18, 19, 21, 22, 24, 26, 28, 36, 41, 52, 53,
60, 74, 96, 98, 102, 105, 109)
form.grid.a = c(7, 9, 16, 17, 18, 19, 21, 22, 24, 26, 28, 36, 41, 52,
53, 60, 74, 96, 98, 102, 105, 109)
form.grid.b = c(form.grid, 1, 12)
form.grid.96 = c(form.grid.b, 4, 45, 48, 52, 56, 57, 58, 59, 61, 64, 69,
72, 75, 78, 80, 82, 84, 94, 95, 100, 103, 104, 106)
form.man.grid = c(7, 9, 12, 16, 17, 18, 19, 21, 22, 24, 26, 28, 36, 41,
45:109)
form.man.grid.b = c(form.man.grid, 1)
form.man.grid.96 = c(form.man.grid.b, 4)
form.aws.no.ocean = c(1:44, 52, 53, 60, 74, 102, 105)
form.aws.no.ocean.96 = c(form.aws.no.ocean, 45, 48, 52, 56, 57, 58, 59,
61, 64, 69, 72, 75, 78, 80, 82, 84, 94, 95, 98, 100, 103, 104, 106)
form.aws.grid = c(1:44, 52, 53, 60, 74, 96, 102, 105, 109)
form.aws.grid.96 = c(form.aws.grid, 45, 48, 52, 56, 57, 58, 59, 61, 64,
69, 72, 75, 78, 80, 82, 84, 94, 95, 98, 100, 103, 104, 106)
form.noaws.ver = c(1, 4, 6, 8, 12, 16, 17, 21, 22, 26, 27, 38, 44)
form.aws.ver = c(45:104, 106:109)
form.long = c(1:109)[-c(2, 8, 10, 11, 13, 15, 20, 29, 30, 31, 34, 40)]
form.long.ver = c(1:109)[-c(2, 8, 10, 11, 13, 15, 20, 29, 30, 31, 34, 40,
52, 53, 60, 74, 96, 102, 105, 109)]
form.list = list(form.all, form.S09, form.no.ocean, form.no.ocean.b,
form.no.ocean.96, form.man.no.ocean, form.man.no.ocean.b,
form.man.no.ocean.96, form.grid, form.grid.b, form.grid.96,
form.man.grid, form.man.grid.b, form.man.grid.96, form.long)
form.name = c("Full", "S09", "No.Ocean.1A", "No.Ocean.1B", "No.Ocean.1C",
"No.Ocean.2A", "No.Ocean.2B", "No.Ocean.2C", "Grid.1A", "Grid.1B",
"Grid.1C", "Grid.2A", "Grid.2B", "Grid.2C", "Restricted Verification")
form.ver = list(form.aws.ver, form.noaws.ver, form.aws.ver, form.aws.ver,
form.aws.ver, form.noaws.ver, form.noaws.ver, form.noaws.ver,
form.aws.ver, form.aws.ver, form.aws.ver, form.noaws.ver, form.noaws.ver,
form.noaws.ver, form.long.ver, 1:109)
form.ver.name = c("AWS", "No.AWS", "AWS", "AWS", "AWS", "No.AWS",
"No.AWS", "No.AWS", "AWS", "AWS", "AWS", "No.AWS", "No.AWS", "No.AWS",
"Restricted Verification", "All")
### Satellite periods
noaa = list(1:300, 1:150, 151:300, 1:36, 37:83, 84:151, 157:228, 229:282,
283:300)
noaa.date = list(1982, 1982, 1982, 1982, 1985, 1988.917, 1995, 2001,
2005.5)
noaa.lbls = c("1982-2006", "1982-1994", "1995-2006", "NOAA-7", "NOAA-9",
"NOAA-11", "NOAA-14", "NOAA-16", "NOAA-17")
###
Dummy label for geographical plotting
map.lbls = c("Deg C/Decade", "", "")
### Satellite offsets matrix
sat.cal.matrix = matrix(ncol = 5, nrow = 6)
sat.cal.matrix[1, ] = c("t-test", 143, 156, 143, 156)
sat.cal.matrix[2, ] = c("t-test", 84, 142, 84, 142)
sat.cal.matrix[3, ] = c("t-test", 37, 83, 37, 83)
sat.cal.matrix[4, ] = c("t-test", 157, 214, 157, 214)
sat.cal.matrix[5, ] = c("lm", 1, 36, 1, 36)
sat.cal.matrix[6, ] = c("t-test", 229, 282, 229, 282)
### Ground station names, lat/lon, nearest grid cells, distances,
geographic group
temp.idx = matrix(nrow = 109, ncol = 10)
temp.idx[1, ] = c("Adelaide", -67.8, 292.1, 48, 41, 40, 55, 54,
41.68639807, 1)
temp.idx[2, ] = c( "Amundsen_Scott", -90, 0, 1970, 1971, 2040, 1900,
1969, 29.9732532, 4)
temp.idx[3, ] = c( "Arturo_Prat", -62.5, 300.3, 3, 2, 4, 8, 9,
114.1041535, 1)
temp.idx[4, ] = c( "Asuka", -71.5, 24.1, 3243, 3162, 3244, 3163, 3161,
44.99882239, 4)
temp.idx[5, ] = c( "Belgrano_I", -78, 321.2, 926, 925, 879, 977, 976,
70.63341805, 4)
temp.idx[6, ] = c( "Belgrano_II", -77.9, 325.4, 974, 1029, 973, 1028,
975, 27.52472898, 4)
temp.idx[7, ] = c( "Bellingshausen", -62.2, 301.1, 2, 1, 3, 7, 8,
142.4397892, 1)
temp.idx[8, ] = c( "Byrd", -80, 240, 817, 818, 864, 863, 816,
13.24221794, 2)
temp.idx[9, ] = c( "Campbell", -52, 169, 4935, 4934, 4865, 4793, 4864,
1639.851284, 1)
temp.idx[10, ] = c( "Casey", -66.3, 110.5, 5423, 5422, 5393, 5394, 5450,
68.60901916, 4)
temp.idx[11, ] = c( "Davis", -68.6, 78, 5334, 5299, 5260, 5300, 5261,
32.453277, 4)
temp.idx[12, ] = c( "Deception", -63, 299.3, 4, 3, 10, 9, 11,
108.4405257, 1)
temp.idx[13, ] = c( "Dumont_Durville", -66.7, 140, 4576, 4500, 4501,
4499, 4575, 55.12623375, 4)
temp.idx[14, ] = c( "Esperanza", -63.4, 303, 6, 1, 15, 7, 2, 31.74784973,
1)
temp.idx[15, ] = c( "Faraday", -65.4, 295.6, 21, 22, 13, 29, 28,
56.14826582, 1)
temp.idx[16, ] = c( "Ferraz", -62.1, 301.6, 1, 2, 6, 7, 3, 135.1830542,
1)
temp.idx[17, ] = c( "Gough", -40.4, 350.1, 1855, 1450, 1583, 1856, 1451,
3385.574039, 1)
temp.idx[18, ] = c( "Great_Wall", -62.2, 301, 2, 1, 3, 7, 8, 144.3266525,
1)
temp.idx[19, ] = c( "Grytviken", -54.3, 323.5, 6, 1, 15, 7, 2,
1795.237216, 1)
temp.idx[20, ] = c( "Halley", -75.5, 333.6, 1021, 1022, 1080, 1079, 1081,
28.00646963, 4)
temp.idx[21, ] = c( "Jubany", -62.2, 301.4, 1, 2, 3, 7, 6, 135.3601642,
1)
temp.idx[22, ] = c( "King_Sejong", -62.2, 301.3, 2, 1, 3, 7, 6,
140.335687, 1)
temp.idx[23, ] = c( "Leningradskaja", -69.5, 159.4, 3240, 3157, 3156,
3239, 3155, 53.99238049, 4)
temp.idx[24, ] = c( "Macquarie", -54.5, 158.9, 4273, 4196, 3647, 3566,
3967, 1632.703337, 1)
temp.idx[25, ] = c( "Marambio", -64.2, 303.3, 15, 8, 7, 9, 6,
19.63652572, 1)
temp.idx[26, ] = c( "Marion", -46.8, 37.8, 4502, 4426, 4350, 4650, 4577,
2455.134345, 1)
temp.idx[27, ] = c( "Mario_Zucchelli", -74.7, 164.1, 2646, 2647, 2645,
2648, 2732, 30.02328627, 3)
temp.idx[28, ] = c( "Marsh", -62.2, 301.1, 2, 1, 3, 7, 8, 120.8924296, 1)
temp.idx[29, ] = c( "Mawson", -67.6, 62.9, 5196, 5139, 5197, 5140, 5250,
11.74122039, 4)
temp.idx[30, ] = c( "McMurdo", -77.92, 166.67, 2418, 2419, 2416, 2417,
2345, 7.885579706, 3)
temp.idx[31, ] = c( "Mirny", -66.5, 93, 5483, 5462, 5484, 5463, 5436,
33.4106842, 4)
temp.idx[32, ] = c( "Molodeznaja", -67.7, 45.9, 4650, 4651, 4577, 4723,
4795, 14.61550264, 4)
temp.idx[33, ] = c( "Neumayer", -70.7, 351.6, 1451, 1517, 1858, 1787,
1584, 6.016633582, 4)
temp.idx[34, ] = c( "Novolazarevskaya", -70.8, 11.8, 2496, 2573, 2574,
2422, 2658, 18.54203705, 4)
temp.idx[35, ] = c( "O_Higgins", -63.3, 302.1, 1, 7, 6, 2, 15,
15.54485958, 1)
temp.idx[36, ] = c( "Orcadas", -60.7, 315.3, 6, 1, 15, 7, 2, 814.7070993,
1)
temp.idx[37, ] = c( "Rothera", -67.5, 291.9, 41, 40, 47, 48, 51,
8.923597155, 1)
temp.idx[38, ] = c( "Russkaya", -74.8, 223.1, 699, 700, 741, 657, 698,
47.7955837, 2)
temp.idx[39, ] = c( "San_Martin", -68.1, 292.9, 55, 54, 48, 62, 61,
15.22295047, 1)
temp.idx[40, ] = c( "Scott_Base", -77.85, 166.77, 2418, 2419, 2417, 2416,
2345, 7.885579706, 3)
temp.idx[41, ] = c( "Signy", -60.7, 314.4, 6, 1, 15, 7, 2, 766.003401, 1)
temp.idx[42, ] = c( "Syowa", -69, 39.6, 4197, 4198, 4274, 4121, 4122,
40.10138004, 4)
temp.idx[43, ] = c( "Vostok", -78.5, 106.9, 3855, 3934, 3694, 3776, 3775,
51.25190992, 4)
temp.idx[44, ] = c( "Zhongshan", -69.4, 76.4, 5259, 5209, 5154, 5210,
5155, 20.31622125, 4)
temp.idx[45, ] = c( "Bonaparte_Point", -64.8, 295.9, 13, 5, 12, 20, 21,
44.3733877, 1)
temp.idx[46, ] = c( "Butler_Island", -72.2, 299.8, 223, 250, 195, 278,
166, 17.33735847, 1)
temp.idx[47, ] = c( "Byrd", -80, 240.6, 817, 818, 816, 864, 863,
5.856875557, 2)
temp.idx[48, ] = c( "Cape_Denison", -67, 142.7, 4349, 4425, 4348, 4272,
4501, 44.79745815, 4)
temp.idx[49, ] = c( "Cape_King", -73.6, 166.6, 2566, 2649, 2567, 2650,
2490, 19.99906213, 3)
temp.idx[50, ] = c( "Cape_Philips", -73.1, 169.6, 2491, 2492, 2490, 2493,
2569, 37.08498924, 3)
temp.idx[51, ] = c( "Cape_Ross", -76.7, 163, 2642, 2643, 2564, 2641,
2726, 33.29787583, 3)
temp.idx[52, ] = c( "Casey_Airstrip", -66.3, 110.8, 5423, 5422, 5450,
5393, 5394, 42.86898755, 4)
temp.idx[53, ] = c( "Casey_New_Airstrip", -66.3, 110.8, 5423, 5422, 5450,
5393, 5394, 42.86898755, 4)
temp.idx[54, ] = c( "Clean_Air", -89.99, 179.99, 1970, 1971, 2040, 1900,
1969, 29.9732532, 4)
temp.idx[55, ] = c( "D_10", -66.7, 139.8, 4576, 4500, 4499, 4501, 4575,
40.52006846, 4)
temp.idx[56, ] = c( "D_47", -67.4, 138.7, 4575, 4499, 4574, 4648, 4500,
18.62263574, 4)
temp.idx[57, ] = c( "D_57", -68.1, 137.5, 4497, 4573, 4496, 4574, 4572,
29.68688679, 4)
temp.idx[58, ] = c( "D_80", -70, 134.9, 4416, 4417, 4493, 4340, 4339,
29.29911398, 4)
temp.idx[59, ] = c( "Dome_C_II", -75.1, 123.4, 4098, 4174, 4023, 4099,
3943, 50.26200551, 4)
temp.idx[60, ] = c( "Dome_F", -77.3, 39.7, 3341, 3260, 3259, 3342, 3340,
23.22316898, 4)
temp.idx[61, ] = c( "Doug", -82.3, 246.8, 1002, 952, 907, 1112, 860,
27.93340239, 2)
temp.idx[62, ] = c( "Drescher", -72.87, 340.97, 1132, 1133, 1134, 1197,
1196, 32.60891964, 4)
temp.idx[63, ] = c( "Elaine", -83.1, 174.2, 2055, 2125, 2126, 2056, 2054,
38.61377024, 3)
temp.idx[64, ] = c( "Elizabeth", -82.6, 222.9, 1242, 1180, 1305, 1243,
1181, 56.51167125, 2)
temp.idx[65, ] = c( "Enigma_Lake", -74.7, 164, 2646, 2647, 2645, 2732,
2731, 30.13542191, 3)
temp.idx[66, ] = c( "Erin", -84.9, 231.2, 1431, 1366, 1498, 1239, 1303,
50.91496271, 2)
temp.idx[67, ] = c( "Ferrell", -77.9, 170.8, 2276, 2277, 2275, 2274,
2348, 36.26682375, 3)
temp.idx[68, ] = c( "GC41", -71.6, 111.3, 4842, 4912, 4913, 4770, 4984,
24.97980932, 4)
temp.idx[69, ] = c( "GEO3", -68.7, 61.1, 5079, 5014, 5141, 5142, 5015,
15.50248633, 4)
temp.idx[70, ] = c( "GF08", -68.5, 102.1, 5353, 5318, 5386, 5354, 5385,
39.98831567, 4)
temp.idx[71, ] = c( "Gill", -80, 181.4, 1922, 1992, 1923, 1993, 1851,
18.55797104, 3)
temp.idx[72, ] = c( "Harry", -83, 238.6, 1114, 1115, 1175, 1058, 1057,
40.55293055, 2)
temp.idx[73, ] = c( "Henry", -89, 359, 1892, 1893, 1891, 1894, 1890,
466.9032731, 4)
temp.idx[74, ] = c( "Hi_Priestley_Gl", -73.6, 160.7, 2901, 2817, 2902,
2818, 2984, 13.87841726, 3)
temp.idx[75, ] = c( "LGB10", -71.3, 59.2, 4664, 4663, 4734, 4591, 4735,
32.34699498, 4)
temp.idx[76, ] = c( "LGB20", -73.8, 55.7, 4291, 4134, 4212, 4213, 4058,
9.036639558, 4)
temp.idx[77, ] = c( "LGB35", -76, 65, 4068, 4142, 4219, 4143, 3993,
40.11379661, 4)
temp.idx[78, ] = c( "LGB59", -73.5, 76.78, 4676, 4677, 4747, 4603, 4604,
15.53572843, 4)
temp.idx[79, ] = c( "Larsen_Ice_Shelf", -66.9, 299.1, 49, 43, 42, 50, 57,
17.55978504, 1)
temp.idx[80, ] = c( "Law_Dome_Summit", -66.7, 112.7, 5395, 5424, 5425,
5363, 5396, 19.65541292, 4)
temp.idx[81, ] = c( "Lettau", -82.5, 185.6, 1847, 1846, 1848, 1845, 1916,
99.19493816, 3)
temp.idx[82, ] = c( "Limbert", -75.4, 300.1, 458, 497, 420, 538, 498,
43.56113626, 1)
temp.idx[83, ] = c( "Linda", -78.5, 168.4, 2346, 2345, 2347, 2348, 2344,
7.312975998, 3)
temp.idx[84, ] = c( "Lynn", -74.2, 160.4, 2900, 2816, 2815, 2901, 2899,
29.01424379, 3)
temp.idx[85, ] = c( "Manuela", -74.9, 163.7, 2646, 2645, 2647, 2731,
2732, 35.49586369, 3)
temp.idx[86, ] = c( "Marble_Point", -77.4, 163.7, 2564, 2563, 2489, 2488,
2642, 27.32305531, 3)
temp.idx[87, ] = c( "Marilyn", -80, 165.1, 2413, 2343, 2414, 2342, 2484,
25.48352667, 3)
temp.idx[88, ] = c( "Minna_Bluff", -78.6, 166.7, 2417, 2418, 2416, 2344,
2419, 56.36846685, 3)
temp.idx[89, ] = c( "Mount_Siple", -73.2, 232.9, 456, 419, 494, 495, 382,
115.8677037, 2)
temp.idx[90, ] = c( "Nansen_Ice_Sheet", -74.8, 163.3, 2731, 2645, 2646,
2730, 2732, 58.82892652, 3)
temp.idx[91, ] = c( "Nico", -89, 89.7, 2392, 2463, 2321, 2393, 2464,
324.8575404, 4)
temp.idx[92, ] = c( "Pegasus_North", -77.9, 166.5, 2418, 2417, 2419,
2416, 2489, 23.96057281, 3)
temp.idx[93, ] = c( "Pegasus_South", -78, 166.6, 2418, 2417, 2419, 2416,
2345, 16.24984004, 3)
temp.idx[94, ] = c( "Penguin_Point", -67.6, 146.2, 4196, 4272, 4348,
4349, 4424, 7.449501047, 4)
temp.idx[95, ] = c( "Port_Martin", -66.8, 141.4, 4501, 4425, 4424, 4500,
4348, 31.55325773, 4)
temp.idx[96, ] = c( "Possession_Island", -71.9, 171.2, 2494, 2493, 2492,
2491, 2571, 134.1092842, 3)
temp.idx[97, ] = c( "Priestley_Gl", -74.3, 163.2, 2731, 2732, 2647, 2730,
2646, 19.00585189, 3)
temp.idx[98, ] = c( "Racer_Rock", -64.1, 298.4, 4, 10, 9, 5, 11,
49.47196117, 1)
temp.idx[99, ] = c( "Relay_Station", -74, 43.1, 3822, 3743, 3823, 3742,
3821, 7.636661371, 4)
temp.idx[100, ] = c( "Santa_Claus_Island", -65, 294.3, 14, 22, 21, 13,
29, 84.15697374, 1)
temp.idx[101, ] = c( "Schwerdtfeger", -79.9, 170, 2272, 2273, 2202, 2271,
2203, 13.05798907, 3)
temp.idx[102, ] = c( "Scott_Island", -67.4, 180, 3157, 3240, 3322, 3404,
3485, 422.0673688, 4)
temp.idx[103, ] = c( "Siple", -75.9, 276, 324, 358, 293, 395, 359,
50.35567713, 2)
temp.idx[104, ] = c( "Sutton", -67.1, 141.4, 4425, 4501, 4424, 4500,
4349, 22.90986224, 4)
temp.idx[105, ] = c( "Terra_Nova_Bay", -74.7, 164.1, 2646, 2647, 2645,
2648, 2732, 30.02328627, 3)
temp.idx[106, ] = c( "Theresa", -84.6, 244.2, 1236, 1363, 1173, 1113,
1299, 88.8988494, 2)
temp.idx[107, ] = c( "Tourmaline_Plateau", -74.1, 163.4, 2732, 2731,
2648, 2817, 2647, 15.38304621, 3)
temp.idx[108, ] = c( "Uranus_Glacier", -71.4, 291.1, 124, 146, 110, 109,
123, 15.74802782, 1)
temp.idx[109, ] = c( "Whitlock", -76.2, 168.4, 2419, 2348, 2347, 2418,
2346, 173.4223562, 3)
all.idx=matrix(nrow = 109, ncol = 10)
all.idx[, 1] = as.vector(as.character(temp.idx[, 1]))
all.idx = data.frame(all.idx)
all.idx[, 2:10] = as.vector(as.numeric(temp.idx[, 2:10]))
rm(temp.idx)
### Grid cell masks for geographic locations
mask.pen = c(1:114, 117:128, 140:152, 166:181, 195:208, 223:235, 250:260,
278:287, 310:317, 345:350, 383:386, 420:422, 457:459)
mask.west.a = c(115, 116, 129:139, 153:165, 182:194, 209:222, 236:249,
261:277, 288:309, 318:344, 351:382, 387:419, 423:456, 460:495, 497:535,
538:575, 578:615, 620:657, 663:700, 707:743, 752:787, 797:832, 843:878,
891:924, 938:970, 988:1020, 1043:1074, 1101:1131, 1163:1184, 1187:1192,
1227:1246, 1252:1256, 1292:1310, 1357:1374, 1424:1442, 1493:1505,
1560:1567, 1628:1634, 1696:1701, 1765:1770, 1835:1840, 1907:1912,
1978:1982, 2049:2053, 2120:2123, 2191:2194, 2262:2265, 2335:2337,
2407:2410, 2482:2483)
mask.ross = c(1185:1186, 1247:1251, 1311:1319, 1375:1383, 1443:1449,
1506:1516, 1568:1582, 1635:1649, 1702:1717, 1771:1785, 1841:1854,
1913:1925, 1983:1996, 2054:2066, 2124:2136, 2195:2206, 2266:2277,
2338:2348, 2411:2416, 2418:2419, 2484:2486)
mask.west = c(mask.west.a, mask.ross)
mask.east = seq(1:5509)
mask.east[c(mask.pen, mask.west)] = NA
mask.east = as.vector(na.omit(mask.east))
mask.cols = rep(4, 5509)
mask.cols[mask.pen] = 1
mask.cols[mask.west.a] = 2
mask.cols[mask.ross] = 3
mask.cols.noross = mask.cols
mask.cols.noross[mask.ross] = 2
### Set anchor colors
cold.end = rgb(red = 50, green = 0, blue = 70, maxColorValue = 255)
cold.2 = rgb(red = 0, green = 80, blue = 220, maxColorValue = 255)
cold.1 = rgb(red = 0, green = 210, blue = 255, maxColorValue = 255)
mid = rgb(red = 248, green = 248, blue = 248, maxColorValue = 255)
hot.1 = rgb(red = 255, green = 225, blue = 0, maxColorValue = 255)
hot.2 = rgb(red = 255, green = 80, blue = 0, maxColorValue = 255)
hot.end = rgb(red = 60, green = 0, blue = 0, maxColorValue = 255)
anchor.colors = c(cold.end, cold.2, cold.1, mid, hot.1, hot.2, hot.end)
#########################################################################
###############
######################
END VARIABLE ASSIGNMENTS
######################
#########################################################################
###############
###======================================================================
================================###
###
END FUNCTION AND VARIABLE DEFINITIONS
###
###
BEGIN COMPUTATIONAL SECTION
###
###======================================================================
================================###
#####################################
### SEGMENT: LOAD AND PARSE DATA ###
#####################################
###
### If desired, un-comment "downData()" to obtain the S09 and READER
data
### directly from online sources. The data will be saved in the working
### directory and subsequently loaded into R with "loadData()".
###
###
#
###
Download data from online sources
downData()
Load data into R
l.data = loadData()
saved.S09.recon = l.data[[5]]
gnd.S09 = l.data[[6]]
###
Get parsed data
all = parseAll(l.data)
all.avhrr = all[[1]]
all.gnd = all[[2]]
colnames(all.gnd) = all.idx[, 1]
sat.coord = all[[3]]
coords = matrix(nrow = nrow(sat.coord), ncol = 5)
coords[, 1] = sat.coord[, 2]
coords[, 2] = sat.coord[, 1]
coords[, 3] = coords[, 4] = coords[, 5] = 1
allsurf.coord = all.idx[2:3]
### Grid cell masks used by S09
mask.S09.pen = (seq(1:200))[(sat.coord[1:200, 1] < 0 | sat.coord[1:200,
1] > 180) & (sat.coord[1:200, 2] > -72)]
mask.S09.pen = mask.S09.pen[-115]
mask.S09.west = sort(c((seq(1:5509))[(sat.coord[, 1] < 299.5 &
sat.coord[, 1] > 179) & !((sat.coord[, 1] < 0 | sat.coord[, 1] > 180) &
(sat.coord[, 2] > -72))], 116, 223, 278))
noteast = sort(c(mask.S09.pen, mask.S09.west))
mask.S09.east = (seq(1:5509))[-noteast]
#########################################################################
##############
######################
END LOAD AND PARSE DATA
######################
#########################################################################
##############
###################################
### SEGMENT: SATELLITE OFFSETS ###
###################################
###
### This section computes gross satellite offsets based
### on Wilcoxon and t-tests for difference in means.
###
### NOAA-7: Correction for linear drop in temperature.
###
### NOAA-9: Correction for small negative offset in temperature.
###
### NOAA-11: Correction for small negative offset through 1993
###
and major negative offset post-1993.
###
### NOAA-14: Correction for major positive offset.
###
### NOAA-16: Correction for small negative offset.
###
### NOAA-17: No correction.
###
### NOTE: OFFSETS DETERMINED FOR INFORMATIONAL PURPOSES ONLY.
###
SATELLITE DATA IS ***NOT*** CORRECTED BY THESE OFFSETS.
###
###
###
Get only stations on the AVHRR grid and the associated
AVHRR cells
gnd.grid = all.gnd[, -form.grid]
av.grid = parseSat(all.avhrr, all.idx, form = form.grid)
###
Plot results of Wilcoxon/t-tests and determine offsets
rng = 24
### Get the offsets
offsets = getOffsets(av.grid, gnd.grid, rng = rng)
### Plot the offsets
plotEst(offsets[[2]], offsets[[1]], rng)
###
Get trends
sat.trends = list()
X = c(1:300) / 120
sat.trends[[2]] = lmFn(X, makeAnom(all.avhrr))$coef
#########################################################################
############
######################
END SATELLITE OFFSETS
######################
#########################################################################
############
########################################
### SEGMENT: IMPUTATION EXPERIMENTS ###
########################################
###
### This segment performs the cross-validation experiments used to
select the
### optimal number of retained EOFs and to compare infilling methods.
###
### doGndRandom() performs TTLS/TSVD infilling of the pre-set ground
station sets
### from the VARIABLE ASSIGNMENT segment utilizing random withholding.
###
### doGndStn() performs TTLS/TSVD infilling of the pre-set ground
station sets
### from the VARIABLE ASSIGNMENT segment utilizing early/late
withholding.
###
### doSat() requires a fully imputed ground station set for input and
### performs infilling of the AVHRR principle components
###
### doRls() utilizes the internal cross-validation function in the RLS
### algorithm to determine the optimal number of retained satellite EOFs
### for any given infilled ground station matrix.
###
### The output of these functions may be plotted using
"plt.gnd.stats.iter()"
### and "plt.gnd.stats.avg()" for ground station imputations, and
### "plt.sat.stats.iter()" and "plt.sat.stats.avg()" for satellite
### imputations.
###
###
Ground station experiment using random withholding (NOT RUN)
#
###
###
#
gnd.stats.random = doGndRandom()
Ground station experiment using early/late withholding
on designated stations (NOT RUN)
gnd.stats.stn = doGndStn()
### Infill satellite sets using fully infilled ground station set (NOT
RUN)
#
###
#
sat.stats = doSat()
Impute RLS sets using fully infilled ground station set (NOT RUN)
rls.stats = doRls()
#########################################################################
#################
######################
END IMPUTATION EXPERIMENTS
######################
#########################################################################
#################
##########################################################
### SEGMENT: NEAREST-STATION RECONSTRUCTION (NOT RUN) ###
##########################################################
###
###
#
#
#
#
###
Parse for ground stations (NOT RUN)
Yo = makeAnom(window(all.gnd[, -form.grid.96], 1957, c(2006,12)))
surf.coord = allsurf.coord[-form.grid.96, ]
statnumber = length(surf.coord[, 1])
colnames(Yo) = all.idx[-form.grid.96, 1]
Find nearest stations (NOT RUN)
#
stationindex = array(0, dim=c(5509, 600))
#
### Cycle through each time
for (j in 1:600) {
#
### Find stations with data
grndmask = !is.na(Yo[j, ])
#
### Cycle through each grid point
for (i in 1:5509) {
### Get distances
#
dist = gCirc(surf.coord[, 1], surf.coord[, 2],
sat.coord[i, 2], sat.coord[i, 1], 6371)
#
dist)[1]
#
#
}
###
#
#
#
###
### Store nearest station
stationindex[i, j] = which(min(dist[grndmask]) ==
}
Calculate offsets (NOT RUN)
offset = offsetArea(Yo = Yo, maxoverlap = 60)
Yoc = t(t(Yo) + offset)
Yoc = ts(Yoc, start = 1957, freq = 12)
Assemble into reconstruction (NOT RUN)
#
rec.area = array(NA, dim=c(600, 5509))
#
#
#
### Cycle through all values
for (j in 1:600) {
rec.area[j, ]=Yoc[j, stationindex[, j]]
}
#
rec.area = ts(rec.area, start = 1957, freq = 12)
#########################################################################
#################
######################
END NEAREST-STATION RECONSTRUCTION
######################
#########################################################################
#################
################################
### SEGMENT: RECONSTRUCTION ###
################################
###
###========================================================
### Infill ground stations: Set Grid.1C (form.list[[11]])
###========================================================
###
Infill the ground set (optimal settings)
### Get the station set
Y.all = makeAnom(window(all.gnd, 1957, c(2006,12)))
Yo = Y.all[, -form.grid.96]
### Infill (with neat plotting)
main.gnd = emFn(Yo, maxiter = 500, tol = 0.0005, type = "ttls",
n.eof = 7, plt = c(T, 1), DoFL = 12)
### Extract final iteration
Y = main.gnd[[7]]$X
Y = mScale(Y, -colMeans(Y, na.rm = TRUE), type = 1, add = TRUE,
scale = FALSE, center = FALSE)
###====================================
### Set up ground station verification
###====================================
###
Obtain unused stations
ver.list = getVerStns(Y, 2, form.list, form.ver)
form.withheld = ver.list[[3]]
withheld.names = all.idx[-form.withheld, 1]
withheld.locs = all.idx[-form.withheld, 10]
grid.names = all.idx[-form.grid, 1]
grid.locs = all.idx[-form.grid, 10]
###======================================================================
### Infill satellite PCs: Eigenvector-weighted method, optimal settings
###======================================================================
###
Infill satellite PCs
eigen.pcs = getEigen(Y, pc.max = 150, n.max = 8)
###
Get reconstruction
recon.eigen = getRecon(eigen.pcs, verbose = TRUE, compare.S09 =
TRUE)
###=============================================
### Regularized Least Squares estimation of PCs
###=============================================
###
###
Calculate estimates
Optimal v.max for iridge from doRls():
126
rls.pcs = getRls(Y, v.max = 126)
###
Get reconstruction
recon.rls = getRecon(rls.pcs)
###=================
### S09 REPLICATION
###=================
### Replicates S09 procedure with PCs imputed simultaneously with ground
stations, TTLS,
### n.eof = 3, and satellite verification reconstructions
###
Get the AVHRR data
### Temporal information - S09, correlation
S09.data = getPcs(all.avhrr, 1982, c(2006, 12), 2, FALSE, 1)
S09.early.data = getPcs(all.avhrr, 1982, c(1994, 6), 2, FALSE, 1)
S09.late.data = getPcs(all.avhrr, c(1994, 7), c(2006, 12), 2,
FALSE, 1)
###
Infill
### Get the station set
S09.Y = gnd.S09
colnames(S09.Y) = all.gnd[1, -form.S09]
### Get first 3 satellite PCs
S09.X = ts(S09.data[[1]][, 1:3], start = 1982, freq = 12)
S09.X.early = ts(S09.early.data[[1]][, 1:3], start = 1982,
freq = 12)
S09.X.late = ts(S09.late.data[[1]][, 1:3], start = c(1994,
7), freq = 12)
### Merge
S09 = ts.union(S09.Y, S09.X)
S09.early = ts.union(S09.Y, S09.X.early)
S09.late = ts.union(S09.Y, S09.X.late)
### Infill and extract
infilled.S09 = emFn(S09, type = "ttls", DoFL = 1, maxiter =
50, tol = 0.001, n.eof = 3, plt = c(T, 43), progressive = c(F, 1, F, 1))
infilled.S09.early = emFn(S09.early, type = "ttls", DoFL = 1,
maxiter = 50, tol = 0.001, n.eof = 3, plt = c(T, 43), progressive = c(F,
1, F, 1))
infilled.S09.late = emFn(S09.late, type = "ttls", DoFL = 1,
maxiter = 50, tol = 0.001, n.eof = 3, plt = c(T, 43), progressive = c(F,
1, F, 1))
S09.pcs = S09.pcs.early = S09.pcs.late = list()
S09.pcs$V = S09.data[[2]][, 1:3]
S09.pcs.early$V = S09.early.data[[2]][, 1:3]
S09.pcs.late$V = S09.late.data[[2]][, 1:3]
S09.pcs$D = S09.data[[5]]
S09.pcs.early$D = S09.early.data[[5]]
S09.pcs.late$D = S09.early.data[[5]]
rm(S09.data, S09.early.data, S09.late.data)
S09.pcs$X = ts(infilled.S09[[3]]$X[, 43:45], 1957, freq = 12)
S09.pcs$Xhat = ts(infilled.S09[[3]]$Xhat[, 43:45], 1957, freq
= 12)
S09.pcs.early$X = ts(infilled.S09.early[[3]]$X[, 43:45],
1957, freq = 12)
S09.pcs.early$Xhat = ts(infilled.S09.early[[3]]$Xhat[,
43:45], 1957, freq = 12)
S09.pcs.late$X = ts(infilled.S09.late[[3]]$X[, 43:45], 1957,
freq = 12)
S09.pcs.late$Xhat = ts(infilled.S09.late[[3]]$Xhat[, 43:45],
1957, freq = 12)
### Get reconstructions
recon.S09 = getRecon(S09.pcs)
recon.S09.early = getRecon(S09.pcs.early)
recon.S09.late=getRecon(S09.pcs.late)
###
Ground station verification test
#########################################################################
##########
######################
END RECONSTRUCTIONS
######################
#########################################################################
##########
##############################
### SEGMENT: VERIFICATION ###
##############################
###
### This section compares verification statistics to ground stations
### between the eigen and rls reconstructions and the S09
reconstruction.
### It also replicates the S09 satellite verification statistics.
###
###=================
### S09 REPLICATION
###=================
###
Define proper time windows
orig = makeAnom(all.avhrr)
est.early = window(recon.S09.early[[1]], start = 1982)
est.late = window(recon.S09.late[[1]], start = 1982)
early.idx = late.idx = est.early == est.early
window(early.idx, start = c(1994, 7)) = FALSE
window(late.idx, end = c(1994, 6)) = FALSE
###
Get satellite verification statistics
S09.ver.early = getStats(orig, est.early, early.idx)
S09.ver.late = getStats(orig, est.late, late.idx)
###===================================
### GROUND STATION VERIFICATION STATS
###===================================
###
#
###
#
###
Generate the ground station sets (NOT RUN)
cv.gnd = cvGnd(Yo)
Generate E-W reconstructions (NOT RUN)
eigen.ver.estimates = verEigen(cv.gnd, pc.max = 150, n.max = 8)
Get E-W verification stats (NOT RUN)
#
eigen.ver.stats = getStats(eigen.ver.estimates$ver,
eigen.ver.estimates$estimates)
###
#
###
Generate RLS reconstructions (NOT RUN)
rls.ver.estimates = verRls(cv.gnd, v.min = 1, v.max = 150)
Get RLS verification stats (NOT RUN)
#
rls.ver.stats = getStats(rls.ver.estimates$ver,
rls.ver.estimates$estimates)
###
#
###
Get S09 estimates (NOT RUN)
S09.ver.estimates = verS09(S09, S09.pcs)
Get S09 verification stats (NOT RUN)
#
S09.ver.stats = getStats(S09.ver.estimates$ver,
S09.ver.estimates$estimates)
###==========================================================
### GROUND STATION EXPLAINED VARIANCE - FULL RECONSTRUCTIONS
###==========================================================
###
Obtain the original data for all on-grid stations
grid.locs = all.idx[-form.grid, 4]
grid.names = all.idx[-form.grid, 1]
grid.orig = all.gnd[, -form.grid]
grid.orig = makeAnom(window(grid.orig, 1957, c(2006, 12)))
colnames(grid.orig) = grid.names
idx = !is.na(grid.orig)
###
Get the data from the E-W reconstruction
eigen.grid = recon.eigen[[1]][, grid.locs]
colnames(eigen.grid) = grid.names
###
Get the data from the RLS reconstruction
rls.grid = recon.rls[[1]][, grid.locs]
colnames(rls.grid) = grid.names
###
Get the data from the S09 reconstruction
S09.grid = saved.S09.recon[, grid.locs]
colnames(S09.grid) = grid.names
###
Calculate stats
eigen.stats = getStats(grid.orig, eigen.grid, idx)[[1]]
rls.stats = getStats(grid.orig, rls.grid, idx)[[1]]
S09.stats = getStats(grid.orig, S09.grid, idx)[[1]]
###
Calculate stats for just 1957 - 1981
eigen.stats.57.81 = getStats(grid.orig[1:300, ], eigen.grid[1:300,
], idx[1:300, ])[[1]]
rls.stats.57.81 = getStats(grid.orig[1:300, ], rls.grid[1:300, ],
idx[1:300, ])[[1]]
S09.stats.57.81 = getStats(grid.orig[1:300, ], S09.grid[1:300, ],
idx[1:300, ])[[1]]
###
Calculate stats for just 1982 - 2006
eigen.stats.82.06 = getStats(grid.orig[301:600, ],
eigen.grid[301:600, ], idx[301:600, ])[[1]]
rls.stats.82.06 = getStats(grid.orig[301:600, ], rls.grid[301:600,
], idx[301:600, ])[[1]]
S09.stats.82.06 = getStats(grid.orig[301:600, ], S09.grid[301:600,
], idx[301:600, ])[[1]]
###==============================================
### EIGEN AND RLS SATELLITE STATS FOR COMPARISON
###==============================================
###
Compare EIGEN and RLS vs. satellite
eigen.sat.stats = getStats(makeAnom(all.avhrr),
window(recon.eigen[[1]], start = 1982))
rls.sat.stats = getStats(makeAnom(all.avhrr),
window(recon.rls[[1]], start = 1982))
###
Compare S09 unspliced solution vs. satellite
S09.sat.stats = getStats(makeAnom(all.avhrr),
window(recon.S09[[1]], start = 1982))
#########################################################################
#######
######################
END VERIFICATION
######################
#########################################################################
#######