Package `ordinalForest`

Package ‘ordinalForest’
April 13, 2017
Type Package
Title Ordinal Forests: Prediction and Class Width Inference with
Ordinal Target Variables
Version 1.0
Date 2017-04-13
Author Roman Hornung
Maintainer Roman Hornung <[email protected]>
Imports ranger, combinat, ggplot2
Description Ordinal forests (OF) are a method for ordinal regression with high-dimensional
and low-dimensional data that is able to predict the values of the ordinal target variable
for new observations and at the same time estimate the relative widths of the classes of
the ordinal target variable. Using a (permutation-based) variable importance measure it
is moreover possible to rank the importances of the covariates.
OF will be presented in an upcoming technical report by Hornung et al..
The main functions of the package are: ordfor() (construction of OF), predict.ordfor()
(prediction of the target variable values of new observations), and plot.ordfor()
(visualization of the estimated relative widths of the classes of the ordinal target
variable).
License GPL-2
Encoding UTF-8
RoxygenNote 5.0.1
NeedsCompilation no
Repository CRAN
Date/Publication 2017-04-13 21:00:25 UTC
R topics documented:
ordinalForest-package .
hearth . . . . . . . . .
ordfor . . . . . . . . .
perfm . . . . . . . . .
plot.ordfor . . . . . . .
predict.ordfor . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 2
. 3
. 5
. 9
. 11
. 12
2
ordinalForest-package
Index
14
ordinalForest-package Ordinal Forests: Prediction, Class Width Inference and Variable
Ranking with Ordinal Target Variables
Description
Ordinal forests (OF) are a method for ordinal regression with high-dimensional and low-dimensional
data that is able to predict the values of the ordinal target variable for new observations and at
the same time estimate the relative widths of the classes of the ordinal target variable. Using a
(permutation-based) variable importance measure it is moreover possible to rank the importances
of the covariates.
OF will be presented in an upcoming technical report by Hornung et al..
Details
For a detailed description of OF see: ordfor
The main functions are: ordfor (construction of OF), predict.ordfor (prediction of the target variable values of new observations), and plot.ordfor (visualization of the estimated relative
widths of the classes of the ordinal target variable).
Examples
## Not run:
# Illustration of the key functionalities of the package:
##########################################################
# Load example dataset:
data(hearth)
# Inspect the data:
table(hearth$Class)
dim(hearth)
head(hearth)
# Split into training dataset and test dataset:
set.seed(123)
trainind <- sort(sample(1:nrow(hearth), size=floor(nrow(hearth)*(2/3))))
testind <- setdiff(1:nrow(hearth), trainind)
datatrain <- hearth[trainind,]
datatest <- hearth[testind,]
# Construct OF using the training dataset:
hearth
3
ordforres <- ordfor(depvar="Class", data=datatrain, ndiv=1000, ntreeperdiv=100,
ntreefinal=5000, perfmeasure = "equal")
ordforres
# Study variable importance values:
sort(ordforres$varimp, decreasing=TRUE)
# Take a closer look at the top variables:
boxplot(datatrain$oldpeak ~ datatrain$Class, horizontal=TRUE)
fisher.test(table(datatrain$exang, datatrain$Class))
# Plot the optimized partition:
# using ggplot2:
plot(ordforres)
# without ggplot2:
plot(ordforres, useggplot=FALSE)
# Predict values of the ordinal target variable in the test dataset:
preds <- predict(ordforres, newdata=datatest)
preds
# Compare predicted values with true values:
table(data.frame(true_values=datatest$Class, predictions=preds$ypred))
## End(Not run)
hearth
Data on Coronary Artery Disease
Description
This data includes 294 patients undergoing angiography at the Hungarian Institute of Cardiology in
Budapest between 1983 and 1987.
Format
A data frame with 294 observations, ten covariates and one ordinal target variable
Details
The variables are as follows:
• age. numeric. Age in years
• sex. factor. Sex (1 = male; 0 = female)
• chest_pain. factor. Chest pain type (1 = typical angina; 2 = atypical angina; 3 = non-anginal
pain; 4 = asymptomatic)
4
hearth
• trestbps. numeric. Resting blood pressure (in mm Hg on admission to the hospital)
• chol. numeric. Serum cholestoral in mg/dl
• fbs. factor. Fasting blood sugar > 120 mg/dl (1 = true; 0 = false)
• restecg. factor. Resting electrocardiographic results (1 = having ST-T wave abnormality (T
wave inversions and/or ST elevation or depression of > 0.05 mV); 0 = normal)
• thalach. numeric. Maximum heart rate achieved
• exang. factor. Exercise induced angina (1 = yes; 0 = no)
• oldpeak. numeric. ST depression induced by exercise relative to rest
• Class. factor. Ordinal target variable - severity of coronary artery disease (determined using
angiograms) (1 = no disease; 2 = degree 1; 3 = degree 2; 4 = degree 3; 5 = degree 4)
The original openML dataset was pre-processed in the following way:
1. The variables were re-named according to the description given on openML.
2. The missing values which were coded as "-9" were replaced by NA values.
3. The variables slope, ca, and thal were excluded, because these featured too many missing
values.
4. The categorical covariates were transformed into factors.
5. There were 6 restecg values of "2" which were replaced by "1".
6. The missing values were imputed: The missing values of the numerical covariates were replaced
by the means of the corresponding non-missing values. The missing values of the categorical covariates were replaced by the modes of the corresponding non-missing values.
Source
OpenML: data.name: heart-h, data.id: 1565, link: https://www.openml.org/d/1565/
References
Detrano, R., Janosi, A., Steinbrunn, W., Pfisterer, M., Schmid, J.-J., Sandhu, S., Guppy, K. H., Lee,
S., Froelicher, V. (1989) International application of a new probability algorithm for the diagnosis
of coronary artery disease. The American Journal Of Cardiology, 64, 304–310.
Examples
data(hearth)
table(hearth$Class)
dim(hearth)
head(hearth)
ordfor
ordfor
5
Ordinal forests
Description
Constructs ordinal forests (OF) as presented in Hornung et al. (in prep).
The following tasks can be performed using OF: 1) Predicting the values of an ordinal target variable
for new observations based on covariate values (see predict.ordfor); 2) Estimation of the relative
widths of the classes of the ordinal target variable; 3) Ranking the importances of the covariates with
respect to predicting the values of the ordinal target variable.
The default values for the following hyperparameters are appropriate for most applications: ndiv,
ntreeperdiv, ntreefinal, npermtrial, and nbest. Thus, these arguments do in general NOT
have to be specified by hand.
For details on OF see the ’Details’ section below.
Usage
ordfor(depvar, data, ndiv = 1000, ntreeperdiv = 100, ntreefinal = 5000,
perfmeasure = c("equal", "irrespective", "onecateg", "custom"), categimp,
categweights, npermtrial = 500, nbest = 10, importance = "permutation")
Arguments
depvar
character. Name of the dependent variable in data.
data
data.frame. Data frame containing the covariates and a factor-valued ordinal
target variable.
ndiv
integer. Number of random partitions of [0,1] to consider in the optimization.
ntreeperdiv
integer. Number of trees in the smaller regression forests constructed for the
ndiv different partitions.
ntreefinal
integer. Number of trees in the larger regression forest constructed using the
optimized partition (i.e., the OF).
perfmeasure
character. Performance measure. The default is "equal". See ’Details’ subsection ’Performance measures’ below and perfm.
categimp
character. Class to priorize if perfmeasure="onecateg".
categweights
numeric. Needed if perfmeasure="custom": vector of length equal to the number of classes. Class weights - classes with higher weights are to be predicted
more precisely then those with smaller weights.
npermtrial
integer. Number of permutations to try for each partition considered during the
optimization.
nbest
integer. Number of forests to consider when averaging the parititions corresponding to the nbest forests with the highest levels of prediction performance.
importance
character. Variable importance mode, one of ’none’, ’impurity’, ’permutation’.
The ’impurity’ measure is the variance of the responses for regression. Passed
to the function ranger from the package of the same name.
6
ordfor
Details
Introduction: Ordinal forests (OF) are a method for ordinal regression with high-dimensional
and low-dimensional data that is able to predict the values of the ordinal target variable for new
observations and at the same time estimate the relative widths of the classes of the ordinal target
variable. Using a (permutation-based) variable importance measure it is moreover possible to
rank the importances of the covariates.
OF will be presented in an upcoming technical report by Hornung et al..
Methods: The concept of OF is based on the following assumption: There exists a (possibly latent) refined continuous variable y* underlying the observed ordinal target variable y (y in {1,...,J},
J number of classes), where y* determines the values of y. The functional relationship between
y* and y takes the form of a monotonically increasing step function. Depending on which of J
intervals ]a_1, a_2], ]a_2, a_3], ..., ]a_J, a_{J+1}[ contains the value of y*, the ordinal target
variable y takes a different value.
In the context of conditional inference trees, for situations in which the intervals ]a_1, a_2], ...,
]a_J, a_{J+1}[ are available, Hothorn et al. (2006) suggest to use as a metric target variable
the midpoints of these intervals. OF are however concerned with settings in which the intervals
]a_1, a_2], ..., ]a_J, a_{J+1}[ are unknown. The main idea of OF is to estimate the unknown
interval borders by maximizing the out-of-bag prediction performance (see section "Performance
measures") of standard regression forests that use as a target variable a quantity very similar to the
midpoints of the intervals. More precisely, we assume y* to have a standard normal distribution
and consider the following score values as the values of the target variable: phi^(-1)((phi(a_j) +
phi(a_{j+1}))/2) if y = j with a_1 = -infinite and a_{J+1} = +infinite, where "phi" denotes the
cumulative distribution function of the standard normal distribution. The assumption of normality
is simply made, because normally distributed target variables can be expected to be well suited
for standard regression forests.
The estimation of the optimal partition is based on constructing many regression forests (b in
1,...,ndiv) of smaller size, where each of these uses as the values of the target variable a score set
obtained using random sampling: {phi^(-1)((phi(a_{b,j}) + phi(a_{b,j+1}))/2) : j in 1,...,J}, where
phi(a_{b,1})=0 < phi(a_{b,2}) <... < phi(a_{b,J}) < phi(a_{b,J+1})=1 is a random partition of
[0,1].
After estimating the intervals, a larger regression forest is built using as values of the target variable the score values phi^(-1)((phi(a_{opt,j}) + phi(a_{opt,j+1}))/2), where the a_{opt,j} values
are the optimized borders.
An important by-product of the construction of OFs are the estimated interval borders
a_{opt,1},...,a_{opt,J+1}. The optimized intervals ]phi(a_{opt,1}), phi(a_{opt,2})], ...,
]phi(a_{opt,J}), phi(a_{opt,J+1})[ can be interpreted vividly as the estimated relative widths of
the classes 1,...,J.
Further details are given in the sections "Hyperparameters" and "Performance measures".
Hyperparameters: There are several hyperparameters, which do, however, not have to be optimized by the user in general. Expect in the case of nbest the larger these values are, the better,
but the default values should be large enough in most applications.
These hyperparameters are described in the following:
• ndiv
Default value: 1000. The default value of the number of considered partitions in
the estimation of the optimal partition is quite large. Such a large number of considered
partitions is necessary to attain a high chance that some of the partitions are close enough to
ordfor
7
•
•
•
•
the optimal partition, that is, the partition that leads to the optimal performance with respect
to the considered performance measure (provided with perfmeasure).
ntreeperdiv Default value: 100. A smaller number of trees considered per tried partition
might lead to a too strong variability in the assessments of the performances achieved for the
individual partitions.
ntreefinal Default value: 5000. The number of trees ntreefinal plays the same role
as in conventional regression forests. For very high-dimensional datasets the default number
5000 might be too small.
npermtrial Default value: 500. As stated above it is necessary to consider a large number of tried partitions ndiv in the optimization in order increase the chance that the best of
the considered partitions are close to the optimal partition. However, for a larger number of
classes it is in addition necessary to use an efficient algorithm for the sampling of the considered partitions. This algorithm should ensure that the partitions are heterogeneous enough
to encertain that the best of the tried partitions are close enough to the optimal partition. In
OF we consider an algorithm that has the goal of making the rankings of the widths of the
intervals corresponding to the respective classes as dissimilar as possible between successive
partitions.
Denote d_{b,j} := phi(a_{b,j}). Then the b-th partition is generated as follows:
If b = 1, then J-1 instances of a U(0,1) distributed random variable are drawn and the values
are sorted. The sorted values are designated as d_{1,2},...,d_{1,J}. Moreover, set d_{1,1} :=
0 and d_{1,J+1} := 1. The first considered partition is now { d_{1,1}, ... ,d_{1,J+1} }.
If b > 1, the following procedure is performed:
1.
Let dist_{b-1,j} = d_{b-1,j+1} - d_{b-1,j} (j in {1,...J}) denote the width of the
j-th interval of the b-1-th partition and r_{b-1,j} the rank of dist_{b-1,j} among dist_{b1,1},...,dist_{b-1,J}. First, npermtrial random permutations of 1,...,J are drawn, where
r*_{l,1},...,r*_{l,J} denotes the l-th drawn partition. Subsequently, that permutation
r*_{l*,1},...,r*_{l*,J} (l* in {1,...,npermtrial}) is determined that features the greatest
quadratic distance to r_{b-1,1},...r_{b-1,J}, that is, r*_{l*,1},...,r*_{l*,J} =
argmax_{r*_{l,1},...,r*_{l,J}} sum_{j=1}^J (r*_{l,j} - r_{b-1,j})^2.
2. A partition is generated in the same way as in the case of b = 1.
3. The intervals of the partition generated in 2. are re-ordered in such a way that the width
of the j-th interval (for j in {1,...,J}) has rank r*_{l*,j} among the widths of the intervals
]d_{b,1}, d_{b,2}], ..., ]d_{b,J}, d_{b,J+1}]. The b-th considered partition is now {
d_{b,1}, ... ,d_{b,J+1} }.
The default value 500 for npermtrial should be large enough to ensure sufficiently heterogeneous partitions - independent from the number of classes J.
nbest Default value: 10. After having constructed ndiv regression forests, each of these
using a particular partition, the nbest partitions associated with the highest levels of prediction performance are considered: For each j in {1,...,J} the averages of the phi(a_j) values
across these nbest partitions are taken. Then phi^(-1)(.) is applied to the obtained averages resulting in the estimated optimal borders: a_opt,1,...,a_opt,J+1. The latter are then
used to calculate the scores for the OF using the following formula phi^(-1)((phi(a_{opt,j})
+ phi(a_{opt,j+1}))/2) (see also above).
It is probably important that the number nbest of best partitions considered is not strongly
misspecified. A too large value of nbest could lead to averaging over a too heterogeneous
set of partitions. Conversely, a too small value of nbest could lead to a too large variance
in the estimation of the optimal partition. For larger numbers of tried partitions ndiv the
results are less sensitive to the choice of nbest, because the number of tried partitions that
8
ordfor
are close to the optimal partition becomes larger the more partitions are considered. When
setting the number of considered partitions ndiv to 1000, the value 10 for nbest specified as
a default value in ordfor was seen to lead to a good trade-off between the heterogeneity of
the averaged partitions and the variance in the estimation.
Performance measures: As noted above, the different partitions tried during the estimation of
the optimal partition are assessed with respect to their (out-of-bag) prediction performance. The
choice of the specific performance measure used here determines the specific kind of performance
the ordinal forest should feature:
• perfmeasure="equal"
This choice should be made if it is of interest to predict each
class with the same precision irrespective of the numbers of observations representing the
individual classes. Youden’s J statistic is calculated with respect to each class ("observation/prediction in class j" vs. "observation/prediction NOT in class j" (j=1,...,J)) and the
simple average of the J results taken.
• perfmeasure="irrespective" This choice should be made if it is of interest to predict
all observations with the same precision independent of their categories. Youden’s J statistic
is calculated with respect to each class and subsequently a weighted average of these values
is taken - with weights proportional to the number of observations representing the respective
classes in the training data.
• perfmeasure="onecateg"
This choice should be made if it is merely relevant that a
particular class of the classes of the ordinal target variable is predicted as precisely as possible. This particular class must be provided via the argument categ. Youden’s J statistic is
calculated with respect to class categ ("observation/prediction in class categ" vs. "observation/prediction NOT in class categ").
• perfmeasure="custom"
This choice should be made if there is a particular ranking of
the classes with respect to their importance. Youden’s J statistic is calculated with respect
to each class. Subsequently, a weighted average with user-specified weights (provided via
the argument categweights) is taken. Classes with higher weights are to be predicted more
precisely than those with smaller weights.
Value
ordfor returns an object of class ordfor. An object of class "ordfor" is a list containing the
following components:
forestfinal
object of class "ranger". Regression forest constructed using the optimized
partition of [0,1] (i.e., the OF). Used by predict.ordfor.
bordersbest
vector of length J+1. Average over the nbest best partitions of [0,1]. Used by
predict.ordfor.
forests
list of length ndiv. The smaller forests constructed during optimizing the partition of [0,1].
perfmeasurevalues
vector of length ndiv. Performance measure values of the ndiv smaller forests
stored in forests.
bordersb
matrix of dimension ndiv x (J+1). All ndiv partitions considered.
classes
character vector of length J. Classes of the target variable.
ndiv
integer. Number of random partitions considered.
perfm
ntreeperdiv
ntreefinal
perfmeasure
categimp
nbest
classfreq
varimp
9
integer. Number of trees per partition.
integer. Number of trees of OF (constructed after optimizing the partition).
character. Performance measure.
character. If perfmeasure="onecateg": class to priorize, NA else.
integer. Number of forests considered in the averaging of the parititions corresponding to the nbest forests with the highest levels of prediction performance.
table. Clas frequencies.
vector of length p. Variable importance for each covariate.
References
Hothorn, T., Hornik, K., Zeileis, A. (2006) Unbiased recursive partitioning: a conditional inference
framework. Journal of Computational and Graphical Statistics, 15, 651–674.
Examples
data(hearth)
set.seed(123)
hearthsubset <- hearth[sort(sample(1:nrow(hearth), size=floor(nrow(hearth)*(1/2)))),]
ordforres <- ordfor(depvar="Class", data=hearthsubset, ndiv=80, nbest=5, ntreeperdiv=100,
ntreefinal=1000, perfmeasure = "equal")
# NOTE: ndiv=80 is not enough!! In practice, ndiv=1000 (default value) or a higher
# number should be used.
ordforres
sort(ordforres$varimp, decreasing=TRUE)
perfm
Performance measures based on Youden’s J statistic
Description
These performance measures based on Youden’s J statistic are used in ordfor to measure the performance of the smaller regression forests constructed for the estimation of the optimal partition.
They may however also be used to measure the precision of predictions on new data or the precision
of OOB predictions.
Usage
perfm_equal(ytest, ytestpred, categ, categweights)
perfm_irrespective(ytest, ytestpred, categ, categweights)
perfm_onecateg(ytest, ytestpred, categ, categweights)
perfm_custom(ytest, ytestpred, categ, categweights)
10
perfm
Arguments
ytest
factor. True values of the target variable.
ytestpred
factor. Predicted values of the target variable.
categ
character. Needed in the case of perfm_onecateg: Category to predict as precisely as possible.
categweights
numeric. Needed in the case of perfm_custom: Vector of length equal to the
number of classes. Class weights - classes with higher weights are to be predicted more precisely than those with smaller weights.
Details
perfm_equal should be used if it is of interest to predict each class with the same precision irrespective of the numbers of observations representing the individual classes. Youden’s J statistic is
calculated with respect to each class ("class j yes" vs. "class j no" (j=1,...,J)) and the simple average
of the J results taken.
perfm_irrespective should be used if it is of interest to predict all observations with the same
precision independent of their categories. Youden’s J statistic is calculated with respect to each
class and subsequently a weighted average of these values is taken - with weights proportional to
the number of observations representing the respective classes in the training data.
perfm_onecateg should be used if it is merely relevant that a particular class of the classes of the
ordinal target variable is predicted as accurately as possible. This particular class must be provided
via the argument categ. Youden’s J statistic is calculated with respect to class categ ("class categ
yes" vs. "class categ no").
perfm_custom should be used if there is a particular ranking of the classes with respect to their
importance. Youden’s J statistic is calculated with respect to each class. Subsequently a weighted
average is taken using user-specified weights (provided via the argument categweights). Classes
with higher weights are to be predicted more precisely than those with smaller weights.
Examples
data(hearth)
set.seed(123)
trainind <- sort(sample(1:nrow(hearth), size=floor(nrow(hearth)*(1/2))))
testind <- setdiff(1:nrow(hearth), trainind)
datatrain <- hearth[trainind,]
datatest <- hearth[testind,]
ordforres <- ordfor(depvar="Class", data=datatrain, ndiv=80, nbest=5)
# NOTE: ndiv=80 is not enough!! In practice, ndiv=1000 (default value) or a higher
# number should be used.
preds <- predict(ordforres, newdata=datatest)
perfm_equal(ytest=datatest$Class, ytestpred=preds$ypred)
plot.ordfor
11
perfm_irrespective(ytest=datatest$Class, ytestpred=preds$ypred)
perfm_onecateg(ytest=datatest$Class,
perfm_onecateg(ytest=datatest$Class,
perfm_onecateg(ytest=datatest$Class,
perfm_onecateg(ytest=datatest$Class,
perfm_onecateg(ytest=datatest$Class,
ytestpred=preds$ypred,
ytestpred=preds$ypred,
ytestpred=preds$ypred,
ytestpred=preds$ypred,
ytestpred=preds$ypred,
categ="1")
categ="2")
categ="3")
categ="4")
categ="5")
perfm_custom(ytest=datatest$Class, ytestpred=preds$ypred, categweights=c(1,2,1,1,1))
# perfm_equal(), perfm_irrespective(), and perfm_onecateg() are special cases of perfm_custom():
perfm_custom(ytest=datatest$Class, ytestpred=preds$ypred, categweights=c(1,1,1,1,1))
perfm_equal(ytest=datatest$Class, ytestpred=preds$ypred)
perfm_custom(ytest=datatest$Class, ytestpred=preds$ypred, categweights=table(datatest$Class))
perfm_irrespective(ytest=datatest$Class, ytestpred=preds$ypred)
perfm_custom(ytest=datatest$Class, ytestpred=preds$ypred, categweights=c(0,2,0,0,0))
perfm_onecateg(ytest=datatest$Class, ytestpred=preds$ypred, categ="2")
plot.ordfor
Visualization of optimized partition
Description
Visualization of the optimized partition of [0,1] interpretable as the relative widths of the classes of
the ordinal target variable.
Usage
## S3 method for class 'ordfor'
plot(x, useggplot = TRUE, ...)
Arguments
x
object of class ordfor. See function ordfor.
useggplot
logical. Should R package ggplot2 be used to produce a more neatly looking
plot?
...
other parameters to be passed through to plotting functions if useggplot=FALSE.
Value
If useggplot = TRUE (default) a ggplot x is returned where the ggplot2 plot is also shown on the
current device. See ?ggplot for saving ggplot2 plots. If useggplot = FALSE NULL is returned.
12
predict.ordfor
Examples
data(hearth)
set.seed(123)
hearthsubset <- hearth[sort(sample(1:nrow(hearth), size=floor(nrow(hearth)*(1/2)))),]
ordforres <- ordfor(depvar="Class", data=hearthsubset, ndiv=80, nbest=5)
# NOTE: ndiv=80 is not enough!! In practice, ndiv=1000 (default value) or a higher
# number should be used.
plot(ordforres)
plot(ordforres, useggplot=FALSE)
predict.ordfor
Prediction using ordinal forest objects
Description
Prediction of test data using ordinal forest.
Usage
## S3 method for class 'ordfor'
predict(object, newdata, ...)
Arguments
object
object of class ordfor. See function ordfor.
newdata
data.frame. Data frame containing new data.
...
further arguments passed to or from other methods.
Value
predict.ordfor returns an object of class ordforpred. An object of class "ordforpred" is a list
containing the following components:
ypred
vector of length nrow(newdata). Factor-valued test data predictions.
yforestpredmetric
vector of length nrow(newdata). Numeric test data predictions: Result of applying the regression forest forestfinal returned by ordfor.
predict.ordfor
Examples
data(hearth)
set.seed(123)
trainind <- sort(sample(1:nrow(hearth), size=floor(nrow(hearth)*(1/2))))
testind <- setdiff(1:nrow(hearth), trainind)
datatrain <- hearth[trainind,]
datatest <- hearth[testind,]
ordforres <- ordfor(depvar="Class", data=datatrain, ndiv=80, nbest=5)
# NOTE: ndiv=80 is not enough!! In practice, ndiv=1000 (default value) or a higher
# number should be used.
preds <- predict(ordforres, newdata=datatest)
preds
table(data.frame(true_values=datatest$Class, predictions=preds$ypred))
par(mfrow=c(1,2))
plot(preds$yforestpredmetric, as.numeric(preds$ypred))
plot(pnorm(preds$yforestpredmetric), as.numeric(preds$ypred), xlim=c(0,1))
par(mfrow=c(1,1))
13
Index
hearth, 3
ordfor, 2, 5, 9, 11, 12
ordinalForest (ordinalForest-package), 2
ordinalForest-package, 2
perfm, 5, 9
perfm_custom (perfm), 9
perfm_equal (perfm), 9
perfm_irrespective (perfm), 9
perfm_onecateg (perfm), 9
plot.ordfor, 2, 11
predict.ordfor, 2, 5, 8, 12
14