CART and MARS
For an example, take the STAT 100 students’ data (N = 136), and try to predict BMI
from Height and Weight. The exact formula is
BMI = 703
pounds
,
(inches)2
so what is of interest is to see how the methods can approximate this nonlinear function. To
start, using regular linear regression and quadratic regression, we have the following:
df RSS/N Error
Model
CV
Linear
3
0.2461
0.2858
Quadratic 6 0.001887 0.007548
Regression Trees
The function we’ll use is in the package tree. The function is tree, too. The data set is
survey98. The following will find a tree, including doing the growing and pruning:
bmitree <- tree(BMI ~ Height+Weight,data=survey98)
bmitree
node), split, n, deviance, yval
* denotes terminal node
1) root 136 1515.000 22.39
2) Weight < 166 117 595.000 21.45
4) Weight < 130.5 61 103.400 20.05
8) Weight < 106.5 7
3.108 18.33 *
9) Weight > 106.5 54
76.830 20.27
18) Height < 67.5 48
45.650 20.53 *
19) Height > 67.5 6
4.083 18.27 *
5) Weight > 130.5 56 240.800 22.98
10) Height < 65.5 14
62.550 25.16
20) Height < 63.5 5
23.630 26.91 *
21) Height > 63.5 9
14.960 24.18 *
11) Height > 65.5 42
90.170 22.26
22) Weight < 142.5 15
16.810 20.99 *
23) Weight > 142.5 27
35.620 22.97 *
3) Weight > 166 19 192.000 28.13
6) Height < 68.5 6
54.100 30.57 *
7) Height > 68.5 13
85.640 27.00
14) Weight < 187.5 6
2.395 24.99 *
15) Weight > 187.5 7
38.050 28.73 *
1
To get the residual ss:
summary(bmitree)
Regression tree:
tree(formula = BMI ~ Height + Weight, data = survey98)
Number of terminal nodes: 10
Residual mean deviance: 1.892 = 238.4 / 126
You can plot the tree:
plot(bmitree)
text(bmitree)
Weight < 166
|
Weight < 130.5
Height < 68.5
Weight < 187.5
30.57
24.99 28.73
Weight
< 106.5
Height < 65.5
Height
< 67.5
18.33
Height < 63.5
Weight < 142.5
20.53 18.27
26.91 24.18 20.99 22.97
2
250
Or, with only two variables, a map is easy to look at:
200
28.73
Weight
30.57
24.99
24.18
22.97
150
26.91
20.99
100
20.53
18.27
18.33
60
65
70
Height
Model
df RSS/N Error
CV
Linear
3
0.2461
0.2858
Quadratic 6 0.001887 0.007548
10? 1.7529
3.0657
Tree
Or looking at the actual function and its estimate:
3
75
True function
Y
Z
z0
4
Tree
Y
Z
zztree
MARS
The MARS function is in the package mda. It uses the (x,y) form for the arguments,
rather than the (y x) form. That is one thing to drive you crazy with R and S-PLUS. We’ll
start with degree=1, which means there are no Height×Weight terms, i.e., it is an additive
model.
bmimars1 <- mars(survey98[,c("Height","Weight")],survey98[,"BMI"],degree=1)
sum(bmimars1$res^2)
[1] 9.895714
5
MARS: Additive
Y
Z
zzm
Now degree=2 will allow Height×Weight terms. Higher degrees allow multiplying more
variables together, but since we only have 2 variables, upping the degree past 2 doesn’t do
anything.
bmimars2 <- mars(survey98[,c("Height","Weight")],survey98[,"BMI"],degree=2)
sum(bmimars2$res^2)
[1] 0.9107158
6
MARS: degree=2
Y
Z
zzm2
The errors:
df
RSS/N Error
Model
CV
Linear
3
0.2461
0.2858
6
0.001887 0.007548
Quadratic
Tree
10?20?30? 1.7529
3.0657
22
0.07276
0.1452
MARS degree=1
33
0.006696 0.1009
MARS degree=2
There are 8 basis functions chosen for the degree=1 fit, and each gets penalized and extra
“2”, except for the constant term, so the df = 8 + 2 × 7 = 22. For the degree=2 fit, there
are 9 basis functions, and the penalty is 3, hence df = 9 + 3 × 8 = 33.
The regular quadratic model did best, but this problem does have a global function, so it
should do well. It is impressive how MARS did. MARS would be expected to do much better
than a polynomial if the relationship between x and y was different for different regions of
7
x.
Iris. Trying the classification trees on the iris data:
iristree <- tree(Species~.,data=iris)
iristree
node), split, n, deviance, yval, (yprob)
* denotes terminal node
1) root 150 329.600 setosa ( 0.33333 0.33333 0.33333 )
2) Petal.Length < 2.45 50
0.000 setosa ( 1.00000 0.00000 0.00000 ) *
3) Petal.Length > 2.45 100 138.600 versicolor ( 0.00000 0.50000 0.50000 )
6) Petal.Width < 1.75 54 33.320 versicolor ( 0.00000 0.90741 0.09259 )
12) Petal.Length < 4.95 48
9.721 versicolor ( 0.00000 0.97917 0.02083 )
24) Sepal.Length < 5.15 5
5.004 versicolor ( 0.00000 0.80000 0.20000 ) *
25) Sepal.Length > 5.15 43
0.000 versicolor ( 0.00000 1.00000 0.00000 ) *
13) Petal.Length > 4.95 6
7.638 virginica ( 0.00000 0.33333 0.66667 ) *
7) Petal.Width > 1.75 46
9.635 virginica ( 0.00000 0.02174 0.97826 )
14) Petal.Length < 4.95 6
5.407 virginica ( 0.00000 0.16667 0.83333 ) *
15) Petal.Length > 4.95 40
0.000 virginica ( 0.00000 0.00000 1.00000 ) *
plot(iristree)
text(iristree)
8
Petal.Length < 2.45
|
Petal.Width < 1.75
setosa
Petal.Length < 4.95 Petal.Length < 4.95
Sepal.Length < 5.15
virginica virginica
versicolorversicolor virginica
Note that there are two sets of leaves that have the same species. That is ok, but looks
funny. You can snip those off using snip.tree. You have to look at the numbers to decide
where to snip: At nodes 12 and 7, i.e., the nodes just above where the same-speicies leaf
pairs are.
iristree2 <- snip.tree(iristree,c(7,12))
plot(iristree2)
text(iristree2)
9
Petal.Length < 2.45
|
Petal.Width < 1.75
setosa
Petal.Length < 4.95
virginica
versicolor
virginica
That’s better. To see the map:
plot(iris[,3],iris[,4],col=rep(c("red","blue","green"),c(50,50,50)),xlab="Petal Length",
abline(v=2.45)
segments(2.45,1.75,2.45,9)
segments(2.45,1.75,9,1.75)
segments(4.95,1.75,4.95,0)
10
1
2
3
4
5
Petal Length
11
6
7
0.5
1.0
1.5
Petal Width
2.0
2.5
© Copyright 2026 Paperzz