Reliability of Rating Visible Landscape Qualities

Reliability of Rating Visible Landscape Qualities
James F. Palmer
James E Palmer is Associate Professor and Undergraduate Curriculum
Director of the Landscape Architecture Program at the State University
of New York’s College of Environmental Science and Forestry in Syracuse,
New York. He holds an M.L.A. degree
and Ph.D. degree from the University
of Massachusetts in Amherst. His
research focuses on perceptions of
visible landscape qualities. He is particularly interested in developing GIS
models to predict landscape perceptions and the interaction between
landscape change and changes in perceptions. He also hosts LArch-L, the
electronic discussion group for landscape architecture. He may be
reached by email at
[email protected].
Abstract: Reliability is measured by whether an investigation will obtain similar results
when it is repeated by another party. It is argued that reliability is important at the level of
individuals. While all landscape assessments are based on individual judgments, they are frequently aggregated to form co~npositejudgments. The use of inter-group, intra-gwup and interrater measures of reliability in the landscape perception literature is reviewed. This paper investigates the reliability of assessing various visible landscape qualities using data primarily from
previously published studies. The results indicate that there is reason for concern about the reliability of rating scales used in thisJ~eld, and suggest actions for both research and practice.
Landscape studies have joined the shifting sands of scholarly inquiry. The empirical positivist tradition continues to hold that there is an objective landscape that can be studied
through the careful application of scientific principles. Post-positivists, among whom I include
myself, stand back from the stranglehold that science recently held on legitimate knowledge.
They acknowledge other ways of knowing, but favor empirical methods of discovery. In response
to positivism, post-modern landscape scholars with diverse viewpoints are investigating new
ways to explore the personal meaning that landscapes hold for us. For instance, Norberg-Schulz
(1979) has contributed to the rising importance of genius loci, Potteiger and Purinton (1998)
are exploring landscape narratives, and Brook (1998) is adapting Goethe’s approach to scientific inquiry through direct experience. In this flurry of activity some foundational attributes of
applied everyday experience are lost or ignored. One such attribute is reliability, which is the
subject of this paper.
Reliability and the Human Condition
The poor boy hopes to recover
from his blunder:
T
o illustrate the tension
between our desire to capture the uniqueness of the moment
and the need for stability in our lives,
I call upon one of the greatest chroniclers of the human condition. Quoting
from the balcony scene in Shakespeare’s Romeo and Juliet act 2, scene 2:
Romeo. Lady, by yonder blessed
moon I swear
That tips with silver all these fruittree tops ....
In the middle of Romeo’s poetic
attempt to swayJuliet’s heart, she
cuts him offl
Juliet. O! swear not by the moon,
the inconstant moon,
The monthly changes in her circled
orb,
Lost that thy love prove likewise
variable.
She will have none of it (this
night at least). She makes it clear
that she wants a constant and
dependable love:
Romeo. What shall I swear by?
166 Landscape Journal
Juliet. Do not swear at all;
Or, if thou wilt, swear by thy gracious self,
Which is the god of my idolatry,
And I’ll believe thee.
She embraces the substance of
his desire (their love); it is only the
metaphor for his method that is in
question (the unique moment).
Romeo. If my heart’s dear love ....
He has not learned, and begins
another uniquely poetic expression,
only to be cut off once more:
Juliet. Well, do not swear. Although
I joy in thee,
I have no joy of this contract tonight:
It is too rash, too unadvised, too
sudden;
Too like the lightning, which doth
cease to be
Ere one can say it lightens. Sweet,
good night!
This bud of love, by summer’s
ripening breath,
May prove a beauteous flower when
next we meet.
Good night, good night! as sweet
repose and rest
Come to thy heart as that within
my breast!
He has twice failed, and she
grows weary. However, she leaves him
with an indication of what she
seeks--reliability. What she wants is
not rash declarations, but steadfast
devotion; not unadvised promises, but
considered pronouncements; not sudden like a flash of lightning, but a
constant and unfailing partnership in
love.Juliet’s response represents the
need for reliability in our most important experiences as a fundamental
condition in our lives.
Reliability in the (Post-)Positivist Paradigm
The National Environmental
Policy Act of 1969 has particular
importance for those of us who conduct landscape assessments. It is
NEPA that declared it national policy
and the "continuing responsibility of
the Federal Government to use all
practicable means to... assure for all
Americans... aesthetically.., pleasing surroundings." In particulm;
there is a responsibility to "identify
and develop methods and procedures
¯.. which will insure that presently
unquantiffed environmental amenities and values may be given appropriate consideration in decision making." As one example, the goals of the
Forest Service’s new Scenery Management System are to (1) inventory and
analyze scenery; (2) assist in establishing overall resource goals and
objectives; (3) monitor the scenic
resource; and (4) ensure high-quality
scenery for future generations
(USDA 1995). Reliability is the
implicit cornerstone upon which
these goals can be achieved.
Though speaking about unquantiffed values and amenities, the language of NEPA clearly takes a decidedly positivist tone. Nunnally (1978,
p. 191) describes the meaning and
importance of reliability to science:
Reliability concerns the extent to
which measurements are repeatable--when different persons make
the measurements, on different
occasions, with supposedly alternative instruments for measuring the
same thing and when there are
small variations in circumstances
for making measurements that are
not intended to influence results.
¯.. Measurement reliability represents a classic issue in scientific
generalization.
The measurement of reliability
is relatively straightforward.~ "The
average correlation among the items
can be used to obtain an accurate
estimate of reliability" for independently made assessments (Nunnally
1978, p. 227). There are even generally accepted standards for reliability
among psychometricians (Nunnally
1978, p. 245).
What a satisfactory level of reliability is depends on how a measure is
being used. In the early stages of
research.., one saves time and
energy by working with instruments
that have only modest reliability,
for which purposes reliabilities of
.70 or higher will suffice .... For
basic research, it can be argued
that increasing reliabilities much
beyond .80 is often wasteful of time
and funds .... In many applied
problems, a great deal hinges on
the exact score made by a
person .... In those applied settings
where important decisions are
made with respect to specific test
scores, a reliability of .90 is the
minimum that should be tolerated,
and a reliability of .95 should be
considered the desirable standard.
Since there are few problems
that impact us as directly as decisions
about our common landscape, the
fundamental importance of using
reliable landscape assessment methods should be readily apparent.
Reliability in Visual Landscape
Assessments
It is surprising that relatively
little attention is paid by most
researchers and practitioners to the
reliability of landscape assessment
methods--and there is a lot on which
to focus¯ For instance, how many photos are needed to reliably represent
different landscapes (Daniel et al.
1977; Hoffman 1997)? When a long
series of landscape scenes is being
evaluated, are the same standards
being reliably applied throughout the
sequence (Palmer 1998)? Are landscape evaluations stable after a year
or two (Hull and Buhyoff 1984); how
about after ten years (Palmer 1997)?
While these and other questions
of reliability are all important, this
paper focuses on the reliability of
raters. Three levels of rater reliability
are distinguished: inter-group, intragroup, and inter-rater. Inter-group
reliability compares the mean ratings
assigned by different groups. The
intent is to establish whether different groups give similar ratings. Intragroup reliability establishes the reliability of a composite or mean rating
from a particular group. Inter-rater
reliability establishes the expected
reliability for a single rater’s assessment, based on the ratings from a
group. The landscape assessment literature concerning inter-group, intragroup, and inter-rater reliability is
reviewed. The analyses of raw
datasets, primarily from previously
published studies, are presented to
illustrate the difference between
inter- and intra-group reliabilities for
scenic value ratings. Then three new
analyses are presented investigating
the reliability of three types of landscape descriptors: landscape dimensions (development, naturalism, preference, and spaciousness), informational content (coherence,
complexity, legibility, and mystery),
and compositional elements (color,
form, line, and texture). The concluding discussion makes recommendations to those in research and practice concerning the reliability of landscape assessment.
Inter-group reliability. At the Our
National Landscape conference,
Craik and Feimer (1979) made a plea
to establish technical standards for
the reliability, validity, generality, and
utility of observer-based landscape
assessments. At this time, most studies that include reliability estimates
use inter-group measures that compare the mean assessments made by
groups of students, lay public, or professionals. Interpretation of intergroup reliabilities may easily fall prey
to the ecological fallacy (Robinson
1950) of making generalizations
about individuals based on correlations among groups. Establishing
similar rating patterns between
divergent groups, for example professionals and the public, does not indicate that there is wide agreement
among individuals within or across
these groups. The following examples
illustrate the use of inter-group reliability.
In a study of Southern Connecticut River Valley landscapes,
Zube et al. (1974) reported the correlations among mean ratings for thirteen groups on eighteen scales.
Eighty-five percent of the correlations among these groups were above
.83. All of the correlations below .83
involved a single group of center-city
Palmer 167
residents, suggesting that they "may
in fact have different perceptions of
scenic quality in the rural landscape." The need for further study is
indicated.
Buhyoff and his colleagues
(1979) compare the perception of
insect damage to forest scenes by
foresters, environmentalists, and the
public. There was an overall similarity among the groups, but ratings
were affected by awareness of the disease. Groat (1982) compared perceptions of modern and post-modern
architecture by accountants and
architects. She found that the
accountants were not sensitive to
post-modern principles.
Some of these and other studies
find evidence that there is a strong
similarity among ratings by the public, students, and professionals in
many instances. In other studies,
expert knowledge appears to change
the ratings by professionals and students, sometimes quite significantly.
However, none of these group comparisons sheds light on the reliability
of ratings by individuals, whether
they are environmental professionals
or members of the public.
Intra-group reliability. Most
reports of rater reliability are for
group-mean ratings rather than for
the average reliability for an individUal rater within a group. For instance,
Gobster and Chenoweth (1989) identified thirty-four of the most commonly used landscape descriptors
representing physical, psychological
and artistic attributes. In two experiments, using groups of thirty and
twenty-two raters, they report reliabilities of .64 to .99 for group-mean
ratings.
Extensive research on forest
landscape aesthetics has developed
during the past thirty years. Much of
this research uses the scenic beauty
estimation (SBE) method (Daniel
and Boster 1976), a statistical technique that standardizes each rater’s
scores in order to control the variation in how a rater "anchors" the rating scale. The reliability of groupmean ratings is frequently reported,
168 Landscape Journal
but not the average reliability for
individuals (Brown et al. 1988; Brown
and Daniel 1987; Daniel et al. 1989;
Hetherington et al. 1993; Rudis et al.
1988). Typically these reliabilities are
above .90 for scenic beauty.
Reports of the intra-group reliability of ratings other than scenic
beauty or preference are less common. Herzog (1987) used ratings of
identifiability, coherence, spaciousness, complexity, mystery, texture,
and preference to characterize seventy natural mountainous, canyon,
and desert scenes. Using groups of
from thirteen to twenty-six students,
he obtained intra-group reliabilities
for mean scores that ranged from .69
for coherence to .97 for preference.
Intra-group reliabilities are
generally high, and the more raters
there are in a group, the greater will
be the group-mean reliability. The
reliability of group-means is important if they are going to be used as
variables in predictive research models or for making environmental decisions. However, they do not reflect
the reliability of individual raters.
Inter-rater reliability. In practice,
landscape assessments conducted in
the field rarely involve judgments by
more than one or perhaps two trained
professionals. Assessing slides or
other representations makes it more
convenient to use a panel of evaluators. However, it is still unusual for
more than a few professionals to
apply any of the agency evaluations
systems to a series of slides (Smardon
et al. 1988; USDA 1995; USDI 1980).
The only common occurrence of a
large group evaluating landscape
scenes is to create mean ratings for a
single attribute (e.g., scenic beauty or
preference), normally for research
purposes.
Schroeder (1984) reports the
inter-rater reliability of attractiveness or scenic beauty from ten studies, as well as one study of visual air
quality, two studies of enjoyableness,
and another of safety for urban parks.
The attractiveness studies have a
mean reliability of .530. Patsfall and
his colleagues (1984) report singlerater reliabilities of.23 for SBE ratings of vistas along the Blue Ridge
Parkway. Anderson (1976) extended
Zube’s (1974) study and found an
inter-rater reliability for individuals of
.69 for scenic value judgments of 212
scenes. He also used the SpearmanBrown prophecy formula to determine that a group of four raters
would achieve a composite reliability
for scenic value of approximately .90.
Craik’s research group used
four scenes to evaluate inter-rater
reliabilities for fifteen attribute ratings, primarily those used in the
Bureau of Land Management’s visual
resource assessment procedure
(Feimer et al. 1979). Even after
extending the analysis to 19 scenes,
they found that "the reliability of ratings tends to be low for single
observers and hence it is advisable to
use composite judgments of panels of
independent observers" (Feimer et al.
1981, p. 16). In a later report on this
same work, Craik (1983, p. 72) recommends that "given present rating
systems, panels of at least five members, rendering independent judgments, are required to achieve adequate levels of composite reliability."
The basis of this recommendation
comes from the application of the
Spearman-Brown prophecy formula
to determine the composite reliability
for a group of any size based on the
mean intra-group reliability. Kopka
and Ross (1984) also evaluated the
inter-rater reliability of BLM’s "level
of influence" procedure applied to
existing scenes. The findings from
these two independent studies concerning the assessment reliability of
formalistic design qualities in the
landscape are summarized in Table 1.
They fall substantially short of
Table 1. Inter-rater reliability for BLM’s level of influence variables.
Variable
Kopka & Ross 1984
Form
Line
Color
Texture
.54
.63
.25
.53
acceptable levels, even for
exploratory research, let alone for
professional assessments.
Craik (1972; Brush 1976) suggests that a distinction be made
between evaluative appraisals and
preferential judgments. Appraisals
are based on a widely-accepted and
culturally-determined standard.
"Esthetic appraisals reflect the layman’s attempt to employ a commonly
but imperfectly understood external
standard" (Craik 1972, p. 257). In
contrast, personal judgments are
more specialized assessments from a
particular perspective and for a par-
Feimer et al. 1981
.45
.19
.13
.41
ticular purpose. "Preferential ratings
[judgments] reflect all sorts of individual and subgroup tastes, inclinations, and dispositions" (Craik 1972,
p. 257). High inter-rater reliability
may be an indication that an attribute assessment is understood to be an
appraisal using common culturally
accepted criteria, while low reliability
may indicate a more personal judgment (Brush 1976). For instance,
Coughlin and Goldstein (1970)
obtained higher inter-rater reliabilities for attractiveness than for residential or sightseeing preference ratings of a series of scenes.
Inter-Group and Inter-Rater Reliability of
Scenic Quality
Most landscape assessment
research has sought to explain why
people like certain landscapes more
than others (Zube, Sell and Taylor
1982). The term used to describe this
liking has varied among authors.
Zube (1974) refers to scenic value,
the Kaplans (1989, 1998) use preference, and Daniel and Boster (1976)
choose scenic beauty. These terms
have been found to be essentially the
same (Zube et al. 1974). Because scenic quality, preference or beauty is so
widely measured, it is possible to use
this measure to compare the relative
size of inter- and intra-group reliabilities. It also creates a standard for
comparison to other attributes:
Methods. The inter-group and
inter-rater reliability of landscape
preference is evaluated based on data
from four previous research efforts.
Palmer and Smardon (1989) surveyed
Table 2. Study datasets used to compare inter-group and inter-rater reliability.
Sites Represented
Location
Participants
Media
Year
n
Format
Type
Selection
Year
Juneau, AK
Juneau, AK
B/W offset press
B/W offset press
’86
’86
16
t6
Rating-9
Rating-9
Residents
Public meeting
Random
Random
’86
’86
406 Palmer & Smardon 1989
41
ibid.
Dennis, MA
Dennis, MA
Dennis, MA
Dennis, MA
Dennis, MA
Color photos
B/W offset press
Color slides
B/W offset press
B/W offset press
’76
’76
’76
’76
’76
56
56
56
56
56
Q-sort-7
Q-sort-7
Rating-10
Q-sort-7
Q-sort-7
Registered voter
Town list
Resident
Env. Prof.
Env. Prof.
Random
Random
Available
Selected
Selected
’76
’87
’96
’83
’85
68
34
31
5t
67
Palmer 1983
Palmer 1997
unpublished
Palmer t985
Palmer 1985
White Mtn, NH
White Mtn, NH
White Mtn, Nit
Color web press
Color web press
Color web press
’92
’92
’92
64
64
64
Rating-10
Rating-10
Rating-10
Residents
USFS prof.
Opinion leaders
Random
Random
Census
’95
’95
’94/5
77
205
97
Palmer 1998
ibid.
ibid.
US impact pmrs
US ~mpact pmrs
US ampact pmrs
US tmpact pmrs
US ~mpact pmrs
US ~mpact pmrs
US ampact pmrs
US ampact pmrs
US ~mpact pmrs
US ampact pmrs
US ~mpact pmrs
US ~mpact pmrs
Color web press
Color web press
Color web press
Color web press
Color web press
Color web press
Color web press
Color web press
Color webpress
Color web press
Color webpress
Color web press
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
32
32
32
32
32
32
32
32
32
32
32
32
Compare-100
Compare-100
Compare-100
Compare-100
Compare-100
Compare-100
Compare-100
Compare-100
Compare-100
Compare-100
Compare-100
Compare-100
Austrian students
French students
German students
Hong Kong stds.
Italian students
Japanese students
Korean students
Puerto Rican stds.
Spanish students
Utah students
Yugoslav students
Central NY stds.
Available
Available
Available
Available
Available
Available
Available
Available
Available
Available
Available
Available
’90
’87
’88/9
’90
’87
’87
’87
’87
’87
’88
’87
’87
59
99
47
53
26
52
128
14
100
40
47
59
Palmer et al. t990
ibid.
ibid.
ibid.
ibid.
ibid.
ibid.
ibid.
ibid.
ibid.
ibi&
ibid.
n
Study Citation
Palmer 169
Table 3. Inter-group reliability of scenic ratings for Dennis, Massachusetts.
Citizen ’76
Citizens ’76
Citizens ’87
Citizens ’96
Env. Prof. ’83
Env. Prof. ’85
Citizen ’87
-0.947
0.904
0.955
0.952
Citizen ’96
-0.947
-0.941
0.940
0.942
a random sample of residents and
attendees at a public workshop to
study the human-use values of wetlands in Juneau, Alaska. The survey
included sixteen photos representing
the range of local wetland types and
conditions. The second study began
as part of a community effort to
develop a comprehensive plan for
Dennis, Massachusetts. In 1976, a
random sample of registered voters
evaluated fifty-six photographs representing the town (Palmer 1983). Residents evaluated the same scenes in
1987 and 1996 (Palmer 1997). These
ratings are compared to those from
employees of the U.S. Army Corps of
Engineers which were gathered in
preparation for a training course in
landscape aesthetics (Palmer 1985).
The third study evaluated simulations of different harvesting intensities, patterns, and patch sizes of
clearcuts in the White Mountain
National Forest (Palmer 1998).
Respondents included a random sample of regional.residents, opinion leaders in the management of the area’s
Env. Prof. ’83
0.904
-0.941
-0.899
0.895
Env. Prof. ’85
0.955
0.940
-0.899
-0.992
forests, and environmental professionals stationed in National Forests
in the northeastern quarter of the
United States. The final study
involves twelve groups of college students from around the world (Palmer
et al. 1990). They evaluated sixteen
matched simulations from the northeastern and southwestern United
States portraying pre- and postimpact conditions. Citations for these
data-sets and the general characteristics of the respondents and simulation
media are summarized in Table 2.
Results. The correlation between
the mean scenic ratings of Juneau
residents and workshop attendees is
.971 for scenic ratings of sixteen
diverSe wetlands. Table 3 shows the
correlations among groups evaluating
the Dennis scenes. The average correlation among the five groups is
.937. The highest correlation is .992
between the two groups of environmental professionals, and the lowest
is .895 between 1996 citizens and
1985 professionals.
0.952
0.942
0.895
-0.992
--
In the White Mountain
clearcutting study, the inter-group
correlation of citizens with opinion
leaders is .980, and with Forest Service professionals it is .978. The correlation between opinion leaders and
Forest Service employees is .974. The
correlations among the four groups of
opinion leaders are shown in Table 4,
with an average correlation of .894.
Table 5 shows the correlations among
the seven groups of Forest Service
environmental professionals. Their
average correlation is .971.
In Table 6 are the correlations
between twelve student groups from
around the world. Even with such
diverse respondent groups, the average inter-group correlation is .804.
The lowest correlation is .496
between students from Japan and
Germany, while the highest is .969
between the Austrian and German
students. The inter-group correlations from these four studies indicate
why the quality of landscape assessments enjoys such a high reputation-most measures of reliability
meet the highest standards, and all
but a very few meet standards of
acceptability.
Table 4. Inter-group reliability among opinion leaders’ scenic ratings for clearcutting alternatives in the
White Mountains, New Hampshire.
Appalachian
Trail Council
Appalachian Trail Council
Forest Resources Steering Com.
North Country Council
Roundtable on Forest Law
170 Landscape Journal
-0.934
0.890
0.881
Forest Resources
Steering Committee
-0.934
-0.902
0.890
North Country
Council
0.890
-0.902
-0.868
Roundtable on
Forest Law
0.881
0.890
-0.868
--
Table 5. Inter-group reliability among USFS employees’ scenic ratings for clearcutting alternatives in the
White Mountains, New Hampshir,e.
Archaeol.
Engineer
Forester
Land Arch
Manager
Rec Spec
Wild,Bio.
-0.985
0.968
0.937
0.948
0.973
0.977
-0.985
-0.973
0.947
0.950
0.975
0.980
0.968
-0.973
-0.970
0.984
0.995
0,989
0.937
0.947
-0.970
-0.986
0.975
0.945
0.948
0.950
0.984
-0.986
-0.987
0.960
0.973
0.975
0.995
0.975
-0.987
-0.984
0.977
0.980
0.989
0.945
0.960
-0.984
--
Archaeologist
Engineer
Forester
Landscape Arch
Management
Recreation Spec
Wildlife Biologist
However, the inter-rater reliabilities in Table 7 are much less
encouraging. The average inter-rater
correlation is .307 for Juneau residents, and .355 for workshop attendees. That is approximately one-third
the inter-rater correlation between
the two groups. The average of the
intra-group correlations for the five
Dennis study groups is .608. Again,
this is a substantial drop in reliability
from the average inter-group correlation of .937. The average inter-rater
correlation is .554 among the three
major groups in the White Mountain
study. This is down from an average
inter-group correlation of .977. The
average of the twelve inter-rater correlations from the international study
is .427, down from an average intergroup correlation of .804.
If landscape assessments are
made primarily by individuals and not
by large panels of evaluators, then
these results indicate that the reliability of scenic assessments is unlikely
to be reaching acceptable levels. The
next sections will consider the reliability of other visual qualities.
and scenic quality. His landscape
dimensions were all measured from
USGS 1:24,000 topography or Massachusetts MapDown land use maps.
Palmer (1996) used a similar
approach to validate a GIS model of
spaciousness. A regression analysis
found landscape dimensions
explained approximately half of the
variation in Zube’s perceived scenic
value and Palmer’s perceived spaciousness.
Shafer (1969) employed a different approach to measure landscape dimensions. He divided an eyelevel photograph into foreground,
middle ground and background. Then
the area and perimeter of content
areas were measured in each zone.
Examples of content include water,
trees, buildings, ground cover, and
pavement. This approach to measuring landscape dimensions also
accounts for approximately half the
variation in visual preference.
Inter-group and Inter-rater Reliability of
Landscape Dimensions
Zube (1974) defines landscape
dimensions "as physical characteristics or attributes of the landscape
which can be measured using either
normal ratio scales or psychometric
scaling." Examples of such dimensions include: percent tree or water
cover, length or area of the view, relative elevation change, and various
edge and contrast indices. These
dimensions bear a remarkable resemblance to those employed by quantitative landscape ecologists today
(Tnrner and Gardner 1991). Zube
investigated the relationship between
twenty-three landscape dimensions
Table 6. Inter-group reliability in a multi-national study of visual impact perceptions.
Austrian
Central NY
French
German
Hong Kong
Italian
Japanese
Korean
Puerto Rican
Spanish
Utah
Yugoslav
Au
CNY
Fr
Gr
HK
It
-0.958
0.952
0.969
0.681
0.863
0.604
0.723
0.802
0.934
0.891
0.906
-0.958
-0.928
0.952
0.709
0.881
0.641
0.750
0.813
0.945
0.914
0.923
0.952
-0.928
-0.958
0.631
0.784
0.599
0.712
0.712
0.902
0.870
0.887
0.969
0.952
-0.958
-0.587
0.813
0.496
0.661
0.731
0.880
0.832
0.853
0.681
0.709
0.631
-0.587
-0.800
0.791
0.704
0.765
0.790
0.855
0.821
0.863
0.881
0.784
0.813
-0.800
-0.714
0.731
0.853
0.855
0.847
0.902
Ja
0.604
0.641
0.599
0.496
0.791
-0.714
-0.843
0.619
0.733
0.783
0.796
Ko
PR
Sp
Ut
Yu
0.723
0.750
0.712
0.661
0.704
0.731
-0.843
-0.663
0.821
0.800
0.844
0.802
0.813
0.712
0.731
0.765
0.853
0.619
-0.663
-0.782
0.767
0.855
0.934
0.945
0.902
0.880
0.790
0.733
0.733
0.821
-0.782
-0.963
0.933
0.891
0.914
0.870
0.832
0.855
0.783
0.783
0.800
0.767
-0.963
-0.916
0.906
0.923
0.887
0.853
0.821
0.796
0.844
0.844
0.855
0.933
-0.916
--
Palmer 171
Table 7. Inter-Rater Reliability of Scenic Ratings from Four Studies.
Location
Respondents
Juneau, AK
Residents
Public meeting attendees
Dennis, MA
Mean
95%-ile
Median
5%-ile
406
41
.307
.355
.793
.824
.424
.490
-.249
-.400
Registered voter
Town list
Residents
Env. Prof. in Corps of Engineers
Env. prof. in Corps of Engineers.
68
34
31
51
67
.603
.563
.539
.635
.701
.824
.837
.853
.818
.971
.636
.606
.655
.674
.807
.272
.076
-. 198
.315
.089
White Mtn, NH
Citizens
Opinion leaders
Appalachian Trail Council
Forest Res. Steering Committee
North Country Council
Roundtable on Forest Law
USFS employees
Archaeologist
Engineer
Forester
Landscape Architect
Management
Recreation Specialist
Wildlife Biologist
69
95
24
18
43
10
205
9
13
87
10
30
20
36
.512
.532
.574
.344
.653
.554
.619
.690
.537
.601
.684
.584
.687
.666
.800
.812
.840
.737
.833
.815
.837
.820
.814
.830
.882
.838
.848
.839
.593
.630
.709
.441
.701
.676
.672
.697
.598
.658
.735
.651
.706
.717
-.067
-.147
-.645
-.321
.275
-.095
.215
.543
.122
.798
.434
.002
.457
.216
US impact pairs
Austrian students
French students
German students
Hong Kong students
Italian students
Japanese students
Korean students
Puerto Rican students
Spanish students
Utah State students
Yugoslav students
SUNY ESF students
59
29
47
53
26
52
128
14
100
40
47
59
.602
.451
.543
.344
.314
.347
.248
.273
.505
.469
.450
.583
.827
.774
.844
.675
.699
.704
.606
.685
.785
.805
.74 t
.812
.646
.461
.598
.383
.322
.390
.251
.364
.528
.497
.479
.624
.176
.090
.002
-.154
-. 147
-.180
-.121
-.274
.149
.049
.061
.226
The approaches developed by
Zube and Shafer use physical tools to
measure the landscape’s dimensions.
Human judgment can also be used to
estimate these measurements, for
instance the relative area of a view
covered by forest. However, when
people are used as the measuring
device, more complicated constructs
can also be measured, such as naturalism or spaciousness. It is the reliability of using people to measure
landscape dimensions that is tested in
this section.
172 Landscape Journal
n
Methods. Respondents are thirty
advanced landscape architecture or
environmental science students at
State University of New York’s College of Environmental Science and
Forestry in 1997 and 1998. They evaluated offset printed photographs of
Dennis, Massachusetts taken in 1976.
They were instructed to identify the
highest, lowest, and intermediate
quality scenes and describe the criteria for their decisions. Using these
scenes as anchor points on a sevenpoint scale, they sorted the remaining
scenes among the seven rating levels.
Each quality was evaluated on a different day. The four landscape dimensions were described as follows:
Naturalism refers to aspects of the
landscape that could exist without
human care. Nature is an expression of how much vegetation is in a
view, how organic are its elements
and patterns, and how uncontrolled
are the natural processes.
Development refers to aspects of
the landscape that are human creations. Development is an expression of human control over natural
processes or patterns, and the dominance of structures, such as buildings, roads, or dams.
Spaciousness is the landscape’s
enclosure or expansiveness. It
describes how much room there is
to wander in the view, or how far
you could go before you reach the
boundaries.
Preference is how much you like or
dislike a landscape. It is also called
scenic quality, attractiveness, or
beauty. People’s preferences are
descriptions of their personal experience.
Results. The inter-rater correlations reported in Table 8 show that
acceptable levels of reliability are
achieved for perceived naturalism
(.796), development (.762), and spaciousness (.715). The response patterns for naturalism and development
are nearly mirror images of each
other. The reliability of development
is brought down somewhat by one
student whose responses correlate
negatively with the other evaluators.
The reliability of the preference
ratings (.582) in Table 8 is comparable to that for scenic preference from
the other studies presented in this
paper. However, it is substantially
lower than for the three landscape
dimensions.
hypothesis is that humans have
evolved to seek and understand information in a particular type of landscape, namely the savanna (Kaplan
and Kaplan 1982, p. 75-77). As such,
the preference for visual conditions
that enhance the acquisition of information in savanna-like landscapes is
in the genes, so to speak.
The Kaplans posit a framework
that characterizes information from
two perspectives. First, information
contributes to either understanding
or exploration. "Understanding
refers to the desire people have to
make sense of their world, to comprehend what goes on around them.
Understanding provides a sense of
security.... People want to explore,
to expand their horizons and find out
what lies ahead. They seek more
information and look for new challenges" (Kaplan, Kaplan & Ryan
1998, p. 10)¯ Second, visual information is presented in two forms: twodimensional and three-dimensional.
Two-dimensional information
"involves the direct perception of the
elements in the scene in terms of
their number, grouping, and placement .... When viewing scenes, people not only infer a third dimension,
It is suggested that the landscape dimensions are descriptions of
physical condition. However, the four
Preference Matrix variables describe
our experience and interpretation of
information in the landscape. In a
sense they are more removed from an
objective physical condition and
closer to a psychological outcome.
Methods. The respondents, photographs, and procedures are the same
as those used for the landscape dimensions ratings. The students were given
a reading assignment (Kaplan and
Kaplan 1989, p. 49-58) in preparation
for a thirty minute lecture about the
Kaplan’s information framework. As
before, students rated the attributes
on different days. The four informational attributes were described in the
instructions as follows:
Coherence is the landscape’s orderliness or confusion. It describes how
well a view "hangs together," or
how easy it is to understand what
you see. It is enhanced by anything
that helps organize the patterns of
light, size, texture, or other elements into a few major units.
Complexity is the landscape’s
intricacy or simplicity. It describes
how much is going on in a view;
Table 8. Inter-rater Reliability for Perceived Landscape Dimensions in Dennis, MA.
Attribute
Naturalism
Development
Spaciousness
Preference
n
Mean
95%-ile
30
30
30
30
.796
.762
.715
.582
.916
.9 l0
.859
.823
Inter-group and Inter-rater Reliability of
Informational Content
The Kaplans (1982, 1989, 1998)
have proposed an elegant theory that
relates the informational content of a
scene to its preference. It is related to
Appleton’s (1975) prospect-refuge
theory of environmental preference¯
They both refer to adaptive pressures
during human evolution to lend
authority to their theories¯ Their
but imagine themselves in the scene.
¯.. involve the inference of what
being in the pictured space would
entail" (Kaplan et al. 1998, p. 13).
Four information concepts are
derived from this framework. Twodimensional understanding is called
coherence, and two-dimensional
exploration is complexity. Threedimensional understanding is legibility, and three-dimensional exploration is mystery. The Kaplans call
this two-by-two classification grid the
Preference Matrix. It is thought that
people prefer landscapes that are
coherent, complex, legible, and mysterious.
Median
5%-ile
.826
.828
.728
.613
.466
-.046
.514
.226
how many eletnents of different
kinds it contains. It is the promise
of further information, if only there
were more time to look at it from
the present vantage point.
Legibility is whether a landscape
is memorable or indistinguishable.
It describes how easy it would be to
find one’s way around the view; how
easy it would be to figure out where
one is at any given moment or to
find one’s way back to any given
place.
Palmer 173
Mystery in the landscape is the
result of incomplete perception. It
describes the extent to which further information is promised to the
obsetwer if she were to walk deeper
into the scene. This is not a promise of surprise, but of information
that has continuity with what is
already available.
These definitions are based on
descriptions given by the Kaplans and
their graduate students (Kaplan and
Kaplan 1989; Herzog 1989; Herzog,
Kaplan and Kaplan 1982).
Results. Table 9 lists the reliabilities obtained for the informational
attributes. These reliabilities are
highest for the exploratory variables
complexity (.315) and mystery (.262).
The three-dimensional variables,
legibility (.214) and ~nystery (.26),
are higher than the two-dimensional
variables. The lowest reliability was
found for the two-dimensional understanding variable, coherence(. 186).
However, all of these intra-group reliabilities are unacceptably low.
Inter-Group and Inter-Rater Reliability of
Compositional Elements
A common approach to assessing visual impacts involves evaluating
the amount of change in the scene’s
visual composition. Litton (1968;
USDA 1973) initiated what became
(USDI 1980; Smardon et al. 1988).
More recent manuals developed in
Great Britain build on the work of
Dame Sylvia Crowe and demonstrate
how a more complete palette of aesthetic factors can be used to describe
and evaluate landscapes (Lucas 1990;
Bell 1993).
Methods. Respondents were
twenty-five professionals in a two-day
visual assessment continuing education course taught in early-December
1997 in Albany, New York. The individual contrast ratings are the maximum contrast from the land/water,
vegetation, or structures components
of the landscape. The ratings involve
two visual principles, contrast and
dominance. Visual elements are the
source of visual contrast in the landscape, creating the patterns that we
see. An object may differ from its setting or other objects in one or more
element. When there is significant
contrast in one or more of the elements, one object may dominate
other parts of the landscape. The contrast or dominance of the following
six visual elements are evaluated:
Color is the major visual property
of surfaces attributed to reflected
light of a particular intensity and
wavelength. Described by its hue
(tint or wavelength), value (light or
Line is a path, real or imagined,
that the eye follows when perceiving abrupt differences in color or
texture, or when objects are
aligned in a one-ditnensional
sequence, described by its boldness,
complexity, and orientation. It is
usually evident as the edge of forms
in the landscape. Bold vertical lines
which interrupt the skyline tend to
dominate weak horizontal lines.
Texture is small forms or color
mixtures aggregated into a continuous surface pattern. The aggregation is sufficient that the parts do
not appear as discrete objects in
the composition of the scene. Textures are described by their grain,
density, regularity, and internal
contrast. Coarse and high-contrast
textures tend to dominate finegrained textures of low internal
contrast.
Scale is the relative size of an
object in relation to its surrounding
landscape. The scale may be in
relation to the landscape setting as
a whole, the proportion of field-ofview, or other distinct objects.
Large, heavy, massive objects
within a confined space dominate
small, light, delicate objects in
more expansive settings.
Space is the three-dimensional
arrangement of objects and voids.
Compositions are described as
panoramic, enclosed, feature, focal,
or canopied. Position of objects or
Table 9. Inter-rater reliability for informational attributes of the Dennis, MA landscape.
Attribute
Coherence
Complexity
Legibility
Mystery
n
Mean
28
28
28
28
.186
.315
.214
.262
the common use of form, line, color,
and texture as the attributes most
commonly used to describe landscape
character and change. In particular;
procedures have been developed that
evaluate changes in contrast associated with forIn, line, color, texture, as
well as scale and spatial dominance
174 Landscape Journal
95%-ile
.518
.727
.630
.581
darkness), and chroma (saturation
or brilliance). Lighter, warmer,
brighter colors tend to "advance",
while darker, cooler, duller colors
tend to "retreat" in a scene. Dark
next to light tends to attract the
eye and this contrast becomes a
visual focal point.
Form is the mass or shape of an
object or objects which appear unified. Forms are described by their
geometry, complexity, and orientation in the landscape. Forms that
are bold, regular, solid, or vertical
tend to be dominant in the landscape.
Median
5%-ile
.227
.350
.224
.278
-.205
-.358
-.246
-.081
view in the landscape is relative to
topography. Backdrop is the sky,
water, or land background against
which objects are seen. Objects
which occupy vulnerable positions
within spatial compositions, which
are high in the landscape, and/or
which are seen against the sky dominate in the scene. The sum of the
contrast and dominance ratings is
used to create an index of visual
impact severity.
The evaluation forms are
adapted from those prepared by
Smardon and his colleagues (1981)
for the Bureau of Land Management.
These judgments were made for five
sets of pre- and post-impact slide
pairs. The impacts evaluated were
eleven wind turbines installed in Vermont as seen from 1.25 and 4.0 miles,
a forest view 15 percent of which is
harvested in four or five acre
clearcuts and another in twenty-five
to thirty acre clearcuts in the White
Mountains, and a new on-ramp to a
limited access highway near Binghamton, New York.
Results. The reliability of the
compositional attributes is shown in
Table 10. The average inter-rater
reliability of scenic value ratings for
the nine slides was .539, which is
comparable, though slightly lower
than the scenic value ratings reported
above. There is a range of reliability
for the contrast ratings. Unacceptably low contrast reliabilities are
found for scale (.280) and line (.423).
The contrast reliabilities for color
ity of the index of visual impact
severity is even more reliable (.664).
Reliability of an index is normally
higher than the reliability of its components (Nunnally 1978). However,
these levels still fall well below professionally acceptable levels.
Discussion
In general, landscape attributes
that have a more denotative character seem to have greater inter-rater
reliability than those with a more
connotative charactex: Denotative
attributes provide clear designation
or referential meaning. These are
generally agreed upon "objective"
facts. They are particularly appropriate for evaluative appraisals. Connotative attributes provide emotive or
metaphorical meaning. They include
any suggestion or implication beyond
denotative meaning. This more personal attribution suggests that connotative meaning is largely a matter of
preferential judgments. Naturalism,
development, and spaciousness seem
to be examples of denotative attrib-
genetically based or human evolutionary significance in their meaning.
Perhaps their relevance to our everyday survival has changed as our cultural and environmental conditions
have changed. Another possibility is
that static photographs are inadequate to effectively trigger our information-seeking instincts. Perhaps
more attention needs to be paid to
how" movement through the landscape influences both landscape preference and information-seeking
behavior.
At least with regard to scenic
quality, these results also indicate
that evaluations from landscape professionals are more reliable than
from public individuals. Table 11
compares the average individual (i.e.,
inter-rater) reliability for professional and public respondents to the
Dennis, Massachusetts and White
Mountains studies. This is one aspect
of Carlson’s (1977) argument that
landscape evaluations should be left
to specially trained environmental
professionals. However; the difference
Table 10. Inter-rater Reliability of Landscape Composition Attributes.
Rating
Scenic Value
Contrast (Sum)
Color Contrast
Form Contrast
Line Contrast
Scale Contrast
Texture Contrast
Scale Dominance
Spatial Dominance
Visual Impact Severity
n
Mean
25
25
25
25
25
25
25
25
25
25
.539
.630
.503
.563
.423
.280
.620
.561
.376
.664
(.503), and form (.563) are minimal.
Only texture contrast (.620) has a
minimally acceptable reliability.
These results are comparable to
those from the studies summarized in
Table 1. The reliability of scale dominance (.561) and spatial dominance
(.376) is also low.
The sum of these five contrast
ratings forms a contrast index that is
more reliable (.630) than any of its
individual components. The reliabil-
95%-ile
.912
.950
.952
.923
.922
.885
.948
.923
.896
.960
utes. Landscape preference and the
compositional elements form, line,
colo~; and texture may be in a gray
area between connotative and denotative meaning. The reliability of
coherence, complexity, mystery, and
legibility is sufficiently low to suggest
that they have highly connotative
meaning.
The relatively poor reliability of
the information variables may contribute to the discussion of nature
versus nurture, whether there is a
Median
5%-ile
.592
.681
.583
.664
.497
.395
.716
.620
.463
.742
-.005
.193
-.272
-. 144
-.392
-.608
-. 154
.000
-.453
.006
is not so great as to nullify the usefulness of evaluations by random samples of the public, nor can it be used
to justify evaluations by only one or
two professionals.
There are several possible reasons that occur to the author for the
generally poor results reported here.
One possibility is that photographs
Palmer 175
Table 11. Inter-rater reliability of scenic ratings by professionals and the public.
Dennis, MA
Professionals
Public
provide insufficient information or
that different people fill in missing
information differently. This is a
question of the validity of photographic representations. Many studies have reported that photographs
appear to be valid representations,
but recent work by Hoffman (1997;
Hoffman and Palmer 1994) suggests
that there are serious validity concerns under some circumstances.
Another possible explanation is that
the instructions describing the landscape attributes to be evaluated were
.668
.568
White Mtns.
.619
.512
For some time, the practice of
landscape assessment has been dominated by methods that use rating
scales or checklists. The mixed findings reported here suggest that it
may be appropriate to investigate the
reliability of other landscape assessment methods. Those who used rating scales did so because they sought
reliable results; they never made
claims to uncover deep meaning. In
contrast, other researchers chose to
develop qualitative methods to search
for deeper meaning and to gain a
scape assessments are most commonly made by a single professional.
Published evaluations of the reliability of a single evaluator (i.e., intragroup reliability) for various landscape qualities do not ever meet professional standards (i.e., greater than
.9) and normally fall below minimally
acceptable levels (i.e., .7). Table 9
shows the number of evaluators
needed for each of the landscape
qualities considered in this paper.
The size of these evaluation panels is
determined by applying the Spearman-Brown Prophecy Formula (Nunnally 1978; Feimer et al. 1979; Anderson et al. 1976).
Several recommendations seem
appropriate given these findings.
Research involving landscape assessments by the public or professionals
must include: (1) field validations of
Table 12. The number of assessors needed to obtain minimally and professionally reliable ratings of landscape attributes.
Number of assessors to reach:
Attribute
Scenic value
Naturalism
Development
Spaciousness
Complexity
Mystery
Legibility
Coherence
Color Contrast
Form Contrast
Line Contrast
Scale Contrast
Texture Contrast
Scale Dominance
Spatial Dominance
not understood or were inadequate in
other ways. For instance, only written
or oral instructions are typically given
to respondents. It seems reasonable
that some form ofvisuat instructions
may be necessary to provide more
reliable evaluations of visible attributes. Finally, low retiabilities may be
an indication of low salience. Some of
these landscape attributes may be
fuzzy concepts or constructs that
need further development to reach
professional standards of reliable
assessment.
176 Landscape Journal
Tested reliability
.582
.796
.762
.715
.315
.262
.214
.186
.503
.563
.423
.280
.620
.561
.376
.9 reliability
7
3
3
4
20
25
33
40
9
7
13
24
6
7
15
richer understanding from the landscape. Perhaps it is time to also investigate and refine the reliability of
these methods. They may be just as
reliable as some of the rating scales!
Conclusions
The results presented here give
reason for some concern in the way
landscape assessments are conducted
in both research and practice. Land-
.7 reliability
2
l
1
1
6
7
9
11
3
2
4
6
2
2
4
photographic representations when
possible; (2) validation of the evaluation instructions; and (3) use of photographs or images to help explain
the landscape attributes being evaluated. The professional application of
landscape assessments must include:
(1) multiple trait~ed evaluators; (2) a
reliability assessment of the evaluations; and (3) field validation of photographic representations when possible.
Acknowledgments
Previously unpublished data used for this
research were funded by the North Central
Forest Research Station, Chicago, Illinois.
Slides for the landscape compositional assessment were provided by Vermont Environmental Research Associates, Waterbury Center,
Vermont and Integrated Site, Syracuse, New
York.
Note
1. The intraclass correlation (ICC) is an alternative approach to estimating reliability that
normally gives roughly comparable values
(Ebel 1951;Jones et al. 1983). The ICC is calculated using variance components from an
ANOVA table. There is an extensive literature
on the appropriate valance components to
include in an ICC (e.g., Shrout & Fleiss 1979).
Burry-Stock et al. (1996) considers ICC calculations "beyond the statistical expertise of the
average observer using observation or performance data" in their research.
References
Anderson, T. W., E. H. Zube and W. P. MacConnell. 1976. "Predicting scenic
Resource Values." In Studies in Landscape
Perception. Amherst: Institute for Man
and Environment, University of Massachusetts.
Appleton, J. 1975. The Experience of Landscape.
London: Wiley.
--. 1984. "Prospects and Refuges Revisited." Landscape Journal 3 (XX): 91-103.
Bell, S. 1993. Elements of Visual Design in the
Landscape. London: E & FN Spon.
Brook, I. 1998. "Goethean Science as a Way to
Read Landscape." Landscape Research
23(1): 51-69.
Brown, T. C., and T. C. Daniel. 1987. "Context
Effects in Perceived Environmental
Quality Assessment: Scene Selection
and Landscape Ratings."Journal of Environmental Psychology 7(3): 233-250.
Brown, T. C., T. C. Daniel, M. T. Richards and
D. A. King. 1988. "Recreation Participation and the Validity of Photo-based
Preference Judgements."Journal of
Leisure Research 20(4): 40-60.
Brush, R. O. 1976. "Perceived Quality of Scenic
and Recreational Environments." In Perceiving Environmental Quality: Research and
Applications. New York: Plenum Press.
Bnhyoff, G.J., W. A. Leuschner andJ. D. Wellman. 1979. "Aesthetic Impacts of Southern Pine Beetle Damage."Journal of
Environmental Management 8(3): 261-267.
Burry-Stock,J. A., D. G. Shaw, C. Laurire and
B. S. Chissom. 1996. "Rater Agreement
Indexes for Performance Assessment."
Educational and Psychological Measurement
56(2): 25t-262.
Carlson, A. A. 1977. "On the possibility of
quantifying scenic beauty." Landscape
Planning 4(2): 131-172.
Coughlin, R. E. and K. A. Goldstein. 1970.
"The Extent of Agreement Among
Observers on Environmental AttractiveNess." Discussion Paper 37. Philadelphia: Regional Science Research Institute.
Craik, C. H. 1972. "Psychological Factors in
Landscape Appraisal." Environment and
Behavior 4(3): 255-266.
. 1983. "The psychology of large scale
environments. In Environmental Psychology: Directions and Perspectives. New York:
Praeger Publications.
-. and N. R. Feimer. 1979. "Setting Technical Standards for Visual Assessment
Procedures." In Our National Landscape.
General Technical Report. PSW-35.
Berkeley, CA: USDA Forest Service,
Pacific Southwest Forest and Range
Experiment Station.
Daniel, T. C., L. M. Anderson, H. W. Schroeder
and L. Wheeler III. 1977. "Mapping the
Scenic Beauty of Forest Landscapes."
Leisure Sciences 1 (1): 335-52.
., and R. S. Boster. 1976. "Measuring
Landscape Esthetics: the Scenic Beauty
Estimation Method." Research Paper
RM-167. Fort Collins, CO: USDA Forest
Service, Rocky Mountain Forest and
Range Experiment Station.
., T. C. Brown, D. A. King, M. T.
Richards, and W. P. Stewart. 1989. "Perceived Scenic Beauty and Contingent
Valuation of Forest Campgrounds." Forest Science, 35(1): 76-90.
Ebel, R. L. 1951. "Estimation of the reliability
of ratings." Psychometrica 16 (4): 407-424.
Feimer, N. R., K. H. Craik, R. C. Smardon, and
S. R.J. Sheppard. 1979. Evaluating the
Effiectiveness of Observer Based Visual
Resource and hnapact Assessment
Methods." In Our National Landscape.
General Technical Report. PSW-35.
Berkeley, CA.: USDA Forest Service,
Pacific Southwest Forest and Range
Experiment Station.
, R. C. Smardon, and K. H. Craik. 1981.
"Evaluating the effectiveness of
observer based visual resource and
impact assessment methods." Landscape
Research 6(1): 12-16.
Gobster, P. H., and R. E. Chenoweth. 1989.
"The dimensions of aesthetic preference: a quantitative analysis."Journal of
Environmental Management 29 (1): 47-72.
Great Literature: Personal Library Series. 1992.
(CD-ROM) Parsippany, NJ: Bureau
Development, Inc.
Groat, L. 1982. "Meaning in post-modern
architecture: an examination using the
multiple sorting task."Journal of EnvironmentalPsychology 2(1): 3-22.
Guilford,J. P. 1954. Psychometric Methods. New
York: McGraw-Hill.
Herzog, T. R. 1987. "A cognitive analysis of
preference for natural environments:
mountains, canyons, and deserts." Landscape Journal 6(2): 140-152.
¯ 1989. "A cognitive analysis of preference for urban nature."Journal of EnvironmentalPsychology 9(1): 27-43.
, S. Kaplan and R. Kaplan 1982. "The
prediction of preference for unfamiliar
urban places." Population and Environment
5(1): 43-59.
Hetherington,J., T.C. Daniel and T.C. Brown.
1993. "Is motion more important than it
sounds?: the medium of presentation in
environmental perception research."
Journal of Environmental Psychology 13 (4):
283-291.
Hoffrnan, R.E. 1997. Testing the validity and
reliability of slides as representations of
northern hardwood forest conditions.
(Ph.D. dissertation). SUNY College of
Environmental Science and Forestry.
., andJ. F. Palmer. 1994. "Validity of
using photographs to represent visible
qualities of forest environments." In
History and Culture: Proceedings of the Council of Educators in Landscape Architecture
1994 Coherence. Washington, D. C.:
Landscape Architecture
Foundation/Council of Educators in
Landscape Architecture.
Hull, R. B., IV and G.J. Buhyoft: 1984. "Individual and group reliability of landscape
assessments." Landscape Planning 11 (!):
67-71.
Jones, A. P., L. A.Johnson, M. C. Butler and
D. S. Main. 1983. "Apples and oranges:
an emperical comparison of commonly
used indices of interrater agreement."
Academy of Management Journal 26(3):
507-519.
Kaplan, R., and S. Kaplan. 1989. The Experience of
Nature: A Psychological Perspective. New
York: Cambridge University Press.
., S. Kaplan and R. L. Ryan. t998. With
People in Mind. Washington, D. C: Island
Press.
Kaplan, S., and R. Kaplan. 1982. Cognition and
Environment. New York: Cambridge University Press.
Kopka, S., and M. Ross. 1984. "A study of the
reliability of the Bureau of Land Management visual resource assessment
scheme." Landscape Planning 11 (2):
161-166.
Litton, B: R.,Jr. 1968. "Forest landscape
description and inventories--a basis for
land planning and design." Research
Paper PSW-49. Berkeley, CA: SUDA
Forest Service, Pacific Southwest Forest
and Range Experiment Station.
Lucas, O. W. R. 1991. The Design of Forest Landscapes. New York: Oxford University
Press.
Palmer 177
Norberg-Schulz, C. 1979. Genius Loci: Towards
a Phenomenology of Architecture. New York:
Rizzoli International Publications.
Nunnally, J. C. 1978. Psychometric Theory. New
York: McGraw-Hill.
Palmer,J. F. 1983. "Assessment of coastaI wetlands in Dennis, Massachusetts." In The
Future of Wetlands: Assessing Visual-Cultural Values of Wetlands. Montclair, New
Jersey: Allanheld, Osmun Co.
¯ 1985. "The perception of landscape
visual quality by environmental professionals and local citizens." Syracuse:
Faculty of Landscape Architecture,
SUNY College of Environmental Science and Forestry.
--. 1996. Modeling Spaciousness in the Dutch
Landscape. Report 119. Wageningen, The
Netherlands: Agricultural Research
Department, Winand Staring Centre.
--. 1997. "Stability of landscape perceptions in the face of landscape change."
Landscape and Urban Planning 37(1/2):
109-113.
--. 1998. Cleareutting in the White Mountains:
Perceptions of Citizens, Opinion Leaders and
U.S. Forest Service Employees. [NYCFRD
98-0i] Syracuse, NY: New York Center
for Forestry Research and Development, SUNY College of Environmental
Science and Forestry.
--., S. Alonso, K. Dong-hee,J. Gury, Y.
Hernandez, R. Ohno, G. Oneto, A.
Pogacnik, and R. Smardon. 1990. "A
multi-national study assessing perceived
visual i~npacts." Impact Assessment Bulletin
8(4): 31-48.
--, and R.C. Smardon. 1989. "Measuring
human values associated with wetlands." In Intractable ConJlicts and their
Transformation. Syracuse, NY: Syracuse
University Press.
178 Landscape Journal
Patsfall, M. R., N. R. Feimer, G.J. Buhyoff, and
J. D. Wellamn. 1984. "The prediction of
scenic beauty from landscape content
and composition."J0urna/ofEnvironmentalPsychology 4(1): 7-26.
Potteiger, M. andJ. Purinton. 1998. Landscape
Narratives. New York:John Wiley &
Sons.
Robinson, W. S. 1950. "Ecological correlations
and the behavior of individuals." American Sociological Review 15(3): 351-357.
Rudis, V. A.,J. H. Gramann, E.J. Ruddell, and
J. M. Westphal. 1988. "Forest inventory
and management-based visual preference models of southern pine stands."
Forest Science 34(4): 846-863.
Schroeder, H.W. 1984. "Environmental perception rating scales: a case for simple
methods of analysis." Environment and
Behavior 16(5): 573-598.
Shafer, E. L.,J. E. Hamilton and E. A. Schmidt.
1969. "Natural landscape preferences: a
predictive lnodel.’Journal of Leisure
Research 1(1): 1-19.
Shakespeare, W. 1992. Romeo and Juliet. In Great
Literature: Personal Library Series. 1992.
(CD-ROM) Parsippany, NJ: Bureau
Development, Inc.
Shrout, P. E. andJ. L. Fleiss. 1979. "Intraclass
correlations: uses in assessing rater reliability." Psychological Bulletin 86 (2):
420-428.
Smardon, R. C., S. R.J. Sheppard and S. Newman. 1981. Prototype Visual hnpact Assessment Manual. Syracuse: SUNY College of
Environmental Science and Forestry.
--,J. F. Palmer, A. Knopf, K. Grinde,J. E.
Henderson and L. D. Peyman-Dove.
1988. Visual Resources Assessment Procedure
for US Army Corps of Engineers. Instruction
Report EL-88-1. Vicksburg, Mississippi:
U.S. Army Engineer Waterways Experiment Station.
Turner, M. G., and R. H. Gardner (eds.). 1991.
Quantitative methods in landscape ecology: the
analysis and interpretation of landscape heterogeneity. New York: Springer-Verlag.
U.S. Department of Agriculture, Forest Service. 1973. National forest management. Vol.
1. Agriculture Handbook No. 434.
Washington, D. C.: U.S. Government
Printing Office.
--. 1995. Landscape Aesthetics: A Handbook for
Scenery Management. Agriculture Handbook No. 701. Washington, D. C.: USDA
Forest Service.
U.S. Department of Interior, Bureau of Land
Management. 1980. VisualResource
Management Program. Washington, D. C.:
U.S. Government Printing Office.
Zube, E. H., D. G. Pitt and T. W. Anderson.
1974. Perception and Measurement of Scenic
Resources in the Southern Connecticut River
Valley. Pub. No. R-74-1. Amherst: Institute for Man and His Environment,
University of Massachusetts.
,J. L. Sell andJ. G. Taylor. 1982. "Landscape perception, research, application
and theory." Landscape Planning 9(1):
1-35.