Part 6.
Milton & Wiseman's meta-analysis (1999) and Bem, Broughton &
Palmer's meta-analysis (2000)
Following on from Bem and Honorton’s 1994 paper "Does Psi Exist?
Replicable Evidence for an Anomolous Process of Information
Transfer" which described the PRL successes, Julie Milton
(University of Edinburgh) and Richard Wiseman (University of
Hertfordshire) decided to see if other ganzfeld experiments
conducted at the same time had replicated those results.
The inclusion criteria was that the ganzfeld experiment should
have begun in 1987 or later and have been published before
February 1997, which was the cut-off point for an experiment to be
included in the survey. The reason for setting the earliest point
at 1987 was as follows: Since the PRL experiments were ostensibly
based on "The Joint Communique" from Hyman and Honorton (which
described the flaws that future ganzfeld experiments should avoid)
and this paper was published in 1986, it would take a year for
this advice to be adhered to in the parapsychological experiments
of the time.
This, of course, does not mean all experiments by people who have
allowed for the findings of Hyman and Honorton in their work will
be included, nor does it exclude those experiments that ignored
their findings but fell within the time-frame. Milton and Wiseman
try to answer the question "Why not just include experiments that
followed PRL’s protocol" with this statement:
Milton, Wiseman, “Does Psi Exist? Lack of Replication of an Anomalous Process of
Information Transfer”, Psychological Bulletin, 1999
“As with the earlier studies before them, the 11 [PRL] autoganzfeld studies
varied considerably in procedure, so it would not have been possible to restrict
the meta-analysis to examining exact replication attempts of the autoganzfeld
work, nor is it known which, if any, procedures that might have been common to
all of the autoganzfeld studies might have been crucial to success, ruling out
the possibility of seeking a database of studies that replicated the
autoganzfeld studies in their essentials. We therefore decided in advance to
follow Honorton's approach to ganzfeld meta-analysis of both the early studies
and his own autoganzfeld studies (Honorton et al., 1990; Hyman, 1985) of
including in our database all psi studies that used the ganzfeld technique.”
The results in the original meta-analysis were as follows:
n
Bierman, series 3
Bierman, series 4a only
Bierman, Utrecht 1
Bierman, Utrecht 2
Broughton, Alexander
Broughton, Alexander
Broughton, Alexander
Broughton, Alexander
Amsterdam
Amsterdam
Amsterdam
Amsterdam
Rhine/Durham
Rhine/Durham
Rhine/Durham
Rhine/Durham
1995
1995
1993
1993
1997
1997
1997
1997
hits
40
36
50
50
50
50
51
50
%(*100)
12
13
13
12
12
9
19
11
0,3
0,361111
0,26
0,24
0,24
0,18
0,372549
0,22
Broughton, Alexander
Dalton
Johansson and Parker, series 2
Johansson and Parker, series 3
Johansson and Parker, series 1
Kanthamani, Broughton, series 3
Kanthamani et al, series 5a
Kanthamani et al, series 5b
Kanthamani et al, series 6a
Kanthamani et al, series 6b
Kanthamani, Broughton, series 4
Kanthamani, Broughton,
Kanthamani, Broughton
Kanthamani & Palmer
McDonough et al
Morris et al
Morris et al
Morris et al
Stanford, Frank
Williams, Roe, Upchurch, Lawrence
Willin
Willin
Rhine/Durham
Koestler
Gothenburg
Gothenburg
Gothenburg
Rhine/Durham
Rhine/Durham
Rhine/Durham
Rhine/Durham
Rhine/Durham
Rhine/Durham
Rhine/Durham
Rhine/Durham
Rhine/Durham
Koestler
Koestler
Koestler
Koestler
1997
1994
1997
1997
1997
1994
1994
1994
1994
1994
1994
1994
1994
1993
1994
1993
1993
1995
1991
1994
1996
1996
Total
8
29
30
30
30
40
4
10
20
40
65
46
50
22
20
32
32
97
58
42
16
100
3
12
11
11
6
8
2
1
5
12
24
12
13
2
6
13
8
32
11
5
4
24
0,375
0,413793
0,366667
0,366667
0,2
0,2
0,5
0,1
0,283333
0,3
0,369231
0,26087
0,26
0,090909
0,3
0,40625
0,25
0,329897
0,189655
0,119048
0,25
0,24
1198
326
0,27212
With an overall hit rate of 27.2% and mean effect size (z/n^2) of
0.013 the authors concluded that the PRL autoganzfeld findings had
not been replicated by other laboratories.
Along with the overall hit rate, the paper examined other
conditions previously considered psi-conducive: dynamic vs.
static; mental disciplines; previous paranormal experiences.
These results were presented in the paper as:
Variable and studies
Dynamic vs static targets
Broughton and Alexander, 1996
Morris et al 1993 (study 1)
Mental discipline (novices only)
Bierman et al (1993)
Broughton & Alexander (1996)
Kanthamani & Broughton (1994)
Morris et al (1993) study 1
Previous psychic experiences (novices
only)
Broughton and Alexander (1996)
Kanthamani & Broughton (1994)
Morris et al (1993) study 1
R or
phi^2
N
Z^2
Stouffer z^2
for variable
-0.09
0
151
32
-0.95
0
-0.67
0.17
-0.02
0.08
0.44
91
151
182
32
1.34
-0.07
0.87
2.33
0.02
0.06
0.07
151
182
32
0.04
0.6
0.38
0.59
This meta-analysis soon came in for criticism. The first, in an
electronic mail debate amongst parapsychologists, was that the
meta-analysis should have excluded the three most extreme results
as “outliers”; ie, results which could be due to a freak of chance
and whose absence who give a truer picture of the effect.
Anonymous [Radin?], “Should Ganzfeld Research Continue To Be Crucial In The
Search For A Replicable Psi Effect? Part II”, Journal of Parapsychology, 1999
“Three studies are significantly negative (those labeled Kanthamani & Broughton,
1994, Series 5b; Kanthamani & Palmer, 1993; Williams et al., 1994). When these
three studies are removed, the remaining 27 studies are now homogeneous
([[chi].sup.2] = 32.4, p = 0.35), and the resulting Stouffer z of these 27
studies is z = 1.99, p = .02 (one-tail). Thus, upon removing three outlier
studies from this meta-analysis, the overall result is a statistically
significant replication.”
It was argued that trimming the three most negative results was
unsystematic, and that the most negative and the most positive
should be trimmed. In reply it was said that the direction of the
result was unimportant: it was only how much it deviated from
chance that was of concern. It was then demonstrated that by
trimming the three most positive results, the meta-analysis became
sufficiently homogenous.
Schecter, source unkown, quoted on Bierman’s website
“Radin's analysis indicates that the distribution of ganzfeld study results in
Milton & Wiseman's analysis is heterogeneous and that the overall nonsignificant
Z depends largely on the three most negative of the thirty results. The
implications for intepretation are quite different than they would be if the
nonsignificant Z reflected a fairly even balance of positive and negative
results.
Similarly, removing the three most positive results instead of the three most
negative also leaves a homogeneous distribution.
When the results are listed in order of size, it's easy to see that the highest
10 Z-scores are all greater than +1.0, and 9 of the 10 associated effect sizes
are greater than +.20, while only the 4 most negative Z-scores are more extreme
than -1.0, and only the 3 most negative effect sizes are more extreme than -.20.
I.e., looking at the distribution's shape helps flesh out what the summary
statistics -- the overall Z and the chi-square heterogeneity measure -- tell
us.”
No clear decision was ever reached on this subject but after Bem,
Palmer and Broughton’s 2000 paper (see below) that updated the
post-PRL database to include experiments until 1999, extreme
results were now in the positive direction, and so the "outlier"
argument has not been brought up again.
An argument that stuck much more firmly was that Milton and
Wiseman’s inclusion criteria did not properly specify which
experiments really followed the same procedures as the PRL trials
that the meta-analysis was supposed to reproduce.
If it truly was supposed to see if other laboratories had
replicated PRL’s results with a protocol based on Hyman and
Honorton’s Joint Communiqué, then why was there no attempt to
include those experiment that most closely adhered to those
protocols? As mentioned before, Milton and Wiseman commented that
there was no single experimental protocol throughout the PRL
trials, so no opportunity to evaluate an exact replication.
Nevertheless, it was argued quite persuasively that by including
certain experiments that deviated so widely from the PRL norm, any
sense of "replication" had been lost. In the ganzfeld debate
published in the Journal of Parapsychology, two parapsychologists
put forward the case against Willin’s work using musical targets.
Anonymous, “Should Ganzfeld Research Continue To Be Crucial In The Search For A
Replicable Psi Effect? Part II”, Journal of Parapsychology, 1999
“[the inclusion criteria] seem at best arbitrary. Milton will exclude Symmons
and Morris because they used drumming. On the same basis, the large study of
Willin--which used music targets is a radical departure from standard ganzfeld-should be excluded.”
(The above-mentioned Symmons and Morris experiment, "Drumming at
seven Hz and automated ganzfeld performance", is briefly spoken
about in part 5)
We should bear in mind that Willin himself began his paper by
admitting that in the history of psi experiments there’d been
little evidence to suggest that music could be communicated by
telepathy.
Melvyn Willin, “A ganzfeld experiment using musical targets”, Journal of the
Society of Psychical Research, 1996
"Experiments using actual music as the target have not been conducted very
often. Brief reports by R. Shulman appeared in the Journal of Parapsychology in
1938 (Shulman, 1938), and some by R. W. George in the Parapsychology Bulletin in
1948 (George, 1948). The ganzfeld procedure was not used and only simple tunes
played on various instruments were employed. This precluded the possibility of
an emotional response from the sender. The results were at chance level.
H. J. Keil conducted tests at Duke University—reported in 1965—using music as
the target (Keil, 1965). However, as before, the ganzfeld procedure was not
used, and the music was chosen by the subjects themselves. The purpose of the
experiment was to see whether the order of music being listened to could be
ascertained by a receiver. The results were positive but this was because one
particular subject scored very highly."
However, in 1981, Roney-Dougal completed a successful experiment
in the ganzfeld using audio tapes containing the spoken word. So,
the idea that audio stimulus can influence a sender had some
experimental basis. But with little in the way of previous
evidence on which to base a theory of MUSIC in the ganzfeld,
Willin’s work could at least be considered exploratory.
Other experiments were also called into question. Richard
Broughton pointed out that Kanthamani’s work with dreams used a
non-standard judging technique and the ganzfeld state was
interupted (although in the paper "Experiment in Ganzfeld and
Dreams: A Confirmatory Study", Kanthamani, Khilji, 1988, there is
no mention made of this procdeure: it says the subject came out of
the ganzfeld after the session had ended). Parker said his own
experiment "series 1" should not be considered since auditory
monitoring (in which the sender is able to hear the receiver’s
mentation) is standard, and series 1 lacked this aspect.
Also the dates set for inclusion in the Milton/Wiseman database
have been criticised as arbitrary. This is usually with reference
to the deadline which fell months before the publication of a
particularly sucessful large scale ganzfeld experiment (Dalton,
1997) but could equally apply to the date for the earliest
experiments.
To recap: experiments need to have begun by early 1987 in order
for the recommendations of Hyman and Honorton's Joint Communique
to have been become standard practice (Milton & Wiseman felt this
was a suitable period of time). Effectively this loses all
ganzfeld experiments published before 1991. This misses out many
papers who, judging by the level of detail, were written with
Hyman and Honorton's conclusions in mind. As an indicator of how
many experiments may have been left out even though they followed
the new guidelines, in the Spring of 1987 Bierman published "A
test on Possible Implications of the Observational Theories for
Ganzfeld Research". Despite its appearance barely one year after
The Joint Communique was released, it clearly states "All
recommendations by Hyman and Honorton (JoP 50-1, 1987, in Press)
are taken into account." If papers were being published in 1987
with knowledge of Hyman and Honorton's work, then it could be that
Milton & Wiseman's meta-anlaysis is missing four year's worth of
data. Reading further papers from the late eighties, they include
extensive details on exactly those areas that Hyman & Honorton
expressed an interest in: the randomisation process and the
statistical analyses, indicating a certain adherance to the
guidlines.
With regards to the final deadline, certain parapsychologists
noted a certain unseemly haste on the part of Milton & Wiseman in
their rush to be published. Writing some years later Nancy
Zingrone maintained that the meta-analysis seemed to be “conducted
and published more from self-interest than from a sincere wish to
test the hypothesis at hand” (Journal of Parapsychology, 66,
2002). She criticized the speed with which Milton & Wiseman
submitted their paper for publication, and Adrian Parker, too,
made a similar point.
Nancy Zingrone, correspondence to the editor, Journal of Parapsychology, 2002
“It is usual in the parapsychological community for people to "try out" papers
that will eventually be published by presenting them at the annual
Parapsychological Association conventions. An extra layer of pre-publication
protection from errors of fact or method is provided to authors first by the
convention refereeing process and, second, by the experience of presenting at
the convention and fielding questions and criticisms both on the convention
floor and in informal encounters. It seemed to me to be odd at the time that
Milton and Wiseman chose to submit their convention version to Psychological
Bulletin after it had been accepted for the Proceedings of Presented Papers but
before the actual presentation at the convention. That is, they submitted "Does
Psi Exist?" to the Psychological Bulletin slightly more than six weeks prior to
the PA Convention. The submission was received by Psychological Bulletin on June
23rd, 1997 (Milton & Wiseman, 1999, p.391), and the convention took place from
August 7th - 10th, 1997.”
Adrian Parker, “Parapsychology: the good, the bad, the ugly”, April 2001
“Apparently the Milton-Wiseman paper was accepted [to the journal “Psychological
Bulletin”] on condition that it was to be accompanied by a reply. It was finally
decided this would be from by Professor Morris. Wiseman has given varying
accounts of what then happened during the delay caused by Professor Morris´s
subsequent period of illness but one account is that the authors pressed for
publication. In any event, the outcome was a publication without a reply from
Professor Morris. Worse, Milton and Wiseman were by then in possession of
unpublished data which should have tempered their conclusions.”
It is indeed unfortunate that Milton and Wiseman didn’t pause
before submitting their paper for publication, since it was at
this convention that Kathy Dalton presented her highly successful
results into the ganzfled and creative receivers. This experiment
alone would have been enough to push the overall hit rate of the
meta-analysis from 27% to 29%.
In
to
an
to
light of these new results, certain parapsychologists decided
redo the Milton and Wiseman meta-analysis but this time, making
attempt to weigh the scores according to how much they adhered
a typical standard set-up.
Broughton, source not known, cited on Bierman's website, dated 20th July 1999
"I think somewhere we should reply with simply a blocking analysis of the M&W
meta-analysis blocking on "standard" vs. "non-standard" ganzfeld procedure. I
don’t know how many other studies might fall out on that basis, but at least on
first glance, the non-standard ganzfeld studies seem to have the edge on poor
results."
The first issue for this new analysis was to somehow order the
experiments according to "standardness" in a non-biased manner.
The second issue was to extend the deadline of the meta-analysis
to include more experiments.
As to the ordering, the parapsychologists were very particular
about finding non-biased judges and then delivering to them the
scientific papers in a non-biased way. It had long been
established that scientific referees can allow their pre-existing
ideas about science colour the way they assess other peoples’
work.
Robert J. MacCoun, “Biases in the interpretation and use of research results”,
Annual Reviews Psychological, 1998
“Mahoney (1977) conducted the earliest rigorous demonstration of biased evidence
processing using the experimental approach. Behavioral modification experts
evaluated one of five randomly assigned versions of a research manuscript on the
“effects of extrinsic reward on intrinsic motivation,” a hypothesis in potential
conflict with the experts’ own paradigm. The five versions described an
identical methodology but varied with respect to the study’s results and
discussion section. Mahoney found that the methodology and findings were
evaluated more favorably, and were more likely to be accepted for publication,
when they supported the experts’ views.”
So Bem, Palmer and Broughton decided to remove any experimental
details that may sway the judges in their assessment.
Bem, Palmer, Broughton, “Updating the Ganzfeld Database: A Victim of Its Own
Success?”, Journal of Parapsychology, 2001
“The method sections for the 40 studies to be rated were first edited to
eliminate all article titles, authors, hypotheses, references to results of
other experiments in the sample, and descriptions of psychological tests (except
those given during the ganzfeld or used for subject selection). The edited
method sections were then photocopied and assembled into judging packets.”
These edited papers were submitted to three judges (advanced
social psychology students chosen by Bem from his university, but
who had no knowledge about the ganzfeld database other than what
they had heard in Bem’s lectures about PRL) who were also given
previously published explanations of the typical ganzfeld
procedure, one published in 1994 ("The Ganzfeld Procedure" section
taken from Bem and Honorton’s 1994 paper "Does Psi Exist?") and
the other more detailed description of the PRL set-up (Honorton et
al, "Psi communication in the ganzfeld", Journal of
Parapsychology, 1990).
Bem, Palmer, Broughton, "Updating the Ganzfeld Database: A Victim of Its Own
Success?", Journal of Parapsychology 65, 2001
"As hypothesized, the degree to which a replication adheres to the standard
ganzfeld protocol is positively and significantly correlated with ES, rs(38) =
.31, p = .024, one-tailed.
This same outcome can be observed by defining as standard the 29 studies whose
ratings fell above the midpoint of the scale (4) and defining as non-standard
the 9 studies that fell below the midpoint (2 studies fell at the midpoint): The
standard studies obtain an overall hit rate of 31.2%, ES = .096, Stouffer Z =
3.49, p = .0002, one-tailed. In contrast, the non-standard studies obtain an
overall hit rate of only 24.0%, ES = -.10, Stouffer Z = -1.30, ns. The
difference between the standard and non-standard studies is itself significant,
U = 190.5, p = .020, one-tailed. Most importantly, the mean effect size of the
standard studies falls within the 95% confidence intervals of both the 39 preautoganzfeld studies and the 10 autoganzfeld studies summarized by Bem and
Honorton (1994). In other words, ganzfeld studies that adhere to the standard
ganzfeld protocol continue to replicate with effect sizes comparable to those of
previous studies."
Bem, Broughton and Palmer's meta-analysis, listed according to standardness.
z
Hit
Study
Trials
Standardness
ES
score
Rate%
Bierman et al. (1993) (Series I)
50
0.03
0.00
26.0
7.00
Bierman et al. (1993) (Series II)
50
-0.30
-0.04
24.0
7.00
Broughton & Alexander (1997) (First
Timers Series 1) a
50
-0.30
-0.04
24.0
7.00
Broughton & Alexander (1997) (First
Timers Series 2) a
50
-1.33
-0.19
18.0
7.00
Broughton & Alexander (1997)
(Emotionally Close Series) a
51
1.81
0.25
37.3
7.00
Dalton (1994)
29
1.76
0.33
41.4
7.00
*Dalton (1997)
128
5.20
0.46
46.9
7.00
Morris et al. (1993) (Cunningham Study) 32
1.78
0.31
40.6
7.00
*Alexander & Broughton (1999)
50
1.60
0.23
36.0
6.67
Broughton & Alexander (1997)
(Clairvoyance Series) a
50
-0.64
-0.09
22.0
6.67
Broughton & Alexander (1997) (General
Series) a
8
0.46
0.16
37.5
6.67
Kanthamani & Broughton (1994) (Series
3)
40
-0.91
-0.14
20.0
6.67
Kanthamani & Broughton (1994) (Series
4)
65
2.01
0.25
36.9
6.67
Parker et al. (1997) (Study 2)b
30
1.25
0.23
36.7
6.67
b
30
1.25
0.23
36.7
6.67
*Parker & Westerlund (1998) (Study 4)
30
2.40
0.44
46.7
6.67
*Parker & Westerlund (1998) (Study 5)
30
1.25
0.23
36.7
6.67
Kanthamani & Palmer (1993)
22
-2.17
-0.46
9.1
6.33
Morris et al. (1995)
97
1. 67
0.17
33.0
6.33
Kanthamani & Broughton (1994) (Series
8)
50
0.03
0.00
26.0
6.00
Morris et al. (1993) (McAlpine Study)
32
-0.17
-0.03
25.0
6.00
Stanford & Frank (1991)
58
-1.24
-0.16
19.0d
5.67
Kanthamani & Broughton (1994) (Series
7)
46
0.03
0.00
26.1
5.33
McDonough et al. (1994)
20
1.02
0.23
30.0
5.33
30
-0.83
-0.15
20.0
5.33
Williams et al. (1994)
42
-2.30
-0.35
11.9
5.33
*Wezelman et al. (1997)
32
2.15
0.38
43.8
4.67
Bierman (1995) (Series III)
40
1.94
0.31
40.0
4.33
Bierman (1995) Series IV
36
1.33
0.22
36.1
4.33
*Symmons & Morris (1997)
51
2.97
0.42
45.1
4.00
32
-1.45
-0.26
15.6
Parker et al. (1997) (Study 3)
Parker et al. (1997) (Study 1)
b
*Wezelman & Bierman (1997) (Series IV)
4.00
40
0.52
0.08
30.0
d
Kanthamani & Broughton (1992) (Series
6a)c
20
-0.46
-0.10
25.0
d
3.33
*Parker & Westerlund (1998) (Serial
Study)
30
-0.49
-0.09
23.0
d
3.33
*Wezelman & Bierman (1997) (Series V)
40
-0.91
-0.14
20.0
3.00
*Wezelman & Bierman (1997) (Series VI)
40
-0.15
-0.02
25.0
3.00
4
0.22
0.11
50.0
Kanthamani & Khilji (1992) (Series 6b)
Kanthamani et al. (1988) (Series 5a)
c
Kanthamani et al. (1988) (Series 5b)
c
c
3.67
2.67
d
10
-2.06
-0.65
10.0
2.67
Willin (1996a)
100
-0.33
-0.03
24.0
1.33
Willin (1996b)
16
-0.24
-0.06
25.0
1.33
Note. *Asterisks denote studies added to Milton and Wiseman (1999).
a
Cited as Broughton and Alexander (1996) in Milton and Wiseman (1999).
b
Cited as Johansson and Parker (1995) in Milton and Wiseman (1999).
c
Series summarized and numbered in Kanthamani and Broughton (1994).
d
Hit rate not reported. Estimated from z score.
But the "Standardness" criteria does have one criticism: it was
only used this one time in the ganzfeld debate, and no attempt has
been made to see if the same effect occurs with earlier
experiments. As the authors themselves say:
Bem, Palmer, Broughton, "Updating the Ganzfeld Database: A Victim of Its Own
Success?", Journal of Parapsychology 65, 2001
"It is true, of course, that the pre-autoganzfeld studies were themselves
methodologically diverse and may have included some studies that would have been
rated as non-standard by our raters. If such studies were to be excluded from
the pre-autoganzfeld database, it is conceivable that the new replications would
not fall inside the pre-autoganzfeld confidence limits. This possibility can
only be assessed by a separate standardness analysis of the pre-autoganzfeld
database."
Despite this clear discrepency between how the experiments of
1974-82 and of 1991-99 were treated, the parapsychological
literature continues to treat the two as if they were replications
of each other. The "standardness", to my mind, looks like a
second roll of the dice: an attempt to squeeze some positive
conclusions from negative data. One should also remind oneself
that when, in 1986, Hyman and Honorton recommended a new standard
way to conduct and report ganzfeld experiments, some
parapsychologists had grave doubts that this would help matters,
saying that novelty had an important role to play in psi-conducive
experiments.
Stanford, "Commentary on the Hyman-Honorton Joint Communique", Journal of
Parapsychology 50, 1986
"I am made quite uneasy by the readiness of Hyman and Honorton to make a cause
celebre out of the issue of anomalous communication during ganzfeld by proposing
a large-scale systematic replication series under the auspices of the National
Science Foundation (NSF). Such a suggestion is incredibly premature and could
prove both wasteful and hurtful. My own considered judgment is that success with
the ganzfeld-ESP paradigm depends very heavily on a number of variables that are
implicit in much of the work done, but not explicit in the written reports.
Indeed, those variables might be difficult or impossible to identify or
verbalize at present, and they may represent a complex combination of factors.
Such factors might include subject population differences, objective laboratoryspecific circumstances, and differences in the treatment of subjects, especially
in aspects of social interaction (Stanford, 1985). The reasons for success or
failure with the ganzfeld-ESP paradigm simply have not been pinpointed, metaanalysis notwithstanding, and there is need for much systematic research here."
Further more, they were helped greatly by the release of new data
from Dalton (1997) and Parker et al (1999) and also by the lack of
data from the period 1987-91. This period, despite showing
adherance to Hyman and Honorton's work is pretty bleak in terms of
positive results. By my reckoning, including all experiments from
1987-1999 gives 28.7% (not including PRL autoganzfeld) or 29.3%
(including PRL autoganzfeld).
© Copyright 2026 Paperzz