Contribution of Prof. RA Fisher to statistics

2017
Contribution of Prof. R.A. Fisher
to statistics
VIDYA SAGAR,(ROLL NO.:-1875)
DEPARTMENT OF STATISTICS
5/5/2017
During writing the dissertation, a few people become an inseparable part of
this dissertation. All the people I thank below were my strength during
writing the dissertation.
Prof. Dr. Rameshwar Nath Mishra (head of department of statistics, Patna
university) as well as Prof. Dr. Srikant Singh (department of statistics, Patna
university) who gave me the golden opportunity to do this wonderful
dissertation on the topic ‘ Some contribution of Prof. Ronald Alymer Fisher
on statistics’ which also helped me in doing lot of research and I come to know
about so many new things. Prof. Amendra Mishra , Prof. Anchala kumari and
others Professor who help and guide me from first day of my college life and
make me able to write this distraction .
Founder of Google ‘ Larry page’ and
information and photos of R.A. Fisher .
‘Sergey Brin’ who provide me so
My parents who has been supporting me from birth, my friends who stood by
me in all the phases that i went through while thinking, writing and improving
this dissertation and helped me a lot in finalizing this dissertation within the
limited time frame.
-----------------------------------------------------------------------------------------------------
Signature
CONTENTS
1. INTRODUCTION
2. BEGINNINGS
3. Academic Career
4. R.A. Fisher’s Timelines and scientific achievements
5. Contribution
6. Tobacco , lungs cancer and Fisher’s biggest
mistake
7. Books
8. Some personal detail and ends
Introduction
Lived 1890 – 1962.
Sir Ronald fisher F.R.S.(1890-1962) was one of the leading scientist of the 20th
century who laid the foundation for modern statistics. As a statistican working
at the Rothamsted Agricultural Experimentation station, the oldest agricultural
research institute in the United Kingdom, he also made major contribution to
Evolutionary Biology and Genetics. The concept of randomization and the
analysis of variance procedures that he introduced are now used throughout the
world. In 1922 he gave a new definition of statistics.
In addition to being probably the greatest statistician ever, he also invented
experimental design and was one of the principal founders of population
genetics. He unified the disconnected concepts of natural selection and
Mendel’s rules of inheritance. In quantitative biology the importance of his
book Statistical Methods for Research Workers has been likened to that of Isaac
Newton’s Principia in physics.
According to geneticist and author Richard Dawkins, Fisher was the greatest
biologist since Charles Darwin.
Beginnings
Ronald Aylmer Fisher was born into a wealthy family in London, England,
UK on February 17, 1890. He was the second born of twins; his elder twin
was still-born.
His father, George Fisher, was an enormously successful fine arts dealer,
who ran an auction company ranking in importance with Sotheby’s or
Christie’s. His mother, Katie Heath, was a lawyer’s daughter.
Ronald’s parents could afford the best private schooling for him, but his life
of abundance was temporary. His mother died of peritonitis when he was
14, and, when he was 15, his father’s business folded. The family moved
from a luxurious mansion in one of the richest parts of London –
Hampstead – to a small house in one of the poorer parts – Streatham.
Ronald continued to be educated at Harrow School; not because his father
could afford the very high fees, but because Ronald was a brilliant student
and was awarded scholarships. One of his masters later commented that, of
all the students he had taught, Ronald was uniquely brilliant.
In addition to his family’s ill-fortunes, Ronald was hampered by a personal
disability – his appalling short-sightedness. His eyesight was so bad that he
was not allowed to read under electric light because it strained his eyes too
much. This particular cloud, however, seems to have had a silver lining,
shifting his perspective on mathematics. He learned to visualize problems
in his mind’s eye and solve them in his head rather than on paper.
Academic career of Prof. R. A.
Fisher
Fisher viewed himself as a scientist, especially interested in biology.
Despite this, he did not enjoy learning the intricacies and names of
biological structures. He decided to study mathematics, believing it was
through mathematics he could make the greatest contributions to biology.
In 1909, at the age of 19, he won a scholarship to the University of
Cambridge. Three years later he graduated with first class honors in
mathematics.
Although clearly a brilliant mathematician, his tutors were dubious about
his future. They were worried that in mathematics he tended to ‘see’ the
correct answer and write it down, rather than go through the usual
processes of calculation and proof.
After graduating, Fisher spent a further year at Cambridge studying
postgraduate level physics, including the theory of errors, a topic which
heightened his interest in statistics.
In 1911, still an undergraduate, Fisher formed Cambridge University’s
Eugenics Society, which attracted a number of prominent members.
Charles Darwin’s son Leonard lectured to the society in 1912. He and
Fisher became firm, lifelong friends.
It was Fisher’s interest in eugenics that first prompted him to look at the
genetics of a population, leading him to found – along with J. B. S. Haldane
and Sewall Wright – the new science of population genetics. In 1919 he was
offered a position at the Galton Laboratory in University College
London led by Karl Pearson, but instead accepted a temporary job
at Rothamsted Research in Harpenden to investigate the possibility of
analysing the vast amount of data accumulated since 1842 from the
"Classical Field Experiments" where he analysed the data recorded over
many years and published Studies in Crop Variation in 1921.
Ronald Fisher’s Timeline and Main Scientific
Achievements
1912 – Published his first paper, in which he created the method of
maximum likelihood. He continued refining this method for 10 years.
1912 – established the principle that there is a sample mean different from
the population mean.
1913 – worked as a statistician for an insurance company and trained in
Britain’s Territorial Army.
1914 – volunteered for the British Army at the start of World War 1 and
was rejected because of his poor eyesight. This may have saved him from a
similar fate to Henry Moseley, who volunteered successfully.
1914 – became a high school mathematics and physics teacher. He soon
found he did not enjoy teaching, but had to stick with it in the absence of
any other way of earning a living.
1918 – with financial help from Leonard Darwin he published a landmark
paper that founded quantitative genetics: The correlation between relatives
on the supposition of Mendelian inheritance. Fisher also introduced the
concept of variance for the first time. The paper had been delayed since
1916 because referees had difficulty understanding it. Geneticist James
Crow likened this paper, written when Fisher was a high school teacher,
to Albert Einstein’s great papers published when he was working in a
patent office.
Ronald Fisher’s Timeline and Main Scientific
Achievements
Fisher’s new ideas and his mathematical approach to biological questions
were often met with incomprehension and sometimes downright
resistance. Fisher had a quick temper and became involved in some rather
bitter feuds. His ability to ‘see’ the answers to complex mathematics
problems was both a blessing – he made outstanding progress – and a
curse – people could often not follow the logic of his arguments. The books
he would later write were landmarks in biology and statistics, but often had
to be explained by more ‘user friendly’ scientists before they became
widely understood.
1919 – Became a statistician at Rothamsted Experimental Station in central
England, working in agricultural research. Here he had access to a huge
amount of biological data collected since 1842. He applied his
mathematical genius to the data, enabling him to invent the tools of modern
experimental design.
1921 – created the statistical method of analysis of variance (ANOVA) and
introduced the concept of likelihood.
1924 – created the F distribution
1925 – released his book Statistical Methods for Research Workers. At the
time of its publication it received no positive reviews, yet it was soon to
revolutionize statistics and biology. Geneticist and mathematician Alan
Owen wrote that this book occupies a position in quantitative biology
similar to Isaac Newton’sPrincipia in physics.
1929 – elected to the Royal Society, joining the United Kingdom’s scientific
elite.
1930 – released his book The Genetical Theory of Natural Selection, unifying
the theory of natural selection with Mendel’s laws of inheritance, defining
the new field of population genetics and revitalizing the concept of sexual
selection. Fisher introduced a large number of vital new concepts in this
book including: the inverse relationship between the magnitude of a
mutation and the likelihood of the mutation increasing an organism’s
fitness; parental investment; Fisher’s principle; the sexy son hypothesis;
Ronald Fisher’s Timeline and Main Scientific
Achievements
and the heterozygote advantage. Geneticist James Crow described this book
as “the natural successor to The Origin of Species.”
1933 – appointed Professor of Eugenics at University College London.
While working in this role he studied the genetics of human blood groups,
explained the Rhesus system and established the Fisher-Race notation, still
used today, for Rhesus phenotypes and genotypes. Fisher’s reputation as a
lecturer was reminiscent of Willard Gibbs’s: his lecture courses usually lost
most of the students who started them. Only a small band of the smartest
students could stick with his brilliant but challenging ideas.
1935 – released his book The Design of Experiments introducing the
concept of a null hypothesis.
1936 – although his work was decisive in unifying natural selection with
Mendel’s laws of inheritance, Fisher’s careful statistical analysis of Mendel’s
data suggested all was not well.
Mendel’s results showed too few random errors to have come from real
experiments. Nearly all of Mendel’s data showed an unnatural bias. Fischer
wrote:
“Although no explanation can be expected to be satisfactory, it
remains a possibility among others that Mendel was deceived by some
assistant who knew too well what was expected. This possibility is
supported by independent evidence that the data of most, if not all, of the
experiments have been falsified so as to agree closely with Mendel’s
expectations.”
RONALD FISHER, 1890 TO 1962
Ronald Fisher’s Timeline and Main Scientific
Achievements
Has Mendel’s work been rediscovered? Annals of Science 1: 115 – 117, 1936
Fisher’s analysis said there was only a 1-in-2000 chance that Mendel’s
results were the fully reported results of real experiments. Mendel’s results
and conclusions, however, were correct.
1939 – University College London closed down the Eugenics Department
and Fisher returned to the Rothamsted Experimental Station.
1943 – appointed to the Balfour Chair of Genetics at the University of
Cambridge.
1952 – knighted by Queen Elizabeth, becoming Sir Ronald Aylmer Fisher.
1955 – awarded the Royal Society’s Copley Medal, one of the greatest
prizes in science. Previous recipients included Benjamin
Franklin, Alessandro Volta, Michael Faraday, Robert Bunsen, Charles
Darwin, Willard Gibbs, Dmitri Mendeleev, Alfred Russel Wallace, J. J.
Thomson, Ernest Rutherford, Albert Einstein, and James Chadwick.
1957 – retired from his chair at Cambridge, but continued working there
for two years.
1959 – moved to Adelaide, Australia to do research work with E. A. Cornish
at CSIRO. Now 69 years old, one of the main reasons he moved was that he
enjoyed the warm, sunny climate of South Australia.
By the end of his career, Fisher had written 7 books and almost 400
academic papers devoted to statistics.
Some contribution of
R.A. Fisher to statistics
The contributions of Sir Ronald Aylmer Fisher to the discipline of statistics
are multifarious, profound and long-lasting. In fact, he can be regarded as
having laid the foundations of statistics as a science. He is often dubbed the
'father of statistics'. He contributed both to the mathematical theory of
statistics and to its applications, especially to agriculture and the design of
experiments therein. His contributions to statistics are so many that it is
not even possible to mention them all in this short article. We, therefore,
confine our attention to discussing what we regard as the more important
among them. Fisher provided a unified and general theory for analysis of
data and laid the logical foundations for inductive inference. Fisher
regarded statistical methods from the point of view of applications. Since
he was always involved in solving biological problems which needed
statistical methods, he himself developed a large body of methods. Many of
them have become standard tools in a statistician's repertoire. Some of
these developments required rather deep mathematical work and it was
Some contribution of
R.A. Fisher to statistics
characteristic of-Fisher to use elegant geometrical arguments in the
derivation of his results. An excellent example of this type is his derivation
of the sampling distribution of the correlation coefficient. Fisher had poor
eyesight even at an young age, which prevented private reading and he
relied largely on being read to, which in turn involved doing mathematics
without pencil, paper and other such visual aids. It is believed that this
situation helped him develop a keen geometrical sense. Fisher's
contributions to statistics have also given rise to a number of bitter
controversies, due to the nature of the ideas and his personality and
idiosyncrasies. Some of the controversies only go to show that it is not
possible to build the whole gamut of statistical theory and methodology on
a single paradigm and that no single system is quite solid, as Fisher himself
realised.
Sampling distribution
Fisher derived mathematically the sampling distribution of the
Student's t statistic which Gosset (pen name: Student) had
derived earlier by 'simulation'. Fisher also derived
mathematically the sampling distributions of the F statistic, the
correlation coefficient and the multiple correlation coefficient and
the sampling distributions associated with the general linear
model. Fisher's derivation of the sampling distribution of the
correlation coefficient from a bivariate normal distribution was
the starting point of the modern theory of exact sampling
distributions. Another useful and important contribution was the
tanh -1 transformation he found for the correlation coefficient to
make its sampling distribution close to the normal distribution, so
that tables of the standard normal distribution could be used in
testing significance of the correlation coefficient. Fisher made a
modification in the degree of freedom of the Pearson's X2 when
parameters are to be estimated
Maximum likelihood
estimator
Fisher's very first paper published in 1912 (at the age of22) was on the
method of maximum likelihood (although he did not call it so at that time).
He developed this in view of his lack of satisfaction with the methods of
moment estimators and least squares estimators. At that time the term
'likelihood' as opposed to probability or inverse probability caused some
controversies. Although the basic idea of likelihood dates back to Lambert
and Bernoulli and the method of estimation can be found in the works of
Gauss, Laplace and Edgeworth, it was Fisher to whom the idea is credited,
since he developed it and advocated its use. Fisher studied the maximum
likelihood estimation in some detail establishing its efficiency. Fisher's
mathematics was not always rigorous, certainly not by modern day
standards, but even then, his mathematical work, like in the case of his
work on maximum likelihood estimation, provides a great deal of insight.
Some of his claims on the properties of maximum likelihood estimators
were proved to be false in their generality by Bahadur, Basu and Savage.
Subsequent authors developed strong theories based on the likelihood
function. Fisher advocated maximum likelihood estimation as a standard
procedure and since then it has become the foremost estimation method
and has been developed for innumerable problems in many different
sciences and contexts. It also has seen enormous ramifications and plays a
central role in statistical theory, methodology and applications. Following
his work on the likelihood, Fisher did a lot of work on the theory of
estimation and developed the notions of sufficiency, information,
consistency, efficiency and ancillary statistic and integrated them into a
well-knit theory of estimation. His pioneering work on this is contained in
two papers he wrote in 1922 and 1925.
Analysis of Variance
In 1919, Fisher joined the Rothamsted Experimental Station,
where one of his tasks was to analyse data from current field
trials. It is in this context that he formulated and developed the
technique of analysis of variance. The analysis of variance is really
a convenient way of organising the computation for analysing
data in certain situations. Fisher developed the analysis of
variance initially for orthogonal designs such as randomised
block designs and Latin square designs. Later, Frank Yates
extended the technique to non-orthogonal designs such as
balanced incomplete block designs, designs with a factorial
structure of treatments, etc. The technique of analysis of variance
developed rapidly and has come to be used in a wide variety of
problems formulated in the set-up of the linear model. Although
initially developed as a convenient means of testing hypotheses, it
also throws light on sources of experimental error and helps set
up confidence intervals for means, contrasts, etc.
Design of Experiment
Fisher's studies on the analysis of variance brought to light certain
inadequacies in the schemes being used then for experiments, especially
agricultural experiments. It is in an attempt to sort out these inadequacies
that Fisher evolved design of experiments as a science and enunciated
clearly and carefully the basic principles of experiments as randomisation,
replication and local control (blocking, confounding, etc.). The theory of
design of experiments he formulated was intended to provide adequate
techniques for collecting primary data and for drawing valid inferences
from them and extracting efficiently the maximum amount of information
from the data collected. Randomisation guarantees validity of estimates
and their unbiasedness . Replication helps provide a source of estimate of
error, which can be used to compare treatments and other effects, test
hypotheses and set up confidence limits. Local control helps to reduce
sampling variations in the comparisons by eliminating some sources of
such variations. Fisher formulated randomised block designs, Latin square
designs, factorial arrangements of treatments and other efficient designs
and worked out the analysis of variance structures for them. The subject of
design of experiments then developed rapidly both in the direction of
formulation and· use of efficient designs, especially in agricultural
experiments, in the direction of statistical theory formulating useful and
efficient designs and working out their analyses, and in the direction of
interesting and difficult combinatorial mathematics investigating the
existence of designs of certain types and their construction. Surely, the
formulation of the basics of experimental design should be regarded as
Fisher's most important contribution to statistics and science.
Discriminant Analysis
From the time Fisher derived the sampling distributions of
correlation coefficient and the multiple correlation coefficient, he
was interested in the smdy of relationships between different
measurements on the same individual and the use of multiple
measurements for the purposes of classification and other
problems. Fisher formulated the problem of discriminant analysis
(what might be called a statistical pattern recognition problem
today) in statistical terms and arrived at what is called the linear
discriminant function for classifying an object into one of two
classes on the basis of measurements on multiple variables. He
derived the linear discriminant function as the linear combination
of the variables that maximises the between-group to withingroup squared distance. Since then the same function has been
derived from other considerations such as a Bayes decision rule
and has been applied in many fields like biological taxonomy,
medical diagnosis, engineering pattern recognition and other
classification problems. Statistical and other pattern recognition
methods and image processing techniques have made
considerable progress in the last two or three decades, in theory
and in applications, but Fisher's linear discriminant function still
has a place in the pattern recognition repertoire.
Fisher’s Exact test
Fisher's exact test is a statistical significance test used in the analysis
of contingency
tables. Although
in
practice
it
is
employed
when sample sizes are small, it is valid for all sample sizes. It is named after
its inventor, Ronald Fisher, and is one of a class of exact tests, so called
because the significance of the deviation from a null hypothesis (e.g., Pvalue) can be calculated exactly, rather than relying on an approximation
that becomes exact in the limit as the sample size grows to infinity, as with
many statistical tests.
The test is useful for categorical data that result from classifying objects in
two different ways; it is used to examine the significance of the association
(contingency) between the two kinds of classification. With large samples,
a chi-squared test can be used in this situation.
Sufficient statistics
In statistics, a statistic is sufficient with respect to a statistical
model and its associated unknown parameter if "no other statistic
that can be calculated from the same sample provides any
additional information as to the value of the parameter". In
particular, a statistic is sufficient for a family of probability
distributions if the sample from which it is calculated gives no
additional information than does the statistic, as to which of those
probability distributions is that of the population from which the
sample was taken. More generally, the "unknown parameter" may
represent a vector of unknown quantities or may represent
everything about the model that is unknown or not fully specified.
In such a case, the sufficient statistic may be a set of functions,
called a jointly sufficient statistic.
The concept, due to Ronald Fisher, is equivalent to the statement
that, conditional on the value of a sufficient statistic for a
parameter, the joint probability distribution of the data does not
depend on that parameter. Both the statistic and the underlying
parameter can be vectors.
Fisher–Neyman factorization theorem
Fisher's factorization theorem or factorization criterion provides a
convenient characterization of a sufficient statistic. If
the probability density function is ƒθ(x), then T is sufficient for θ if
and only if nonnegative functions g and h can be found such that
𝑓𝜃 (𝑥 ) = ℎ(𝑥 )𝑔𝜃 (𝑇(𝑥 ))
i.e. the density ƒ can be factored into a product such that one
factor, h, does not depend on θ and the other factor, which
does depend on θ, depends on x only through T(x).
It is easy to see that if F(t) is a one-to-one function and T is a
sufficient statistic, then F(T) is a sufficient statistic. In
Sufficient statistics
particular we can multiply a sufficient statistic by a nonzero
constant and get another sufficient statistic.
This concept of Fisher finds useful in the application of RaoBlackwell theorem and in the family of exponential family.
Tobacco, Lungs
cancer, and Fisher’s biggest
mistake
In 1954 Richard Doll and Bradford Hill published evidence in the
British Medical Journal showing a strong link between smoking and
lung cancer. They published further evidence in 1956.
Fisher was a paid tobacco industry consultant and a devoted pipe
smoker. He did not think the statistical evidence for a link was
convincing. He accepted that smoking seemed to be correlated with
lung cancer, but declared that ‘correlation is not causation.’ He said a
good case had been made for further research, but not for suggesting
to people that they should stop smoking.
Reviewing Fisher’s arguments today is interesting. He made many
valid scientific points against the research that linked lung cancer to
smoking. History and further research, however, proved him wrong.
Books by R.A.
Fisher
Fisher wrote several books on statistics, many of them containing his
original ideas. The most important among them are: Statistical Methods for
Research Workers, Edinburgh: Oliver & Boyd, first published in 1925 and
which has seen several editions. This unusual book is full of original ideas,
written from the point of view of applications; here, each technique is
explained starting with an actual scientific problem and a live data set
collected to be able to answer certain questions and an enunciation of an
appropriate statistical method with illustration on this data Design of
Experiments, first published in 1935 has also seen several editions. Besides
these, with F Yates, he compiled and published Statistical Tables for
Biological, Agricultural and Medical Research in 1938 (with several
subsequent editions), also by Edinburgh: Oliver & Boyd. These tables,
together with those by Pearson and Hartley, were essential tools of a
statistician's trade in those days when a statistical laboratory consisted of
manually or electrically operated calculating machines and even in the days
of electronic desk calculators.
Books by R.A.
Fisher
Students may not find Fisher's books quite readable and until one has
mastered the material from some other source or with the help of a good
teacher, his books may not help. However, they make very useful and
enjoyable reading for an expert and for a teacher!
Suggested Reading
• June 1964 issue of Biometrics. Vol.20. No.2. in memoriam Ronald Aylmer
Fisher. Dedicated to the memory of Fisher soon after his death, contains
many articles on his life and work.
• Box J. R A Fisher: The Life of a Scientist. John Wiley & Sons. New York,
1978.
• Savage L J. Rereading of R A Fisher. In L J Savage: The writings of Leonard
Jimmie Savage: A Memorial Selection. American Statistical Association and
the Institute of Mathematical Statistics. Hayward. Calif. pp. 678-720, 1981.
• Fisher-Box J. Fisher, Ronald Aylmer. In Kotz S, Johnson N Land Read C B
(Eds.).Encyclopedia of Statistical Sciences. New York. Wiley Interscience.
Vo1.3. pp. 103-111, 1988.
• December 1990 issue of Biometrics. Vol. 46. No.4. published in the year of
Fisher's birth centenary, contains a few articles on his life and work.
• Rao C R. R A Fisher: The founder of moclem statistics. Statistical Science.
Vol. 7. pp.34-48, 1992.
Some Personal
Details and the End
Fisher married Ruth Guinness, a physician’s daughter, in 1917. He was 27,
she was 17. Together they had seven daughters and two sons. Their eldest
son George was killed in action flying his fighter plane in 1943, during
World War 2. Fisher’s marriage then fell apart.
Fisher was known to have a quick temper: he got involved in scientific
feuds and could behave rudely to people he had strong disagreements with.
On the other hand, his many friends reported that he was warm, likable,
friendly, had a sharp and appealing sense of humour, was engagingly
eccentric at times, and was an intellectually stimulating companion. He was
generous with his ideas: many people who talked to him were able to
publish work as their own in which Fisher’s informal, unaccredited
contributions had been vital.
Ronald Fisher died aged 72 on July 29, 1962, in Adelaide, Australia
following an operation for colon cancer. With bitter irony, we now know
that the likelihood of getting this disease increases in smokers. Ronald
Fisher was cremated and his ashes interred in St. Peter’s Cathedral,
Adelaide.
“I am genuinely sorry for scientists of the younger generation
who never knew Fisher personally. So long as you avoided a handful of
subjects like inverse probability that would turn Fisher in the briefest
possible moment from extreme urbanity into a boiling cauldron of wrath,
you got by with little worse than a thick head from the port which he, like
the Cambridge mathematician J. E. Littlewood, loved to drink in the
evening. And on the credit side you gained a cherished memory of English
spoken in a Shakespearean style and delivered in the manner of a Spanish
grandee.