a recollection of the ESRC-CNRS funded research

Twenty five years later
Plan
Some whys
Immediate Results
Further Results
Examples
Toulouse-Lancaster: a recollection of the
ESRC-CNRS funded research project in statistics
(1984-1986)
Antoine de Falguerolles
Institut de Mathématiques de Toulouse, Statistique et Probabilités
14 avril 2010
Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986)
Institut de Mathématiques de Toulouse, Statistique et Probabilités
End
Twenty five years later
Plan
Some whys
2010 −
Immediate Results
1984 + 1986
2
Further Results
= 25
Google for Twenty five years after
Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986)
Institut de Mathématiques de Toulouse, Statistique et Probabilités
Examples
End
Twenty five years later
Plan
Some whys
Immediate Results
Further Results
Examples
Twenty five years later, a new Musketeer book between The Round
Table of the Musketeers and The Man in the Iron Mask.
Two decades and a half have passed since the famous swordsmen
triumphed over Cardinal Analyse des Donnés and Milady Modelling
in The Round Table of the Musketeers. Time has not weakened
their resolve, nor dispersed their loyalties. But treasons and
strategems still cry out for justice: European regulation on the
Rosé wine endangers the throne of France, while in England,
Brown promises to rebuild the economy, renew society and restore
faith in politics. Today, the Royal Statistical Society brings its
immortal Companies of Musketeers out of world-wide dispersion to
cross swords with time, the malevolence of non-statisticians, and
the forces of history. But their greatest test is the titanic struggle
with the son of Milady who wears the face of evil.
Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986)
Institut de Mathématiques de Toulouse, Statistique et Probabilités
End
Twenty five years later
Plan
Some whys
Immediate Results
Further Results
Who are the Musketeers?
Who is the devilish son of
Milady?
Who is Count d’Artagnan?
I
the Musketeers: The
aficionados of statistical
modelling!
I
the devilish son of Milady:
Some dear colleague!
I
Count d’Artagnan: see
next
Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986)
Institut de Mathématiques de Toulouse, Statistique et Probabilités
Examples
End
Twenty five years later
Plan
Some whys
Immediate Results
Further Results
Examples
End
ESRC/CNRS FRANCO-BRITISH
PROGRAMME
From left to right: Musqueteers
Francis, Baccini(?) or Saint
Pierre (?), Hinde, and Carlier.
Notice the different tunics.
“The research project has the general
aim of comparing and evaluating the
distinct French and British
approaches to data analysis through
the analysis of a number of complex
data sets, and determining to what
extent they are complementary
rather than competitive.”
Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986)
Institut de Mathématiques de Toulouse, Statistique et Probabilités
Twenty five years later
Plan
Some whys
Immediate Results
Further Results
Examples
I
Analyse des données /
Modélisation statistique
I
Exploratory Analysis /
Statistical modelling
I
e.g: SVD of data matrices
/ Statistical models for
muldimensional arrays
Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986)
Institut de Mathématiques de Toulouse, Statistique et Probabilités
End
Twenty five years later
Plan
Some whys
Immediate Results
Twenty five years later
Plan
Some whys
Why this theme?
Why Toulouse?
The influential Edmond Lisle
Immediate Results
The round table
Further Results
Alain Baccini
Henri Caussinus
Jean-René Mathieu
Examples
CA and link functions
Retinal convergence
End
Further Results
Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986)
Institut de Mathématiques de Toulouse, Statistique et Probabilités
Examples
End
Twenty five years later
Plan
Some whys
Immediate Results
Further Results
Examples
Why this theme?
Why this theme?
I
It seems widely accepted that there was such thing as a
French approach in the Eighties.
I
To what extent did this “School” dominated statistical data
analysis in France?
I
To what extent did this “School” was seen outside France as
dominating statistical data analysis in France?
These questions (and many more) are discussed
I
In a special issue of the Journal Electronique d’Histoire des Probabilités et de la Statistique
http://www.jehps.net/decembre2008.html. Some Matériaux pour l’histoire de l’analyse des données
have been organised by Ludovic Lebart. They include an introduction by Ludovic Lebart, a series of articles
(John Gower, Fionn Murtagh, Michel Armatte, Alain Desrosieres, Willem J. Heiser, Antoine de
Falguerolles, Alfredo Rizzi, Hans-Hermann Bock, Boris Mirkin and Ilya Muchnik), and some texts and
documents from Jean-Paul Benzécri, Henry Rouanet et Dominique Lepine, Noboru Ohsumi.
I
Henri Caussinus (2002): Some concluding observations in Annales de la Faculté des Sciences de Toulouse.
Vol. XI, n 4, 2002 pp. 587591.
www.stat.cmu.edu/~fienberg/ToulouseAnnales-4-2002/Conclusion.pdf
Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986)
Institut de Mathématiques de Toulouse, Statistique et Probabilités
End
Twenty five years later
Plan
Some whys
Immediate Results
Further Results
Examples
Why this theme?
The debate on the emphasis to put either on DATA or on
MODEL is recurrent.
I
Ernest Fournier de Flaix commenting some of the talks given
at the Jubilee meeting (1885) of the Royal (London)
Statistical Association:
[. . . ]C’est la traduction en courbes graphiques des
calculs de probabilités, mais les calculs de probabilité
sont un des dangers de la statistique. [. . . ]
Ernest Fournier de Flaix: Le jubilee-volume de la Société de
Statistique de Londres, Journal de la Société de Statistique de
Paris, vol. 27, 1886, p. 222-223.
I
Same Journal in 1897 and 1898: IWLS (Vilfredo Pareto);
Gamma distribution (Lucien March)!
Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986)
Institut de Mathématiques de Toulouse, Statistique et Probabilités
End
Twenty five years later
Plan
Some whys
Immediate Results
Further Results
Examples
Why this theme?
Fournier de Flaix:
Les mêmes réserves ne doivent-elles pas être faites sur
l’application des formules mathématiques, accessibles à si
peu de personnes, aux résultats de la statistique ? On en
trouve la preuve dans un mémoire de M. Galton sur
l’application de la méthode graphique à la mesure de
l’erreur.
C’est la traduction en courbes graphiques des calculs de
probabilités, mais les calculs de probabilité sont un des
dangers de la statistique. Ces calculs ont séduit plus d’un
économiste, plus d’un statisticien, tels que Stanley
Jevons, en les exposant plus d’une fois à être démentis
par les faits. Ce qui est arrivé dans la question de la
monnaie.
Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986)
Institut de Mathématiques de Toulouse, Statistique et Probabilités
End
Twenty five years later
Plan
Some whys
Immediate Results
Further Results
Examples
Why Toulouse?
Why Toulouse?
The Laboratoire de Statistique et Probabilités, Faculté des
Sciences, Université de Toulouse
I founded by Roger Huron (1913 - 1997) in the Fifties, a
mathematician and a medical doctor
I
I
Roger Huron (1958): Méthode générale d’estimation de la
fréquence des gènes. Application aux groupes sanguins Annales
de la faculté des sciences de Toulouse Sér. 4, 22, p. 159-173.
then chaired by Henri Caussinus (1972-1988)
I
Henri Caussinus, Contribution à l’analyse statistique des
tableaux de corrélation. Annales de la faculté des sciences de
Toulouse, Sér. 4, 29 (1965), p. 77-183.
I
Jean-René Mathieu (1988-1996)
I
Gérard Letac (1997-1998), . . . until 2007.
Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986)
Institut de Mathématiques de Toulouse, Statistique et Probabilités
End
Twenty five years later
Plan
Some whys
Immediate Results
Further Results
Examples
Why Toulouse?
In the fifties, Roger Huron used the EM algorithm (before it was
known under this name) to model the mixing of genes in various
populations sampled in various countries:
I
Observed frequencies of phenotypes
I
Estimated frequencies of genotypes given the data on
phenotypes and current estimations of model parameters
I
Revised estimation of model parameters given estimated
frequencies of genotypes
See also: Huron (Roger) et Ruffié (Jacques) – Les méthodes en génétique générale et en génétique humaine, Paris:
Masson et Cie, 1958.
Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986)
Institut de Mathématiques de Toulouse, Statistique et Probabilités
End
Twenty five years later
Plan
Some whys
Immediate Results
Further Results
Examples
Why Toulouse?
I
COMPSTAT’1982 held in Toulouse.
I
I
Murray Aitkin (1982): Logit Models for the analysis of a very
Large Survey of Unemployment in France, COMPSTAT’82,
Part II, p. 9-10 Wien: Physica-Verlag.
11th Biometric Conference also held in Toulouse in 1982
I
Murray Aitkin (1982): ?
Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986)
Institut de Mathématiques de Toulouse, Statistique et Probabilités
End
Twenty five years later
Plan
Some whys
Immediate Results
Further Results
Examples
The influential Edmond Lisle
Edmond Lisle
A super international connector!
Né le 23 mars 1928 à Marseille.
Études aux Lycée français de Londres, Kingsbridge Grammar School, Merchant Taylors School à
Londres, Magdalen College à Oxford, Facultés de droit et des lettres de Paris.
Diplômes: Master of Arts, Docteur ès sciences économiques, Licencié ès lettres.
Influential member of the CNRS, he was instrumental in
maintaining Social Sciences within the CNRS thus contributing to
keep the “scientific” status to Social Sciences. See the interesting
interview (27 June 2001) of Edmond Lisle by Olivier Martin in La
revue pour l’histoire du CNRS, 2002
(http://histoire-cnrs.revues.org/documrnt543.html).
Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986)
Institut de Mathématiques de Toulouse, Statistique et Probabilités
End
Twenty five years later
Plan
Some whys
Immediate Results
Further Results
The round table
The round table, 9-10 December 1985
Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986)
Institut de Mathématiques de Toulouse, Statistique et Probabilités
Examples
End
Twenty five years later
Plan
Some whys
Immediate Results
Further Results
The round table
Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986)
Institut de Mathématiques de Toulouse, Statistique et Probabilités
Examples
End
Twenty five years later
Plan
Some whys
Immediate Results
Further Results
The round table
Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986)
Institut de Mathématiques de Toulouse, Statistique et Probabilités
Examples
End
Twenty five years later
Plan
Some whys
Immediate Results
Further Results
The round table
Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986)
Institut de Mathématiques de Toulouse, Statistique et Probabilités
Examples
End
Twenty five years later
Plan
Some whys
Immediate Results
Further Results
The round table
The cosupervised PhD
dissertation of Nathalie Raynal
(defended 1987)
Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986)
Institut de Mathématiques de Toulouse, Statistique et Probabilités
Examples
End
Twenty five years later
Plan
Some whys
Immediate Results
Further Results
Examples
Alain Baccini
Alain Baccini
Les premiers de ces travaux ont conduit à la thèse d’Abdelhaq Khoudraji (”Analyse des Correspondances et mise en
oeuvre du modèle de Goodman”, 1988) et à deux publications : l’une sur l’estimation moindres carrés des
paramètres du modèle d’association (Baccini et Khoudraji, 1992) ; l’autre sur l’usage de ce modèle dans l’analyse
d’une table de taux (Baccini et Khoudraji, 1992). Par la suite, les propriétés asymptotiques des estimateurs
moindres carrés généralisés des paramètres des modèles d’association et de corrélation ont été établies (Baccini,
Fekri et Fine, 2000). Enfin, à la suite de la thèse de Lahcen At-Sidi-Allal (”Contributions à l’étude des modèles
d’association dans l’analyse des tables de contigence”, 1996), un algorithme de calcul des estimations maximum de
vraisemblance des paramètres des modèles d’association et de corrélation ont été mis au point, ainsi que des
critères de choix de la dimension d’un tel modèle (At-Sidi-Allal, Baccini et Mondot, 2004).
Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986)
Institut de Mathématiques de Toulouse, Statistique et Probabilités
End
Twenty five years later
Plan
Some whys
Immediate Results
Further Results
Examples
Henri Caussinus
Henri Caussinus
. . . Il s’est rapidement avéré que les deux approches
devaient être considérées comme complémentaires bien
plus que concurrentes (voir le numéro spécial de la RSA,
1987). Et c’est dans cette optique précise que plusieurs
recherches ont ensuite été développées à Toulouse, en
grande partie grâce à l’impulsion donnée par la
collaboration entre notre équipe et celle animée par
Murray Aitkin. . . .
Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986)
Institut de Mathématiques de Toulouse, Statistique et Probabilités
End
Twenty five years later
Plan
Some whys
Immediate Results
Further Results
Examples
Henri Caussinus
Besse, Ph., Caussinus, H., Ferré, L., Fine, J. (1986). Principal
component analysis and optimisation of graphical displays. Statistics, 19,
2, pp 301-312.
Caussinus, H. (1986). Quelques réflexions sur la part des modèles
probabilistes en analyse des données. In E. Diday et al. (eds.), Data
Analysis and Informatics, IV. pp. 151-165, North-Holland, Amsterdam.
Caussinus, H., Fekri, M., Hakam, S., Ruiz-Gazen, A. (2003). A
monitoring display of multivariate outliers, Computational Statistics and
Data Analysis, 44, 1-2, 237-252.
Caussinus, H and Ruiz-Gazen, A. (2006). Projection-Pursuit approach for
categorical data. In Multiple Correspondence Analysis and Related
Methods, M. Greenacre and J. Blasius (eds.), 405-418, Chapman & Hall.
Caussinus, H and Ruiz-Gazen, A. (2007). Classification and generalized
principal component analysis, Selected contributions in data analysis and
classification, Brito et al.(Eds.), 539-548, Springer.
Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986)
Institut de Mathématiques de Toulouse, Statistique et Probabilités
End
Twenty five years later
Plan
Some whys
Immediate Results
Further Results
Jean-René Mathieu
Jean-René Mathieu
Jean-René Mathieu organises
the Fifth International
Statistical Modelling
Workshop in Toulouse (1990)
Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986)
Institut de Mathématiques de Toulouse, Statistique et Probabilités
Examples
End
Twenty five years later
Plan
Some whys
Immediate Results
Further Results
Examples
A two-way table
1
...
B
b
1
AB
y11
...
AB
y1b
...
AB
y1#B
y1A
..
.
a
AB
ya1
...
AB
yab
...
AB
y1#B
yaA
..
.
AB
y#A1
...
AB
y#Ab
...
AB
y#A#B
A
y#A
y1B
...
ybB
...
B
y#B
y∅
..
.
A
..
.
#A
...
#B
Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986)
Institut de Mathématiques de Toulouse, Statistique et Probabilités
End
Twenty five years later
Plan
Some whys
Immediate Results
Further Results
Examples
CA and link functions
CA and link functions
I
Empirical probabilities
AB
pab
I
=
AB
yab
y∅
bilinear predictor and identification constraints
K
X
AB
A
B
ηab
= β ∅ + βaA + βbB +
σk βk,a
βk,b
k=1
I
prior weights
AB
wab
=
I
a link function
I
least squares (constant variance)
AB
ηab
1
paA pbB
= g (µAB
ab )
A useful machinery I learnt from the colleagues in Lancaster.
Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986)
Institut de Mathématiques de Toulouse, Statistique et Probabilités
End
Twenty five years later
Plan
Some whys
Immediate Results
Further Results
Examples
CA and link functions
In standard CA, the link function is
P
A
B
reduces to 1 + K
k=1 σk βk,a βk,b .
1
.
paA pbB
AB then
The predictor ηab
Other links can be considered, e.g. a log link in the spirit of
Goodman’s R × C association model.
Link misspecification has an important impact on dimensionality as
can be seen in the following simulated example. (See Baccini,
Caussinus and Falguerolles (1994): Diabolic horseshoes, IWSM 9,
Exeter.)
Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986)
Institut de Mathématiques de Toulouse, Statistique et Probabilités
End
Twenty five years later
Plan
Some whys
Immediate Results
Further Results
2
1
1
2
15
12
13
16
11
1014798
6
2 3
5
4
0
axis 3
12
10
113
16798
14
4 5 15
3
0
axis 2
1
2
CA and link functions
17
−1
−1
16
1
−2
−2
17
−2
−1
0
1
2
−2
−1
1
2
2
1
17
1
0
16
2
13
3 12
147815
11
56
4 10
9
−1
0
16
17
−2
−2
−1
12
11 8
3 10 15
2 4 5 6 79
14
13
axis 3
1
0
1
−1
axis 2
0
axis 1
2
axis 1
−2
−1
0
1
2
−2
1
2
Toulouse-Lancaster: a recollection of the ESRC-CNRS
funded research project in statistics
axis 1
axis 1 (1984-1986)
Institut de Mathématiques de Toulouse, Statistique et Probabilités
Examples
End
Twenty five years later
Plan
Some whys
Immediate Results
Further Results
Examples
Retinal convergence
Retinal convergence
The suicide data: frequencies of suicide by Age, Gender, and
Method (Heuer, 1979) (see van der Heijden and de Leeuw, 1985;
van der Heijden and Worsley, 1988; . . . )
About 50000 cases (20000 males and 30000 females), 9 methods
of suicide, 17 age classes.
Many exploratory aproaches can be considered here:
I
I
Multiple correspondence analysis
Correspondence analysis of some two-way table:
I
I
I
I
(age, sex) by methods
sex odds classified by methods and ages
...
Various Biplots
Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986)
Institut de Mathématiques de Toulouse, Statistique et Probabilités
End
Twenty five years later
Plan
Some whys
Immediate Results
Further Results
Examples
Retinal convergence
I
I
I
Multiple correspondence
analysis
Correspondence analysis
of some two-way table:
I
I
I
I
I
(age, sex) by methods
sex odds classified by
methods and ages
...
Poisson all two-way
A∗M +A∗S +M ∗S
some standard models
derived from the “Poisson
trick”:
I
I
I
Various Biplots
I
multinomial logit
(response is method)
binomial (response is
gender)
...
others
Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986)
Institut de Mathématiques de Toulouse, Statistique et Probabilités
End
Twenty five years later
Plan
Some whys
Immediate Results
Further Results
1.5
Retinal convergence
+
1.0
++
0.0
+
++
+
++ +
+
++
+
++
+
+
+
−0.5
axis 2
0.5
+
+
+
+
+
+
+
++
+
++
−1.0
+ +
+
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
axis 1 project in statistics (1984-1986)
Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research
Institut de Mathématiques de Toulouse, Statistique et Probabilités
Examples
End
Twenty five years later
Plan
Some whys
Immediate Results
Further Results
Examples
Retinal convergence
Similar or Different? Dimension 1 in the plot separates males and
females. Dimension 2 is mostly ordered by age.
Note also that the two “clouds” for the age groups for males and
females have approximately the same “shape”. The difference in
the respective scale of the two clouds, due to the chi-squared
metric, reflects the unbalancedness of the frequencies of males and
females.
The location of the two clouds reflect that a positive (resp.
negative) age-method for one group corresponds to a negative
(resp. positive) age-method for the other group.
Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986)
Institut de Mathématiques de Toulouse, Statistique et Probabilités
End
Twenty five years later
Plan
Some whys
Immediate Results
Further Results
Retinal convergence
Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986)
Institut de Mathématiques de Toulouse, Statistique et Probabilités
Examples
End
Twenty five years later
Plan
Some whys
Immediate Results
Further Results
Retinal convergence
An exploratory approach:
AM =
pam
AMS
yam1
AMS
AMS
yam1 +yam2
is the empirical proportion of males
AM =
qam
AMS
yam2
AMS +y AMS
yam1
am2
is the empirical proportion of females
for given A = a and M = m
Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986)
Institut de Mathématiques de Toulouse, Statistique et Probabilités
Examples
End
Twenty five years later
Plan
Some whys
Immediate Results
Further Results
Examples
Retinal convergence
I
AM or the q AM (q AM = 1 − p AM )
select either the pam
am
am
am
choose prior weights √ A 1M A M (other choices are possible)
I
AM :
a bilinear predictor for pam
I
pa pm qa qm
AM
ηam
∅
= β +
βaA
+
M
βm
+
K
X
A
M
σk βk,a
βk,m
k=1
I
AM for p AM corresponds to −η AM for q AM .)
a logit link ( the ηam
am
am
am
Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986)
Institut de Mathématiques de Toulouse, Statistique et Probabilités
End
Twenty five years later
Plan
Some whys
Immediate Results
Further Results
Thank you for your attention!
Toulouse-Lancaster: a recollection of the ESRC-CNRS funded research project in statistics (1984-1986)
Institut de Mathématiques de Toulouse, Statistique et Probabilités
Examples
End