Wolf, P.H.; (1979).Stochastic Modeling of Complete Work Histories with Applications in Occupation Health."

STOCHASTIC MODELING OF COMPLETE WORK HISTORIES
WITH APPLICATIONS IN OCCUPATION HEALTH
by
Pamela Hope Wolf
Department of Biostatistics
University of North Carolina at Chapel Hill
Institute of Statistics Mimeo Series No. 1209
FEBRUARY 1979
STOCHASTIC MODELING OF COMPLETE WORK
HISTORIES WITH APPLICATIONS IN
OCCUPATIONAL HEALTH
by
Pamela Hope Wolf
A Dissertation submitted to the faculty of
the University of North Carolina at Chapel
Hill in partial fulfillment of the requirements
for the degree of Doctor of Public Health in
the Department of Biostatistics.
Chapel Hi 11
1978
Approved by:
<~AI(i~a0
tAdvi sor
Reader
Copyri ght by
PAMElA HOPE WOLF
1978
PAMELA HOPE WOLF.
Stochastic Modeling of Complete Work Histories
with Applications in Occupational Health.
(Under the direction of
RICHARD H. SHACHTMAN.)
The main objective of this research effort is the development of
new techniques for analysis of work histories, as they currently
provide the basic link between environmental data and health outcome.
Markov models provide a convenient framework for describing the
dynamics of movement in an industrial setting, as they are based upon
probabilistic predictions which permit the incorporation of job-to-job
transfers and job order in the detection of potential hazards.
Initially, the work histories of a cohort of retired rubber workers
are examined to develop a simple Markov chain.
chain construction are considered:
Five criteria for
state selection and definitions,
time stationarity, time interval for transition, geometricity of
holding times, and the Markov property.
Upon finding the Markov chain too restrictive, a semi-Markov
Process (sMP) is developed, wherein time spent for each job visit is
incorporated into the model.
From the sMP, parameters are derived
which enable the comparison of different health outcome groups within
the cohort:
normal and disability retirees.
Using the techniques developed above, another application is
explored:
a case-control study of leukemia in the rubber industry.
First passage probabilities are used to describe the distribution of
the latent period between exposure to the potential agent and death.
ACKNOWLEDGEMENTS
The author wishes to express her sincere appreciation to her
advisor, Dr. Richard H. Shachtman for his guidance and encouragement
throughout the course of this research. Special thanks are due to
Dr. Michael Symons for his helpful criticisms and suggestions and
especially for his continual moral support.
In addition she would like
to thank Dr. Dragana Andjelkovich, Dr. Carl Shy and Dr. Elizabeth Coulter
for their time and effort on this study.
The author thanks her family and friends for their patience and
encouragement.
Finally, she wishes to express her gratitude to Betty Owens for
her typing of this manuscript.
ii
TABLE OF CONTENTS
Chapter
Page
ACKNOWLEDGEMENTS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ii
LIST OF TABLES......................................
vi
LIST OF FIGURES...
GLOSSARY OF
I
.•.
viii
ix
TERr~S...................................
INTRODUCTION AND REVIEW OF THE LITERATURE
.
1.1
Introduction
.
1.2
The Problem in its Occupational Health
Context. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3
1.4
1.5
1.6
1.7
2
Previous Statistical Modeling in the Area.....
6
Markov Chain and semi-Markov Process Models
for Descri bi ng Hea lth Outcomes................
7
General Description of Markov Processes.......
9
Semi-Markov Processes in Discrete Time........
10
Statistical Estimation for Markov and semi-Markov
Processes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
Health Applications of Markov Processes.......
Outline of Subsequent Chapters................
20
26
EXAMINATION AND TESTING OF THE DISCRETE
r·1ARKOV CHAIN........................................
29
2.1
2.2
2.3
2.4
2.5
29
29
31
32
35
1.8
1.9
II
1
Introduction. . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . . . .
Background and Data Source....................
Choice of States and Preliminary Analysis.....
Selection of Time Interval....................
Estimation of the Transition ~atrices.........
iii
TABLE OF CONTENTS (continued)
Chapter
II
Holding Time Distributions........................
Time Stationarity..... .......................•....
2.7 . 1 Ca 1enda r Trend.............................
2.7.2 Temporal-Career Trend......................
The Markov Property...............................
Summary and Conclusions...........................
38
39
40
42
SEMI-MARKOV PROCESSES: MODELING AND FITTING....... .....
3.1
Introduction....
.•.•.........
3.2
Estimates of the sMP Matrices............ .. .... ...
3.3 Holding Time Distributions........................
3.4 Estimating a Mixture Distribution.................
3.5
First Passage Probabilities. ........•......•......
60
60
62
63
67
72
2.6.
2.7
2.8
2.9
III
IV
36
37
APPLICATION OF THE MODEL TO A CASE-CONTROL STUDY
OF LEUKEf1IA.............................................
4.1
4.2
4.3
4.4
Introduction......................................
98
Background and Data Source..
98
4.2.1 The Population.........
.•.......
98
4.2.2 The Exposure Information...................
99
Appropriateness of a Markov Chain
100
4.3.1 Time Considerations
101
4.3.2 Estimation of the Discrete Markov Chain
101
Transition Matrices
.
4.3.3 Geometricity of the Holding Times
102
Developing and Constructing the semi-Markov
Process
102
Estimation of the sMP Transition Matrices .. 102
Holding Time Distributions ....•............ 103
.
4.4.1
4.4.2
4.4.3 Using First Passage Probabilities to
Describe the Distribution of Latency
Periods
. 104
First Passage Probabilities for Aggregates
of Sta tes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 106
Conclusions and Comments
108
4.4.4
4.5
98
iv
TABLES OF CONTENTS (continued)
Chapter
v
SUMMARY AND SUGGESTIONS FOR FUTURE RESEARCH
123
5.1
SUlT1I11a ry. . • . • • • • • • • • • • • • • • . • • • • • • . • • • • • • • • • • • • • . •• 123
5.2
Extensions and Suggestions for Future Research ... 124
REFERENCES ...•....••.••...•............•.••••.•.••..... 126
APPENDIX 1.1 .......•......................•••.......... 129
APPENDIX 11.1 ...........•.............................. 136
APPENDIX 11.2 ...........................•••....•..•.•.. 137
APPENDIX IV.1 ...........................•.••.•......... 138
APPENDIX IV.2 ...............................•...•...... 139
v
LIST OF TABLES
Tabl e
Page
?l. Total Number of Visits Distributed Across the OTGls Di sa bi 11 ty Re t i rees . • • . • . • . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.
Total Number of Visits Distributed Over Time Categories Disability Retirees
48
2.3.
Error Analysis of Missed Transitions for One Month Interval..
2.4.
Error Analysis of Missed Transitions for Three Month
Interval.. . . . .. ... ... ... . . ...... .... .... . . .... . .. . . . ... ... . . ... ... .... ... . ...
2.5
49
50
Transition Matrix (Markov Chain) for Normal Retirees with
Three Month Intervals
2.6.
47
'
51
Transition Matrix (Markov Chain) for Disability Retirees with
Three Month Interval s. . .... ... . . . . . ...... .... ... . ... ..... .. . . ... . ..... . ...
52
2.7.
Goodness-Of-Fit Test for Geometric Holding Time - Disability.
53
2.8.
NuR!§e.. of States Whof"ch Ha.ve Geometric Holdi ng Times by aTG
Scheme', Interval tength and' Rt!Urement Type................
58'
2.9.
Tests for First Order versus Second Order Dependence ••••.....
59
3.1.
Transition Matrix (semi-Markov) for Normal Retirees with
Three Month Intervals •.........•.......•.•.•...•..........
86
3.2. Transition Matrix (semi-Markov) for Disability Retirees with
Three Month Interval s
".. . .. .
...
.
.. . .
.
87
3.3.
Holding Time Distributions
88
3.4.
Holding Time Distributions for the Normal Retirees...
90
3.5.
Normal Retirees: Model-Based and Data-Based Mean Holding
3.6.
Times......... .
..
91
Disability Retirees:
Model-Based and Data-Based Mean Holding
Ti mes • . • . • • . . . . . . • • • • • . • • • . • . . . . • • . . . • • • • • • • • • • • • • • • • • • • • .
94
3.7.
First Passage Probabilities Truncated Means
4.1.
Analysis of Missed Transitions for Three Month Intervals ••..• 113
4.2.
Transition Matrix for Simple Markov Chain:
vi
97
Cases •.•.•••..... 114
·LIST OF TABLES (continued)
Table
Page
4.3.
Transition Matrix for Simple MarkoY Chain:
4.4.
Test for Geometricity of Holding Times
116
4.:'.
Transition Matrix (semi-MarkoY):
Leukemia Cases
117
4.6.
Transition Matrix (semi-MarkoY):
Leukemia Controls
118
4.7.
Holding Time Distributions
4.8.
Model-Based and Data-Based Estimates of Holding Times •••••..• 120
4.9.
Measures Used to Describe Distribution of Latency Period
by Expos ure Code.......................................... 121
4.10. Latency Measures for Combined States
vii
Controls
115
119
122
LIST OF FIGURES
Page
Fi gure
1 • 1.
Coded Wo rk Hi sto ry .......•.
•
27
1.2.
Conceptual Framework of Experience Transformation Algorithm..
28
2.1.
Sampling Scheme for Retiree Data Base......
44
It •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
2.2.a. Sample 'Raw' Work History...................
4S
2.2.b. Path-Coded Data Resulting from Work History Shown in
Figure 2.2.a in One Month Periods
45
2.3.
Definition of State Space for Preliminary Analysis..........
46
3.1.
Graphical Representation of Method of Generating Expected
Length-of-Stay Distribution Under Hypothetical Exponential
Distribution
3.2.
,...
76
Holding Time Distributions (Fitted and Empirical) h7,10 No rma 1s. . . • . . . • • • . • . . • • . . . . • . • • • • • . • . • • • • . • • • • • • • • • . • . . . . •
3.3.
Hold~n9 Ti~e
77
Distributions (Fitted and Empirical) hS,g -
0, sa bill tie s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
78
79
3.5.
Fi rst Passa ge Probabi1 ity f 29 (n) .......•.....••••••.••......
First Passage Probability f 39 (n) •.............•.•.•.•.......
80
3.6.
First Passage Probabi li ty
f 59 (n) ..............••.•..•..•....
81
3.7.
82
3.8.
First Passage Probability f 79 (n) •.....•...•.••••••..•.•.•..•
Fi rst Passage Proba bi 1i ty f lO ,9(n) .....•.•.•••••••.•.•.•.•..
3.9.
Cumulative First Passage Probabilities
.
3.4.
3.10. Cumulative First Passage Probabilities
- Norma 1s ••••••••.•..•
- Di sabi 1iti es ........
83
84
85
4.1.
Distribution of Latent Period:
Primary Benzene Exposure ..... 110
4.2.
Distribution of Latent Period:
Secondary Benzene Exposure •.. 110
4.3.
Distribution of Latent Period:
Primary Other Solvent
111
Exposure .............•............................•••....•
4.4.
Distribution of Latent Period:
Secondary Other Solvent
Exposure ...............••.................••.•••.•.•..•... 111
4.5.
Distribution of Latent Period for Aggregate States
viii
112
GLOSSARY OF
TERr~S
OCCUPATIONAL HEALTH TERMINOLOGY
Disability Retirees:
workers who retire prior to age 65 as
a result of a disabling illness or
condition.
Latent Period:
time between exposure to an agent and
death due to the malignancy under study.
Normal Retirees:
workers who retire at the mandatory age
of 65.
Occupational Title:
specific job in the industry.
Occupational Title
Group:
Populatipn-at-risk
(PAR) :
Work Hi story:
aggregate of occupational titles based upon
common materials or processes.
source population that is exposed to a
particular risk.
a chronological list of the jobs held
by a worker throughout his career in
the industry; this list usually includes
pertinent information about each job such
such as dates of entry and exit, department
and job description.
STATISTICAL TERMINOLOGY
First Passage Probabil ity
(f .. (n)):
probability that the first passage from
lJ
state i to state j will take n
time units.
Geometric Property:
a property that holds when the length-of-stay
in each state follow the geometric distribution.
Holding Time:
waiting time that is conditional on the
next state visited.
Length-of-Stay:
number of time units that the process is
in a particular state after a transition
to that state.
t~rkov
Chain:
a process for which the Markov property holds,
stationarity of the transition probabilities
exists the lengths-of-stay are geometric.
ix
GLOSSARY OF TERMS (continued)
STATISTICAL TERMINOLOGY
Markov Property:
this condition asserts that given the present
state, no additional data concerning states
of the system in the past can alter the
probability of the state at a future time.
Missed Transition:
the process moves from state a to state b
to state c but we only observe the system
when it is in state a and state c and
thus miss the transition to state b
[a + b, b + c].
Real Transition:
a transition from state
where i + j.
semi-Markov Process:
the time between transitions of the process
is random in length and can depend upon
the state the process is entering. ..
States:
mutually exclusive and exhaustive categories
which are represented by X(t) in a stochastic
process.
Stochastic Process:
an indexed collection of random variables
{X(t) : tE T}, where T is often interpr~ted
as time.
Time Stationarity:
a condition which exists if the probabilities
of transferring from state to state are not
time dependent.
Transition:
a change from one state to another in a
given time period.
Virtual Transition:
a transition from state i to state i
[the process remains in state i from one
time period to the next].
Visit:
transition to a state.
llaiting Time:
time spent in a state before a transition
ta kes place.
x
i,
to state j,
CHAPTER I
INTRODUCTION AND REVIEW OF THE LITERATURE
1.1
Introduction
The main objective of this research effort is to explore new
ways of utilizing and analyzing work history information in order
to detect associations between health outcomes and work experiences.
We wish to develop techniques that will eventually be applicable to
work histories of individuals representing a variety of health problems.
Initially, we are asking the question "what statistical models
enable us to better exploit the information available in a complete
work history, with respect to certain health outcomes?"
We will re-
view some stochastic models, estimate their parameters and test their
fit to data bases which represent populations with particular health
problems of interest.
Our initial modeling effort will focus on the
comparison of cohorts of disabled and normal retirees from the rubber
industry.
The literature review will encompass topics pertinent to both
substantive health issues and statistical modeling.
In Section 1.2
and 1.3 we discuss the problems of characterizing work experience
through job classification schemes and review the previous studies
which utilize work history data.
The reason for investigating sto-
chastic models is covered in Section 1.4. The basic background information on Markov and semi-Markov models as well as techniques for
statistical estimation for these processes are presented in the sections
2
beginning with Section 1.5.
Health applications of Markov processes
are reviewed in Section 1.8. The final section outlines the material
presented in subsequent chapters.
1.2 The Problem in its Occupational Health Context
One of the major goals in occupational health is to prevent disease
and injury through control of hazardous or toxic agents.
This requires
effective measurement of the etiologic (causative) agents of interest.
Because precise measurements of exposure are difficult to obtain, it
is necessary to create an objective classification or categorization
scheme based on the available environmental data. These exposure
categories are valuable when there is interest in a single agent suspected in the production of a specific outcome.
du~tries,
However, in many in-
particularly the rubber industry from which the data for this
study derive, characterization of the exposure variables is difficult
in that hundreds of chemicals are used in all phases of the manufacturing process.
Further difficulties arise from the mobility of the work
force where it is not unusual for a worker to hold as many as 30 or 40
distinct jobs within the industry.
Hence, the use of exposure categories
can be a tenuous matter (1).
However, exposure categories can be used indirectly by classifying
jobs (Occupational Titles or OTIs) into categories that relate to different types and levels of exposure.
Since the specific OTIs are process-
oriented and determined by materials used in an operation, OT groupings
permit determination of health risks in a small subset of the entire
plant population.
It makes analysis simpler in that the number of jobs
and the range of potential etiologic agents are reduced to homogeneous
categories.
3
Since classification of jobs provides the only basic link between
environmental exposure and health outcome, i.e., as measured by mortality and morbidity, the worker's occupational history is an essential
component in characterizing his environmental exposures.
to which the work history is described becomes important.
The extent
For instance,
the complete description of jobs held for the entire period of employment is an improvement over simply an account of the last job held for
the following reasons:
(1)
natural selection - the type of work done in a specific area
might require a "resistant" individual, others will tend to
select out
(2)
seniority - there may exist a hierarchy of less hazardous
or better jobs
(3)
latency period - a job held early in a worker's career may
have more impact on a particular health outcome which may
appear twenty years later
(4) disability or diagnosis - a disabled worker, or worker exhibiting symptoms of ill health may be moved to a less
hazardous area at his own request or that of his physician (2).
Once complete work histories are obtained for an industrial population or subpopulation, the issue becomes one of method of analysis.
Although some attempts have been made in industrial settings to analyze
health outcome by work area or job classification, very few investigators use the whole work history.
Mancuso
et~.
(3), in their mortality study of rubber workers,
identified a cohort of workers who were members of the working population in 1938-9 and subsequently died between 1938 and 1964. The
4
employment histories were available on a quarterly basis from 1938
through 1964, but the information prior to 1938 was unavailable.
Work
history information obtained for the years of entry into the cohort
was then used to subdivide the population in a cross-sectional manner
into five groups:
(a) office, (b) compounding, milling and calender-
i ng , (c) curing and t i rebui 1di ng , (d) miscellaneous, and (e) unknown.
Apparently, there was much information missing that could be pertinent to health outcome.
Job changes occurring more frequently than
quarterly would not appear on the history, nor would any of the jobs
that were held by the worker prior to his identification as part of
the cohort.
Lloyd et!l. (4), approached the issue in a more sophisticated
manner in their study of steelworkers employed in 1953.
Each worker
was classified into one of 53 work areas at initial observation and
by area in which a minimum of five years of employment was contributed.
This breakdown of work areas is a refinement of the type of general
categories used by Mancuso and also aids in confining any health problem
to specific areas.
The use of five year exposure begins to examine the
issue of duration of exposure, yet the question of whether this five
year period was cumulative or consecutive ;s ignored in the study.
The rubber industry in England was studied by Parkes (5) who subdivided the entire industry into industrial groups determined by product under manufacture, as well as 15 occupational groups which are
process-oriented.
Sickness absence and certain diseases were examined
for variation among the 15 groups.
The purpose was not only to reveal
the true extent of sickness in the industry, but to provide enough
information about these groups to enable internal and external comparisons
5
with other industries.
Occupational and industrial groups were re-
corded for the time of illness, but obviously it was impossible to relate the illness to previous jobs held.
Length of service within a
particular occupation was also recorded, but because of its association
with age, it was not included in the study.
Fox, Lindars and Owen (6),
later reported high proportions of respiratory cancer in two of these
area (curing and warehouse).
A more recent use of occupational histories is'seen in the chemical
industry as decribed by Ott (7). Work histories for all decedents
were coded into department or job classification categories.
Because
of limited funds, however, only entries of at least one year were recorded.
Exposure codes were then constructed for each category which
combined environmental sampling results for various chemicals of interest.
Over 130 job classifications were identified, but the lack of complete
work histories would preclude identification of an agent that was hazardous upon short term exposure (in this case, less than one year),
p2rticu1ar1y when interest lies in an acute illness.
Spirtas (8) actually utilized complete work histories in examining
the relationship between solvent exposure and leukemia in rubber workers.
A computational procedure (Experience Transformation Algorithm) was
developed whereby summations of time spent in specific occupational
area (OT's) are generated according to criteria left to the discretion
of the investigator.
titles.
Appendix 1.1 lists and describes the occupational
Sample output from this technique might be, for example,
12.12 years in the mixing OT, 7.32 years in the curing aT, and 5.9
years in the inspection-OT.
Hence, the worker's entire career of over
25 years has been reduced and summarized into surrogate exposure groups.
We will elaborate on the Spirtas
study in Section 1.3.
6
There have not been many attempts in the literature to incorporate
work area as a proxy measure of environmental exposure.
from one of two drawbacks:
They all suffer
not utilizing all the available information
or the inability to take this data and sufficiently analyze it. Most
studies disregard the sequence of jobs held since they involve crosssectional categorization or total time spent in suspected hazardous
exposures.
This research effort will be concerned with the development of
new techniques for analysis of work histories through"stochastic modeling.
This type of model, based upon relationships derived through
probabilistic predictions, permits the incorporation of job-to-job
transfers and job order in the detection of potential hazards. and
allows projection of health outcome in the industry.
1.3 Previous Statistical Modeling in the Area
Spirtas (8) used a linear models approach to test the hypothesis
that time spent in solvent exposed areas is greater for leukemia cases
than for their corresponding matched controls.
The job descriptions
from the raw work histories were coded into Occupational Titles (OT's)
in accordance with the dictionary written by industrial hygienists,
that is in Appendix 1.1. This dictionary reduces over one thousand
jobs into seventy-seven OTIs.
These OTIs were then grouped into
Occupational Title Groups (OTG's) which were dependent on level of
solvent exposure and other process-oriented criteria.
Figures 1.1 and
1.2 show an example of a coded work history and the resulting output
of the Experience Transformation Algorithm (ETA) which yields time
spent in the OTG's as continuous variables.
7
The general linear model used was:
Y••
-1J
where y ..
= -1
X' j B. + E. .
-J -1J
is the vector of total time spent in solvent exposed jobs
-1J
as computed by the ETA,
~ij
is the design matrix incorporating case-
control code as well as demographic covariates,
parameters to be estimated and
~ij
~j
is the matrix of
is the error vector.
The only assumptions made were:
(1)
~
'V
N(O,l)
(2) there are no interaction effects
(3) the covariate effects are linear
Note that the dependent (outcome) variable is exposure, while case
control code is the independent variable.
Several versions of this model were examined, incorporating
various constraints and restrictions such as analysis of male cases only
or changes in the matching design. The dimensions of the matrices and
vectors varied to reflect the changes mentioned above, but the overall
approach to analyzing the work histories remained the same.
1.4 Markov Chain and semi-Markov Process Models for Describing Health
Outcomes
Although previous research in occupational health has implicated
certain agents as hazardous, it may be that a particular sequence of
exposure is necessary to initiate one or more adverse health outcomes.
In addition, workers in a hazardous area may have a tendency to move
to less dangerous locations in the plant.
These facts, coupled with
some evidence that workers do follow particular job patterns, lead one
to look for a statistical technique capable of incorporating probability
8
of job transfer with an element of memory.
By memory, it is meant
that there may be a greater likelihood for a worker with experience
and skill in one area, to return to that area later in his/her career.
Intuitively, this leads to investigation of Markov models.
Several research questions of interest can be explored through
the use of Markov models:
(l) Is the i nterva1 of time between exposure in some OT and
retirement different for two cohorts representing different health
outcomes?
(2) Does the interval between exposure and eventual sickness (sick
leave) differ in the two groups?
(3) Is the length of stay in some jobs affected by intervening
periods of sickness?
(4) Would different job histories and lengths of stay produce
changes in the health outcome variables?
Since the measures in the above questions can be represented
probabilistically in Markov models, a convenient framework would be
provided for describing the dynamics of movement in the rubber industry.
Changes in job mobility over time and differences with respect to jobs
held could be explored among varying health outcome groups.
If sign-
ificant differences result, a first step toward linking that health
outcome with the environment will have been achieved and future research
will be warranted.
The ability to propose alternative models of movement and perhaps
maximum lengths of stay in certain jobs as suggested by (4) above will
be useful in preventing undesirable health consequences. This will
permit predictions of sickness in the industry and be instrumental in
. estimating disability award needs for management.
9
1.5 General Description of Markov Processes
A fundamental method for observing and analyzing physical systems
is deducing a state at one time t 2 from the state the system was in
at an earlier time, t 1 . A stochastic process in general, is an indexed collection of random variables {X(t): tc T}, where the index
t
ranges through T, which is often the set of non-negative integers.
Very frequently T may be interpreted as time.
The development of the
process can be thought of as transitions between mutually exclusive
and exhaustive categories called states, which in this study are
Occupational Title Groups or OTG's.
If, in addition to being able to make deductions about the process
at time t 2 from information gained at time t 1, the deduction does
net depend upon occurrences prior to t l , the process or stochastic
system is considered Markovian. We can say that the Markov property
holds if the conditional probability of a future event, given a past
event and the present state, depends only on the present state of the
The Markov property can be expressed as:
process.
P{X t +l = j
for
t
I Xo = kO' Xl = k1 ,· .. Xt = i} = P{X t +1 = j I Xt = i}
= 0, 1, 2.
The conditional probabilities P{X t +1 = j I Xt = i} are called transition probabilities; they give the chance of transferring to OTG j
from OTG i.
If these probabilities are not time dependent, that is
P{X t +l = j
I Xt = i} = P{X r +1
= j
I Xr
= i}
for any t, r dO, 1, 2••. }
then we consider the transition probabilities to be stationary, or
10
time homogeneous, and denote them by Pij
probability).
(one step transition
The Pij'S are usually represented in matrix form:
Poo ...... P(Jt1
P=
for an
(M + 1) - state system.
These probabilities must satisfy the equations below:
(1)
(2)
Pij
~
L PiJ·
0;
=
j
1;
i. j
= 0, l ..... M.
i .e., the states are exhausti ve.
A finite Markov chain can now be defined as a process that has
the following properties:
(a) there are a finite number of states
(b)
the Markov property holds
(c) the transition probabilities are stationary and satisfy
1 and 2 above
(d) a set of initial probabilities exists:
forall
P [X O = i],
i.
Assuming a stochastic process as the model of worker movement,
it must be made clear that the set of OTG's visited, or paths, are
determined by laws of probability.
That is, these laws specify the
probability of certain visits but do not govern the visits themselves.
1.6 Semi-Markov Processes in Discrete Time
The major difference between a semi-Markov process (sMP) and a Markov
chain is that in a sMP the time between transitions is random in length
and this time can depend upon the state the process is entering. An
advantage of the discrete time sMP is that the modeling is relatively
11
simple computationally, as recursive formulae exist which permit fairly
easy calculation of the expressions that we will derive in this section.
A sMP is a process where successive OTG occupancies are governed
by transitions of a Markov process, but whose length of stay in any
area is governed by a random variable that depends upon the current
OTG and the OTG to which the transition is being made.
are two random variables operating:
Thus, there
the state selection and the length
of stay in that state.
We assume that once a worker enters OTG i, the next work area
j
is selected
according to the transition matrix
before making the transition to
II
ho 1ds ins ta te i
II
j
for some time
~.
However,
(with j already known), the process
called the holding time.
T •• ,
lJ
These holding times are all positive integer valued and are governed
by a probability mass function
Pf T.. = n} = h.. (n) ;
lJ
hij (.).
Thus,
n = 1, 2 .. ;
lJ
= 1,
1, j
2 ... M.
It is also assumed that the holding times are at least one unit in
length such that hij(O) = PfT ij
= O} = O.
Summarizing, once the process has selected both j
moves to state j
and T.. it
lJ
and selects the next state, say k, from the
probabilities Pjl' Pj2,···PjM and a holding time TjK' K = 1, ... M.
A real transition is a transition from state i
j, i f j.
to some state
A virtual transition is a move from state i
to state i.
In some processes it is impossible to have a virtual transition, but
here we will permit Pii
~
O.
Now, as observers we know that the process has entered state i,
but we have no way of knowing the next state.
We therefore introduce
12
the notion of waiting time.
The time spent in i
is called
i and
The following relationship
T
the corresponding mass function is wi (·).
is obtained: wi(m) = P {Tf = m} = L Pij hij(m).
j
This is the probability that the system will spend m time units in
in i
if it is going to j
of going from i
destinations.
(hij(m»
multiplied by the probability
to j (Pij) and then summing over all possible·
Thus, a waiting time is a holding time that is uncondi-
tional on the next state visited.
Other parameters that will be useful in our later work are defined below along with the formulae for computation.
interval transition probability
(~ij(n»
- probability that the
process will be in state j at time n given that it entered
state i at time zero.
M
n
>
~ij(n) = 0ij wi(n) + k~l PH m~ohik(m) ~kj (n - m),
Qij
1 i = j
= f o i f- j
first passage time probabilities (fij(n)} - probability that the
first passage from state i
to state j
will require n time
units
M
fiJ·(n} =
L
n
I
r=l m=O
rf-j
Pir hir(m} frJ·(n-m) + PiJ' hiJ·{n}
mean recurrence time (Qjj) - the mean time it takes the process
to return to the same state
1
Q.. =
nfjj(n}
JJ
n=O
mean first passage time (Qij) - mean time required to go from
state i to state j
QiJ'
for the first time
= T,.
+
K
l..~p ir QrJ'
r=l
r~j
(9,10,11,12,13)
13
1.7 Statistical Estimation for Markov and semi-Markov Processes
A basic paper that considers statistical estimation of the
meters of a Markov process is Zahl (14).
para~
A particular problem he con-
siders is estimation of these parameters when only a sample of pathcoded data is available, viz., the observations on each individual are
taken periodically and these subsets of the complete paths are used to
estimate the parameters of the model.
The model considered is described
below.
During the course of a follow-up study, patients are classified
into k ·states, 51' 52 ,··5 k· The study is of length T and the
total time of observation is from t = 0 to t = T. At time t l a
patient is in Si and at time t 2 > t l he is in state Sj' where i
mayor may not equal
j. The first assumption made is that of station-
arity, the second is that of the Markov property.
Assuming the independent behavior of each individual, the maximum
likelihood estimate of the transition probability Pij(t) is:
nij
'"
-p .. (t) = 2 n..
i, j = 1,2, .. , k,
lJ
lJ
j
where nij is the total number of transitions from state 5 i to Sj
in the time interval (0, t). It is noted that the estimating procedure
is independent of the scheme of entry.
Now, Doob
that
(15)
~(t)
The elements of
=
9
showed that there exists a unique matrix
!
~
such
+ gt + g2 t 2/2! + .•. + exp{gt}.
take the form qij
= ~1~
Pij(t) = Pij(O) and are
t
thus called the instantaneous transition rates which are independent
of t.
14
* (t) = "P (t) - I and thus we obtain
Q
can be estimated by lettering P
.....
. . . . "
1 A*
A*
A*
2
Q =- [P (t) - 1/2 (P (t)) + 1/3 (P (t))
-
t
converges.
3
- .... ] provided the series
It is noted that the Pij(t) 's are determined uniquely by
the q.. S and Zah1 suggests estimating Q and then estimating the
lJ
transition matrix P from the relationships above.
I
"
The asymptotic mean of Pij(t)
is Pij(t)
and the asymptotic
variance is
with 1
n c' ~Byo (n) the asymptotic covariance of q" ~a and q" y~.
Important questions arise when comparing two processes that have
the same state space, as will be done on several occasions in this
study.
Zah1 proposes a test for determining whether the rate matrices
g
(and hence the transition matrices) for two periods of time are the
same.
It is suggested that this test would be useful in deciding whether
to combine two processes from adjacent periods of time.
the null hypothesis is stated as H·
o· q(l)
aa
= q(2)
as'
a~a
T
More formally,
•
The test statistic is
U
~S
=
tV
where the superscripts represent the two processes.
of
q~s
N (0, 1)
The variance
is given (for a small period of time between observations) as
1
I
qaB
b
nb eba(T)
with eba(T) being the expected time spent in state a
during
15
the period (0, T) for an individual who is initially in state b.
The test statistic for a simultaneous test for several
a, S is
given by
P
iI l
UaiB ;
= x2p
+ Bi ,
with a;
i = 1, 2... p.
Analogously, the test for the null hypothesis HO: pel) (t)
as
= p(2) (t) ;s given as
aB
vaB =
pO) (T) _ p(2)
as
J
~
as
(T)
~
N(O, 1) for large n.
var(p(l)(T) + Var (p(2)(T)
as
aB
An extremely important condition that is often overlooked or
assumed is that of the Markov property.
Zahl gives the following test
A
for the hypothesis that Phij(t) will depend upon h,
the state the
system is in before state i.
Let nhij represent the number of
transitions from state h to state i to state j; nhij = nij .
r
h
If the Markov property holds, then for each (1, j) pair,
= nhij
will be equal for all h. If the condition does not
Ph·lJ.
L~
j
lJ
hol d, then these probabilities will be different for at least one h
within an
(i, j) pair.
The chi-square test for homogeneity is
given as
2
Xu
k
L .l
= l
he:E iEE J=l
(nhij - En )2
hij
En
hij
k
where
(L
nhij =
E
j=l
nhij )
( l
n .. )
hcE hlJ
k
l
hcE
L
j=l
nhij
16
x2
The statistics is distributed as
with m[{k - 1) m - (k - 1)]
degrees of freedom, with m the number of elements in E, the set of
indices i
for which the transition rate qii
is not zero.
The last test given by Zahl concerning the fit of the model is a
test for time stationarity, or more simply time trend in the probabilities.
The test for the null hypothesis of no trend versus change in transition probability with time is
k
2
I
h=l
I
J. I__
i£E
where
E
n •.
lJ
(h)
=
~
(
l
j=l
k
I
j=l
n •. (h)) {
h=l
1J
2
I
~
n ij
(h)
(h)
nij
h=l
with m[(k - 1)2 2 - (k - 1)] degrees of freedom.
Anderson and Goodman (16), aside from giving the likelihood
estimates and their asymptotic distribution for transition probabilities,
offer some basic tests of hypotheses not previously considered by
Zahl.
(a)
constant.
the transition probabilities of a first order chain are
That is, they are independent of time t.
The null hypothesis
is stated HO: Pij (t) = Pij' for all t = 1, ... T. The joint estimates
'"
p..
(t) can be represented in an mx T table for a given i and
lJ
j = 1, 2, ... m:
17
~j
1
2
1
2
m
Pi1(l)
"P (1 )
"P (1)
"
"
"P (2)
im
"
Pi1 (2)
i2
im
Pi2 (2)
T
To test the hypothesis of stationarity we obtain
2
Xl· =
L
t,j
""
2"
nl·(t - 1) [Po .(t) - P.. J /P..
lJ
lJ
lJ
degrees of freedom.
with
(m - l)(T - 1)
Note that we are actually testing whether for any
given column, all the elements are equal.
(b) assuming stationarity the chain is of a given order. A first
order chain is one in which the dependency or memory is determined
by the ore prpcedina state the chain occupies before making a transition. A higher order chain is one in which the transition probabilities
are determined by two or more preceding states.
"
m
Let p·· k = n.. k! L "iji represent the probability of going from
lJ
lJ
i=l
state i
to state
j
to state k at successive time intervals.
Note that "ijk has already been defined in the discussion of Zahl's
work.
Consider the hypothesis that the chain is of first order versus
second order.
Then HO can be expressed as:
18
,.,
m
The maximum likelihood estimate of Pjk i S PJ' k =
mm
2 ni J' k / !i lL n"
lJ l
i=l
so the HO can be tested using
2
tV
X (m_l)2
. where
T
n"lJ *
= L n, 'k = L 2 "i· 'k(t)
k lJ
k t=l
J
This test for order can be used jointly for all
i, j, k = 1, .. .,m
by considering
2
X
= 4\ X~J
tV
J
2
2
X (m-l) m
Bi 11 i ngs 1ey (17) offers a test for independence of the sequence
derive-d from a test of independence for contingency tables.
~.
Letting
(n·ij ) be the matrix of the number of transitions from state i
to state j the chi-square statistics used is:
=
,~ [n ij - ni . n. j /n}2 / ni . n.j/n' where n.j = .~ ~ij'
1
1J
which is asymptotically chi-square with
(s - 1)2 degrees of freedom.
(Notation is that of Anderson and Goodman.)
. test is also given in Hoe1 (18).
This
Billingsley considers a test for the
hypothesis that two independent Markov chains actually have the same
transition matrices, HO: Pij = qij
mentioned by Zah1.
,.,
Pij
count
(the same as
associ~ted
that differs from that previously
Letting f ij be the cell count associated with
nij mentioned in Anderson) and gij be the cell
,.,
with qij'
i) fi. 9i.
f ,..J + g..
,J
then
(f -!u.)
ij
°f i •
gi •
2
tV
2
X s(s-l}
19
The actual statistic is derived using the common estimate of the
f 1..J +g 1..J
f. + g.
'l" ty
true pro bb
all
1•
. This test can be very useful when
1•
we desire a comparison between two cohorts with respect to movements
through certain states of interest and not necessarily through the
entire state space.
The basic paper for estimation of semi-Markov processes when there
is micro data available (as opposed to aggregate data) is Moore and
Pyke (19).
Letting X(n)
denote the time spent in state J 1 before
ntransferring to the next state I n, we define:
P(J n
= j/JO,···,J
-1)
n
= PJ
n-l'
J'
where the authors let Fij = Hi'
The transition function of the process is Qij (x) and is equal
to
xIJO" .. J n_l = i, X(l) ... X{n-l)).
From the above definitions it is seen that
P(J n = j, X(n)
~
p.~l Q.. (t),
lJ lJ
F.. (t)
lJ
=
Fij(t)
= 0, Pij = 0
p..
lJ
> 0
Moore and Pyke give the following estimators:
A
A
A
Qij(x,t) = Pij(t) Hi(x;t),
where Pij(t)
is the same maximum likelihood estimate used previously.
20
N. (t)
H.(x,t)
= N.(t)-l
1
1
lr
k=l
e:{X - Xik ),
ik is the holding time of
X
the k-th visit to state i and
_ {1 ; u ~
£(u ) - O;u<
0
0
"
is the number of times J k = i. Hi(t) is the ordinary
empirical distribution function determined from Ni{t) and the holding
and Ni(t)
times in state i.
1.8 Health Applications of Markov Processes
Although there are numerous fields in which Markov models have
been applied, we will restrict ourselves in this review to applications
in health related fields, i.e., health services, medicine and biology.
The purpose of this section is not only to learn how Markov methodology
has been applied, but how to critique the use of these models when
they have been applied inappropriately or not tested properly. Several
examples cited have formed a basis for the work to be done in this study
and it is important to be fully familiar with these approaches.
An early use of Markov processes in the health field is that of
Marshall and Goldhamer (20) in their study of the epidemiology of
mental disease.
The intended emphasis of this paper is model con-
struction of simple Markov processes, although there is concern with
the substantive issue, viz., characterizing the age distribution of
the mentally ill.
The primary interest is in determining age-specific
incidence rates of mental disease.
The authors have developed several
simple models of the states involved in progressing from sanity to
insanity to institutionalization and eventually death.
In this framework
they define incidence as characterizing the frequency with which certain
21
transitions between states occur, and prevalence as the number of
persons who at anyone time are in particular states.
Death and hos-
pitalization are considered to be absorbing states and the process is
considered Markovian.
The authors recognize two basic assumptions:
the ability of the model to approximate a real situation, and the
structure of the model itself with constant transition probabilities.
They suggest it would have been better to regard the transition probabilities as functions of age, but they attempt to approximate the
process and estimate the parameters for five year age groups.
They do
not stress whether the process is truly Markovian and it is not clear
whether the authors even considered this issue.
Another example of a Markovian study of disease development is
found in Bush, Chen and Zaremba (21).
They use a stationary Markov
chain to describe the movement of 1000 adults with primary active
tuberculosis.
The 22 states are based upon disease form, disability
levels, and disease development levels.
initial state is included.
In additioQ, a reservoir or
Four basic requirements for a finite Markov
chain are each considered separately:
finite state space, standardized
time interval, Markov property and stationary transition probabilities.
It would seem that the Markov assumption would be difficult to meet,
particularly since future disease states and medical prognoses would be
highly dependent upon past history of disease progress.
The authors deal
with this by creating states in such a way that individuals in the same
state would have a common history with others in that state.
In effect,
22
this is converting an n-th order chain into a first order chain.
They then check for aperiodicity and irreducibility to find the convergent matrix of an equilibrium health process:
lim Pij
t-.<lO
t
= iT..
lJ
•
Irreducibility is dealt with in an unusual way.
Rather than facing.
the inconvenient and tedious task of determining whether all states
cOl:Jll1unicate with each other, the states are constructed so that all
states are accessible from some other state in the process. This is
accomplished through use of the reservior and death states.
Death
is considered a state in which individuals remain for a specified
period of time, e.g., until they would have reached age 100, and then
they
~,re
transferred to a reservoir state.
Thus all states conmunicate
with (are accessible from) each other through the death and reservior
states yielding an irreducible set of states.
The stationary convergent matrix as defined above can be used to
predict the distribution of the population after a given number of
time periods have elapsed.
By changing the transition probabilities
one could compute the effect by noting the changes in the resulting
equilibrium distribution. This is quite desirable in proposing alternative
models, and efforts along these lines will be attempted in the latter
part of this study.
Alling
(22) also studied tuberculosis using a discrete time
stochastic model.
He cites the advantages of working with a discrete
model as opposed to a continuous one, particularly in the case of clinical
disease.
A discrete Markovian process with constant transition prob-
abilities (first order) is discussed.
Three states are originally
23
suggested, but because of a non-stationary relapse function the states
are further subdivided as recommended by Boag (23) to create a more
workable stationary function.
Meridith (24) evaluated the cost effectiveness of a special program
in a California mental hospital by analyzing movement of patients into,
through, and out of the program.
One basic assumption that was made
was that the group was homogeneous when, in fact, it might not have
been.
A chain was assumed and therefore long-range predictions might
not be applicable particularly with regard to death.
In order to
force the Markov chain, a geometric distribution was used instead of
lognormal, which actually fit the data.
First passage times, long-
terms trends and stay times were derived.
Program modifications were
simulated and the resulting costs analyzed.
Often, investigators will use their data to create states, and
immediately estimate transition probabilities without any regard for
the necessary assumptions.
This was the case with the study of dental
caries process by Lu (25).
He develops
an irreversible Markov chain
where the states represent stages of tooth decay for the five surfaces
of the teeth.
The transition probabilities are calculated for various
paths between a healthy tooth to a tooth with all surfaces caried.
Various Markov statistics are derived:
mean number of transient states
occupied, the mean number of steps before absorption, among others.
No
attention was paid to satisfying the Markov property nor the assumption
of geometric holding times.
Bithell (26) discusses the application of non-homogeneous Markov
chains to the analysis of a hospital admissions system.
All lengths of
stay are considered geometric and the time interval used is one day.
24
There are two examples in the, literature where one investigator
proposes a Markov chain model for his data and a semi-Markov process
is later suggested by someone else.
et
~.
The first is the study by Zung
(27) of genetic and/or environmental influence in determining
sleep and dream patterns.
Six sets of twins were studied, four of
which were matched on age and sex with zygosity varying.
The six stages
of sleep were recorded for all subjects for four consecutive nights.
From the collected data, transition matrices for a Markov chain were
tabulated.
Using these probabilities, sleep patterns were regenerated
using the Monte Carlo method and the results were plotted and examined
for differences among the groups.
Yang and Hursch (28) challenged Zung's model particularly because
the holding times were not geometric.
They suggest a way of testing
the geometricity and this technique will be employed in the early examination of chains in our study.
at time t,
state i,
If J t is the state the chain is in
is the consecutive units of time the chain is in
and Xt
then the assumption of geometricity is expressed as follows:
Pr(X t = klJ t = 1) = Pii k- 1(t) (1 - Pii(t))
k
= 1,2, .....
After Yang shows that Zung's data is not geometric he goes on to assume
instead that the data are modeled best by a nonhomogeneous semi-Markov
process.
The transition matrices as well as the duration distribution
functions are estimated using the method described in Moore and Pyke (19).
A second example of a Markov chain being reworked as a semi-Markov
process is Thomas' (29) study to predict the recovery of coronary care
patients to estimate the needs for various hospital resources.
The
objective was to develop a technique that would not depend upon the
25
value judgnrents of medical personnel as had been done in past predictions of patient care needs.
Thomas relied instead on information
already existing in hospital information systems.
Four states were defined based on symptoms, medication, diet and
other criteria which would normally be found on hospital records.
Since Thomas sensed the lack of memory property would be violated with
these simple states, he defined recovery phases within the states to
give rise to a total of 14 Markov states.
State-to-state transitions
as well as time spent in each state were investigated.
The model was
tested by comparing 1ength-of-stay distributions for the samples generated
by the models, with the empirical distributions obtained over a two
year period.
Kao (30) further develps Thomas' use of Markov models.
He considers
new state definitions for Thomas' data and uses a semi-Markov process
instead.
In so doing, the amount of time spent in each state is allowed
to be a random variable.
By using a sMP model it is suggested that
there are numerous characteristics of the process that are easily derived
For example, the interval transition probability, 0ij (n),
is the probability that a patient is in state j on day n given that
and computed.
he entered state
i on day zero.
An analagous question arises with
respect to the data for this study, concerning disability retirement or
sickness after a certain period of time has elapsed.
Other probabilities and distributions of interest include first
passage probabilities, the time it takes to reach state
j
from state i
for the first time and the overall length of stay distributions.
Kao
goes on to apply the derivations above to Thomas' data and concludes that
the statistics available are quite useful as well as computationally
26
more simple.
Both Thomas' and Kao's approaches will be examined with
respect to our occupational data to see which model best enables the
desirable predictions to be made.
1.9 Outline of Subsequent Chapters
Chapter II will deal with examination and testing of a discrete
Markov chain model for the retiree data base.
The null hypothesis
for a Markov chain, (can a Markov chain be fit to this data) will be
explored through the basic criteria for chain construction.
Chapter III
will be devoted to modeling semi-Markov processes to the data.
There
are several parameters and functi'ons which are easily computed and
derived from an sMP that will enable us to compare the two cohorts.
The results of the methodologies developed in Chapters II and III
will be employed in Chapter IV, an applications chapter. Once
techniques are established using the cohort data,questions about other
health problems can be explored. Since the preliminary analysis
deals with a retrospective cohort, we would like to model data stemmirg
from a different epidemiologic design.
Thus, we will examine a case-control
study of leukemia and compare the occupational histories of the cases
with their matched controls.
The study will conclude with a summary of results, conclusions and
suggestions for future research.
-
1~:~!1)140J.: l~E
:":!n:c
------.--- .. ----- --·i~~~ -
-·-----------.-;:~I·
(.
/ /7 7 ,;:.-"';...... ;"....~,(/ c_~,"~
_.'... :_i_r_~~_'~.:·_15;_~_:_::~:.:·.>_=_~ ~.-:' . :...:~ .l~;;~.~'=-'::
_______________.
~cx
,.
,'.
C~
•1
1'''.
"'hhl--------------,'----- - - --
20
~.:;.
2=-1.9-4;_.J:=-~_'__ 's:_-1~-:S_ _ L3J.~~I?? l.'... r'::$
7-22-1..2!
_~-',J
.
I-.!:\n
I
':"lJf Cry c-
"
"i.!~ ~ t
- , ,
:... '.n c'l
r
,. . ~-Q'
(. ~.~.
;!oJ.:.......
... ,
_, _, ---.----,--.- . .
._u~2:.._
-
• - .
I
n.!. . .' J!) 1__ ~ _' ~ -'__
1..
V_
r--:----,
r------!
1/
.•
~.::L..:L...1-.-·~'---<~I---
A-CNloQ-I-Q3
I
'
'y
.
.
I-L-
-C~=+_:
-.:....--- 1!
i
;------1L
______,,
_
Cili7cn
\
- - ----_. --.---
Yc:;
-------------------.
._._.
_._.
_
.
u.s.
• & .. , -I
:
.
'
Sal. I ',,,-... ~ c.lJ-t.-.'~
; _,
I.
~r~ -. -:..- -- -
•
.
E
.
,I --,;.- - - - - - - - __
'<':<./...:..7":'__''/':-:-':''-2. '--",..r~-i. A..-<' . __ __ _ _
~I1
~
I
I
--------.--- h
n
-
_
.__..
-
•
c·
I
-~-=~:~::~~~~~~=
[j:-7-6;_:._7I':l:1.?_._t_~ __~~:_.:::J_.
I.:':"
I~:-:'':'
:'+ 'lrJ. ~ij,:';r;'
:'cinst •
__.
LP.:~~~e.
L
•
~
h_~_'
Pro,:,. {hncr
i3 .:'1.1:
i
l~::~"'_~'-_~ . (_I;:. I.!~:~\·.. ~:_I~I:!.::~~:::!:.:!
1,
12-5..lo1t _~lr,-"s_ •....!'.o 17-n
• 70
1_
r--
..
7-16-l~2 ~C~llcd for U.3.r.S. Serv
.73
I .--- --. -------.-
I'·
!
g-31-41~ 1~:~~':~_~_-O_-_?-_-_C_-?~-.-ii---3~~_.
':''--J..·.j~ __ 1 7 i1.L:L A~:;.J_·--_I
:
...
Y,here I;"rn
I,:~:-on O:-:,io
.75
l.Ce.
!G-:"0~_'3__ ,~' 'l~.:~~s
.2...,
-
.. _
I
03
_,'ow, :tn-,g -[:OJ,;;'
Cvon '0,,10"
04 J~:'i.-L.:.~_::,,1::;1_ _ I_;J-~~15
Se:'v. I·:.:tr:
~1r.'D o~ _1_~__ lz-t0)-3-;---U;--1~1--1 'I
"
-:::::'-L 0(, _1_c:.'.?~~7
: ~·
:$..u_IjUt.l_J.1J..f)I.').~__
9~?
~~,=,3
~o.
- - --_. -..
_/1;
Il'iUI~ocrIJCIl(lS'
J.
-
FOLDER
".,dal SCl':lrily
.. -
._ ._. __-...
;)~:~
I'
'I~SONj\;!~L
11-1/--------.
- -------,---------------- --------- - -,- -~_ D;.:.-r·_!....:-.:.r:r;~~L
C'..\,.;,;.'/C.\l'lO:-;
_~.~~r lllwr~
--------
n,_n"
~";(p 01
;\iJr:id
S
'I
O-F~'
-
-
CODED WORK HISTORY
Fi gure 1.1
_
.
-
_
-
- - - - - - - - - - - - - - - - -
,
- --- --.-- . ----.-------.
,
.;.
.
~---- ----~~~:.:~- ~-.~-::~;,'.
I
-----==~---------1
------------------ ------.
F=-=--=-~..;.I:~~~-:---------------;------_
I
1._
I
,_'
I
... -.
N
Source : Company Personnel Files
o",J
28
CONCEPTUAL FRAMEWORK OF EXPERIENCE TRANSFORMATION
ALGORITHM
Fi gure 1.2
OTG 12
OTG #1
t
...
CEMG 7 • • •
r
...
CALO....
..
OTG
OTG....'3
BEBL••••• TRBL.VVMA, •••
...
no
,
ALLO ••••
1
7-10-41
2
7-22-42
3
9-01-44
4
12-06-44
- - - - - - - - - - - - - - - - - - - - - - - - - -- - - -
-
12-12-49
5
10-12-57
6
8-09-65
7
12-28-65
8
9
5-01-72
10
9-01-73
Totals
Source
0.0
Spirtas (8)
0.0
14.3
...
17.9
CHAPTER II
EXAf4INATION AND TESTING OF THE
DISCRETE MARKOV CHAIN
2.1
Introduction
In this chapter we examine a data base consisting of normal and
disabled retirees from the rubber industry. After a brief review in
Section 2.2 of the existing data base and its sample design, we consider
the fit of a simple discrete Markov chain to these data.
In Sections
2.3 through 2.8 we explore the individual criteria for chain constructstate selection and definitions, time consideration and choice
ion:
of interval length, time stationarity, and the Markov property itself.
In addition, the length-of-stay in each OTG will be examined to see if
a geometric distribution is followed, a necessary condition for a Markov
chain.
These analyses suggest that a Markov chain is an inappropriate
model for these data, perhaps as a result of the tendency for workers
to stay in the same work area for long periods of time, indicating the
need for a more general process such as a semi-Markov process in which
the holding times are a part of the variability of the process.
2.2 Background and Data Source
The Occupational Health Studies Group (OHSG) is a research group
within the School of Public Health at the University of North Carolina
that entered into a contractual agreement in 1970 with the United Rubber
Workers and four major tire manufacturing companies to study work-related
30
he~lth
problems.
from the
The group is multidisciplinary with scientific support
Departn~nts
of Biostatistics, Environmental Sciences and Eng-
ineering and Epidemiology.
We will use data from a random sample of rubber workers taken at
a facility in Akron, Ohio to estimate parameters for our mathematical
model.
For this industry there has been mandatory retirement at age
65 since contract negotiations of the late 1950's, while disability
retirement covers those individuals who must retire prior to age 65
due to disabling health problems.
In particular, the sample derives from three files at OHSG, which
are utilized for various cohort studies. The first file or Master file
consists of all active or terminated Akron hourly (factory) workers
alive on 1 January 1964. This file permits the reconstruction' of the
ten year retired and working population at this one plant, and records
all entries and exits from the industry.
at risk has been
enun~rated,
That is, the entire population
thus creating the baseline population for
monitoring the incidence of disease among the working population.
The
cohort was identified on the basis of a thorough check of both management
and union records including payroll tapes, pension records and employment
exits and transfers.
The follow-up has been estimated to be 97% complete.
From this data set a second file was constructed containing all males
from the Master file aged 40 or over on 1 January 1964. The third file
is a stratified random sample of the second file for whom detailed work
histories were collected. All the normal and disability retirees from
this last file were selected for this present study.
the sampling flow.
Figure 2.1 illustrates
For each individual in our study, a detailed work
history was readily available, completely describing all the jobs and
job changes comprising that worker's career in the industry.
31
A sample segment of a work history is displayed in Figure 2.2.a.
Each line of recorded information shows the item number which indicates
the job order chronologically, a brief description of the job held and
the dates of entry and exit.
The column designated OTG shows the
corresponding Occupational Title Group as assigned from a scheme which
aggregates the Occupational Titles (see Section 2.3).
The work histories
were then used to generate paths, which are chronologic orderings of
observations of the OTG's occupied by the worker at successive time
intervals.
Figure 2.2.b shows how the work history for Figure 2.2.a
would appear in path coded form.
2.3 Choice of States and Preliminary Analysis
The Occupational Titles in the rubber industry can be collapsed
to form 19 standard Occupational Title Groups (OTG's).
This is
accomplished very easily in that the 19 OTG's are general categories
which are comprised of many OT's with similar descriptions.
For example,
tire batch preparation, tube batch preparation,pigment blending and
mill mixing would be four distinct OT's.
19 OTG's,
However, when grouping into
they would be a part of the Batch Preparation OTG in which
raw ingredients (rubber, fillers, oils, and pigments) are weighed and
mixed together. Appendix 11.1 gives a listing and description of these
processes oriented OTG's.
Preliminary analysis focused on examining the normal and disability
retirees with respect to differences in length-of-stay in the work areas.
Since the intent was to depict differences in the number of visits as
well as in the distributions across length-of-stay categories, time
was categorized into seven groups of progressively longer stay times.
This approach indicates, as shown in Tables 2.1 and 2.2
that the disability
32
retirees had worked in certain areas such as tire building and curing
more frequently than the normal retirees and that these workers experienced several short term exposures to the environmental agents of
these process areas rather than one long stay in the same area.
Two conlnents regarding Tables 2.1 and 2.2 are necessary here.
First, note that there are 18 OTG's used in the preliminary analysis,
as 'sick' was not analyzed separately in the initial runs, but was
included in OTG 18 with 'all other' jobs not previously grouped in
OTG 1-17.
Secondly, the expected number of visits for a geometric
distribution are calculated using the normal retirees as the standard.
That
is, the values appearing in the cells labeled expected are those that one would
expect if the corresponding proportions in the normal group were
operating in the disability group.
At first it seemed that the Markov property would be difficult
to satisfy if "time" was real time and "state" meant OTG.
Thus time-
OTG combinations were used to form states in order to construct a chain
that would be easy to work with.
were defined.
Figure 2.3 shows how these states
It became apparent, however, that n-step transition
probabilities (probability of going from one state to another in n steps)
would not have useful interpretations because of the modified definition
of time, and this approach was discontinued.
2.4 Selection of Time Interval
The first consideration in working with the time parameter is whether
or not to model with a discrete or continuous time parameter.
Permitting
time to be continuous suggests that an OTG change can occur at any point
in time.
Although theoretically a job change can occur somewhat randomly
with respect to time, time homogeneity assumptions are more difficult
33
to satisfy with a continuous time parameter. Since the workers cannot be
continually observed, data collection considerations suggest that a discrete
time parameter be used.
Therefore, a discrete model will be utilized,
and varying time periods of transition are examined.
The choice of
discrete time necessitates careful consideration of the constant time
between transitions.
One should select a time interval that is short
enough to detect job transfers, but is long enough to be meaningful
in the health impact sense in order to be able to meet the Markov chain
assumptions as well as keep the problem computationally simple.
Because the data were collected in path coded form, the OTG is
observed for successive days, months or the interval of choice, and
any state changes occurring in the interim are virtually ignored.
This
identifies the notion of a missed transition, a concept that is extremely
important in determining the time interval to be used.
An example of
a work history with a missed transition is shown in Figure 2.2.b.
The choice of a one month interval yielded relatively few missed
transitions in every OTG for both populations as can be seen in Table
2.3.
In every case except SICK, over 99% of the moves were detected.
Since it is highly likely that sick periods would be less than one
month, it is reasonable to miss a higher proportion of these transitions.
When the three month interval was examined for this type of sensitivity
as seen in Table 2.4, similar findings resulted.
That is, a higher
proportion of transitions to sick was not detected, although over 90%
of the remaining transitions were picked up.
With both one and three month intervals we miss sick with a higher
frequency than any other OTG, because stays in sick would, in general,
tend to be shorter than visits to work areas.
However, as seen in
34
Tables 2.3 and 2.4, the proportion of sick transitions missed is higher
in the normal retirees, with a striking difference occurring when the
three month interval is used.
We can interpret this to mean that the
disability retirees are spending longer periods of time in sick than
the normal retirees, as would be expected, and with our method of
generating the paths, our ability to observe the visit is increased.
Although the choice of a one month interval proved to be more sensitive
with respect to this type of error, we feel that the Markov property
will be difficult to satisfy with such a short transition interval,
and we will proceed to use a three month interval.
Because of the small number of visits to some OTG's, the 19 OTG's
were subsequently reduced to 10 OTG's as described in Appendix 11.2.
Note that by the nature of a missed transition as given in the footnotes
to Tables 2.3 and 2.4, reducing the number of OTG's can only result in
a reduction of errors.
Since there will be fewer groups in the 10 OTG
scheme, there would be a greater likelihood for a transition to occur
between jobs which fall in the same OTG, thus there would be less chance
for a true miss, except for the OTG corresponding to sick leave.
A suggested technique for assessing the effect of the "misses"
involves a redefinition of the paths that would be dependent on the
type of transition missed.
For example, if a missed transition by the
previous definition involved a stay in the missed state for longer than
half of the selected time interval, then one could redefine the path to
include a visit to that state.
Suppose, for example, a portion of the worker's path is:
month:
1
2
Q)
4
5(0
OTG:
1
1
3
1
1
2
35
Also, suppose that the interval of choice is three months.
That
1s, we look at the OTG for month 3 . n (circled months) for all integer
n until the worker's work history has been exhausted.
In the example
above, the worker is in OTG 3 at month 3 and OTG 2 at month 6 and the
visits in the fourth and fifth months to OTG 1 are considered misses.
By techniques suggested above, the two months in OTG 1 comprise more
than half of the three month period and the OTG for month 6 would be
changed to 1.
Another algorithm for detecting job changes that would otherwise
be omitted would be to record all sick days regardless of the length
of visit.
For the purposes of this section, we will use a three month
transition interval, since there are a high proportion of visits in
excess of three months, and we will make no adjustments for missed
transitions at this point in the analysis.
2.5
Estimation of the Transition Matrices
Once we have determined the state definition and choice of discrete
time interval, it is computationally simple to estimate the matrices
of transition probabilities for the two populations.
Recall the
estimation procedure from Section 1.6. Assuming the independent behavior of each individual, the maximum likelihood estimate of the transition probability of transferring from OTG i to OTG j
Pij
denoted by
is:
Pij
i, j
= 10
L
k=l
= 1,2, •.. ,10
nik
The n..
are the total observed number of transitions from OTG i
to OTG j
for all the individuals in the population.
lJ
(2.5.1)
36
The resulting estimates which were generated from the path-coded
data are displayed in Tables 2.5 and 2.6.
Note that the extremely
high proportion of transitions are returns to the same OrG as evidenced
by the high probabilities along the diagonals of the matrices. This
indicates an overall tendency to remain in the same job for periods
longer than three months, except in 'sick'.
There are some instances where estimates of the transition probabilities are zeros.
This does not mean that a transition of that type
is theoret i call y impo ss i b1e, but rather that the re were no observed
transitions in that cell of the transition matrix.
As a matter of fact,
by our state definitions all states communicate with each other, meaning
that there is no inherent reason why a transfer from one orG to any
other could not take place.
2.6
Holding Time Distributions
After estimating the two transition matrices for the Markov chain,
we are able to test the asusmption of geometric holding times, a necessary
condition for a Markov chain.
For each state or OrG, the frequency of
visits of varying lengths were generated and compared with what would
be expected if the holding times were in fact geometric.
Recall the
probability density function of a geometric distribution:
f(x)
= pX
(l-p) ;
x
= 0,
1, 2...
0 < p
~
1
(2.6.1)
It is permissab1e to have a distinct geometric distribution corresponding
to every state of the chain, each with a different parameter p.
they must all take the form specified above.
whether under the null hypothesis:
But,
We are actually testing
37
Prob (staying k consecutive time units in OTG i
I
chain
is in OTG i)
"k-l
= Pii
(l -
,..
Pii)
k;::1,2, ..
d
(2.6.2)
where the Pii1s come directly from the transition matrices discussed
above, and the observed frequencies of length-of-stay are enumerated
from the path-coded data.
Table 2.7 gives the chi-square goodness-of-fit tests for each of
the 10 OTG's for the disability group.
Table 2.8 summarizes the results
of the tests in both populations, as well as for different combinations
of OTG's and two transition interval lengths, using different values
for the Pii's as generated by the underlying chain.
Clearly the
assumption of geometric holding times is not supported for any of these
schemes.
2.7 Time Stationarity
A criterion for constructing a simple Markov chain is the stability
of the estimated transition probabilities over time.
When making mean-
ingful inferences from the parameters derived from the chain, it is
desirable that the probabilities be stationary as defined in Section 1.5.
The real possibility exists that the movement of workers throughout
the plant is a function of calendar time; perhaps hiring policies or
production needs changed from year to year.
Non-stationarity could also
reflect the instability of worker movement at certain points in his
career such as the periods following entry into the industry and during
the pre-retirement years.
These potential sources of non-homogeneity
with respect to time are examined next.
38
In testing whether there is a time trend in the transition
probabilities of a chain, it is first necessary to generate transition
matrices for the specific time periods of interest.
We then examine
the null hypothesis:
Ho :
Pi j ( t) = Pi j , fo r a11
t = 0, 1, 2, ••• ,T
In so doing, we again employ the chi-square statistic for each of the
m states of the chain.
x~ = l n.(t-1)
t
(2.7.1)
,j'
Pij
is approximately x2 with
(m-l)(T-l)
degrees of freedom.
2.7a .. Calendar Trend
Since the work histories of the retirees span the years from 1910's
through the 1970's, six transition matrices were generated, one for
each decade.
This was achieved by having the worker1s transitions
during a decade contribute only to the corresponding
~
matrix.
For
example, a worker with a 40 year career will have contributed to 4,
possibly 5 separate matrices in this estimation procedure.
The beginning of the time span (1910's and 1920's) and the end
(1960 s) contained far fewer workers than the other decades and thus
I
this concern with stationarity will be restricted to 1930 through
1960.
We began by looking at short (20 year) periods of time to see
if there were fluctuations in the probabilities.
First, a matrix
was generated for the period 1930-1949 and these probabi 1i ties were
tested against individual decade estimates for 1930 and 1940.
A similar
39
test was performed for the period 1940-1959.
Results showed that we
could not accept the null hypothesis of no calendar trend, as there
were some OTG's which exhibited fluctuations with time.
However, upon
close examination of the estimates in every OTG where there is nonhomogeneity, the largest contribution to the chi-square value came from
the probabil ity of moving to OTG 10.
Recall ing the composition of the
OTG1s (see Appendix 11.2) we see that OTG 10 is comprised of jobs which
do not fall into any of the previous categories and those that are unknown.
It is not unreasonable to expect more instability in this group
because of its heterogeneous nature, in that these jobs might be more
likely to reflect changes over time.
So, although strictly speaking
there is not overall stationarity with respect to every type of transition,
we see that there is not much time trend in OTG's 1 through 9 and we
will assume stationarity.
2.7.b.
Temporal-Career Trend
In order to determine whether there are significant fluctuations
in a worker's movements as he accumulates more time and experience in
the industry, we tested stationarity over three periods in a worker's
career.
Each path was subdivided into three time periods:
first five
years, five years prior to retirement, and the remaining central portion
of the work experience.
Separate matrices were generated in a fashion
similar to that used for decade trend in Section 2.7.a to correspond
to early, middle and late career.
Results showed that the Pij's for
middle career more closely approximate the overall estimate than those
for the other time periods, indicating that type of transition is affected
by the time point in an individual's career.
For future development
40
of Markov models we may want to cons1der using separate chains to
correspond to these career differences, as there are evidently several
underlying processes operating separately from one another.
2.8 The Markov Property
When fitting a model to a spec1fic data base, it is essential to
test the adequacy of the model with appropriate goodness-of-fit tests.
It is often the case in the literature that a Markov chain is fit to
the data without regard to the Markov property, that 1s, the Markov
property is assumed to hold and never tested rigorously. The Markov
property asserts that the conditional probability of being in a particular
OTG in the future, given the past and present OTG's, depend only on the
present OTG the worker is in.
That is, the present state carries
sufficient knowledge about the past for prediction purposes. This can
be expressed mathematically as:
= P {Xt +1 = j I Xt = i} = Pi j
for all
t = 0,1, ... and where i,
(2.8.1)
ko' kl , ... represent OTG's that a
worker can occupy at a given observation at time t.
j,
In testing whether the property holds, we actually assume that
it does, and test its validity versus a more general assumption of
higher order dependence.
In other words, we check to see if more
knowledge than the present state of a worker's occupational history
is needed to determine the likelihood of future events.
Here we will
test first order dependence against second order dependence.
If
41
they turn out to be not equivalent. all we are acknowledging is a dependence
that is higher than first order.
Because of the mathematical difficulty
in testing dependence that is higher than second order, we never go
further to find out to what extent this dependency occurs.
If we
reject second order dependence, then we will be satisfied that
the Markov property holds.
If we let Pijk be the probability of transferring from OTG
i to OTG
to OTG k on successive observations, and likewise let Pjk be the
transition probability from OTG j
to OTG
k, then the null hypothesis
becomes:
HO:
Pijk = Pjk'
for all
i.
j
and
k.
Two test statistics will be used to test this hypothesis:
(1)
The ordinary chi-square statistic as suggested by Zahl (see
Section 1.7).
X6 = I I I
(I nijk ) (I nijk )
k
(I I
i k
(2)
2
k
j
where
(n·· k - E (n. ·k) )
lJ
lJ
i
nijk )
Kullback, Kupperman and Ku give another chi-square statistic that
is asymptotically equivalent to (1).
2
II I
nijk In{ nijk }, with the computational form
i j k
E(n ijk )
given as
(1/2)X
2
k
=I I I
nijk 1n{n 1jk }+ ~ n.j.ln{n. j .}- ~ ~ nij • In{n.j.}
- ~ ~ n· jk In{n· jk }
J
1
J
(2.8.3)
j
42
Both these chi-square statistics are distributed with
m(m - 1)2 - Co degrees of freedom, where Co represents the number
of cells with no observed transitions.
The number of degrees of
freedom would be reduced even further if there were any cells with
theoretically impossible transitions.
The Ku11back, Kupperman and Ku statistic is a minimum discrimination
statistic and is discussed further in Kullback (31).
Shachtman, Hogue
and Schoenfelder (32) used both these statistics to test the Markov
property in their Markov chain based on a comparison of post-abortum
and post-partum deliverers. They note that the KKK statistic tends
to smooth out the chi-square contributions made by expected cell
.$
frequencies of less than one by using the logarithmic transformation.
It
also noted that thestatist1c is more robust with respect to
infrequent occurrences of improbable transitions.
Table 2.9 gives the results of the computations of both statistics
for a Markov chain based on a transition interval of three months.
It
is obvious that for a simple Markov chain, the Markov property does not
hold in either population, suggesting the need for a more general model.
That is not to say that the Markov property need not hold for a more
general Markov process, but that the testing procedures are different
when time is incorporated into the model.
2.9 Summary and Conc1 usions
To summarize, in attempting to develop a discrete Markov chain,
we first examined several possibilities for the state space and looked
for difference between the two populations with regard to visits to
and from certain work areas and the 1engths-of-stay in these areas.
We
43
then assessed in detail the merits and drawbacks of several potential
time intervals to be used in the chain development.
Different combina-
tions of state definitions, time periods. and methods for handling the
concept of missed transitions have been examined.
Once decisions re-
garding the above criteria were arrived at. the chain was constructed
and tested for three properties:
and the Markov property.
geometric holding times. time stationarity
Failure to exhibit either the Markov property
or geometric holding times indicates that the simple Markov chain is
inappropriate for modeling these data -- at least given the variety of
schemes of time and states that were chosen.
The next section will be modeling and fitting a semi-Markov model
to the data, in which the length-of-stay becomes a vital portion of
the process.
The type of data has the characteristic of a tendency
to stay in an OTG for a long period of time, indicating that a model
which incorporates time might be more satisfactory.
Figure 2.1
SAMPLING SCHEME FOR RETIREE DATA BASE
TUDY'.\
iNDOM SAMPLE
PAR
PAR
'!! ~.~V
\ ----/
'ata
\
\,
~
base
for
this study
DISABILITY"', ,
kETIREES
n = 92
)
~
/'
e
e
~
e
45
Fi gure 2.2.a
ITEM
- -#
JOB DESCRIPTION
---------
DATE-IN
--
DATE-OUT
--_._--
OTG
01
Ti re
12-24-32
11-24-33
3
02
Clerk
11-24-33
02-02-34
7
03
Sick
02-02-34
02-05-34
9
04
Clerk
02-02-34
11-29-34
7
05
Curing
11-29-34
07-30-35
5
06
Sick
07-30-35
08-30-35
9
Sui 1der
Figure 2.2.b
3
SAMPLE "RAW" WORK HISTORY
3
PATH-CODED DATA RESULTING FROM WORK
HISTORY SHOWN IN FIGURE 2.2.a IN ONE
MONTH PERIODS
3
3
3
3
3
3
3
3
3
7
7*
3
3
3
3
3
3
3
3
3
3
5
5
5
5
5
5
5
5
9
NOTE: * Item 03 in Figure 2.2.a is considered a missed transition in
that the duration of sickness was less than one month and occurred
between two examinations of the work history, spaced at one month
intervals, thus it does not appear in Figure 2.2.b.
46
Figure 2.3
Length of
stay
DEFINITION OF THE STATE SPACE FOR PRELIMINARY ANALYSIS
1
2
3
3 months
$1
Sa
.........
3 - 12 months
$2
$9
12 - 24 months
$3
o-
24 - 60 months
60 -120 months
120 -240 months
>
240 months
S7
47
TABLE 2.1
TOTAL NUMBER OF VISITS DISTRIBUTED
ACROSS THE OTG's DISABILITY RETIREES
2
p-Value
36.5
4.99
.026
53
41.6
3.14
.077
3
4
3.2
.22
.64
4
44
54.6
2.05
.153
5
60
61.6
.04
.842
6
34
42.5
1.71
.201
7
150
91.1
38.1
.000
8
66
34.6
28.5
.000
9
15
26.3
4.88
.028
10
139
90.5
26.05
.000
11
59
71.4
2.16
.142
12
28
56.2
14.14
.001
13
109
112.0
.08
.778
14
25
21.3
.65
.421
15
9
9.2
.004
.950
16
73
98.7
6.7
.010
17
68
110.5
16.32
.000
18
456
453.3
.02
.888
1415
1415
OTG
OBSERVED
1
23
2
TOTAL
EXPECTED
.-X.-.
2
X = 149.74
p-va1ue = 0.00000
TABLE 2.2
TOTAL NUMBER OF VISITS DISTRIBUTED OVER TIME CATEGORIES DISABILITY RETIREES
Time/
Observed
0-3
months
383
Expected 258.37
x2
60.12
3 mos.1 year
1-2 yr.
2-5 yr.
489
170
215
377.71
192.35
252.34
141. 56
108.55
32.79
2.60
5.53
22.60
31.58
5-10 yr.
85
10-20- yr.
50
>
20 yr.
TOTAL
23
1415
84.11
1414.99
44.40
p - value
199.61
= 0.0000
.;:..
co
e
e
e
49
TABLE 2.3
ERROR ANALYSIS OF MISSED TRANSITIONS
FOR ONE MONTH INTERVAL
OTG
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
TOTAL
NORMALS (n=439)
Number of
Number of
Transitions
Misses
5794
9535
72
10499
9070
6621
23286
3144
2426
16533
10284
11505
20318
5094
1548
18440
13841
1347
11672
6
6
0
10
10
5
14
6
7
21
10
6
13
1
0
14
14
40
181029
192
9
Average over all 439 indiViduals
Ratio
---
DISABILITY (n=92)
Number of
Number of
Transitions
Ratio
Misses
.0010
.0066
.0000
.0010
.0011
.0008
.0006
.0019
.0029
.0013
.0010
.0005
.0006
.0002
.0000
.0008
.0010
.0297
.0008
774
1483
114
1873
2303
578
8675
634
146
4110
2058
826
3339
623
259
2187
1800
1371
2403
1
5
1
4
1
4
9
5
1
12
1
2
2
0
2
8
4
28
1
.0013
.0034
.0088
.0021
.0004
.0069
.001 0
.0079
.0068
.0029
.0005
.0024
.0006
.0000
.0077
.0037
.0022
.0204
.0004
.0011
35556
91
.0026
.0012
Average over all 92
individuals
.0031
*Criteria for a true miss: A transition that is less than one month and
is to a state space different from previous and subsequent state spaces.
50
TABLE 2.4
ERROR ANALYSIS OF MISSED TRANSITIONS *
FOR THREE MONTH INTERVAL
OTG
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
TOTAL
NORMALS (n=439)
Number of
Number of
Transitions
Misses
1936
3184
23
3507
3032
2210
7784
1055
810
5509
3443
3846
6789
1704
520
6164
4640
453
3891
60500
Ratio
DISABILITY (n=92)
Number of
Number of
Ratio
Transitions
Misses
14
13
5
6
7
16
13
5
17
6
2
20
25
140
1
.0041
.0016
.0870
.0031
.0046
.0059
.0006
.0057
.0086
.0029
.0038
.0013
.0025
.0035
.0038
.0032
.0054
.3090
.0002
261
495
39
626
773
192
2895
211
49
1372
690
277
1118
211
84
732
606
457
798
1
5
0
1
3
2
11
10
1
17
2
0
2
3
0
5
12
61
0
.0038
.0101
.0000
.0016
.0039
.0104
.0038
.0474
.0204
.0124
.0029
.0000
.0018
.0142
0000
.0068
.0198
.1335
.0000
316
.0052
11886
136
.0114
.0059
Average over all 92
individuals
8
5
2
11
Average over all 439 individuals
*Criteria for a true miss: A transition that is less than three months
and is to a state different from previous and subsequent state.
.0134
e
e
e
TABLE 2.5
~.
11
eTG.
J
TRANSITION }~TRIX (MARKOV CHAIN). FOR
NORHAL RETIREi::S WITH ;3-MONTH INTERVALS
1
~
2
3
4
5
6
7
8
")
10
.9733
.0085
.0000
.0000
.0000
.0000
.0066
.0016
.0031
.0069
zli
.0038
.9714
.0025
.0009
.0013
.COIO
.0045
.0002
.0043
.0100
3!
.COOl
.0030
.9706
.0011
.0023
.OO2Q
.0035
.0000
.0047
.0126
41
.COlO
.0076
.0105
.9254
.0306
.0019
.0019
.0000
.0029
.0132
5'
.CDC2
.003::'
.0036
.0049
.9S3Q
.0033
.0073
.0000
.0035
.0111
61
.0000
.0021
.0053
.0012
.0062
.9551
.0076
.0009
.0041
.0176
7!
.0012
.0023
.0013
.0002
.0016
.0016
.9733
.0005
.0053
.0126
.0030
.0018
.0006
.0000
.oeoo
.0035
.0047
.9716
.0047
.0100
j
• ')20)
.0721
.0791
.0116
.OJ4l;
.0349
.2116
.0186
.5023
.014J
'0'
.0028
.0114
.0138
.0022
.0080
.0075
.0279
.0021
.0008
.9235
I
I
,
!
i
Bl
i
I
J
...
I
I
*Probability of transferring from OTG
i
to OTG •
j
U'I
-'
TABL! 2.6
T~~NSITION }~TRIX (MARKOV CHAIN)* FOR
DISABILITY RETIREES WITH 3-MONTH INTERVALS
OTG.:... " •
I
1
2
3
4
6
5
7
8
9
10
.925+
.0225
.0019
.0019
.0019
.0019
.0056
.0019
.0262
.0112
I
31
.0057
.9378
.0094
.0025
.0044
.0019
.0063
.0000
.0170
.0151
.0000
.0061
.9586
.0010
.0007
.0020
.0041
.0000
.0156
.0119
4!I
.OO~O
.0237
.0142
.7962
.0616
.0142
.0190
.0000
.0664
.0047
:; I,
.0015
.0029
.0051
.0117
.9271
.0036
.0058
.0007
.0219
.0197
i .0015
.0000
.0044
.0058
.0131
.9341
.0131
.0000
.0131
.0145
II
.0050
.00/,2
.0050
.0013
.0038
.0025
.9317
.0013
.0247
.0205
s:
.~O47
.0047
.0095
.0000
.0000
.0000
.0190
.9005
.0332
.028/+
o·
.0246
.0355
.0519
.0301
.0492
.0164
.0956
.0131
.6639
.0191
.0:)34
.0128
.0296
.0000
.0208
.0074
.0336
.0054
.0054
.8811
1
2
I
(;1
7
-
I
i
1 -
I
.... t" •
*Prob~bility
of transferring from OTG
i
to OTG •
j
tJ"I
N
e
e
e
53
TABLE 2.7
GOODNESS-OF-FIT TEST
FOR GEOMETRIC HOLDING TIME DISABILITY
INTERVAL = 3 MONTHS
Length of visit
in months
OTG 1
3- 8
9 - 17
18 - 26
27 - 35
36 - 53
54 - 80
> 81
OTG 2
3 - 5
6- 8
9 - 11
12 - 14
15 - 20
21 - 26
27 - 32
33 - 38
39 - 44
45 - 53
54 - 62
63 - 74
75 - 89
90 -110
> 111
Observed #
visits
Expected #
visits
15.
4.
6.
4.
2.
4.
5.
40
5.77
7.13
5.65
5.74
5.07
5.36
5.23
18.
5.
10.
5.
9.
4.
7.
5.
3.
3.
4.
4.
4.
4.
14.
6.16
5.77
5.42
5.08
9.23
8.12
7.14
6.28
5.52
7.06
5.82
6.21
5.82
5.57
9.81
99
2
X
l
= (Obs
=
- Exp)2
Exp
14.78
1.37
0.02
0.53
1.86
0.35
0.02
18.92 P = .002
22.77
0.10
3.88
0.00
0.01
2.09
0.00
0.26
1.15
2.33
0.57
0.79
0.57
0.44
1. 79
l
= 36.76 p = .001
54
TABLE 2.7 continued
GOODNESS-OF-FIT TEST
FOR GEOMETRIC HOLDING TIME DISABILITY
INTERVAL = 3 MONTHS
Length of visit
in months
OTG 3
3- 5
10 - 11
12 - 17
18 - 23
24 - 29
30 - 35
36 - 41
42 - 47
48 - 53
54 - 62
63 - 71
72 - 80
81 - 92
93 -104
105 -119
120 -137
138 -161
162 -197
> 198
Observed #
visits
14.
14.
11.
12.
8.
4.
6.
8.
4.
4.
3.
1.
4.
2.
5.
2.
2.
4.
14.
Expected #
visits
23.
7.
3.
1.
1.
8.
43
- Exp)2
Exp
5.05
9.48
8.71
8.01
7.36
6.76
6.21
5.71
5.25
7.08
6.24
5.50
6.32
5.34
5.52
5.26
5.22
5.16
7.81
15.86
2.15
0.60
1.99
0.06
1.13
0.01
0.92
0.30
1.34
1.68
3.68
0.85
2.09
0.05
2.02
1.99
0.26
4.90
l ..
92
OTG 4
3- 5
6 - 8
9 - 11
12 - 17
18 - 23
> 24
x2 = (Obs
8.76
6.98
5.56
7.94
5.04
8.72
41 .86 p
= .001
23.13
0.00
1.18
6.07
3.24
0.06
l
=
33.67 p
= .0000
TABLE 2.7 continued
GOODNESS-OF-FIT TEST
FOR GEOMETRIC HOLDING TIMES DISABILITY
INTERVAL = 3 MONTHS
x2
Expected #
Length of visit
Observed #
in months
vi sits
vi sits
OTG 5
7.29
28.
3- 5
6.76
9.
6 - 8
6.27
8.
9 - 11
5.51
3.
12 - 14
5.39
5.
15 - 17
9.62
8.
18 - 23
8.27
3.
24 - 29
7.11
4.
30 - 35
6.11
4.
36 - 41
5.25
4.
42 - 47
6.53
5.
48 - 56
5.20
2.
57 - 65
5.33
2.
66 - 77
5.50
3.
78 - 95
9.57
12.
> 96
19.
5.
5.
4.
2.
3.
7.
45
= CObs
_Exp)2
Exp
58.83
0.74
0.48
1.36
0.03
0.27
3.36
1.36
0.75
0.30
0.36
1.97
2.08
1.14
0.62
2
100
OTG 6
3- 8
9 - 17
18 - 26
27 - 38
39 - 53
54 - 74
> 75
55
X
= 73.62 P = .0000
31.18
0.68
0.13
0.78
2.43
1.05
0.41
5.69
7.21
5.89
6.21
5.73
5.38
8.90
x2 =
36.67 p = .0000
57
TABLE 2.7 continued
GOODNESS-OF-FIT TEST
FOR GEOMETRIC HOLDING TIMES DISABILITY
INTERVAL = 3 MONTHS
Length of vi sit
in months
OTG 9
3- 5
6 - 8
9 - 11
12 - 14
15 - 17
18 - 20
> 21
Observed #
visits
73.
33.
9.
4.
1.
O.
3.
Expected· #
visits
31.
27.
21.
13.
11.
8.
6.
10.
3.
2.
5.
4.
11.
7.
3.
1.
3.
10.
176
= (Obs
41.34
27.45
13.22
12.10
8.03
5.33
10.53
- Exp)2
Exp
24.25
1.12
4.67
5.42
6.16
5.33
5.39
2
X
123
OTG 10
3- 5
6 - 8
9 - 11
12 - 14
15 - 17
18 - 20
21 - 23
24 - 26
27 - 29
30 - 32
33 - 35
36 - 38
39 - 40
45 - 50
51 - 56
57 - 65
66 - 80
> 81
2
X
20.82
18.36
16.19
14.27
12.58
11.09
9.78
8.62
7.60
6.70
5.91
5.21
8.65
6.72
5.23
5.74
5.84
6.67
= 52.33 P = .0000
4.98
4.07
1.43
0.11
0.20
0.86
1.46
0.22
2.79
3.30
0.14
0.28
0.64
0.01
0.95
3.92
1.38
1.67
2
X
= 28.41 p = .029
58
TABLE 2.8
NUMBER OF STATES WHICH HAVE GEOMETRIC HOLDING TIMES *
BY OTG SCHEME, INTERVAL LENGTH AND
RET! REMENTTYPE
Number of states
Number of OTG IS
Interval Length.
Disab11 ity
Normal
19
1
7
0
19
3
8
0
19
6
12
0
10
3
1
0
10
6
3
0
*For x2 goodness-of-fit tests, a
= .05
and minimum expected value
= 5.0.
Degrees of freedom ranged from 2 to 63 depending upon the grouping for
minimum expected value.
59
TABLE 2.9
TESTS FOR FIRST ORDER VS. SECOND ORDER DEPENDENCE
2
0
8000
X
Normal reti rees
Disability retirees
>
4339
2
X
k
Co
D.F.
p-val ue
1686
680
130
< .001
892
734
76
< .001
CHAPTER III
SEMI-t~RKOV
3.1
PROCESSES:
MODELING AND FITTING
Introduction
Because of the restrictive assumptions of a discrete Markov
chain, as seen in Sections 2.6 through 2.8, we will employ a more
general model, namely a semi-Markov model.
A feature of the semi-
Markov model (sMP) is the incorporation into the model of the time
spent in a state, which was one of the major problems in assuming
the Markov property in the earlier modeling efforts.
In this chapter
we present the estimates of the transition probabilities and holding
tiol1e distributions for an sMP derived from the retirement data.
The
remaining sections will be devoted to fitting distributions to the
holding times and examining the mixture distributions that fit best.
The final section will be concerned with comparing parameters such
as the first passage probability, cumulative first passage probability
and mean time spent in each OTG which will be derived from the model.
These comparisons will be between the normal and disability retirees,
and within each group we will look for differences among the OTG's
in order to better understand health outcome in relation to work experience.
Consider the following definitions which were introduced in
Section 1.6:
61
Def. 3.1.1
A semi:Markov process consists of the following
two components:
(a) a sequence of random variables
{X n: n
~
O} representing
~
O} where
the states which the system occupies over time,
(b) a sequence of random variables
{Y n: n
Yn = [Tn' Tn+1); each Yn represents the random length of the sojourn
interval in state Xn, which may depend on both Xn and Xn+1 .
The successive states {X n, n ~ O} form a Markov chain and given
that sequence, the sojourn times are independent.
Def. 3.1.2 The transition distribution function Qij(t) denotes
the probability that the process will make a change from state i
state j after spending no more than time t
Qij(t)
independent of n,
to
in state i.
= Pr[X n+1 = j, Yn ~ t I Xn = iJ,
that is, we assume time stationarity.
Def.3.1.3 The holding time distribution Hij(t)
is interpreted
as the probability that a transition will occur in a period of time
less than or equal to t, given the process is in state i and will
make a change to state j.
Hij(t)
Note that Qij(t)
= Pr[T n+1 - Tn = Yn ~ t I Xn = i, Xn+1 = j]
= Pij • Hij(t)
where Pij
probability of transferring from OTG i
is the transition
to OTG j.
Def. 3.1.4 The semi-MarkoY property asserts
= j, Tn+1 = Tn ~ t I XO,···,X n, TO,···,T n] =
= P[X n+1 = j, Tn+1 - Tn ~ t I Xn], for all (i,j) t and n.
P[X n+1
62
That is, the length of stay in a state is conditionally dependent
only on the current state and the next state to be visited and memory1ess
with respect to the prior sequence of states and holding times (33).
3.2 Estimates of the sMP Transition Matrices
In this section we present the results of the estimation of semiMarkov transition probabilities.
The state space remains the same as
that used in the discrete Markov chain and the interval of transition
employed here is three months.
One distinction between the transition probabilities computed
from a simple Markov chain and those from our sMP is the restriction
on virtual transitions in the latter.
That is, all transitions are
real in that the process goes from OTG i
permitted to equal
to OTG
j
with i
not
j. This implies that the transitfon probabilities
along the diagonals of the matrices are proportional to those from the
simple chain, readjusted to allow Pii
= O. The fact that the process
may stay in the same OTG for several time units is handled by the ho1_ing
time as will be seen in Section 3.3.
The estimates for the Pij's
are obtained in a manner similar to that of the chain using the maximum
likelihood method, and are given in Tables 3.1 and 3.2.
Comparing the ninth column in each of these tables, we see that
the disability retirees are visiting the OTG for 'sick' with a much
greater frequency than the normal retirees, a result which was expected.
,..
We are also concerned with using the Pij probabilities to represent all the movements from OTG i
might change with point in a career.
to OTG j when this likelihood
To test for stationarity, we
compared the Pij's for the normal retirees from Table 3.2 with the Pij
estimates for mid-career as described earlier in Section 2.7b.
The
63
result of the chi-square test "for time homogeneity was
x2 = 73.93
80
which was not significant, indicating that the use of overall
"
p.. 's
lJ
is appropriate.
3.3 Holding Time Distributions
The purpose of this section is to discuss the construction of the
holding time distributions.
Procedures for estimation, the results
of these procedures and a discussion of some of the problems with fitting
distributions to empirical data are given.
It is a rather simple procedure to produce theemp1rical holding
time distributions Hij(t) although the number of these distributions
for each population is quite sizeable. Assuming that the holding times
are dependent on both the current OTG and the OTG to which the transition
is being made, we must obtain M2 - M for each population, where M
represents
the number of OTG's in the state space.
This is done in
the following manner:
We observe the total number of visits from OTG i
call this number Nij(t).
of the Nij(t)
to OTG j
We also record the holding time for each
visits and employ an indicator function as follows:
"
-1
Hij(t) = Nij(t)
Nij(t)
k~l
£(t-T ijk ),"
(3.3.1)
where T.lJ'k is the length of time of the k-th visit from OTG i
OTG j
and
and
l;U > 0
du) = { 0; u : 0 .
to
64
Now that we have generated the empirical holding time distributions,
we have the option of finding a functional form for these distributions
or using those obtained empirically to generate other parameters that
might be useful in comparing the two populations.
The advantages of
having the hij'S in functional form lie in both the mathematical
simplicity of deriving other measures and the ability to handle observations which occur with very small probability.
simplicity, we mean that there are direct
By mathematical
co~utationalfonnulae for
deriving the previously mentioned parameters, and those formulae contain
expressions for the holding time distributions.
Thus, having these
distributions in concise form reduces the computational difficulty.
With regard to observations that occur with very small probability
and may be due to random error (outliers), we see that a functional
form provides an opportunity for such events to occur.
For example,
if we do not empirically observe any holding times greater than
10 years, we might assume that it was not possible, and attach a
probability of zero to a length of stay of that size.
However, because
we do not observe an event does not mean it is not theoretically possible,
and assuming that it is impossible can affect the resulting passage
probabilities by an overall increase due to several small contributions.
There are some drawbacks to finding a functional form for the
distributions, all related to the precision of the fit obtained.
In
order to find a reasonably good distribution fit, approximations must
be made with regard to modeling discrete data with a continuous function
and to sample size fluctuations.
One technique we use in dealing with
both these problems is aggregation of the length-of-stay variable into
yearly intervals to provide more observations at each point of the fit.
65
However, there still remain some passages that occur with very low
frequency and the data are not sufficient to create a model.
After reviewing the above choices available to us concerning the
holding time distributions, we choose to find theoretical distributions
that adequately fit our data.
We resolve the sample size problem by
choosing a sample size nO such that for nij ~ nO we will not
attempt to fit a distribution. We will employ the chi-square goodnessof-fit test to assess the adequacy of the hypothesized distribution
because of the testis simplicity with regard to assumptions, computational
ease, and the ability to accept a decent fit when one exists.
Since
the chi-square G-O-F test suggests a minimum cell size of five for
a reasonable approximation, we will choose nO
= 20 so that there
will be a minimum of four categories for each test.
Upon examination of the shape and the first two moments of the
holding time distributions, we see that the exponential family might
be well suited to the data.
In several instances, the mean holding time X
was quite close to the sample standard deviation, s, a unique characteristic
of the exponential distribution.
f(x)
The general form of the distribution is:
= 0- 1
(3.3.2)
Using the method of moments, we let the sample average holding time
X = 0, generating a theoretical distribution of the form 3.3.2 and
restrict the location parameter
~
to be zero.
When the chi-square
goodness-of-fit test was performed, we generated the expected number
of visits under the hypothetical distribution for each interval [a, b],
a
<
b by calculating the area under the curve between a and b,
66
which represents the proportion of visits of that interval lengths
and multiplying by the total number of transitions Nij .
(See Figure
3.1)
If Y = 0- 1 e- X/ 8 ,
1
then the proportion of visits in [a, b] =
b a- 1 .-x/a dx • _.-b/a + .-a/a
(3.3.3)
a
Thus s the expected number of visits from eTG i to eTG j of
length [as b] is N [e- a/ 8 _ e- b/ 8 ] •
ij
When comparing the observed and expected holding times, we
noticed that the visits of length 0-1 year were very difficult to fit.
We observed many more occurrences of hol di ng times of that 1 ength than
the model would predict, yet the fit was close in the other regions
of the curve.
There are apparently two types of transitions occurring which have
a length less than one year. Many of the movements of short length may
be a "bouncing" from one job to the next which is not significant and
can be due to a variety of reasons.
For example, temporary reassignments
to other work areas during times of increased production would naturally
result in short term stays.
Job dissatisfaction is another potential
reason for changing jobs before much time has accumulated.
Hence, we
may propose two stochastic processes to model the behavior:
U~
A
function which is a mixture of the exponential
distribution and some other function operating
on visits of less than one year
or
v ~ A process which ignores transitions of less than
one year.
67
In thinking about choosing U or V, we need to consider both
the substantive implications as well as the mathematical implications
Since we believe that initial exposure may be important in terms of
health impact,we will proceed with fitting a mixture distribution
as described in
above.
U
3.4 Estimating a Mixture Distribution
A reasonable model for the holding time distributions would be
a distribution that 1s a mixture of the exponential and some other
function during the first year, and is strictly exponential beyond a
stay of one year.
The proposed mixture distribution takes the following
form:
=
hex)
{af1 (x) + (1 - a) f 2 (x)
(1 - ex) f (x)
2
for f l (·)
defined on [0, 1] and f 2(·)
fa
We also assume 0
~
1
ex
;
°
<
x
< 1
x
>
defined on [0,
(3.4.1 )
1
~),
and
~ f 2 {x)d x = 1.
f (x) d 1
1
~
;
1.
The simplicity of the uniform distribution makes it quite
desirable as a mixture component. The probability density function
of the uniform distribution is:
f-L
f(x) = b-a
. 0
for a < x < b
el sewhere
(3.4.2)
68
Now, if in equation 3.4.1 we let f 1(x) be the probability density
function of the uniform distribution on [0, 1], and. f 2(x) be the
exponential distribution defined in equation 3.3.2 we obtain:
a + (1 - a)
=
h (x)
{
a-1 e - xl 8
;
(1 - a) 8- 1 e- X/8
We need to estimate both a,
o< X
< 1
X
> 1
(3.4.3)
a,
the proportion of mixture and
the parameter for the exponential component of h(x).
For this estimation problem, we have chosen the method of moments,
as the sample moments are easily calculated.
To describe the method
of moments, we take as many sample moments as are necesslry to sol ve
for the unknown parameters of the population distribution, and set
them equal to the population values.
To estimate a and
a, we
need two equations and thus we will utilize the first two moments of
h(x), namely the mean and variance.
v~
U [0, 1] and Z ~ exp [0, 00),
and X = aV
E(x)
=
Since
~
+
=
(1 - a) Z, then E(X)
~ + (1 - a)
= aE(V)
+
(1 - a) E(Z), so
a.
(3.4.4)
Simil ar1y,
v(X)
=
a2 = ~ + (1 - a) 28 2 - [~+ (1 - a) 8]2
(3.4.5)
3
Now, we substitute the known sample values
~ and 0 2 respectively.
~ and s2 for
69
From equation 3.4.4
(x - }~) /
a=
(1 - ex)
(3.4.6)
From equation 3.4.5, rearranging
s
2
= ex
3"
= 2(1
o = (1
-
-2
-
-
-2
- ex) 6
-
-
- 2
- [ex + (1 - ex) a]
"2"
+ (1 - ex) 26
- 2 -2
- (1 - ex)
e - ex-
-
(1 - ex)
a- + ex-
3"
-2
- ex
4"
- -2 - ex- (1 - ex)
- -6 + ex- - ex-2 - s 2
"3 4
- ex) (1 + ex) 6
We use the quadratic formula to solve the above equation for 0:
a = ex
(1 - ex) +
2 (1 -
&)
(1 +
a)
(3.4.7)
The estimates a,
equations
a we desire are the solutions to the simultaneous
3.4.6 and 3.4.7 above.
The above formulations have all been expressed in terms of x,
the length-of-stay in one year units.
For the estimation of certain
measures, it is desirable to use the unit of time equivalent to the
transition interval unit, i.e., three months.
Using a change of
variable technique, we obtain the following functional form for the
holding time distributions, where y is expressed in terms of transition
units:
h* (y)
o< y< 4
y
where
tjJ
= 4 e.
4
(3.4.8)
>
70
Table 3.3 gives a description of the theoretical holding time
distributions for the normal and disability retirees which resulted
from the estimation procedure described above. Only those distributions
for which nij > 20 were included in the procedure for estimation.
Most of the contribution
to the hi*J.(y) distribution in the 0-4
.
transition unit period resulted from the exponential distribution, that
is,
a
is less than
(1 -
a)
for each
(i, j) pair. indicating the
proportion of mixture in [0, 4] favored the exponential component.
Now, for some (i, j) pairs a distribution such as that described
in [3.4.8J does not accurately describe the observed length-of-stay
events.
We extended the mixture interval to include 0 to 8 time units
*
(0 - 2 years) and the hij(y)
h* (y)
:
{~+
takes the form:
(1 - a)
~-1 e-Y/~
o<y <8
(1 - a) ~-1 e-Y/~
y > 8
The hypothetical distributions which take this form are shown
in Table 3.4.
Figures 3.2 and 3.3 show two examples of the fit of our
mixture distributions to the empirical data.
Although direct comparison of the distributions within and
between the populations is possible. it is easier to do so indirectly
by comparing mean holding times and examining other parameters which
are computed from the holding time distributions.
mean holding time
T ij
for each
Table 3.5 gives the
(i. j) pair for which there was
sufficient data to model with. for the normal retirees.
In addition
~
the data-based estimate of the mean holding time
T
ij
• its standard
71
deviation, along with the corresponding number of transitions of that
type are given in the same table.
The closeness of the model-based
estimate and the data-based estimate can be attributed to the method
of moments estimation for the parameters of the mixtures.
the observed mean holding times are used to estimate
~
That is,
and a.
Table
3.6 presents the same information for the disability retirees.
One of the comparisons we make within and between cohorts is the
mean holding time in an OTG prior to transferring to sick.
By examining
A
Tables 3.5 and 3.6 we see that the
before sick)
all
i.
i9 (mean time spent in OTG i
is consistently greater for the normal retirees across
T
This can be attributed to the fact that the disability retirees'
visits are short term and interspersed with repeated illnesses, while
a healthy individual will have a greater opportunity for a longer stay
in one area.
This corroborates the finding in the preliminary analysis
that the disability retirees are experiencing several short stays in
certain areas rather than one long exposure to the same agent.
Comparing
A
the
i9 's relative to one another, note that the shortest mean holding
time prior to illness occurs in OTG's 10 and 4 for the normals and OTG's
T
8 and 4 for the disabilities.
This suggests that it takes less time in
these areas before the need to take sick leave arises.
Similarly, we
looked at the areas in which the longest time prior to sick resulted.
OTG 3 was common to both groups, along with very long stays in OTG 6.
One might consider these jobs to be associated with less hazardous agents
as it takes longer to initiate an illness.
72
3.5 First Passage Probabilities
First passage times in semi-Markov processes are measures of the
le~gth
of time it takes to reach a given state from another state for
the first time.
In our application, they can be useful tools for ex-
amining the length of time necessary to move from one OTG to another
OTG, e.g. 'sick',
An important feature of the first passage probability
is that the transition from one OTG to another need not be via a direct
path.
This section will outline the basic meaning of a first passage
probability, along with the results obtained when we generate the first
passage probabilities to and from certain OTG's.
Consider the definition of a first passage probability, which says
that fij(n)
is the probability that a first passage from i to j
will require n time units:
N
n
L L
f .. (n) =
lJ
r=l m=O
Pir hir (m) f rJ· (n - m) + Pij hiJ,(n),
[3.5.1]
dj
for i = 1, 2, ... N;
j
= 1, 2, ... N;
and n = 1,2, ...
If we examine [3.5.1] more closely, we see that the first term permits
a transition to be made indirectly from OTG i
intermediate state OTG r at time m,
OTG r to OTG
j
to OTG j
and then a first passage from
in the remaining (n - m) time units.
term accounts for a direct transfer from OTG i to OTG j
maining in
i for n time units.
through an
The second
after re-
As can be seen, this allows us to
ignore events which intervene and focus primarily upon the OTG of origin
and the destination OTG.
73
Figures 3.4 through 3.8 show some of the first passage probabilities
as a function of n t the transition time unitt for both the normal and
disability retirees.
The OTG's that are represented were chosen because
of both the interest in visits to sick and also the availability of data
to model the holding times from which the first passages derive.
Note
that these figures are plotted on a log scale to account for the difference in magnitude of the probabilities for between the normals and disabilities.
This large difference indicates that there is a greater likeli-
hood for the disability retirees to make a visit to sick at any point in
time, a result which is not too surprising when we consider that the probability of ever visiting 'sick' is
.2294 for the disability retirees and
.0979 for the normal retirees.
If we assume that the normal retirees are representative of typical
workers in the plant, and in addition we assume no association between
the work environment and sickness t then we would expect the five curves
for the normal retirees to be virtually identical t with visits to sick
being random events just as likely to occur from one work area as
another.
Detecting subtle differences might not be easy to do by inspection
of Figures 3.4 through 3.8.
To facilitate the comparisons t we have
calculated the truncated means of the fij{n)
for n taking on values through 80 time units.
80
e.. = L
lJ
n=l
n f .. (n)
lJ
distributions t eij
74
These calculations are presented in Table 3.7.
It turns out that it is not
the case for the normal retirees that the visits to sick are equally
as likely from all areas.
The truncated mean first passage times are
higher for those visits originating in OTG's 2, 3 and 7 which suggests
that it takes a longer period of exposure in these areas to produce
a visit to sick.
The functions representing the first passage probabilities from
these three areas to sick are greater than the other two areas, consistently with n.
This is reflected in the cumulative first passage
probabilities which are the probabilities that a first passage has
occurred
Bl
the nth time interval, but not necessarily at the nth
time interval.
Examining the graphs for the cumulative first passage
probabilities (Figures 3.9 and 3.10) we see that for the nonnals, a
first passage to sick is more likely by time n from OTG's 3 and 7.
(OTG 2 is even higher but was not represented in the graph.) As n
increases, the same relationship holds, with OTG's 5 and 10 (curing and
other) showing a much lower cumulative probability.
However, when we analyze the disability retirees, the functions
for the cumulative first passage probabilities indicate that the likelihood of a first passage to sick changes over different ranges of n.
The cumulative first passage probability for a passage from OTG 7 to
9 (sick) remains the highest throughout, while OTG 10 is consistently
the lowest.
at n
At approximately n
= 40,
= 60, f 59 and f 39 intersect.
f 39 and f 29 intersect, and
One possible interpretation
75
is that the nature of the potentially hazardous agents associated with
an OTG might be such that an effect as represented by a visit to sick
is more likely to occur acutely in the earlier years.
For other areas
the agent may seem safe in the early years, but manifest itself later
on and thus the crossover effect.
If we could focus more specifically
on the materials used on these jobs in conjunction with the reason for
sick leave, then we would be better equipped to interpret and speculate
on these probability distributions.
In this study, there was no distinct-
ion made between type of illness, e.g., reason for sick leave.
Because
we did not separate cause of sick leave, we may be masking a direct
relationship between exposure and illness over time such as a delayed
rash which will be caused after contact with a specific chemical.
In both the normal retirees and disability retirees, we are seeing
OTG 7 producing first passages that are somewhat different from the others.
This area represents maintenance and general service.
A possible explanation
for this difference is that is might be a haven for sick individuals once
they develop an illness.
They would be shifted to an area where there
are less hazardous exposures or the work is less strenuous, yet the
illness would have been caused by exposures elsewhere in the plant.
This is related to the issue of age confounding the associations
we might detect.
If, in fact, age a concomitant variable, it would not
be surprising to see those areas where average age is higher associated
with more sick leave.
The differences noted in the first passages as
well as the holding times could be attributed to this underlying relationship, rather than the hypothesized agents.
figure 3.1
~
GRAPHICAL REPRESENTATION Of METHOD Of GENERATING EXPECTED
LENGTH-Of-STAY DISTRIBUTION UNDER HYPOTHETICAL EXPONENTIAL
DISTRIBUTION
x
=
1
" e
"e
e
."
.&J
Proportions
.'S
Observed proportions
~
I
•2
•
• I
o
Q.
I
bz.
.3
...
or
"
.,
I-"
,
Length-of-stay in years
......
0\
e
e
e
77
l
~
,
t
L
:: 3J
~ i' "
· :1
~"
·
..........~ ..
.,
!
. ·, 'I·1
"
o
\;)
..
r-
......
~
..-..
-I
-oct
U
~
0-
:::E:
LLJ
C
z:
oct
C
>-
LLJ
tt-
~
LL.
LL.C:X:
en
z
o
:I:
t-
UJ
-I
----
en en
I
o
I
t-
z
=>
CO
~
t-
en
c
LLJ
:::E:
t-
t!l
z:
C
-I
o
:x:
N
-:~:.
::'/~.
. .......
.. . ..
.
M
OJ
~
:;:,
en
....LL.
: 1 "rtl·
..
, '. :!;'t:-~-Lt.-l
u';'f"
' ;
: J :-: : L.
~.
t
r ..
.l
. _."___
•
•
'
••.•
::,
_._ ••.•• _
~ ',:
f';"
~
. •_. •
4
••• _
•• __
":-r-
.
.•
f ~:J-~I ~'i~: ~~ .:.i:,:
~':L;J.J. .. _ L;' 1 , ,
~.1. ~..
~~
••
•
:~,. ~ ! :
l
.
~·~~H-:_:~:~·;~~::
•
.
_. ;
i
~.
..
10-.....
..
~. . . . . . . . . .~
.. ·· .. ·i
....
.
.
..
::~'.: t~:~t
~
LLJ
>-
78
. l;
y
>,
1\
>,
y
-V')
0
LLI
t-
0
......
...,.......
- -l
~
a::l
ce
V')
>,
Q
-- -l
ce
U
ex
a.
::E
LLI
Q
,.
•
z:
ce
Q
LLI
-l=
La.
V ')
z:
o
t-
:;)
a::l
ex
t-
V')
Q
-
LLI
::E
t-
e.!:'
z:
Q
--l
o
::r
..
I
79
Figure 3.4:
FIRST PASSAGE PROBABILITY
.
I
(n)
,~,,"~ )-.::~
~
I ;:. '.I' " " " ... :...
I .I.!
" I ' "'~.......I
'. : : .:.
.. ... . -.
'.' - '--1"" ···1 .. 1"
",J.'·.. · · · r ' · .. : · ·
:...... ~ .. ~:".: t :.. .:. ._.:. .....~-l· :::.-·t· --"f'.--.:.-\,,-~~ ;.:...-.~
-
. :- ~ ;. ~~:.
.•...
._.~. _.__:.
n
I
.oo,~~
'"
.'"
.
::':
.! ..
,
~.",...
-'!'
..,.
·.·~;....
.. i:'
.'
I
t···:
'I
i
I
I
'1
'
... _ ••..
I
·r....
I' l
·:-·I......
,....
.
.
l !"
'
.
I'--sr--;,i"1~-'--2D- 2$" "'fI'
.
.
'1
n
.
..•.
01
i
..
i
I
"15
.
.,
1 .
. ,
I
:
):-- r --[
iii'"
I"
.
fa"
•.••.•
,:Ef-:-.__...._.,"'".-
I' .'
I
I
.
~~
....•
.1 • . . . .
.-----." ~ '=-=:J~E~
r :-1 :I~u i·. -.1
. I
. .... :.... -. ~_ . - . :.u._-r-..
'.......
I ..
'"
L.
.
,
.
--I
:.
1
29
I :.~r..::.::::. :.::L..
":1:.' : -~"
. ~ .:':
. , ~i·'·~k-···.i
;---- ~,... ~
---" .-----.-::'"~..- -'r
o'n
.0000
f
.. ·1'·· . '1·. -... t
.
O'
••
I
i ,
I
~ .. '''sr
~(J
I
6.:r
I
'J'C
80
Fi gure 3.5
FIRST PASSAGE PROBABILITY
f 39 (n)
81
Fi gure 3.6
FIRST PASSAGE PROBABILITY
f
59 (n}
82
Figure 3.7
FIRST PASSAGE PROBABILITY
1\0R.'4.A I. HETtiU:ES
-..v
OrSAD 1 [•r•
L
"E1'I~ES
••
..."'\.,,,,,,,. I
I» • • • • • • • •
I,
R3
Figure 3.8
FIRST PASSAGE PROBABILITY
f10.9(n}
84
I I ' L
iL II ••f ILI jII
It
!
r
r
~
I
I
i·
VI
...J
<
~
o
z
--....
VI
LI.J
...J
co
<
co
~
0..
LI.J
~
VI
VI
<
0..
....
VI
0::
I L.
-....
LI.J
>
<
...J
=>
~
U
.,..
IL.
'-
e
e
e
Figure 3.10:
.,
.,.
,
"
I " I'
'I'
- I' ,
..
'I"
, ". -, . ':!T
": ;"-,-,-;'',--'
.
:
"'51---
: . . , ,I
,-"
·1 i,
1!,1
rII:
II,
I
:!: ii'
~
"~I::,: ·"I'ii··:;I,1
, " . " I:
,I I :', 1'1 I
: : '. I '
1 I ; I
'.I ,I .,
,. -1-1 .
j
'
I,
"i i I I!
-'-j r f· -I- ,-I"'l I
:
, .,,-, .. ,. " " " ' - 1
I
I'"
-I
,:
I
I
,
r1
'-1
,;
"
"I
I '
j '. ,-fI --lilT
+i'
I I
",,-'-'1i
,-
"
If'
I"
I II
-I'
.
--I
I
I
I
I-I-r,,', - -- - '
-",--ll,!,,':I'
-- I
1'-1 '
.!
! -I ,',-,
'---,
-,-,
'r
-,,'
Ii,'
;,-, :.!
, . - ---.
-- .
-~I-- -.-,-- "'
"~ . ,"'.".,
,-'-" '.': I-I'. :
- --1-I -- '-- ....
--- '-----'
- - -- -
.
~'- ~
,.,'
•.
t'
I
I
I
i
- , -- -
,- ..
-
r'
.
I
I
.
--
. -
(-1- In - -"
-- .- - .
·::·.-'-IJ~,·~·--*ll·-j--f
I -. -1FF·I- - -F·-- . '+l-li"j:
----,-r- "-,-- -.-.
I
-,. :•.,"
f
t
I
;_.
-
I
-
,-.
•
.
I
-
h
-
':,""'-:-','i 'l· I -, ..
• '
",,'"
I';";:-:;;';'Ilrl-'+I'i:,'!";"
-
-
•
-
-
.,
,
----
'1'-
-.,
'. -
.
- ..
, ,
r
.
.
- ,-
,-
-
'00-'+"
; ,·f·.·! I·---r--ro-.
--E~- .
--
-;.:
r-i -~"~l-,-,-r-'-B
·~·,d"+
c:.
,~
'1.
1'.'
..
·1·:: T
-r!"
,I . •·1!· !
o-"~
,,- '.
.-·",,·+I-I··j·::--.I'·
. •• ,-"
'tr..
, '"jo't.----n'
J:.r -~ 1;....-....:-
I
'--r'
.... ~
-I-'
-~rr"'·ll·t
;- :"
I.I
.
--
'j ..
-'J' -
••
.- .
-
.
-:-'!"
-I·..· .
':..1.
... --'..,.
I ;-;
••-.
••.-. ''1,.1.
1'~•. '.-
11
10
I,
.•.
I
r.
''-I:-~''
·l1/r-,;,1=I..:((.1:
.., 'I . -1' •
11:1
f1
I (
'/'! (. ·11
-.... "". II' .·(-1 "'( ,.
' -"'1/.'.
-r--" /~:
. I'.
•
- - t"~-..
Ir'"
...... r
-
I.
i
jI-I'
4
1-1-..
.. I .... ,~
-
I
*-
~
!
.
-
fA'
...
-
...
t
.- --·)r-t;/-'Rfl·,··it
.- --~ - =
r~ ;;, .' . -.., . I
, ""
-,
--_.
-
-I-I . _
--- _ .
-
c.l..lL..
• .
-_.-
of. - - -
T
10
'"
. _..
-
.
-. i '.'1 II ~1 I -' .' , - .. -
. '. 1· ,I. ...-! 1 tI' ,o""Ii'-'
' .. ~ Vi~fl
Il :~Ir!'~'
j-•• -, •
- . ... ---- .-
.
~.
-
-
.
r
i . 1'\ .. , I
,
. : .. 1 1
' -
.
i-'r
/-"i+.!.-:.J
'~I-'·I-,·'-t,.~
:-:-i-ri-f·,-t-r
639
+'h-!-ITIT
,
rrrr
l".-~'!.':,-l'l
'.'r".-. . rr:-.
rr-
659
rt
:;i·:·I'I·"I·I·
- I : i', ;'
629
,rl-;-rf-rr,
I'-oo~'[
t·
- 'r
i- :+r
'.1_
-
1
\)10
9
.
1'1- ,
"-"-'--'-r-[-t
.-:T'-'T,-,-1--r-riI""rr
J I.
" r-
'* :.1- 1.,,-,
I
,
-
--,-
T. "
-r--t-'!-.-
.
..
I-i 1
,:I.I:':J'I
.. " ! .' "~I ,",.
I i ,
iii'
-..
~...l
-:f ,. .
-
I · .. ·
- -
-"
- " . r-i--+:i1-'-''1''-' r'---:-/"
, ..T
i-r
r:n+IT
-
-
-
1- ,-
-
T
-...,-,
•
-
- ' ..
_.•
-
-
-
__
..
-
;>0
-
--
.
.-
---,- ' -.. .
-,-
- - -
--::--..
?O
-
-
..
.
-
.., - -
. -
-.
--.
.- -
-
-
--.-
fO
.
.
-
.:~.,-~SD
-
-_..
-
-
T"
-.-,-
't
mrft
-
~
'-r'r"j-r',",'-j-":'
or'
._.
--T,-,-H'
rrr-r·t-o
r-rr T·-t-
•
,- frr
I'
rn
.
-H++·
, ~t*tt~
.
-
-,-
.-
~ff-ll'
-r,I:~~-frr-,
-'~ '-=r:EB"F:[I-:+~R:t1H+
'T
..J~r l-rn'r--rT,
..
~
-
tr'f@-~"!
.
T 'r:-Il' ~-:' 'ri'
rF
-.'
.!....
r~··~ffi:-·ff'
., .. I· . T I- - r - -
.
~
-1-
-
- -, -- , ''r!
"
-
,•• -
r... H~r-~,-,~hilr·--+~
_.
- .•.
-
-
IT
-
-
I
..,
r'
-.-
-
-
'l~T-'T-
r-T-'
.
-- -.. - - - •- ,
,
-
.
.-
1"'..
..
-
...
• . . -,
-/~:/.:~-, .. -:_:-:.
'- -'-" ':". -.
-. -
:: .. ·:.• /r r ; 1""1r~" /~
., ,.. I ,1". ..
/. '''~~'
J--. J-:rr-..f.t.:
r t •. t . .
ua- -
-
;If
"'j;
.:-1".
•
-----'.
-
:-- _.
~-"
-~- ""
-.~
3 .."., i': r:·.··..,~"
'··-·~
h .- -, ?,I,,, ..
' - . " ' I
r'"..... . -. . - .., , - .
1 .•,; I -.I''-I ''~'
';,0+ . ._.
! -. ';"1 r·"'/"'''I
.=-"<
1
•
,."
f . . ' ~.~"
71
;
••, . '
." ,
."
..
-.:/=ffi'',"'1'. -..,
~rffij-j±
-.--d-··-ll---.....
t-tEJ.:tl·'-~·i-r ;
II ----, t..:~
r-:----ti:i,.-- r.
rl'l- -,-r-'~ "'I,~-.-~.F-fttltf-~{~f-'-:,1 't--~-~. -- . - -- - - . --- - - I~:j:'+r/'!-T--I'
~EF'-I ,.,-,.!.1-". r
ff:r-- .~+~--t-/.
-'A/-
.•., ,-, '1'" ,'.-,
-
j, .. - -
•••••
.
~-I"··,.~·-
-
,-- ....- --
T
..
'r- - . - .,
~.
-"" 'T": !.!--,
-:'
, !.: ••- • I). '.~·i~' -i"j..,.• ,.: ,-,- iT', -r:-I
,
!..J..I··
I .. "
.... --,..j
'-'·,-TI
........,..'- -1- .! - t·
'-,(
rr-I--'
-. '1'-.-
...""'"
•
·.••·-··L.,r-I·i~~"·- - --r--1=TII'~-~-r--. ~~:~~
l-l:t.d~1
.;~: I 'f-. ~
yr, ~i
I .- •. ,. -I' .-.-,- -r
'-'-'-i"I-r"~-"d'I'r-~
-
• . , - ........,,-
.
-I-
T
- - - -- - - _.. . •••••
-j-.--
~~-.. - ~- if,-qt-Tf:~'tll-l~.:.
~l.
-'+rtf
- . -.'
'-1·- .../.........
..
. -
"
,"
-- .
----..
.
'
'
[1' 1--- -',., .. - ....-' ["
,I
·-:·r·I-:-!T: r'TH+, .. -.IT"T~' -,-, .. -,--
. -
-
---j--- .. - . . . . . -
"I
I,
><
f"ISABl LlTV vET1::i:tS
-.-" .- .-.!-~.-----t, ---"-,~I'i-'
t--1-t~J-I·tl
[. or
-1-1. -I'''
- .•~_
++-1 1++-.-- -- !~ - !-------- ... n_ -- .-!.'_.
--11---- - -!--.
- .. ~_.
J
-·
"r""TI-l--",-,r-:1\'
l
---,'
;
---1·'-'·
.-._-H-T
-r·t
..
~.. -;~ .. -.ct
.. ,I. r! " ,I-I,--I-! "-'1 ,-'.. -.-- -....
-- ---'" - .. -... ~.~
1 q.:r;.~.;~
,
:, . . ; ' i "1
' I ' " ---.
,
,,' ! "
I,
I - - -. , -' - . _...
,
. - " -f-.:
f · . - .. -or: '1-'-': ,.- "r-r-l--i-I-·'~ -T--t- 1 ._-.-- ,.-~.-¥fIt ;", o.Ioo:.. f::l,·,··. -.
-- -- .. - ...
--,': "~r'-I -I-rn"t--r-- -~f- ,- --.- -.- .-:.. ---- - --.,.. '~i'=''':-''
-/-,.
··• .. -,..··-,-h·,,-r-.-I·-t-tr- +- -1- -i---"-\.-~..
'·-:...1""1'
" .-.,---.,--' h-: rl~T .,.,+!- TtT
-- i '1- ---';,.:~;'-·r -t-- ·~_l: "'; /'11'
..... -IT
I
~
II
!;
:,1 i,ll-II
."-'-"
. 'I!
t I II ' I
,:
'I "1-'1 I ' I - I -, I - I
," , ,;'.. I:I"\.,-'','.-n
i-i
: 'I'
.
.
CUMULATIVE FIRST PASSAGE PROBABILITIES - NORMALS
e50
. 'II"
-,-r",
,
-i-r 1
-
~
ex:>
c..n
TABLE 3.1
TRANSITION MATRIX (SEMI-MARKOV)* FOR
~
NORMAL RETIREES IIITH 3-1lO1mI INTERVALS.
OTG.
J
OTG.
1
~
2
3
4
5
6
7
8
9
10
1 \ .0000
.3176
.0000
.0000
.0000
.0000
.2471
.0588
.1176
.2588
I .1331
.0000
.0887
.0323
.0444
.0363
.1573
.0081
.1492
.3508
.0040
.1036
.0000
.0359
.0797
.0677
.1195
.0000
.1594
.4303
4 1I .0128
I
.1026
.1410
.0000
.4103
.0256
.0256
.0000
.0385
.2436
II .0049
.0837
.0985
.1330
.0000
.0887
.1970
.0000
.0936
.3005
6 1 .0000
.0458
.1176
.0261
.1373
.0000
.1699
.0196
.0915
.3922
.0444
.0867
.0504
.0081
.0605
.0585
.0000
.O~02
.1996
.4718
8' .1042
I
.0625
.0208
.0000
.0000
.1250
.1667
.0000
.1667
.3542
91
.0421
.1449
.1589
.0234
.0701
.0701
.4252
.0374
.JOOO
.0280
i .0362
.1491
.1809
.0289
.1042
.0984
.3647
.0274
.0101
.0000
.1095
.1057
.0318
.0830
.0487
.2102
*Probability of transferring from OTG to OTG , where i ; j.
i
j
.0194
.0979
.2536
,
2
31
5
I
I
7 ;
I
.~~~~~~~
I
10
I
I
TOTAL
.0401
())
0'\
e
e
e
-
e
e
TABLE 3.2
~G'
OTG
':
i
TRANSITION MATRIX (SEMI-MARKOV)* FOR
DISABILITY RETIREES WITH 3-MONTH INTERVALS.
1
,I
1
3
4
5
6
7
8
9
10
.0000
.3000
.0250
.0250
.0250
.0250
.0750
.0250
.3500
.1500
2
i
.0909
.0000
.1515
.0404
.0707
.0303
.1010
.0000
.2727
.2424
3
!
.0000
.1475
.0000
.0246
.0164
.0492
.0984
.0000
.3770
.2869
4
' .0000
.1163
.0698
.0000
.3023
.0698
.0930
.0000
.3256
.0233
5
!
.\)200
.0400
.0700
.1600
.0000
.0500
.0800
.0100
.3000
.2700
5
j
.0222
.0000
.0667
.0889
.2000
.0000
.2000
.0000
.2000
.2222
7 '.0736
.0613
.0736
.0184
.0552
.0368
.0000
.0184
.3620
.3006
8
I .0476
.0476
.0952
.0000
.0000
.0000
.1905
.0000
.3333
.2857
9
I
.0732
.1057
.1545
.0894
.1463
.0488
.2846
.0407
.0000
.0569
.0284
.1080
.2500
.0000
.1761
.0625
.2841
.0455
.0455
.0000
.0418
.0878
.1147
.0450
.0964
.0439
.1447
.0193
.2294
.1768
10
TOTAL
2
*Probabi1ity of transferring from OTG. to OTG. where i t j.
~
J
....,
OJ
88
TABLE 3.3
HOLDING TIME DISTRIBUTIONS OF THE FORM
* (y) {1/4 ex +
h..
lJ
State State
j
i
1
2
1
7
10
1
7
9
10
2
5
7
9
10
1
2
2
2
2
3
3
3
3
3
4
5
5
5
5
• 7
7
7
7
7
7
5
3
4
9
10
1
2
3
6
9
10
n
-
IjJ
NORMAL
27 .21
21 .00
22 .10
33 .33
39 .17
37 .12
87 .08
26 . 18
20 .00
30 .29
20.60
17.56
14.80
42.40
10.08
44.52
13.72
14.68
8.64
26.64
goodnessgoodnessmixof-fit
ture ... of-fit
2
2
ex IjJ
p-value
p-va lue n..
X
X
lJ
DISABILITy
3.89
2.69
0.08
1.40
6.09
1. 37
6.65
2.27
1.23
1.43
.143
.111
.878
.497
.108
.850
.248
.152
.270
.490
**
108
32
20
27
.18
.48
.20
.31
17.92
18.00
19.24
25.24
11.~2
1.32
0.45
3.44
.126
.251
.503
.180
**
61
22
43
25
29
.13
.13
.30
.25
.29
15.04
18.84
13.28
14.52
20.92
6.21
0.72
3.97
0.25
2.92
.184
.397
. 138
.618
.233
**
234
.31
y ~ 4
(1 _ a) ljJ-l e-y/1jJ
mixture
nij
(l - ex) IjJ -1 e-y/1jJ ; 0 < y < 4
12.40
4.12
** Insufficient data to construct a model
.766
**
**
**
**
**
27
24
. 19 28.48 0.67
.00 13.16 1.06
.414
.304
**
**
**
46
35
.00 42.84 7.32
.12 14.20 1.90
.198
.387
**
**
**
30 .30 31.00 0.77
27 .00 7.48 1.89
.943
.170
**
**
**
**
59
49
.22 30.88 5.58
.27 13.04 1.12
.233
.873
89
TABLE 3. 3
HOLDING TIME
-,_{
*
J
9
9
9
10
10
10
10
10
10
2
3
7
1
3
4
5
6
7
mixV,re
n..
lJ
31
34
91
25
n
,~
1}J
'1.
goodnessof -fit
X2
p-va1up.
72
68
n..
y ~ 4
mixture
ex
~
1}J
goodnessof-fit
X2
p-va1ue
NORMAL-----'---~..!..L- DISABILITY
.96
0
.33 1.32
1.08
0
.44 14.56
.49
0.48
.63
0.24
0.02 >.99
2.06
.152
**
20
or THE FORM
) ,-1 -v/'!!
+ (1 - ( '."
e
-1 .
(l-rl)ljl
e Y!'
1/4
(y)
hi·
State State
;
j
OISTRIBUT10N~
.30 13.76
.31 15.68
.27 13.92
**
**
**
**
44
0.42
6.34
3.07
.517
. 176
.547
**
** Insufficient data to construct a mqde1
.17
9.92
0.66
.819
.10
.852
50 .35 11. 08 1.35
.510
**
31
. 16
7.24
**
90
TJ\AlE 3. 4
HOLDING TIME DISTRIBUTIOW, 01 THr rORM
h;j * (Y)
State
;
State
j
. {1/4 ( + (1 -.') ,- 1
-1
JI
(1 - 't) 'I'
(,' '
mixture
nij
ii
i1J
fl -
Yh
goodnessof-fit
2
; 0 ..' y . 4
y .:
X
p-va1ue
5
7
40
.29
20.36
5.04
.169
6
7
26
.33
14.16
0.68
.410
6
10
60
.40
14.28
4.11
.250
7
5
30
.33
13.24
0.97
.325
10
3
125
.30
13.36
7.04
.318
1\
91
TABLE 3.5
NORMAL RETIREES: MODEL BASED AND DATA BASED
MEAN HOLDING TIMES IN TRANSITION UNITS
State
i
1
1
1
1
1
2
2
2
2
2
2
2
2
2
3
3
3
3
3
3
3
3
4
4
4
4
4
4
State Model-based
j
T..
-
2
7
8
9
10
1
3
4
5
6
7
8
9
10
1
2
4
5
6
7
9
10
1
2
3
5
6
7
Data-based
A
lJ
Tij
16.69
17.56
16.72
17.56
7.08
49.6
13.52
29.08
11.36
14.00
16.40
2.05
14.52
11.48
39.44
12.80
0.50
12.40
55.24
8.64
5.20
19.52
50.44
15.08
.50
2.88
13.28
10.64
4.00
27.48
13.52
29.07
8.71
39.41
12.78
13.87
8.64
19.49
15.05
10.32
St. dey. of data
based estimate
S.D. (To.)
A
lJ
21.60
20.20
6.84
36.32
14.80
49.48
17.12
21.20
28.72
2.28
17.64
14.12
45.20
13.72
15.16
47.44
11.76
5.12
29.72
46.28
18.48
2.88
15.12
25.32
4.96
33.96
Transitions
n..
lJ
27
21
5
10
22
33
22
8
11
9
39
2
37
87
1
26
9
20
17
30
5
108
1
8
11
32
2
2
92
TABlf
NOR~1AI
RLT! RE'ES:
r~ODE
3.~
(continued)
L BASED AND Dfd r\
1~;",S[O
MLAN HOLDING TIMES IN TRANSITION UNITS
State
i
_.
.__.
4
4
State Model-based
j'"
T..
.
9
10
5
5
1
5
3
5
4
6
5
5
5
.. _.'1...__.
2
15.79
18.03
7
9
15.61
5
10
13.34
6
2
. 6
3
6
4
6
5
6
7
8
6
6
10.81
9
6
10
7
7
7
7
1
2
3
10.17
16.65
9.90
11.39
4
5
7
7
6
7
8
7
9
7
8
8
8
10
1
2
3
10.19
15.43
9.18
Data-bdsed
. _.._..
"
·T..
St. dp.v. of data
bdsed e5timate
S.D. ("r.. )
..1) ... _.... _ .. . .. _ _ .1.1.
26.52
6.72
7.48
22.52
15.80
18.04
18.52
15.64
46.80
13.36
9.92
6.04
6.72
7.56
10.80
9.52
37.24
10.16
16.68
9.92
11.40
1. 76
10.20
15.44
10.12
41.44
9.20
18.32
38.16
24.52
27.64
9. 16
._...
Transitions
n..
•. ".__ ... _,,,) ....
3
19
1
31.72
20.32
28.80
36.24
22.32
46.28
15. 16
9.24
9.88
4.56
11 .12
15.84
8.20
49.04
17.72
19.20
15.04
15.68
1.88
14.80
23.68
9.96
37.04
14.16
22.20
40.04
17
20
27
18
40
19
61
7
18
4
21
26
3
14
60
22
43
25
4
30
29
10
99
234
5
3
1
_
93
TMU[ 3. S '(continued)
NvF:~1i\L
r':L r ~ ;~[f S:
t~,":'\N
Mor)[1 [l"SLO /\Ill) !J: r·, W\SLll
HOLDING TinES 1;,,1 TRANSITION IJNITS
St.
Stclte
State
i
j
_ _ _ _,,,
r~ode1··hased
1'
.• _
ij
... _ ...... _._0- ••._
••
[lata-based
:~ii
.... __• • . .• _ _'L.
d~v. of dat~
:ldSf~d e<;timate
SD.
.. . .......
(~i")
.lJ..:
,,_ .. _
Trt'.!n::;itions
n
ij
.•._..._.
8
6
5.32
7.24
6
8
7
33.76
31.64
8
8
8
9
35.76
58.88
10
12.20
0.72
0.96
10.40
0.20
0.68
1.04
1.32
1. 12
1.12
1.32
1. 32
1.24
1.36
8
17
9
31
34
9
1
2
9
3
9
4
9
5
9
6
9
7
9
8
9
10
10
1
2
3
9
10
10
10
10
0.96
1. 54
1.08
9.03
10.55
5
10
10
6
10.70
10
8
10
9
7
1.68
9.04
0.36
2.40
19.28
6
25
13.24
20.32
103
10.56
10.24
11.44
14.56
15.52
17.68
125
20
72
10.72
10.16
15.28
15.88
68
252
18.04
17.52
24.16
23.08
19
7
.
10.23
11.43
4
1.08
0.64
5
15
15
91
8
_
94
TABLE 3.6
DISABILITY RETIREES: MODEL BASED AND DATA BASED
MEAN HOLDING TIMES IN TRANSITION UNITS
State
i
State
1
1
1
1
1
1
2
1
1
1
2
2
2
2
2
2
2
2
j
9
10
1
3
4
5
6
3
3
5
3
6
3
7
3
9
10
2
3
4
4
4
4
4
-lJ
T ..
3
4
5
6
7
8
7
9
10
2
4
3
Model-based Data-based
3
5
6
7
23.44
13.16
42.84
12.74
"
Tij
3.5
.5
.5
2.5
.5
22.85
.5
22.86
13.67
22.05
7.63
17.75
10.93
7.83
10.80
23.39
13.08
12.28
16.50
3.50
7.83
11.25
42.98
12.76
.50
4.83
2.35
3.83
5.25
St. dey. of data
based estimate
"S.D. (Tij)
4.18
27.79
33.72
11.89
21.98
7.62
22.92 .
14.94
11.84
19.38
27.47
15.54
15.97
25.16
1.41
11 .10
16.65
56.54
16.26
1.00
5.86
3.60
5.12
8.32
Transitions
n·lJ.
12
1
1
1
1
3
1
14
6
9
15
4
7
3
10
27
24
18
3
2
6
12
46
35
5
3
13
3
4
96
TABLE 3.6 (continued
DISABILITY RETIREES: MODEL BASED AND DATA BASED
MEAN HOLDING TIMES IN TRANSITION UNITS
State
State
1
j
8
1
2
3
7
8
9
8
10
1
2
3
4
8
8
8"
9
9
9
9
9
9
9
9
9
10
10
10
10
10
10
10
10
Model-based Data-based
-T
5
6
7
8
10
1
2
3
5
6
7
8
9
"-
"-
ij
8.57
6.40
7.90
St. dev. of data
based estimate
T ••
lJ
16.50
.50
4.00
10.50
11 .79
8.50
0.83
1.27
1.18
0.86
0.77
1.33
1.96
0.70
2.07
4.30
4.92
9.66
6.24
2.86
7.72
12.00
19.13
S.D ..~)
4.96
10.55
16.08
13.73
1.51
1.88
2.30
1.31
1.81
1.22
4.75
0.76
2.82
4.62
5.17
14.97
6.99
2.55
14.49
14.53
18.00
Transitions
n·lJ.
1
1
2
4
7
6
9
13
19
11
18
6
35
I)
7
5
19
44
31
11
50
8
8
97
TABLE 3.7
FIRST PASSAGE PROBABILITIES TRUNCATED MEANS
(IN TRANSITION UNITS)
80
L
n=l
*
f
rj (n) . n
NORMAL
DISABILITY
f
29
3.58
7.43
f
39
0.62
11.90
f 59
0.25
7.61
f 79
0.59
9.54
f 1O ,9
0.22
8.35
Pr [visiting sick] = .2294
.0979
*OTG 2 = milling, calendering, extrusion
OTG 3 = batch preparation
OTG 5 = curi ng
OTG 7 = maintenance, power plant, general service
OTG 9 = sick
OTG 10= unknown, other and synthetic
CHAPTER IV
APPLICATION OF THE MODEL TO A CASE-CONTROL
STUDY OF LEUKEMIA
4.1
Introduction
The methodologies employed in Chapters II and III to characterize
the retiree groups will be applied in this chapter to describe and
explain differences between two populations whose source is yet another
health effect:
twofold.
leukemia.
The purpose of examining these groups is
First, we will be able to study a disease entity which is
specific and for which the health outcome if fatal. unlike the disability
retiree group.
Secondly, the study design differs from the cohort
approach used with the retirees, in that our techniques will be applied
in a case-control setting.
This gives us the opportunity to assess the
effectiveness of the techniques set forth in the previous chapters
when both the health event and the design strategy differ.
The sub-
sections in this chapter will be brief. including only the results,
as the methodology has been thoroughly described in the previous chapters.
4.2 Background and Data Source
4.2.1
The Population
The 27 cases of leukemia and their four matched controls derived from
the same rubber company in Akron, Ohio. A case is defined as any active
or retired worker who died between 1964 and 1973 with leukemia appearing
anywhere on the death certificate.
All controls are members of the working and
99
retired populations from which the cases derive.
Workers were not
eligible to be controls if they died of any cancer or blood disorder
such as aplastic anemia.
The controls also had to attain at least the
same age as the corresponding case.
Controls were matched on several
factors in order to insure that the controls be as similar as possible
to the cases, and any differences with respect to their work experience
will not be attributable to these factors.
age, race, sex, plant and date of birth.
The factors matched on were
Two of the four controls were
additionally matched on date-of-hire.
4.2.2 The Exposure Information (state space)
There have been several case reports of leukemia which suggest
that exposure to benzene may lead to an increased risk of developing
leukemia (34, 35, 37, 39).
Benzene is also known to be capable of causing bone marrow aplasia
and chromosomal damage and should thus be considered a potential
leukemogen {34}.
The difficulty in assessing the health effects in an
occupational setting has arisen from the lack of cell type differentiation
and absence of detailed exposure information.
Studies conducted by the
Occupational Health Studies Group have attempted to describe the environmental exposures to solvents in the workplace by grouping the
occupational titles into occupational title groups (OTGls) representing
several levels of potential solvent exposure as evaluated by several
staff industrial hygienists (38, 39).
The findings in the above studies are consistent with occupational
leukemogenesis, however no specific agent has been identified nor were
any direct measurements of solvent levels made.
In order to focus more
specifically on solvent exposure in the rubber industry, Arp
et~.
(40)
100
conducted a study of retrospective solvent use based on company technical
and commerical records.
Pertinent documentation spanning a fifty year
period was used to assign potential benzene and other solvent exposure
codes to the jobs held by the leukemia cases and controls.
The state
space for our analysis will be the codes which resulted from Arp's work.
The codes and their definitions are given in Appendix IV.l, but to
briefly summarize, exposure was assessed in terms of severity (contact
or area exposure) and type (benzene or other solvents). Additional
codes were created for instances when the exposure was not well documented,
there was no known solvent use or the exposure was of an unknown nature.
Note that there is no attempt to quantify the exposure information, but
rather a qualitative assessment is the essence of the coding scheme.
The work experience of each case/control was coded to form a
continuous solvent history with all periods of time between entry into
the plant and eventual exit, retirement or death accounted for.
The
criteria by which these codes were assigned are given in Appendix IV.2.
These solvent histories were used to generate the paths for the
work~rs
to be used in this analysis.
4.3 Appropriateness of a Markov Chain
In this section we will proceed with the necessary steps in chain
construction as outlined in Sections 2.4 through 2.7.
Briefly, we will
examine the time interval, the geometricity of the holding times and
proceed with testing other necessary assumptions should the preliminary
ones hold.
101
4.3.1
Time Considerations
For the reasons discussed in 2.4, a discrete time interval for
transition will be chosen instead of a continuous time parameter.
We
will only examine the three month interval for transition, as this seemed
the most practical when one considers the length of exposure time necessary
to procude a significant result.
However, we must pay attention to the
issue of missing some exposure periods that are shorter than three months.
Table 4.1 gives the results of the sensitivity analysis which checks
for the number of transitions which are less than the basic time unit
and are to a state different from the previous and subsequent states.
The choice of this three month interval is quite satisfactory for both
the cases and controls with the greatest ratio of misses occurring among
the cases for changes to code 2 (.0204).
All the remaining states
yielded a ratio of misses far below 2% and on the average among the
total populations the cases and controls showed
.0096 and
.0043,
respecti vely.
4.3.2 Estimation of the Discrete Markov Chain Transition Matrices
Using the maximum likelihood estimates for Pij'
the transition
matrices for a simple Markov chain with time unit equal to three months
were generated for the cases and controls.
Tables 4.2 and 4.3.
in the matrices:
with
The results are given in
The construction of the state space is reflected
code 10 (death) is obviously an absorbing state
"
P10, 10 = 1.0; transitions from code 8 (retirement) can only
occur to state 10 or self-loop back to code 8.
The most obvious
difference between the cases and controls lie in the tendency to remain
102
in area benzene and other judgement solvent areas.
The cases apparently
stay in secondary benzene with a much greater probability than the controls,
with the reverse true for other solvents.
If benzene is indeed associated
with the etiology of leukemia, then these two results are not surprising.
4.3.3 Geometricity of the Holding Times
We next test whether or not the 1engths-of-stay in the states
follow a geometric distribution as given in (2.6.1) by using the
"
p..
's
lJ
from Tables 4.2 and 4.3.
goodness-of-fit test.
cases and controls.
We test the fit with the chi-square
Table 4.4 presents the results for both the
Some of the states do seem to follow a geometric
distribution, however, the entire state space does not t so we conclude
that the holding times are not geometric.
The failure of the data to satisfy this assumption suggests that
a simple chain is not appropriate and precludes any further testing
of other properties.
The next section will contain the development
of a semi-Markov model.
4.4 Developing and Constructing the semi-Markov Process
This section will be concerned with sMP construction and modeling.
The state space will remain the same as that used in Section 4.3
and the transition interval will be three months.
4.4.1
Estimation of the sMP Transition Matrices
Tables 4.5 and 4.6 present the transition matrices for the
leukemia cases and controls.
Comparing the summaries at the bottom
of each table, we note that there are no substantial differences between the
cases and controls with respect to the probability of visits to
103
the various states.
We are concerned here with the proportion of visits
to a particular exposure, not the proportion of individuals visiting
that state.
We do not see an apparent association between leukemia and solvent
exposure based on the transition matrices.
Should there be an association,
we would expect to find differences in the distribution of time spent
in the exposures;
4.4.2 The Holding Time Distributions
We use the technique described in Section 3.4 to estimate the
holding time distributions which take the general form of equation
(3.4.8).
Because of the low number of transitions to some of the states,
we felt there was not sufficient data to fit holding time distributions
which depend on both
i,
the last state visited, and
to which a transition is being made.
the state
j,
Therefore, for each population,
the nine holding time distributions dependent only on the last state visited
were estimated and fit to the data.
(Recall that code 10 is an absorbing
state, death, and therefore a length-of-stay distribution is meaningless.)
Table 4.7 presents the theoretical holding time distributions for
both populations.
The parameters for each state are given.
The holding
time for the cases in state 7 takes a slightly different form, as the
interval of mixture is [0, 2] time units rather than [0, 4]. This form
is given by:
o< y<
y > 2 •
2
1M
In the control group, fitting an exponential distribution to code 8
was difficult and the modeling effort resulted in the simple stepfunction given in Table 4.7.
To make comparisons between the two groups, we look at the mean
holding times generated by both the model and the empirical data as shown
in Table 4.8.
The data- and model-based estimates correspond well, basically
as a result of the method of moments technique for estimation.
Perhaps the most
striking differences between the cases and controls are the mean holding
times in both of the benzene exposure codes.
For code 1, the cases'
visits are on the average 8.8 time units (2.2 years) longer than the
controls' visits.
The case visits code 2 an average of 5.11 time
units (1.28 years) longer than the control.
Another major difference
in the holding time distributions is the length of time spent in
retiren~nt
before death.
The cases spend approximately 2.5 years less
in retirement than do the controls.
4.4.3 Using First Passage Probabilities to Describe the Distribution
of Latency Periods
We have defined the first passage as a measure of the time necessary
to reach a given state from another specified state for the first time.
In this setting it appears that the first passage probability would be
a useful measure of the latency period, as latency period for cancer
is traditionally defined as the time between first exposure and diagnosis
of the malignancy, although there are a variety of other models,
42, 43).
(41,
Identification of a latent period is quite useful in occupational
epidemiology studies because the relevant period of exposure is then
defined, which enables us to look for changes in chemical usage or
105
production processes in that era.
In the practical sense, occupational
data need only be collected for the relevant calendar years, because
subsequent exposures should not affect the already induced malignancy.
Since date of diagnosis was not available for all the cases, we
use date of death to estimate the latent period.
This is accomplished
by calculating the first passage from the hypothesized exposure to
death (code 10).
The computational formula is:
N
n-1
= L
f i ,10 (n)
L Pir hir (m) f r ,10 (n-m)
rt10
+ Pi,10 hi ,10{n)
r~
Figures 4.1 through 4.4. show the first passage probabilities for
exposure codes 1 through 4 respectively, as a function of time.
The question is, how to determine the meaningful latent period
from these graphs, particularly when the distributions are so skewed
to the right.
We will discuss three measures for use as descriptors
of the latent period:
mode (as defined by the peak first passage
probability), median and the mean.
The median is defined as the point
n such that the cumulative distribution of first passages to death
is 0.50.
That is, according to the model, 50% of the latent periods for that
exposure would be less than n in length and 50% would be greater than
n.
The mean first passage probability 0 ij
the simultaneous system of equations:
0..
lJ
= T.1
+
I
rfj
Pi r 0 rJ'
is obtained by solving
106
Table 4.9 exhibits these three measures for each exposure category.
In every instance, the mode is less than the median which is less than
the mean.
This occurs because of the tail on the density
function of first passage probabilities, which is a result of a few
individuals with very long latent periods, thus increasing the mean.
4.4.4
First Passage Probabilities for Aggregates of States
In this section we derive a method for obtaining the first
passage probabilities for combined states.
For example we might
desire estimates of these probabilities for entry into either of the
benzene codes.
We are able to do this without re-estimating the
~
matrix and the holding time distributions because of the relationships
among the conditional probabilities.
Suppose we are interested in combining codes 1 and 2 as suggested
above to form the aggregate state of benzene exposure at any level of
severity.
We would then like to find the first passage probability
to death from either code 1 or code 2.
Let A = [death occurs at the nth
transition after time
= [X n = code 10, Xm f code 10,
B = [X tO = code 1]
C = [X tO = code 2]
where
to
< m < n - 1]
is the time at which a person enters the exposure code for
the first time.
We can then express the first passage probability to death from
a single exposure code as
fl,lO (n)
= Pr
[A
I
B].
to]
"
107
Si mi 1ar1y,
f 2,10 (n)
= Pr [A Ic].
We are interested in deriving Pr [A IBORC].
That is, the first
passage probability to death from either code 1 or code 2.
~(AnB)
.
Pr [A
I BUC] = Pr [B]
U (AnClJ
+ Pr [C] - '--Pr-[B-nC-]
Since an individual cannot enter both code 1 and code 2 at time to'
Pr [Bne]
= O. We then have by the definition of conditional
pr~babi1ity
fr-[(AnB) U (AnC)} =
Pr [B] + Pr [C]
or
Pr [8] f 1 ,10 (n) + Pr [C] f 2 ,10 (n)
Pr [B] + Pr [C]
which is merely a weighted average of the first passage probabilities.
Recall from the definitions of Band C,
= Pr [entering code 1 at time to]
P [C] = Pr [entering code 2 at time to]'
P [B]
To obtain values for these probabilities, we will use estimates that
are external to the model:
Pr
(B)
Pr (C)
= Pr (code 1) = the number of individuals who visit code
1 and
not code 2 or who visit both, but enter 1 prior
to 2
total number of individuals
=
Pr (code 2)
=
the number of individuals who visit code 2 and
not code 1 or who visit both, but enter code 2
before code 1
total number of individuals
108
For the group of cases, we obtain the following estimates:
= .148
Pr [code 2] = .185
Pr [code 1]
To combine codes 3-4:
Pr [code 3] =
Pr [code 4]
.370
= .370
To combine codes 5-6:
= .296
Pr [code 6] = .111
Pr [code 5]
The resulting peak latent periods for the three combination groups
1-2, 3-4 and 5-6 are 15.0, 9.75 and 12.75 years respectively.
Figure
4.5 depicts the latent period distributions graphically, with the three
location parameters given in Table 4.10.
4.5 Conclusions and Comments
In this section we briefly summarize the results of applying the
Markov models to a case control study of leukemia.
Since the assumr'ions
of a simple Markov chain are not met, it was necessary to include the
distribution of 1ength-of-stay as part of the model and we therefore
developed a
semi~Markov
process.
We find that there are not great differences between the cases and
controls with regard to the probability of visiting any of the states,
yet the two populations vary with respect to time spent per visit to
the benzene areas.
A parameter derived from the sMP, the first passage probability is
used to describe the latency period.
There are several difficulties
109
in assessing the latency period.
In choosing the length of time between
first exposure and death as the latent period, the duration of exposure
at this first visit is ignored.
Perhaps the first passage probability
should be conditionally based upon a visit of minimum length.
We also
presume that the production of all the leukemias is occupationally induced.
It is probable that a fraction of the cases are due to non-occupational
causes and the latent period as estimated in the industrial setting is
irrelevant.
Most investigators choose to select a single point in time
to describe the latent period.
In this study we have demonstrated that
estimating several location parameters better describes the distribution
of the latent period according to our model.
This distribution is
highly skewed and the individual times highly variable.
Despite the inherent problems with the latent period, there are
several observations we can make at this point.
It is apparent that
there is little difference between the severity of benzene exposure in
terms of the length of time needed to initiate a biological response.
For other solvents, there is a marked difference between primary and
secondary exposure with the latent periods being considerably shorter
than for either of the benzene categories.
This suggests that direct
contact with solvents other than benzene may be potentially as hazardous
as exposure to benzene
itself, or, if benzene
is the leukemogenic
agent it may be present in small quantities as a contaminant of the
other solvents of choice in the industry.
110
DISTRIBUTION OF LATENT PERIOD PRIMARY BENZENE EXPOSURE
Fi gure 4.1
:;
: :
ii
,DtJb
~;: ~
H++
H-H·itll·;·l' I~ t';+·~*··i··'~·t±·
-/-··ttf'..1.!-I1
+ 1-1-
; ,' ,• •' ,;
f
'r
: . : . ·rT+--t;--r·
,' •. -:-:-:', I·' I ·
.. ,,..' ' : .. ,.·~:trr,H'rHfj·'·,:·J
f·H:···t
.. :
Iff
._,...
'r· 1 i : ;.~.~-:
! ; ·~-i·· -f-.I ~ ~. ;"L·f"!-i- +;·.. . h ' r l.r .. i-rP'ffi····~ I,+_~
~ ~-:
-T-~-r-j.··~·_~· t-t··of< -t··r-r··;--r- ~ 1:·" . --I .~.
-I- -+. I Ij. :
tr.·l+~.--' --;.. :.. :':.·r-~·.!.·l.,.~. ~- .. tr~~:r-f."d-"-.'··~+ .
rr:.~.-!.
'lTi T'! ,: ~ i l[.[~
,_ ;:-:,~:.,,':t~;~- .~T'
t~l~t
.. :,.- . [f·!-·r~rl-,
:;~'
ftt:f
-~~;,,;.:' i1}-:'J,j~i::;'r- 1'; ,+_·;:t+· . J. -··,·, -f··- ~+~r
1
·r r ; I : : .~,.: ... ;+1'F
l -",
r ~J~: 'F - 'ffi-!':'I'~"
rT J' -=~-,.Y:~.:
. ~/-l',.
-~ 'r-t :' ....,.... -_. -~: , ..... '. 1liii' 1=r...H-. ~.~r:..
. -f.·~lf
. " -i
' ,,'
r r f-l TT-
."I.'
'-:
!
I
. 00 It
I
:
I,
,
.';.,
i. Ll~'
.;.: '.,,-.,:-. - ~ 'H-I--H :': ::'.
or,
rT'" -,-'r-t-+-[-,tT
f 1 ,10(n)
t· ... ·j · , - - . - r -
'r
j'
I,
:-Ff :+.;-;;
!
T
• . • - ••.
'.!
'+~i-~,
,
.!
"'!
0.'' .
I
~F1--Hf+'~'H~;
I
r
!
,...... 2. -: -H
'I
..... f - I- ' '
"
I-I-~~~'
'
_! ~...
-,.-+
.. L
T~
·t·
I
'I
I-I-i-
-r-:-I-t-i-!
,-
-T'
t-!
. Tf-~~~ !
I
~,
I-r -.....+...+I++-I-·M-i4-IH..-+4-+-I-.++
~~ I I ,
rr,
I
!!!
.~~;-.: ;-~,=tr j-r,-tt- ~
j
.!'Io.l
:
'~'''I-__
_'!
.'
-rl-~-r1--1 I
I. . _.
I
-- ..
:'
I
,
I
.
~-r-_n.;"'-+
'~i-
~rJ-ri+t'''H'+H-++'+-I-~
III
I
I
..I
,
I
,,
!
I
I
I
I
, I
,
I
,
I
..
,
I
,
,'.
.<
I
'10
nr
110
KD
IJ.O
n
!
DISTRIBUTION OF LATENT PERIOD:
SECONDARY £3ENZENE EXPOSIIRE
Figure 4.2
·~ttR:±-H-hH--r++-+-HH-'H-+-++_Til_4l-1 +-rT+'-i'.-+j-l-_H~j='t:1
:::!±D-....d
F'~i+
-h ~- T~' ....... ·l*FR
I I
• 00(,
I
·
E~r:1- __
f 2 10(n)
,
. I-I-r
I
c
..
. '.;
.. h· f' -- " ~ ';
-t+r=r;
;~!;
+
..'r
r
•
• 001.
~ . ;~- ~~~=~
':'~
~-
h-t-
I
L
s'-H-W-.;
. : . - r r - r r - r i . T t"'l-f-L r
L.
~ I J I -rr-t-i--ri'.,
: : ' · . H - · ... !--!. ~~rL~';
NIL
.'
~
I
L
I'
.
,
I
6
.~r...l-"I.
... r'
r~!-
....'
lIIII
H--Hi:: :*:-H-f-I-·l-+-l-I-=F"...",.:l--+-++
0 r '
'D
r- .
II 1- -t=1+t=trtf
~lt~.
:;,
r-
I
I
-+- " r-f--'r'+- T
~:'
!- ...
:; .::
I : ;~
.
I
:~-=: r
,:'
-
~~: •
I
'~r-I'
"
-r-<H-+-i-+'.~-H-f-H-:....I-~
:~: .'.
I
'"
-1+--"" L~ _i.:.- -'W-l..':[~
I
I
t;"'lll
i-+-H-+++--Ioi-I-i-I-+--P+";'".....'..:,' .
r+-H-~+tI·ITI-HH-H-."'+-l-;-....
:::
: i
I-F.
'I', -H+'H--H-++-HfO!i:::~:
+-4-++-H~+-t-+t4-!~4-+++~~+-l-..w.~W-+-l--!-'
•
.::'.
I
i/;-H-H-t++-H-+~::;~~~~~
,
-+-t-T--H--h.i;-'! -4-+-~.+4-...l.'-:'.-;,_1
;': .'
;::
'-l-~ _I
1 ,
:::'~
T
.J-l-++;j:il,:m~,H-oI-1-+-J.~H-+4--I-I
:,::f{
'
I
I ~
I
,
,
~
eo
'
n
,
I,
/1..0
I
,
I
I IT!
I ~-1
I
:i
I 1-+-~+-r...l.I"":i--iI:-! '
"1
Figure 4.3
DISTRIBUTION OF LATENT PERIOD
PRIMARY OTHER SOLVENT EXPOSURE
Figure 4.4
DISTRIBUTION OF LATENT PERIOD
SECONDARY OTHER SOLVENT EXPOSURE
rT1=fT-=Ml-~-t+'
~ C~~'-t-;~i1,; I:,.
-H-!-+l-l-H-iH-t-H++-H-H-l-j-L_
- .
1
~-
=$:
_' _
, I I ,-r-..i: ,
r~
;"
I
:+-r
~I !!
I
r-lL -, --I -.. 1--... t-,-1-t-'·t-t-~I~' Ll. ~. L:..... _.~
II_~I-H- ti',
i'
I
DISTRIBUTION OF LATENT PERIOD FOR AGGREGATE STATES
Fi gure 4.5
r
. --,' ,-' __ 1.
-
.-.- .-- -+
. ;.'-
-1-.-
.,-
..
- i·"
-!"
-I
!.
-I - , ..•-
-t·: I - -;-- -.
1--..
i
-
:II.-j.~;.
,- ,
I,-! _j r', 1- -
1-1
I
-II
'I '; 1
~ -I
II t!'~III'
' I
,....51------"'-- -~ _.......--r--~JJ .
-!
Ir'
..
- .......
~
., -
,,' , ",. I
_
I
I
".'
'001
.::'
i<l-j-·
...
At ,.it"
I '/l"
I
.,
, , ! I ..' I!
I
~ j . # . h Ii 1.1 :
-=
-,.
I -I.··.
-In 1:'1 t
~
J::-
~
-'I
!i -
i
~
1 -,
-
l:,
"
'••••
'! .
,..; -i'-
t
1'- - .
iI'
,-t-
j _
,·:--,l-!-I-r
--,. - i i - :-t'1 ; r
It
'"
.;
4
..
I
LLlI
I : I i
"""';TTI"
,ol1.h-- -'-I.
'T;-rj',
..: • 1
:,
!'
.-!
i
-i- 1-
--j---Il-I
-'.
J.
-o--I-I-"jI
--1--,
-:-;/:_: ~ ::-r;- li;l.qJ --. -.::: -.
. '.-.""!;
-~-
- --- - - -- -- - --
---.,
I
,
•
l:-t=m --t--.l-- -'
~=I::r-i=:=i~-a:~l-::
. f-t
··-T-'
.~ -
-- .. -- --
.,
•
- -.
---/.- -.
- -
-
." - - -- - - '-ti +--=a__
- -- - ---.-__ --:-O-,---.--·I-.-j-~_l_I_<-- ± - .L~_
.-
,--
--0-"- ·-- . .
:_:._~'._;_'_--l+'
--, ••- ... ---~ - -1--.- -,•
;
.%0
I-
-.-
, . . ,
I
4-0
-I
-
I
I
'0
"-:-.:
-
q-.-
-
-
- --
-
.
- ,- - --- .
-
--
---
,--
10
-
i-i-'I-II-!I'-"!
',-'.
I",
I:'~I'j
, ,
':
'
"I
; i I.
;:1-
-:-:-,-"1'-
-I"
,,-1- i~I-li-I'. '.l--I:;:I•
- . 11-1::.;-1I . . I Ii I I I !:i! I !·h·! 'i-'·t,
,----r-r,'.
J' .~;-,I
; j l j ••
'-,-t--"'-r--
1
:;'.:1.
,-I,'-'!'·I-·--'-IIi"
,-1 .. , -I
: -~,--
I
--.-~~---~I~-t:l.+H-fL1J
---:1
I:!" -h,-I-I+r
- - - -"
-. -, - ·...,-.. 1· --: ' ,-'
. --.,
~ - ---- .. -I'- -I-H--.-n'-~
--- --. -!-,-,.Itll ..,'-/
-
-
·fl
- -:---
"
. --:
.- -
i
!.
-.
.-
.
--
-
-- -- -.
-r-ttP-'I-I=I+f-lr',I.T!:'T
1-1·11 .- - -1-:"
-I
--
..
1
1;1-
t-j-+'T
··-H-,-r-"J'
-!"'jOO
:i -1--: -l"--F.-.1
--
-I- __
.1..
.1.'
1
-1
! ,,'-I:':'
I , .•.•
--.-
-
I' I-I'
:'1' I I
.----l--I'"-iltl-t.-f.~
..
i
- -. . - - --- -- - - ,- I H·!
:~. -:.: :::~_ <.'-}-!=t--::E~-:=~
_
. -
==-'.=.--t - -. --..
-
,1' '-
I •
r
- --.--
-
I' I ' •
"..
- . -- - -
_._,0-
--
- -- - .
-.
-
-
--
-
--
i --/00
--
-- - '
-.--
- - .
'-
-
_.
0--.
-
0
I
.-- --- -
- - ~-'- ,---,
'
I
:1- .
.
--.-. --. ll.'"
:_ - ,_.__'_ - --- ::":="
--'="-'tl=:r-::-o:: i t - ~
+ ....,-
-, - .
--..
-
-' -- -- - - -
.~. - --- -- . - - ---.-
'"
- -'.. ~'-":---
-
'I
I
-
-----J----I
--
-,-,--.
l.-j--'1'1
--!-_.- -- - t"l - t .1 j-I --1.- ---.,-.-- -.----- -----.
-..
- •.,~ - ,- ----,
..
-------l
\-1 , -I.
I:
- -- - ---. - - •
. .
- - -- --..
. - - -,. I,,,·
..
- - - - . -. . . - -- -.- - ! . "
--
-
I
- - ---- -.
-.
- :-'.,--l-t- ..:-r--:+
- -
::1,I
L
'.
L
- . -- -I~~--_:::~-~~~-_ ~ :::>-1' - -:_--.--
--.f-:+!.~~~~~'!-!¥f ~.~= -~~Ir-- . , • 1-,,1 . , _0_+'-,. ,. - --:1/-. :.,0 . ,-,-r-1" :-t-,,-~
1 1
T!-.,-j-.I·",
-I'~- -
-:
"!,.---,
Other Solvents (Codes 3 and 4)
--
,,!c ~~.
,'~.-
.
-1-- <
__
:1 1-- ,.
- '-11'1'--,,-.-,-,,--. '+1--' - ...
._,_.1
--It -11" t
-",-,;1-,--,-,
- -.!','
-,
T' "'j-:
. - . -- - -, 1-' ,-I .. , I , .
I -.
.,
-:.~;--
!t::·:;!
Benzene (Codes 1 and 2)
- _.
=
-. -
--
1.1
'f -,,!
;..
1
- '" ....--:. t~;--I_"r:,-rJ
't"'''
,-. '--'-1"!-I
.. '
_.,--I
~ -,
.'
:=-flli'~I-
..-- . ,-'
-
~I-~-r~.t·~l~ji' L( 1:~ -:~ -..
t·,
I'
I
-t-~--i-·-I-!-l !·i_I_~L;.i: I
.-.I,--r.- . . -.i--:-'-i-,.,-t-:-,---L.l,.,
I ",j
,I.
-
.
a..
, -r
I
-
-
.
11\.'-'-'-'-'-1- rl--
~
-
'~3~-----~ i ' i '. l l i l
-. -
......,
I
-r·------- -.,1",
....
,,-:r-f-lTl--~-=T=--='":~----t'-T --:=
,
."
'! I .. !. I
- - . - 1
~'~ ""'i-'!'-,
.,
• "1
Iii
"
I,-~.:r- --r.- -~.,-,7"--;-----n,-",-1" i-+++r-ti"Ti'"
"1"1 1 -, -I' -, --I
~ ,,-. -~'" '-"1--'
-1-1 '1' 1- ,-I-- - --, - 1/1>
'-'-'-. _',--I
':-'---i-l-r-~-''~
..".,-..
-;-1':--. '1-- ,-,.--_.---- -- -- - - ,i'#
-" -
t ~!, t'tt~~'t.....
- I '~II':'
:\IJ{ j~~,I -- -II
I~,I'
------ H'I I ...Lt1r!j" . I";,- .
1- --.
- -
-.
_.
III :I'
.IhI
'oX.
i''V~
I f."JT
'
~I-Il -,-,-,~1··1
: ' ""'!"J...~
~.: ~.~, I . 1- ~
'!
-III' II I!!
' I'! II '
.",.
.
.
(
\
,
~.!
Ill
1i'll"·
r~tl 'j' ; j
I
".,.
-I-i
II' --·f· ·-f..--.:-~=--=·=----'-l-"-'--' .---t--..
··f- _.
.
!1 ;4i--,~.~-llj
.-T--i-H.+--n--+-Wl ;- :- -, . - I
j
-1·H -:'-j
~ '-I'_I J. ,- _1_- 1
-- -- - -"I'~ - 1.11;.r-.! ".-'.-- .. -. . - ..... -. . . . . .!.-
-,-
.i!:!.!-L!-.-.~!I.:-;].III-1
Il!II:--t-".1
: -1-i
--" .. - ,-,--
-j-,
-I'.'.I.• !-,-.'-11·.I.f.1u1'
-
__-I'i __
I~
.
-- -.
-- --
'.
1-'10
-
--
- -..
- -'--
-
r-
-
',1- '- I~~J
~ t+:I!:t-l,
.. I-"-'·rl+l
-
----t-
.. - ~::--=:::~'l-~-"!::1-1- -to. -I. I·-·f-, IT!.-
-. -.
- -
'1 . ·r-r-l-o~-,-,-,
-·--:I.'·'1--~:-d-L-'-~··t- , - - - - j ' , -1"'- -.-1·.-,
. _ .
-
.•-
-
·'I.,i-I=_,.r-<-.l--,-.-'I-J.
t-t- ,-r
t
'i"·
I
•
....
"0
N
e
e
e
113
TABLE 4.1
Code
ANALYSIS OF MISSED TRANSITIONS* FOR 3 MONTH INTERVAL
Leukemia Cases
Leukemia Controls
Number of
Number of
transitions
Number of
Number of
transitions
Ratio to state
misses
misses
to state
Ratio
88
0
.0000
383
1
.0026
2
98
2
.0204
114
1
.0088
3
854
7
.0082
3608
12
.0033
4
407
4
.0098
1380
10
.0072
5
168
0
.0000
388
0
.0000
6
30
0
.0000
261
0
.0000
7
1041
19
.0183
4925
26
.0052
8
592
.0017
4301
0
.0000
9
450
3
.0067
2210
25
.0113
10
27
0
.0000
19
0
.0000
3755
36
.0096
17589
75
.0043
Average per individual over
a11 cases
.0193
Average per individual over
.0057
all controls
TOTAL
* A missed transition is defined the same as in Table 2.4.
TABLE 4.2
TRANSITION MATRIX SIMPLE MARKOV CHAIN:
CASES
Code j
Code i I Code 1
Code 2
Code 3
Code 4
Code 5
Code 6
Code 7
Code 8
Code 9
Code 10
.0
.0227
.0
.0
.0
.0
.0
.0227
.0
1
I .9545
2
.0
.9184
.0102
.0204
.0
.0
.0408
.0
.01 02
.0
3
.0
.0
.9473
.0152
.0
.0
.0222
.0105
.0012
.0035
4
.0
.0025
.0369
.9238
.0
.0
.0246
.0025
.0098
.0
5
.0
.0
.0
.0179
.9345
.0
.0417
.0060
.0
.0
6
.0
.1000
.0
.0
.0
.8333
.0333
.0
.0333
.0
7
.0010
.0010
.0183
.0096
.0029
.0019
.9347
.0077
.0211
.0019
8
.0
.0
.0
.0
.0
.0
.0
.9662
.0
.0338
9
.0
.0045
.0134
.0045
.0067
.0
.0512
.0022
.9131
.0045
10
.0
.0
.0
.0
.0
.0
.0
.0
.0000
1.0
...J
...J
~
e
e
e
e
e
e
TABLE 4.3
TRANSITION MATRIX
SIMPLE MARKOV CHAIN
Code
Code i
CONTROLS
j
Code 1
Code 2
Code 3
Code 4
Code 5
Code 6
Code 7
Code 8
Code 9
Code 10
1
I .9243
.0026
.0209
.0026
.0026
.0
.0339
.0
.0131
.0
2
I .0263
.8596
.0088
.0263
.0
.0088
.0526
.0
.0175
.0
3
.0
.0003
.9559
.0147
.0006
.0
.0128
.0097
.0061
.0
4
.0
.0
.0449
.9087
.0014
.0007
.0239
.0058
.0145
.0
5
I .0026
.0026
.0103
.0026
.9251
.0
.0207
.0078
.0284
.0
6
I .0
.0
.0077
.0038
.0077
.9272
.0230
.0
.0307
.0
7
I .0022
.0014
.0106
.0077
.0014
.0008
.9547
.0100
.0112
.0
8
I .0
.0
.0
.0
.0
.0
.0
.9986
.0
.0014
9
I .0009
.0
.0136
.0104
.0023
.0032
.0348
.0036
.9312
.0
I .0
.0
.0
.0
.0
.0
.0
.0
.0
10
1.0
....
....
(J'l
116
TABLE 4.4
TEST FOR GEOMETRICITY OF HOLDING TIMES
Cases
Code
Number of
Transitions
Controls
2
X
p-value
Number of
transitions
x2
p-value
1
4
29
4.12
.249
2
8
16
1.26
.262
3
42
20.66
.001
159
54.94
.001
4
31
7.14
.068
126
55.83
.001
5
11
0.35
.555
29
8.83
.032
6
5
19
3.51
.061
7
67
223
325.95
.0001
8
5
9
38
79.8
.0001
10
a
44.04
.000
6
6.45
.268
152
e
e
e
TABLE 4.5
TRANSITION MATRIX (Semi-Markov)
3 MONTH INTERVAL
LEUKEMIA CASES
Code 1
Code 2
Code 3
Code 4
Code 5
Code 6
Code 7
Code 8
Code 9
Code 10
1
.0
.0
.5000
.0
.0
.0
.0
.0
.5000
.0
2
.0
.0
.1250
.2500
.0
.0
.5000
.0
.1250
.0
3
.0
.0
.0
.2889
.0
.0
.4222
.2000
.0222
.0667
4
.0
.0323
.4839
.0
.0
.0
.3226
.0323
.1290
.0
5
.0
.0
.0
.2727
.0
.0
.6364
.0909
.0
.0
6
.0
.6000
.0
.0
.0
.0
.2000
.0
.2000
.0
7
.0147
.0147
.2794
.1471
.0441
.0294
.0
.1176
.3235
.0294
8
.0
.0
.0
.0
.0
.0
.0
.0
.0
1.0000
9
.0
.0513
.1538
.0513
.0769
.0
.5897
.0256
.0
.0513
10
.0
.0
.0
.0
.0
.0
.0
.0
.0
.0
.0173
.0346
.1948
.1342
.0476
.0216
.2943
.0865
.1688
Code i
TOTAL
~
~
.......
Code
TABLE 4.6
TRANSITION MATRIX (Semi-Markov)
3 MONTH INTERVAL
LEUKEMIA CONTROLS
j
Code 1
Code 2
Code 3
Code 4
Code 5
Code 6
Code 7
Code 8
Code 9
Code 10
1
.0
.0345
.2759
.0345
.0345
.0
.4483
.0
.1724
.0
2
.1875
.0
.0625
.1875
.0
.0625
.3750
.0
.1250
.0
3
.0
.0062
.0
.3312
.0125
.0
.2875
.2188
.1375
.0062
4
.0
.0
.4921
.0
.0159
.0079
.2619
.0635
.1587
.0
5
.0333
.0333
.1333
.0333
.0
.0
.2667
.1000
.3667
.0333
6
.0
.0
.1053
.0526
.1053
.0
.3158
.0
.4211
.0
7
.0491
.0313
.2321
.1695
.1313
.0179
.0
.2188
.2455
.0045
8
.0
.0
.0
.0
.0
.0
.0
.0
.0
9
.0132
.0
.1974
.1513
.0329
.0461
.5066
.0526
.0
.0
.0
.0
.0
.0
.0
.0
.0
.0
.0
.0
.0337
.0693
.1881
.1463
.0348
.0221
.2602
.1196
.1765
.0
Code i
10
TOTAL
1.0000
~
~
OJ
e
e
e
119
TABLE 4.7
HOLDING TIME DISTRIBUTIONS
hi (y) =
Cases
Transitions
n.
State
J
~ + (1 - a) ~-1 e-Y/~ o < y < 4
(1 - a) ~-1 e-Y/~
y >4
Controls
Transitions Mixture
Mixture "
tjJ
ex
&
n·
J
A
l/J
1
4
0.00
21.52
29
0.00
12.72
2
8
0.08
12.60
16
0.00
6.64
3
45
0.23
23.40
162
0.16
29.00
4
31
0.22
15.64
126
0.29
13.88
5
11
0.29
19.96
30
0.30
16.88
6
5
0.00
5.52
19
0.00
13.24
7
68
0.37*
23.92
224
0.45
24.04
8
20
0.00
29.12
103
0.00
**
9
39
0.34
14.64
152
0.17
15.20
*Mixture interval is [Os 2]
** h
8
=
.025
a< y8
.01
8 < y 88
a elsewhere
120
TABLE 4.8
MODEL BASED AND DATA BASED
ESTIMATES OF HOLDING TIMES
CASES
State
;
1
2
3
4
5
6
7
8
9
Model based
T;
21.52
11 .75
18.48
12.64
14.75
5.52
15.44
29.12
10.34
Data based
"
1;
21.50
11 .75
18.48
12.63
14.77
5.50
14.81
29.1
10.37
St. dey.
"
S.D. h;)
Transition
nij
19.61
12.56
25.16
16.63
22.25
4.84
22.71
23.82
17.01
4
8
45
31
11
12.81
6.50
32.56
15.49
19.05
12.94
37.19
50.72
21 .03
29
16
162
126
30
19
224
103
152
5
68
20
39
CONTROLS
1
2
3
4
5
6
7
8
9
12.72
6.64
24.68
10.43
12.42
13.24
14.12
39.20
12.96
12.70
6.63
21.77
10.45
12.43
13.23
21 .49
41 .25
14.04
121
TABLE 4.9
MEASURES USED TO DESCRIBE DISTRIBUTION
OF LATENCY PERIOD BY EXPOSURE CODE
PEAK 0°
Transition
Years
Units
MEDIAN (0)
MEAN (0)
t.u. IS
years
t.u.·s
years
Code 1
68
17
113
28.25
147.08
36.77
Code 2
57
14.25
109
27.25
141.43
35.35
Code 3
28
87
21 .75
120.70
30.17
Code 4
47
11 .75
103
25.75
135.02
33.75
Code 5
47
11 .75
103
25.75
136.35
34.09
Code 6
58
14.5
110
27.50
142.28
35.57
Code 7
40
10.0
94
23.50
129.07
32.26
Code 9
45
15.0
98
24.50
130.43
32.60
7
122
TABLE 4.10
LATENCY MEASURES FOR COMBINED STATES
IN TRANSITION UNITS
Mean (in years)
Mode
Median
Mean
60
114
126.9
31.73
= Other Solvent 37
50
Code 5 &6 = Judgment
96
106.8
26.96
106
122.2
30.55
Code 1 &2
= BENZENE
Code 3 &4
Other
Solvent
CHAPTER V
SUMMARY AND SUGGESTIONS FOR FUTURE RESEARCH
5.1
Summary
The purpose of this research is to explore new techniques for
analyzing work histories.
We have set forth the procedures for
developing and testing both the discrete Markov chain and the discrete
semi-Markov process and applied these methods to two data sets which
derive from different epidemiologic study designs.
We can conclude that in both applications, a simple chain does not
suffice in adequately describing the dynamics of job changes within
the plant.
With the sMP, the distribution of holding times provides
additional information regarding the nature of worker movement.
In
comparing the application of this technique to cohort and case-control
designs, we can conclude that the practical reasons for choosing a
case-control design provide problems with our modeling:
data base.
the size of the
In epidemiological analyses, case-control studies are often
selected as they are more economical due to the fewer individuals comprising the study population, whereas a cohort study involves complete
enumeration of the population at risk.
Because of the large number of
job transfers in the retiree study, we were able to construct holding
times that were both dependent upon the previous and subsequent jobs
visited.
This was not possible with the leukemia study.
However,
124
with the case-control study we are able to attach special significance
to the meaning of the first passage probability, namely, the latent
period.
A comment regarding the holding time distributions is appropriate
here. In both Chapter III and IV, the holding time distributions were
fit to a certain degree of adequacy.
Had we attempted to find the
best fit possible, we expect that some of the results would be different.
5.2 Extensions and Suggestions for Future Research
1.
Higher order chains - Perhaps some of the assumptions that were
not met with the simple chain could be satisfied through higher order
Markov models which incorporate the element of memory back several steps.
2.
Variance of the parameters - There has been little work done
in the area of variance of some of the parameters which are derived from
the sMP.
Simulation or log likelihood methods could be explored as it
would be useful to be able to put confidence limits about some of
t~e
measures in this study.
3.
Taboo probabilities - are there differences among groups of
individuals who do not visit certain states?
4.
Backwards chains - latency could be explored by reversing the
process with respect to time and starting with the endpoint in this
study and working backwards to time of exposure.
5.
Applications to other OHSG studies - It would be useful to
apply the approaches used in this research to other occupational health
studies already completed to compare results.
These techniques could
be used in future studies as there is a tremendous amount of work
histories available for building these models and testing them.
137
6.
Cost-benefit studies - It would be useful to propose alternative
strategies of worker movement by suggesting different
~
and
~
matrices,
whereby the probability of disability or health outcome events might
be considerably changed.
Threshold limits of time spent in certain
hazardous areas would be helpful in predicting disability, which would
be helpful to management in projecting their needs for disability
award payments.
126
REFERENCES
1.
Gamble J, Spirtas R: Job classification and utilization of
complete work histories in occupational epidemiology. J Occup
Med 18:403, 1976.
2.
Gamble J, Spirtas R, Easter P: Applications of a job classification system in occupational epidemiology. AJPH 66:768, 1976.
3.
Mancuso TF, Ciocco A, E1Attar AA: An epidemiologic approach
to the rubber industry. J Occup Med 10:213-231, 1968.
4.
Lloyd JW, Lundin FE, Redmond CK, Geiser PF: Long term mortality
of steelworkers. IV. Mortality by work area. J Occup Med 12:151157,1970.
5.
Parkes HG: Health in the rubber industry. A pilot study.
Megson &Son, Ltd., Manchester, England, 1966.
6.
Fox, AJ, Lindars DC, Owen R: A survey of occupational cancer in
the rubber industry: Results of five-year analysis, 1967-1971.
Br J Ind Med 31:140, 1974.
7.
Ott MG, Holder BB, Gordon HL: Respiratory cancer and occupational
exposure to arsenicals. Arch Env Health 29:250-255, 1974.
8.
Spirtas R: A statistical analysis of work history data in a
retrospective occupational mortality study of the association
of solvents with leukemia in rubberworkers. UNC Institute of
Statistics Mimeo Series 1049:1976.
9.
Feller W: An introduction to probability theory and its applications Vol. I. John Wiley &Sons, New York, 1957.
10.
Hillier F. Lieberman G: An introduction to operations research.
Holden-Day, San Francisco, 1967.
11.
Parzen E:
12.
Karlin S, Taylor H: A first course in stochastic processes.
Academic Press, New York, 1975.
13.
Howard RA: Dynamic probabilistic systems, Vol. II. Semi-Markov
and decision models. John Wiley &Sons, New York, 1971.
14.
Zahl S: A Markov process model for follow-up studies.
Biol 27:90-120, 1955.
15.
Doob JL:
1953.
Stochastic Processes.
Stochastic processes.
A.
Holden-Day, San Francisco, 1962.
Human
John Wiley &Sons, New York,
127
16. Anderson TW, Goodnlan LA: Statistical inference about Markov
chains. Ann Math Stat 28:89-109, 1957.
17.
Billingsley P: Statistical methods in Markov chains.
Stat 32:12-40, 1961.
18.
Hoe1 PG:
19.
Moore EH, Pyke R: Estimation of the transition distributions of
a Markov renewal process. Ann Inst Stat Math 20:411-424, 1969.
20.
Marshall AW, Goldhamer H: An application of Markov processes to
the study of the epidemiology of mental diseases. JASA 50:99-129,
1955.
21.
Bush JW, Chen MM, Zaremba J: Estimating health program outcomes
using a Markov equilibrium analysis of disease development. AJPH
61:2362-75, 1971.
22.
Alling OW: The after history of pulmonary tuberculosis: A
stochastic model. Biometrics 14:527, 1958.
23.
Boag JW: Maximum likelihood estimates of the proportion of
patients cured by cancer therapy. J Roy Stat Soc 11:15-53,
1949.
24.
Meridith J: Markovian analysis of a geriatric ward.
19:604, 1973.
25.
Lu KH: A path probability approach to irreversible Markov
chains with an application in studying the dental caries process.
Biometrics 22:791-809, 1966.
26.
Bithe11 JF: A class of discrete-time models for the study of
hospital admissions systems. OperRes 17:48-69, 1969.
27.
Zung WWK, Wilson WP: Sleep and dream patterns in twins: Markov
analysis of a genetic trait. Bio1 Psychiat 9:119-29, 1967.
28.
Yang MCK, Hursh CJ: The use of a semi-Markov model for describing
sleep patterns. Biometrics 29:667-676, 1973.
29.
Thomas WH: A model for predicting recovery progress of coronary
patients. Health Serv Res 185-212, Fall 1968.
30.
Kao EPC: A semi-Markov model to predict recovery progress of
coronary patients. Health Serv Res 191-208, Fall 1972.
31.
Ku11back 5, Kupperman M, Ku H: Tests for contingency tables and
Markov chains. Technometrics 4:573-608, 1962.
A test for Markoff chains.
Ann Math
Biometrika 41:430-433, 1954.
Mgmt Sci A
128
32.
Shachtman RH, Hogue CJR, Schoenfelder J: The comparison of
post-abortum and post-partum time to delivery using a databased Markov chain. UNC Institute of Statistics Mimeo Series
1034:1975.
33.
Cin1ar E:
1975.
34.
Cronkite EP: Evidence for radiation and chemicals as leukemogenic
agents. Arch Env Health 3:297-303, 1961.
35.
Aksoy M, Erdem S, Dinco1 G: Acute leukemia due to chronic
exposure to benzene. Am J Med 52:160-166, 1972.
36.
Hernberg S: Prognostic aspects of benzene poisoning.
Med 23:204, 1966.
37.
Forni A, Vig1iani EC:
VII, 2:211, 1974.
38.
McMichael AJ, Spirtas R, Kupper LL: An epidemiologic study of
mortality within a cohort of rubber workers. J OccuP Med 15:458464, 1974.
Markov renewal theory: A survey.
Mgmt Sci
Chemical leukemogenesis in man.
21:727-752,
Br J Ind
Ser Hem!!
39.. McMichael AJ, Spirtas R, Kupper LL, Gamble JF: Solvent exposure
and leukemia among rubber workers: An epidemiologic study.
J Occup Med 17:234-239, 1975.
40.
Arp EW, Wolf PH: A Retrospective Assessment of Solvent Exposure
and the Relationship to lymphatic leukemia. In preparation, 1978.
41.
Cobb S, Miller M, Wa1d N: On the estimation of the incubation
period in malignant disease. The brief exposure case, leukemia.
J Chron Dis 9:385-393, 1959.
42.
Armenian HK, li1ienfe1d AM: The distribution of incubation
periods of neoplastic diseases. Am J Epid 99:92-99, 1974.
43.
Polednak AP: latency periods in neoplastic disease. Am J Epid
100:354-356, 1974.
129
APPEND IX I. 1
O. To Dictionary
letter
Code
Occupational
Title
Description
Numerical
. Code
BEBl
Bead Building
This involves all processes
associated with building beads.
26
BPTR
Batch PreparationTime
Mixing of rubber, accelerators,
antioxidants, filters, etc. in
banburys. Includes B.B. operators
and hel pers.
01
BPTU
Batch PreparationTubes
Sames as 01 except for tubes, flaps, 02
and bladders.
CAlO
Calender operation
This O.T. is for those individuals
who operate the calenders.
14
CAlT
Calender tending
This includes individuals that
assist the calender operator
(1st &2nd helpers), and in the
Firestone O.T.ls those on dip
operations as well.
15
CEMD
Cementing-Trends
After the treads are cut to the
25
proper size, the cut edges have
cement put on them so when the
tire builders put the tread on the
green tire they will stick together.
CEMG
Cement mixing
Mixing of liquid material (mostly
solvents) for use throughout the
plant.
ClRK
Clerical
CMTR
Cutting &MillingTires
After internal mixing in the BB,
05
the batch is milled into long sheets,
dried and then cut into long sheets
and stacked, or pelletized. The
pellets or sheets are then either
returned to the BB, or milled preparatory to further use (calender
for p1ystock, tuber for treads).
C~1TU
Cutting &MillingTubes
Sames as CMTR only the stock is
for tubes, flaps or bladders.
04
56
06
130
Letter
Code
Occupational
Title
Desc ri ption
Numerical
Code
CRBA
Curing-bladders
CRFB
Curi ng- fl aps
(black)
Mold press cure of flaps
35
CRFW
Curi ng- fl aps
(white)
Pot heater cure of talc
coated fl aps
36
CRTR
Curing-tires
33
CRTU
Curing-tubes
34
CRVV
Curi ng- va lves
38
FINX
Finishing & Repair- Same as FITU except for flaps
ing -Flaps, bladders bladders and sleeves.
&Sleeves
43
FIRA
Finishing & Inspect- Trim, balance, label, tube
ing-Tires
&inflate, service, force
grind and classify cured tires.
40
FIRB
Repairing tires
Repair, buff, rag, rib dress,
cure repairs of cured tires.
41
FITU
Finishing &Repair- Final inspection & repair of
ing - Tubes &Airbags tubes and airbags.
42
FUSP
Tube &flap building
32
INRE
Inspection &Repair- Inspect gree tires for defects,
Green tires
and repair, preparatory for
curing.
MECH
MIFB
Mechanical building
Mi 11 i ng-Fl aps &
Bl adders
MIMG
Mill Mi xi ng
37
Splicing and booking tubes and
slaps.
29
Largely machinist type work.
89
Sames as MITD only the tuber
makes slugs for flaps &bladders.
11
Similar to Batch preparation
except done on a mill, i.e.,
pigments are added by hand onto
a mill. This operation has
largely been replaced by the
banburys.
60
131
Letter
CJde
Occupational
Title
Description
Numerical
Code
MIMS
Milling - Misc.
This is milling that does not
fit any of the above products,
or where it is not known what
the product is.
13
MIPL
Milling - plystock
Same as MITD only it is sent
to a calender for making plystock,
chafers, inner liner, etc.
10
MITD
Milling-treads
The mixed rubber (with all its
additives) is milled so that
the rubber is hot and plastic.
This tread stock is then sent
to a tuber.
09
MITU
Milling-tubes
Sames as MITD only the tuber
makes tubes.
12
MNEC
Mechanic
49
MNCP
Carpentry
50
MNEL
Maintenance-electrical
46
MNMA
Machinist
51
MNMS
Maintenance, misc.
Includes pipefitters, cement
finishers and other maintenance
jobs not inlcuded in 44-51.
52
MNSM
Maintenance-sheet
metal
Ventilation ducts, BB
collectors, etc.
47
MNWL
r>1NWR
Maintenance-welding
Maintenance-millwright
OTHER
Others
Jobs not included in above and
non-production (guard, waitress,
cafeteria helper, etc.).
59
PALl
Paint & line -green Spray paint onto green tires
tires
before curing. Talc is applied
to some tires, and treads jammed
in others.
30
48
Machine Repair
45
132
Letter
Code
Occupational
Title
Description
Numerical
Code
PIBE
Pigment blending
Weighing &mixing of pigments
(accelerators, antioxidants) for
batch building.
03
PLHA
P1ystock handling
This O.T. includes the splicing
of material making p1ystock for
tires, and band building.
17
PWPT
Power Plant
All operations associated with
power plant.
53
RCDV
Devu1canizing reclaim
Processes of devulcanization
63
(heating, fiber separation, grinding,
batch preparation), digestion (batch
preparation, digestion, dewatering,
dryi ng).
RCMI
Milling - reclaim
Processes of mixing and blending
refining, straining, packaging,
extruding, batch building.
RCPR
Preparati on
reclaim
Processes of digestion, heating
62
sorting, cracking, sifting, magnetic
separation, collection and storage.
SALA
Salary
71
SBPR
Truck materials to BB for mixing
Service - Batch
Preparation - Tires (such as tube stock from CMTR to
be run through BB a second time,
supply mixed pigments ok).
07
SBPU
Service - Batch
Preparation - Tubes
Same as SBPR only for tube, flap,
and bladder stock.
SCPC
Special Products
Largely defense products made
during WWII.
SJAN
Jani tor
Some janitors (e.g., BB cleaners)
be in service operations.
SUN
Liner Service
(rero11, mend,
clean)
This O.T. takes the liners in which 18
the plystock is wound into rolls and
rerolls it after the plystock is used
up by the tire builders, and cleans
and mends the liners when necessary.
64
08
57
Letter
Code
Occupational
Ii tle
Description
133
Numerical
Code
SMOL
Mold cleaning &
repair
Includes trucking of molds to be
cleaned.
39
SREC
Service - reclaim
All service operation associated
with reclaim operation.
65
SRLC
Roll changing (in- These are service people involved
16
cluding trucking & in plystock preparation. It includes those changing the rolls on
service)
the calenders, bias cutting machines,
truckers (taking rolls to storage
areas), etc.
STBB
Service - tire &
bead building
Trucking plystock, tread, inner
liners, etc. and materials to tire
builders and bead builders.
STOD
Tuber service tread tuber (including booking &slitting)
Shipping & receiving
24
Similar to STTU except for tread
stock. Includes slitting the treads
and putting cemented treads onto trays.
STOR
28
All activities of receiving, unload- 58
ing, packing and loading materials
and finished products.
STOX
Tuber service flaps and bladders
STTU
Tuber service-tubes This would include service operat- 22
ions related to tubers making tubes,
such as cutting tubes, punching holes
for valves, putting on valves.
SYNT
Synthetic plant
Manufacture of synthetic rubber
70
TEST
Quality control
testing
All activities associated with
quality control (pigments, stocks,
tires, tubes, etc.)
54
TCKR
Trucking (general)
All trucking not included in
service jobs above.
55
Tr::BL
Tire building
TUBC
Tuber operationsflaps & bladders
Sames as TUBO and TUBU except
flaps and bladders are made.
21
TUBO
Tuber operationtreads
This O.T. runs the tuber which
takes the stock from a mill, puts
it through a die forming a tread.
19
Similar to STTU except for flaps
and bladders - cutting slugs,
booker, etc.
23
27
134
Letter
Code
Occupational
Title
Oescri pti on
Numerical
Code
TUBU
Tuber operationtubes
Similar to TUBO only a tUbe
is formed instead of a tread.
UNKN
Unknown
Don't know enough about the job
99
to put in above O.T.'s. or doesn't
fi t above O. T. IS.
VVMA
Valve Preparation
Making. spraying and buffing
val ves.
20
31
135
OTHER OCCUPATIONAL TITLES
letter Code
Occupational Title
Numeri ca1 Code
PlST
Plastics; VC & PVC production &
manufacture
FUCE
Fuel Cell
88
ARSP
Aerospace
78
INDA
Industrial products; processes-prior
to curing
86
CRSX
Curing - misc.; Mostly industrial products
curing but may include unknown curing
processes
61
INDC
Industrial products; Processes subsequent
curing
87
80, 83-85
(INDA & INDC may be coded UNKN on earlier coding)
METl
Metal preparation; Mostly degreasing
operations
INAC
Inactive file
RET!
Reti red
69
SICK
Sick Suspense
73
lOFF
layoff
72
lea ve of Absence
74
Misc. Absence
75
89?
ACTV
DEAD
136
APPENDIX 11.1
NINETEE~
OCCUPATIONAL TITLE GROUPS
1. SHIPPING AND RECEIVING - All activities of receiving, unloading,
packing and loading materials and finished products.
2.
BATCH PREPARATION - Mixing of rubber, accelerators, antidioxidants,
filters, etc. in banburies.
3.
BATCH PREPARATION - INDUSTRIAL PRODUCTS - Same as above for tubes,
flaps, bladders.
4. MILLING - The mixed rubber is milled so rubber becomes hot and plastic.
5.
CALENDERING OPERATION, BANDBUILDING - Rubber rolled into sheets, cut.
6.
TUBER OPERATIONS -
7.
BEADBUILDING, TIRE BUILDING - Product fabrication, trucking materials
to builders.
8.
INSPECTION AND REPAIR - Inspecting and repairing green tires in
preparation for curing, paint spraying, ta1cing.
Rubber put through a die to form a tread.
9. VALVE PREPARATION - Making, spraying and buffing valves.
10. CURING - Green products cured under heat and pressure.
11.
FINISHING - Trim, balance, label and classify finished products.
12.
MAINTENANCE - Mold cleaning, pipefitting, electrical work.
13. METAL PREPARATION - Raw steel cleaned prior to welding.
14.
RECLAIM - Scrap rubber shredded, devu1canized.
15. SYNTHETIC LATEX - Chemical plant making synthetic rubber.
16. POWER PLANT, CLERICAL, SALARIED
17. UNKNOWN
18. SICK
19. ALL OTHER
137
APPENDIX 11.2
TEN OCCUPATIONAL TITLE GROUPS
AND THE AGGREGATION OF ORIGINAL NINETEEN OTG's
1. BATCH PREPARATION [2, 3]
2. MILLING, CALENDERING, EXTRUSION [4, 5, 6]
3. PRODUCT FABRICATION [7, 9]
4. CURING PREPARATION [8]
5. CURING [10]
6.
FINISHING, INSPECTION, REPAIR [11]
7. MAINTENANCE, POWER PLANT GENERAL SERVICE [1, 12, 13, 16]
8.
RECLAIM [14]
9. SICK [18]
10. UNKNOWN, OTHER AND SYNTHETIC [15, 17, 19]
138
APPENDIX IV.1
JOB EXPOSURE CODES
CODE 1:
Primary Benzene Exposure -- direct, routine use or handling
of benzene or benzene-containing solutions.
CODE 2:
Secondary Benzene Exposure -- no direct, routine use or handling
of benzene or benzene-containing solutions; but routine use
of benzene or benzene-containing solutions in the work area.
CODE 3:
Primary Other Solvent Exposure -- direct, routine use or
handling of solvents other than benzene.
CODE 4:
Secondary Other Solvent Exposure -- routine use of solvents
other than benzene in the work area; but no direct, routine
use or handling of solvents.
CODE 5:
Judgement Primary Solvent Exposure -- undocumented but assumed
direct handling or use of solvents other than benzene.
CODE 6:
Judgement Secondary Solvent Exposure -- undocumented but
assumed use of solvents other than benzene in the work area.
CODE 7:
No Known Solvent Exposure -- documentation does not show primary
or secondary solvent exposure.
CODE 8:
Retirement
CODE 9:
Unknown -- available documentation has insufficient information
to determine the presence or absence of solvent use or exposure.
CODE 10: Death
139
APPENDIX IV.2
CRITERIA FOR ASSIGNING CODES
CODE 1: Assigned when there is documented use of the commerical solvent
benzene in the department during the job period; and documented
routine, direct use or handling of benzene by the job task.
Documentation:
SOL, SPD, FPS
CODE 2: Assigned when there is documented proximity to the use of the
commerical solvent benzene during the job period; but no documented
direct use or handling of benzene by the job title or task.
Documentation:
SOL, SPD, FPS
CODE 3: Assigned when there is documented use of solvents in the
department during the job period; and documented routine,
direct use or handling of solvent by the job title or task.
Documentation:
,
CODE 4:
SOL, JCR
(A) Assigned when there is documented proximity to solvent
use during the job period; but no documented direct use or
handling of solvents by the job title or task; or
(8) Assigned when a job title or task has no documented
direct use or handling of solvents; but is classified as an
assistant to a job title that does.
Documentation:
CODE 5:
SOL, JCR
(A) Assigned when there is no documented direct solvent use,
or area solvent use for the job period; but the job title is
normally associated with direct, routine solvent use when found
in other documented departments; or
140
(8) Assigned when a job title or task has documented routine,
direct solvent use during later time periods in the same department; but the job period predates job documentation.
Documentation:
CODE 6:
(A)
SPD, JCR, Investigator judgement
Assigned when there is no documented direct, routine solvent
use, and no documented use of solvents in the immediate area of
the department during the job period; but other documented job
tasks and/or process equipment normally located in the immediate
area could be reasonably expected, in the judgement of the
investigator, to require the use of solvents; or
(B) Assigned to a job title with documented proximity to
solvent use during later time periods;
bu~
which predates
Solvent Use Charts.
Documentation:
CODE 7:
SOL, JCR, Investigator judgement
Assigned when there is no documented or suspected solvent
exposure associated with the job, the department, the task,
the area, or the equipment.
Documentation:
Code 9:
SOL, JCR, SPD
Assigned when there is insufficient information to assign
another code number.