Williams, K.A.; (1976)An application of Poisson regression analysis to data on medication-taking behavior."

AN APPLICATION OF POISSON REGRESSION
ANALYSIS TO DATA ON MEDICATION-TAKING BEHAVIOR
by
Kenneth Arthur Williams
Department of Biostatistics
University of North Carolina at Chapel Hill
Institute of Statistics Mimeo Series No. 1059
March 1976
AN APP'LICATION OF POISSON REGRESSION
ANALYSIS TO DATA ON MEDICATION-TAKING BEHAVIOR
by
Kenneth Arthur Williams
A thesis submitted to the faculty of
The University of North Carolina at
Chapel Hill in partial fulfillment
of the requirements for the degree
of Doctor of Public Health in the
Department of Biostatistics.
Chapel Hill
1976
Approved by:
Advisor
Reader
Reader
it
ACKNOWLEDGEMENTS
The Author wishes LO thank the members of his advisory
committee; Drs. L. L. Kupper, B. S. Hulka, J. E. Grizzle,
J. R. Stewart, C. D. Turnbull, D. C. Leighton and W. K. Bentz
for their support and patience with these efforts.
In particular,
thanks are due to his advisor Dr. Larry Kupper, not only for his
continued guidance, but also for his continued friendship.
Dr. Barbara Hulka graciously permitted the Author
to use a portion of the data from the AAFP-UNC Study which was
supported by grant No. HS00026-03 between the National Center
for Health Services Research and Development and the Department
of Epidemiology, University of North Carolina.
The Author is also most grateful to his wife Cheryl
for her typing of the final manuscript and for her encouragement
to complete this work.
The Author's participation in this study was supported
by United States Public Health Service Traineeship 5-TOI-MH12602.
WILLIAMS, KENNETH ARTHUR.
An Application of Poisson Regression
Analysis to Data on Medication-Taking Behavior (under the
direction of DR. L. L. KUPPER).
Two primary elements deemed necessary by the American Academy
of Family Physicians for the assessment of primary medical care
are physician-patient communication and patient compliance.
Medication-taking behavior is an important indicator of both of
these elements.
In this dissertation, the principles of Poisson
regression analysis were applied to data collected on patients
with Diabetes Mellitus or Congestive Heart Failure in an attempt
to identify factors which would prove to be useful in the
prediction of non-compliant medication-taking behavior.
It is demonstrated that the number of medication-taking
errors experienced by an individual is well represented by the
negative binomial distribution.
This negative binomial distribu-
tion is shown to have developed via a compounding mechanism based
on an underlying Poisson process whose parameter A, when expressed
as
for k properly chosen independent variables Xl' X , . . 0'
2
~,
follows a gamma distribution.
The results obtained from attempts to select
fa~torH
useful
in the prediction of non-compliant medication-taking behuvlor
indicate that the independent variables exhibiting the most
potential are 1) the total number of drugs involved between
the physician and his patient and 2) the number of drugs
currently being taken for which the patient knows or understands
the function.
Both of these results underscore the notion that
there must be a reciprocal and effectual level of communication
between the physician and patient to insure the quality of the
medical care received.
TABLE OF CONTENTS
PAGE
CHAPTER
I
II
LIST OF TABLES •
vi
LIST OF FIGURES
xi
INTRODUCTION • .
1
1.1
Introduction to the AAFP-UNC Study
1
1.2
Physician-Patient Communication and Patient
Compliance
• • • .
• • •.
2
THE AAFP-UNC STUDY DESIGN
4
2.1
Study Site
4
2.2
Physician Enrollment
4
2.3
Patient Enrollment
5
2.4
Data Collection • .
7
2.5
Validity and Completeness of Medication Recording
8
2.6
Drug Coding and Organization
9
2.7
Response Variables
10
2.8
Sociodemographic Patient Characteristics
12
2.9
Characteristics of Disease Severity . . •
13
2.10 Attitudinal Characteristics of Patients and
Physician-Patient Communication . . • •
13
2.11 Professional Characteristics of
14
Physi~ians
2.12 Characteristics of Patient's Medication Regimen
1.5
2.13 Comment • . • .
18
iv
CHAPTER
III
PAGE
A REVIEW OF THE LITERATURE ON MEDICATION-TAKING
BEHAVIOR
3.1
3.2
3.3
3.4
3.5
IV
V
VI
•...........••.•.•
19
Relationship of Medication and Medication Regimen
Related Characteristics to Medication-Taking
Behavior
. . . . . . . . . . . . . . . ..
19
Relationship of Sociodemographic Characteristics
to Medication-Taking Behavior • • •
19
Characteristics of Disease Severity and Their
Relation to Medication-Taking Behavior
21
Attitudinal Characteristics of Patients and Their
Relation to Medication-Taking Behavior
22
Professional Characteristics of Physicians and
Their Relation to Medication-Taking Behavior
23
METHODS OF STATISTICAL ANALYSIS
.........
25
4.1
The General Theory of Multiple Linear Regression
25
4.2
An Alternative Assumption for the Distribution'
of the Response Variable
27
4.3
Negative Binomial Distribution
27
4.4
The Geometric Distribution as a Special Case of
the Negative Binomial Distribution
. • •.
31
4.5
Estimation of the Parameter X • • •
32
4.6
The Selection of Independent Variables
40
CHARACTERISTICS OF THE INDICATOR CASE SAMPLE •
45
5.1
Description of the Patient Sample. . .
45
5.2
Combining of the DM and CHF Indicator Case Samples
46
5.3
Selected Sociodemographic Characteristics of the
Combined DM-CHF Sample
•.••.
47
A MODEL TO PREDICT NUMBER OF COMMISSIONS
6.1
6.2
51
The Definition and Distributional FOl'm of Number
of Commissions
• . . . . . . . . . . . ..
51
The Initial Selection of Independent Variables to
Predict the Number of Commissions . . . . . . . .
56
v
PAGE
CHAPTER
Estimation of Regression Coefficients by Means
of the Method of Maximum Likelihood •
60
6.4
A Model to Predict Number of Commissions
61
6.5
An Investigation of the Underlying Assumptions for
6.3
6.6
VII
the Model Chosen to Predict Number of Commissions
78
Summary of Chapter VI •
89
A MODEL TO PREDICT NUMBER OF OMISSIONS •
7.1
The Definition and Distributional Form of Number
. . . . . . . . . . . . . . . . ..
91
The Initial Selection of Independent Variables to
Predict the Number of Omissions • . • . •
96
7.3
A Model to Predict Number of Omissions
98
7.4
Summary of Chapter VII
of Omissions
7.2
VIII
91
SUMMARY AND CONCLUSIONS
119
122
8.1
Summary . • . •
122
8.2
Recommendati9ns •.
125
BIBLIOGRAPHY • •
128
LIST OF TABLES
TABLE
PAGE
2.1
Type of Practice and Physician Participation
2.2
Medication-Taking Behaviors of Interest
11
5.1
Distribution of Males and Females • •
47
5.2
Distribution of Maritia1 Status
48
5.3
Distribution of Education . •
5.4
Distribution of Social Class
49
5.5
Mean and Standard Errors for Patient Age and Variables
Indicative of Disease Severity • . • • • • . . . • . .
50
6.1
6.3
6.4
6.5
6.6
6.7
48
The Observed Distribution of the Number of Commissions
for Patients Included in the Study of Medication-Taking
Behavior
6.2
......
5
. . . . . . . . . . . . . . . . . . . . . . ..
52
The Fitting of a Negative Binomial Distribution to the
Observed Distribution of the Number of Commissions for
Patients with Diabetes Mellitus . • . . • . . . . . ••
53
The Fitting of a Negative Binomial Distribution to the
Observed Distribution of the Number of Commissions for
Patients with Congestive Heart Failure . . . . • . . • .
54
The Fitting of a Negative Binomial Distribution to the
Observed Distribution of the Number of Commissions for
the Combined Sample . • . . . . . . . . . . . • .
55
The Initial Set of Independent Variables Proposed for
Consideration in the Prediction of the Number of
Commissions . . . • . . . . . . .
56
Independent Variables in Addition to (a + b ~ c) Initially
Selected to Predict Number of Commissions
60
Mean Number of Commissions GJven the 1'ota1 Number of
Drugs Involved Between the Physician ,and Patient
0'3
vii
PAGE
TABLE
6.8
6.9
6.10
6.11
6.12
Predicted ~'s Based upon the Unweighted Least Squares
Regression Coefficients • • • • • • • •
• . • • ••
66
Predicted X's Using the Maximum Likelihood Estimates of
the Regression Coefficients • • • • • • . • • . • •
67
Analysis of Variance for the Weighted Regression of
Number of Commissions upon (a + b + c)2 . • . • . •
70
Analysis of Variance for the Weighted Regression of
Number of Commissions upon (a + b + c) • . . . •
73
Observed and Expected Freq~encies Hypothesizing a
Poisson Distribution with A = 0.2182, or
+ b + c) = 2 • • • • • • • • • . • • • • • • •
79
Observed and Expecte~ Freq~encies Hypothesizing a
Poisson Distribution with A
0.3652, or
(a + b + c) = 3
79
(a
6.13
················ ····
6.14
Observed and Expected Freq~encies Hypothesizing a
Poisson Distribution with A
0.5710, or
(a + b + c) = 4
·
···············
6.15
····
79
Observed and Expected Freq~encies Hypothesizing a
Poisson Distribution with A = 0.8356, or
(a + b + c) = 5
············
80
Observed and Expected Freq~encies Hypothesizing a
Poisson Distribution with A = 1.1590, or
(a + b + c) = 6
·
80
Observed and Expected Freq~encies Hypothesizing a
Poisson Distribution with A = 1.54;J..2, or
(a + b + c) = 7
81
·········
6.16
······
6.17
·······
6.18
···· ······ ····
·············
Observed and Expected Frequencies Hypothesizing a
Poisson Distribution with ~ = 1. 9822, or
(a + b + c) = 8
··················
6.19
6.20
6.21
···
81
Observed and Expected Freq~encies Hypothesizing a
Poisson Distribution with A 2.4820, or
(a + b + c) = 9+. . . • . . . . . . . . •
82
Intervals Containing the Predicted Values of the
Parameter A and Their Respective Observed Frequencies
H7
ObHCrved /llId EXPP('tl;!d Frl'quenc!t'll lfypotlWHfzfll~~
ll>u l f on
••.•.••.•••..
HIS
GamnUI J) Is t r
/I
viii
PAGE
TABLE
7.1
7.2
7.3
7.4
7.5
7.6
7.7
7.8
7.9
7.10
The Observed Distributions of the Number of Omissions
for Patients Included in the Study of MedicationTaking Behavior • • • • • • • . . . • . • • • • • • • ••
92
The Fitting of a Negative Binomial Distribution to the
Observed Distribution of the Number of Omissions for
the Combined Patient Sample . • . • • • • . . • . . . . .
93
The Fitting of a Geometric Distribution to the Observed
Distribution of the Number of Omissions for the Combined
Patient Sample • • • • . . • . • . • . . . . • • . . . .
94
The Fitting of a Geometric Distribution to the Observed
Distribution of the Number of Omissions for Patients
with Diabetes Mellitus • . . . • • . . . . . . • . ••
95
The Fitting of a Geometric Distribution to the Observed
Distribution of the Number of Omissions for Patients
with Congestive Heart Failure . •
. . . . ..
95
Independent Variables in Addition to (a + b + c) Initially
Selected to Predict Number of Omissions
.•...•
98
"
Predicted A's Using the Maximum Likelihood Estimates of
the Regression Coefficients for Model (7.3.1) When
(a + b + c) = 2 • • • • ••
•••••••.••••.
102
Predicted ~'s Using the Maximum Likelihood Estimates of
the Regression Coefficients for Model (7.3.1) When
(a + b + c) = 3 • .
.. • • . . • . . . . • . . . . . .
102
Predicted ~'s Using the Maximum Likelihood Estimates of
the Regression Coefficients for Model (7.3.1) When
(a + b + c) = 4 • • • • • • • • • • • • • • • • • . . •
103
Predicted ~'s Using the Maximum Likelihood Estimates of
the Regression Coefficients for Model (7.3.1) When
(a
7.11
+b +
c)
7.13
103
Predicted ~'s Using the Maximum Likelihood Estimates of
the Regression Coefficients for Model (7.3.1) l~len
(a + b + c)
7.12
= 5 .•
=
••...••.•...•..
103
Predicted ~'s Using the Maximum Likelihood Estimates of
the Regression Coefficients for Model (7.3.1) When
(a + b + c) = 7 •.
. . ..
..
104
6 ••
Predicted A'S Using the Maximum Likelihood Estimates of
the Regression Coefficients for Model (7.3.1) When
(a + b
+ c)
= 8 •
• • • • • • • • • • • • • • • • • ••
104
ix
PAGE
TABLE
7.14
7.15
7.16
Predicted ~'s Using the Maximum Likelihood Estimates of
the Regression Coefficients for Model (7.3.1) When
(a + b + c) • 9+. • • • • • • • • • • • . • • • • •
Analysis of Variance for the Weighted Regression of
Number of Omissions Upon (a + b + c) and X
• . ••
10
. . . . ..
10
10
10
10
10
7.25
7.26
=
3 • • • • • • •
111
=
4 • • • • • • •
112
= 5 • • • • • • •
112
Observed and Expecked Frequencies Hypothesizing a Poisson
Distribution with A = 1.4174, or (a + b + c) = 6 and
X
7.24
111
Observed and Expec~ed Frequencies Hypothesizing a Poisson
Distribution with A = 0.0840, or (a + b + c) = 5 and
X
7.23
4 • • • • • • •
Observed and Expec~ed Frequencies Hypothesizing a Poisson
Distribution with A = 0.7423, or (a + b + c) = 5 and
X
7.22
=
Observed and Expected Frequencies Hypothesizing a Poisson
Distribution with A = 1.4006, or (a + b + c) = 5 and
X
7.21
111
Observed and Expected Frequencies Hypothesizing a Poisson
Distribution with ~ = 0.0672, or (a + b + c) = 4 and
X
7.20
110
Observed and Expected Frequencies Hypothesizing a Poisson
Distribution with ~ • 0.7255, or (a + b + c) = 4 and
X
= 3 . . . . . . . . . . . . . . . . . . . . . . . . .
10
7.19
110
Observed and Expecfed Frequencies Hypothesizing a Poisson
Distribution with A = 0.0504, or (a + b + c) = 3 and
X
= 3 . . . . . . . . . . . . . . . . . . . . . . . . .
10
7.18
105
Observed and Expected Frequencies Hypothesizing a Poisson
Distribution with ~ • 0.7087, or (a + b + c) = 3 and
X
= 2 . . . . . . . . . . . . . . . . . . .
iO
7.17
105
= 4 •••••••
112
Observed and Expec~ed Frequencies Hypothesizing a Poisson
Distribution with A = 0.7591, or (a + b + c) = 6 and
X = 5 . . . . . . .
10
113
Observed and Expected Frequencies Hypothesizing a Poisson
Distribution with A. = 0.1008, or (a + b + c) ~ 6 and
X
= 6
.
10
113
Observed and Expec~ed Frequencies Hypothesizing a Poisson
Distribution with A = 0.7759, or (a + b + c) = 7 and
~O = 6 • • • • • • •
113
x
TABLE
7.27
PAGE
Observed and Expected Frequencies Hypothesizing a Poisson
Distribution with X• 0.1176, or (a + b + c) • 7 and
X10 • 7 . . . . . . . . . . . . . . . . . . . . . . . . .
7.28
Observed and Expected Frequencies Hypothesizing a Poisson
Distribution with ~ • 0.8095, or (a + b + c) • 9+ and
XIO • 8 • . • . • . • . . . . . . . . . . • . . . . . . .
7.29
7.31
114
Observed and Expected Frequencies Hypothesizing a Poisson
Distrib~tion with ~ • 0.1512, or (a + b + c) = 9+ and
X10
7.30
114
:=
9
114
Intervals Containing the Predicted Values of the Parameter
A and Their Respective Observed Frequencies . .
118
Observed and Expected Frequencies Hypothesizing an
Exponential Distribution • . • • . . • • . • •
118
LIST OF FIGURES
PAGE
FIGURE
1
2
Mean Number of Commissions Versus the Total Number of
Drugs Involved Between a Physician and Patient,
(a + b + c)
.
64
Mean Number and Predicted Number of Commissions Versus
the Total Number of Drugs Involved Between a Physician
and Patient, (a + b + c) . . . . . . . . . . . . . . •
68
CHAPTER I
INTRODUCTION
1.1
Introduction to the AAFP-UNC Study
In 1969 representatives from two organizations, the American
Academy of Family Physicians and the Department of Epidemiology at the
University of North Carolina, undertook the task of developing a
research design, being both original in concept yet feasible to implement, as a method of assessment of primary medical care (13).
Con-
ceptually, there appear to be at least two relatively different paths
of approach to this problem.
One could adopt a prospective approach
and study the residents of a community prior to their
en~rance
into the
health care system, or one could look retrospectively at those individuals who have already gained access to this system and have been
"treated" by it.
The model employed for this study was an attempt to
incorporate both approaches.
A household survey was designed in an
attempt to identify barriers and stimulants to the utilization of
existing health care services, while the effects of the various health
care delivery systems on individuals who enter' them were evaluated
through the use of an "indicator case" model (13).
This dissertation
will relate solely to certain aspects of the indicator case data.
The AAFP-UNC representatives selected a number of conditions which,
it was felt, could adequately serve as indicators of the quality of care
received by individuals.
The four indicator conditionu
Be]
ectcd w('re
2
pregnancy, infancy, diabetes mellitus, and congestive heart failure.
These conditi.ons were selected on the basis that either some improvement
in health status could be anticipated to follow good medical care, or
that a relatively high degree of unanimity existed concerning the appropriate management of the condition (13).
An assessment of the effectiveness of the medical services provided
for each indicator condition was made with reference to the following
eight designated elements.
Elements for Assessment
Utilization
Cost and Convenience
Physician Performance
Communication
Compliance
Physician Awareness of Patient Concerns
Attitudes Toward Physicians
Outcome
1.2
Physician-Patient Communication and Patient Compliance
The two Elements for Assessment with which this dissertation is
concerned are physician-patient communication and patient compliance.
Physician-patient communication is defined as the extent to which the
physician is successful in transmitting information and instructions to
the patient, while patient compliance measures the extent to which the
patient's behavior is modified by the physician's instructions.
Medi-
cation-taking behavior is an important indicator of both of these
elements.
Medication-taking behavior places an emphasis not only upon
how the patient is taking his currently prescribed medications, but also
upon the responsibility of the patient to inform his physician about
other drugs which he is currently consuming.
Conversely, it is ulao the
responsibility of the physician to elicit information from the patient
3
concerning medications which the patient is currently consuming.
When
multiple drugs are involved, this information is essential to assist the
physician in avoiding contraindicated medications and clinical complications resulting from drug misuse.
The non-compliant patient is not a consistent or readily identifiable individual yet he is a part of every medical practice.
The major
thrust of this dissertation will be to identify factors which prove to
be useful in the prediction of non-compliant medication-taking behavior.
The areas to be investigated for such factors are 1) sociodemographic
characteristics of the patients, 2) characteristics indicative of
disease severity, 3) attitudinal characteristics of the patients,
4) characteristics of the patient's medications and medication regimens,
and 5) professional characteristics of the physicians.
CHAPTER II
THE AAFP-UNC STUDY DESIGN
2.1
Study Site
The selection of Fort Wayne, Indiana as the site of this study was
largely dependent upon the expressed willingness of the Fort WayneAllen County Medical Society membership and their Board of Directors to
participate in and provide sponsorship for this research (13).
The
city has a total population of nearly 200,000 persons and it was felt
to be appropriate in terms of population characteristics including
representation from all socioeconomic groups.
The organizational
patterns of medical practice in Fort Wayne include solo practitioners,
two or three doctor associations, and two loosely organized multispecialty groups.
Primary medical care in Fort Wayne is provided
almost exclusively through the private sector of medicine with support
from emergency rooms in three voluntary hospitals.
2.2
Physician Enrollment
A stratified random sampling
proc~dure
was applied to identify the
medical practices which would be asked to participate in this endeavor.
The overall sampling frame was composed of all general practitioners
and internists listed in the Fort Wayne Medical Society Directory with
the exception of 1) those over 70 years of age, 2) those with offices
outside the study area, and 3) those engaged primarily in the practice
of industrial or emergency medicine (13).
The individual practitioner
5
or groups of practitioners, depending on the type of practice in which
the physician was engaged, was designated as the primary sampling unit.
The stratification of the physicians was based upon the four types of
medical practices listed in Table 2.1.
With the exception of solo
general practitioners, 100 percent of the physicians in the remaining
categories were asked to participate.
Table 2.1.
Type of Practice and Physician Participation
Type
of
Practice
Number of
Physicians in
Sampling Frame
Number of
Physicians
Contacted
Number of
Physicians
Participating
Solo
G.P. 's
49
37
18
9
9
6
16
16
15
8
8
7
82
70
46
Solo
Internists
Group
G.P. 's
Group
Internists
Total
The final sample consisted of 46 of the 70 physicians originally
contacted.
For the 24 physicians who either refused to participate or
withdrew their support before the termination date for the enrollment
of patients, reasons often cited were lack of interest, invasion of
privacy, loss of valuable time or office staff rejection.
2.3
Patient Enrollment
Of the four indicator conditions selected for study, this disser-
tation will only be concerned with diabetes mellitus (DM) and congestive heart failure (CHF).
6
The criteria for admission to the study were as follows (13):
Diabetes Mellitus
1.
Age at time of admission is no more than 65 years nor less
than 30 years.
2.
Age at time of diagnosis must be at least 29 years.
3.
Duration of the disp.ase must be no more than ten years.
4.
Patients with psychosis, mental retardation, or mental
confusion inhibiting the interview are excluded.
The rationale for criteria 1-3 is to include in this study only
adult onset diabetics who do not have a high probability of complications associated with advancing age or prolonged presence of the
disease.
The fourth criterion is to insure valid interview data.
Congestive Heart Failure
1.
Age at time of admission is between 50 and 75 years.
2.
The majority of patients have congestive heart failure due
to either coronary heart disease or hypertensive heart
disease.
3.
Other medical diagnoses may be present.
4.
Same as Diabetes Mellitus.
The age boundaries for congestive heart failure are intended to
exclude young persons with failure due to unusual causes and senile
individuals who would be unable to provide valid interview data.
Although no decision rule concerning the specific diagnosis which
caused the heart failure was established, the preponderance of cases
are due to arteriosclerotic or hypertensive heart disease (13).
Many
of the patients who were ultimately enrolled in the study did have
additional chronic conditions, including diabetes mellitus.
Patients who satisfied the respective criteria were admitted into
the study at the time of an office visit to one of the 46 participating
physicians.
Patient enrollment for each medical practice extended over
7
a four-month period.
After the physician's office staff had explained
the purposes of the study to each patient, those who agreed to be
contacted were called by the study nurse-interviewer who scheduled an
appointment for the first home interview with the patient.
Eighty-four
percent of the patients contacted agreed to participate (14).
Although it would have been highly desirable to investigate the
.effects of non-participating physicians and non-participating patients
upon the representedness of the final patient sample, this was not
possible, at this time, due to the scarcity of available data.
2.4
Data Collection
During the first home interview, the patient was asked to produce
for the nurse-interviewer all medications he was currently taking.
After all of these containers were assembled, the patient was asked
about the function of each medication, how often he was told to take
the medication, and how many pills (or other units) he was instructed
to take each time.
The latter two questions were an attempt to discern
the patient's comprehension of the physician's scheduling instructions.
Finally, the patient was asked whether or not he was actually taking
the medication as directed by the physician.
The nurse-interviewer transcribed the prescription number and
pharmacy name and address from the label on each container.
Prescription numbers were presented to the indicated pharmacy,
where prescription data on drug name and schedule were obtained.
In
this manner, each medication presented to the nurse-interviewer by the
patient was identified.
Only one of the 76
~harmacies
being patronized
by the patients in this study refused to supply the requested information (14).
8
From the patient's medical record in the physician's office, the
nurse-interviewer was able to abstract data on medications prescribed,
and not discontinued, during the year prior to the home visit with the
patient.
The abstracted drug data was recorded on a questionaire which
was subsequently submitted to the physician for his review and/or
modification.
Drugs for which scheduling data were unavailable in the
medical record, were specifically called to the doctor's
completion' of the information.
attentic~
for
In addition, the physician was encour-
aged to supplement the medical records data with his own statement of
the patient's current medications.
Further review of the various medications presented by the
patients and abstracted from the physician's medical records revealed
sporadic instances of over-the-counter drugs and topical agents.
Since
these medications were subject to incomplete and inconsistent reporting,
they were excluded.
Thus, with the exception of insulin, which can be
obtained without prescription, all medications included for analysis
were prescription medications only.
2.5
Validity and Completeness of Medication Recording
It was realized that questions could justifiably be raised con-
cerning the validity of the data collection methods utilized in this
study.
The presence or absence of a bottle containing pills is no
valid proof that the pills are actually being consumed.
Likewise,
patients may neglect to show current medications or they may pr.esent
previously prescribed drugs which they are not currently taking.
The
patient's statement was accepted as to whether or not he was actually
taking ·each medication as he reported.
A review of the literature
9
revealed that patients probably overestimate compliant behavior unless
a major deviation has occurred (19).
This phenomenon will have the
effect of underestimating non-compliance, thus forcing any estimates
of non-compliant medication-taking behavior to be conservative.
Pill
counts as a method for determining compliance were viewed to be unreliable as well as impractical while drug excretion tests were not
viable considering the broad spectrum of drugs being studied.
One
other question concerns the accuracy of the physician's listing of each
of his patients' currently prescribed medications.
These types of
errors and omissions were hopefully supplemented by the physician's
review and/or modification thus making the data not totally dependent
on the quality of the medical records.
Despite these questions, the data collection method used was
feasible to execute and was considered to be the most valid and complete method available in view of the overall study design.
2.6
Drug Coding and Organization
Each of the prescription medications inc,luded in this study was
coded according to the following criteria.
Any medication composed of
a different chemical cons"tituent was given a unique code number.
with identical chemical
composition~
whether recorded by generic name
or differing trade names, were given the same code.
strengths of a given
rlrug~
say~
their own distinctive drug code.
Drugs
However, different
0.025 mg and 0.05 mg, were each given
Generic names were not d1.fferentiatcd
from brand names because almost all of the prescriptions, and listingH
by physicians, referred to brand names (14).
Furthel"more~ it
was felt
10
that there were no indications of a deliberate substitution policy
being implemented by anyone pharmacy.
The pharmacologic categories in the Ameri.can Hospital Formulary
Service, 1972, were used as a basis for assigning each medication to a
specific pharmacologic categnry (14).
The American Hospital Formulary
Service lists 25 major pharmacologic categories with numerous subcategories under each major heading.
The indicator case drug data
subsumed within 13 of these major categories.
w~re
However, either due to
their frequent occurrence or specific importance, some particular subcategories will be considered separately from their major categories.
One instance of this is that cardiac drugs will be considered separately from the larger major category of cardiovascular drugs.
2.7
Response Variables
As a result of the indicator case data being collected from both
a patient and his respective physician, who was the primary source of
care for that patient, it is important to incorporate this paired
relationship into the discussion.
Subsequently, physicians and patients
will be presented in pairs and measurements are based upon the extent
to which the patient's drug-taking behavior conforms to his physician's
recommendations.
In an effort to clarify the types of errors being made, and also
to elucidate the magnitude of these errors, a new approach to the
scoring of medication use und misuse was developed.
For each physician-
patient pair certain defined types of medication-taking behavior may
occur, and each behavior may occur with
vary~ng
frequencies.
The
11
medication-taking behaviors of interest are defined below in Table 2.2
and the frequency with which they may occur is symbolized by an
alphabetic letter.
Table 2.2.
Medication-Taking Behaviors of Interest
Behavior
Frequency
Number of drugs the patient is currently taking
which his doctor has prescribed.
a
Number of drugs the patient is not currently
taking which his doctor has prescribed.
b
Number of drugs the patient is currently taking
of which his doctor is unaware.
c
Total number of drugs involved between the
doctor-patient pair.
a + b + c
From this table it can also be seen that the sum of frequencies
(a + b) is equal to the total number of drugs currently prescribed for
a patient by his physician.
Similarly, the sum of the frequencies
(a + c) equals the total number of drugs currently being consumed by
the patient.
The medication-taking behaviors symbolized by the letters "b" and
"c" have been termed the number of omissions and commissions respectively, and it is these two types of non-compliant behavior which will
be the response variables of interest for this dissertation.
It is
important to note that while the medications included in the behavior
symbolized by the letter "a" represent a contribution of information
from both the physician and the patient, omissions represent a contribution from the physician only sinceH J6 assumed that the patJent
was unaware of their having been currently prescribed for him.
Con-
12
versely, commissions represent an input of information from the patient
~
only due to the physician being unaware that his patient was currently
consuming these medications.
2.8
Sociodemographic Patient Characteristics
One of the larger and more well explored classes of variables
commonly collected in connection with research concerning medicationtaking behavior is that relating to the sociodemographic characteristics of the patients.
The following patient characteristics were
included in the research design.
1)
2)
3)
4)
5)
6)
Sex
Marital Status
Patient Age
Education
Social Class
Number of Persons in the Household
Sex is self-explanatory and marital status was categorized very
simply along the lines of either being married or not being married at
the time of enrollment.
The allowable ranges for age have previously
been discussed in relation to admission criteria for this study.
I
Education was categorized as a five-point scale with a range from ninth
grade or less through college graduate.
This scale is at least ordinal.
Social class was measured on the Hollingshead scale (11) which is a
five-class scale based upon a weighted combination of an individual's
educational level and the head of the household's current occupational
role.
The basic data upon which this scale is based are interviews
with respondents in a five percent sample of all households in the
metropolitan area of New Haven, Connecticut, which had a total population of approximately 236,940 persons.
Class I is defined to represent
the upper social class of this population while Class V represents the
~
13
lower social class.
II were combined.
nature.
2.9
To assure an adequate sample size, Classes I and
The Hollingshead scale is also at least ordinal in
The number of persons in the household was also recorded.
Characteristics of Disease Severity
As another possible source for obtaining variables which might
prove useful in predicting non-compliant behavior for the indicator
case data, characteristics believed to be indicative of disease severity were recorded.
For all patients, the duration of disease, recorded
in intervals of one year, and the number of other concurrent conditions
were determined.
Also, a variable assessing the patient's current
level of activity was included.
~orking
or other.
Current activity was dichotomized into
The working category was meant to represent those
individuals who had not reduced their activity level as a result of
their indicator condition while the category other was intended to
include those whose activity level had, in some manner, been compromised by their condition.
2.10 Attitudinal Characteristics of Patients and Physician-Patient
Communication
It has also been suggested in various surroundings that more
subtle factors associated with non-compliance are present in the
patient's attitude toward his condition and in the ability of the
physician and patient to effectively conununicate with one another.
questionaire was developed in an attempt to elicit the patient's
attitude and concerns regarding his condition (12).
Basically, a
series of items with an antecedent clause specifying the content of
each item are developed.
The content of the items relates to
A
14
attitudes and concerns which have previously been expressed by patients
with the given indicator conditions.
The response to each item was located along a continuum of 20
locations representing a range from highly negative to highly positive.
In order to clarify the meaning of each item, verbal anchors were
placed along the continuum.
When scored, the most positive or healthy
response is assigned to the "20" end of the continuum and "1" is the
most negative.
Items are mixed, i.e., the positive end of the scale
is placed on either the right-hand or left-hand side of the page, when
sequenced on the questionaire to avoid a response set in the mind of
the respondent.
In this manner, a score can be assigned to each
patient representing the mean of their responses to the included items.
For each physician-patient pair, a score was also calculated to
represent the proportion of information communicated to the patient of
the total amount which the physician wanted to communicate.
Thus, the
higher the score the better is the level of communication from physician to patient.
It is important to note that the pretesting of this
variable revealed no significant correlation between the resulting
score and the number of items of information which the physician
wanted to communicate to the patient, which is what one might possibly
expect.
2.11 Professional Characteristics of Physicians
A review of the available current literature cites evidence documenting the possible effects of a physician's professional characteristics on the medlcation-taking behavior of his patients.
Based upon
these reports, the following professional characteristics were recorded
15
for each participating physician:
1)
2)
3)
4)
5)
6)
Age
Type of Physician
Board Certification
Type of Practice
Average Number of Patient Visits per Physician per Day
Length of Physician-Patient Relationship
The physician's age is self-explanatory with a note that the
upper age limit is 69 years as discussed in Section 2.2 on physician
enrollment.
Type of physician refers to whether the doctor was a
general practitioner or an internist.
Board Certification is an
indication of whether or not the physician was certified to practice
a specialty.
Type of practice is a reference to the earlier discussion
on the organizational structure of medical practice in Fort Wayne.
This variable indicates whether a physician was associated with a
group practice or had a solo practice.
Type of physician, board certi-
fication, and type of practice are all coded as dichotomous variables.
The average number of patient visits per physician per day represents
an attempt to gauge a physician's daily activity in terms of patient
load.
Its value is the result of an observation by the nurse-inter-
viewer of each physician's average daily case load after consulting the
physician's appointment schedule and conversing with his office staff.
The length of the physician-patient relationship was recorded in terms
of years.
This variable is an attempt to define the strength of such
a relationship.
2.12 Characteristics of the Patient's Medication Regulen
There appears to be a consensus that features of the medication
itself and the regimen by which it is prescribed can
~ffect
patient
16
compliance.
The recording of the following medication regimen related
characteristics for each physician-patient pair was an attempt to
include such types of variables.
1)
Total number of drugs involved between the physician-patient
pair.
2)
Total number of drugs currently prescribed by the physician.
3)
Total number of drugs currently being consumed by the patient.
4)
Number of antidiabetic drugs currently being taken.
5)
Number of cardiac drugs currently being taken.
6)
Number of hypotensive drugs currently being taken.
7)
Number of diuretic drugs currently being taken.
8)
Number of central nervous system drugs currently being taken.
9)
Number of drugs currently being taken once-a-day.
10)
Number of drugs currently prescribed to be taken once-a-day.
11)
Proportion of the total number of drugs currently being
taken with different schedules.
12)
Proportion of the total number of drugs currently prescribed
with different schedules.
13)
Number of drugs currently being taken for which the patient
knows the function.
The first three variables listed, namely (a + b + c), (a + b), and
(a + c) have previously been defined in Section 2.7.
The number of
antidiabetic drugs, cardiac drugs, hypotensive drugs, diuretic drugs
and eNS drugs have been categorized according to the guidelines of the
American Hospital Formulary Service.
One note with respect to the
number of drugs in the above pharmacologic categories is that they are
all based upon the number of drugs currently being consumed, i.e.,
(a + c).
Whenever a variable is based upon the
numbe~
of drugs
currently being consumed, it is by definition based upon (a + c) drugs
as a result of the data collection method which was utilized.
The
number of drugs currently being taken once-a-day was an attempt to
record the complexity of the schedules associated with the patient's
~
,.,
17
medication regimens.
To record this score, the nurse-interviewer was
instructed to ascertain from the patient, for each medication presented,
how many units, usually pills, he was taking during a 24-hour period.
Conversely, from the physician's medical records, the nurse-interviewer
was also able to determine the number of medications which the patient
was currently prescribed to be taking once-a-day.
However, again due
to the nature of the data collection method, the number.of medications
prescribed to be taken once-a-day is based upon the set of drugs
(a
+
b).
For the purposes of this study, a medication's "schedule" is
composed of two components, 1) the dosage and 2) the frequency with
which it is to be taken.
The proportion of the total number of drugs
currently being taken with different schedules represents an effort to
combine these two components into one single, yet meaningful, variable
which is, once again, aimed at scheduling complexity.
Given the medi-
cations which a patient is currently taking, i.e., (a + c), there exist
(a +2 c) possible pairs
of these medications.
It is also possible to
score each individual pair as to whether both dosage and frequency are
the same, different, or only matching on one of the two scheduling
components.
Using the following system of weights
o
if both dosage and frequency are the same
if either dosage or frequency, but not both,
are the same
1
for each of the i
if neither dosage nor frequency are the same
= 1,
2, . . • ,
( a +2 c) pairs
of drugs, a score
18
can be computed for each patient.
p •
Similarly, by
using the set of drugs currently prescribed by a physician for the above
patient, i.e., (a + b), the same procedure can be carried out to derive
i .. l
a score
D" -";';;(-a';;;;'+-2-b~)-
which represents the proportion of the total
number of drugs currently prescribed with different schedules.
A high
score for either P or D would indicate that a number of pairs of medications were differing on at least one of the two defined components of
~
scheduling.
Finally, a variable attempting to assess the number of drugs which
th~
patient was currently taking for which he knew the function was
developed.
At the time of the home interview, the nurse-interviewer
determined for each presented medication whether or not the patient
knew the name of the medication or its intended action.
2.13 Comment
Given this rather extensive set of variables representing patient
characteristics, disease characteristics, medication regimen related
characteristics, and physician characteristics, let uS now review the
existing literature to gauge how they have been found by previous
researchers to relate to non-compliant medication-taking behavior.
1
<
j
j
CHAPTER III
A REVIEW
3.1
OF THE LITERATURE ON MEDICATION-TAKING BEHAVIOR
Relationship of Medication and Medication Regimen Related Characteristics to Medication-Taking Behavior
A review of the current literature concerning the relationships
which are thought to exist among selected aspects of the patient's
medications or medication regimens and non-compliant drug-taking
behavior produced the following findings.
Evidence suggests that there
is a positive relationship between non-compliance and the number of
different medications which a patient is currently taking (3, 19, 20).
Thus, the more medications a patient is currently involved with, the
greater the number of medication-taking errors.
Similarly, increased
non-compliance is also associated with an increased number of medications which must be taken many times during the day (3).
3.2
Relationship of Sociodemographic Characteristics to MedicationTaking Behavior
Due to their greater morbidity, the elderly acquire and consume
more prescription medications than do younger persons.
Based on the
Health Interview Survey, the National Center for Health Statistics
reported that Americans aged 65 years and over acquire 22 percent of
all prescription drugs sold; though they constitute only nine percent
of the total U.S. population (21).· However even with this greater
than expected acquisition and consumption, most results Indicate
20
that age alone is not related to non-compliant medication-taking
behavior (19, 20, 24)',
Blackwell (3) and Schwartz (22) report that more medication errors
are made by persons who live alone giving tentative credence to the
proposition that a patient's non-compliant behavior may be indirectly
related to the number of persons in the patient's household.
A plau-
sible explanation for this may be that given a larger number, of individuals in the patient's home, there is an increased probability of the
patient's medication-taking behavior being monitored with greater
scrutiny.
Also, again from the NCHS, it was reported that individual
acquisition rates for prescription medications decrease as family size
increases (21).
Marston (19) reports that socioeconomic status has not generally
been found to be significantly related to non-compliance although a few
papers do report that low SES is associated with non-compliance.
j
One
of the major drawbacks to the use of socioeconomic status or social
class as a variable is the variety of methods which have been derived
to "measure" this phenomenon.
As a result, any comparability between
studies using different indices is extremely difficult.
A number of authors relate that there is no evidence to support an
association between non-compliance and sex (19, 20, 24).
However, in a
series of studies using anti-tuberculosis drugs, women were reported to
be more likely to discontinue their
n~dications
than were men (19).
addition, the NCHS determined that women were more likely than men to
use or purchase prescription medication!; (21).
Thua, llithough the
present evidence does not strongly support this conjecture, it might
-I
In
21
seem reasonable to hypothesize that women were more non-compliant in
their medication-taking behavior.
Rabin reports that as education increases, the per capita acquisition of prescription medicines declines (21).
a contradictory picture of
t~e
The literature presents
relationship between non-compliant
medication-taking behavior and education.
Neely and Patrick (20)
report no association between non-compliance and education as does
Marston (19).
But, Schwartz (22) reports that non-compliance is
inversely related to education, i.e., as level of education increases,
the number of medication-taking errors tend to decrease.
The existing evidence relating non-compliance to marital status
reveals that there is little or no relationship between them (19, 20).
The general impression which one forms after reviewing the literature on sociodemographic variables is that they have rarely been
helpful in the prediction of non-compliant medication-taking behavior.
3.3
Characteristics of Disease Severity and Their Relation to
Medication-Taking Behavior
The fact that duration of disease does affect the level of com-
munication regarding the condition would suggest that both selfeducation over time, as well as repeated instructions derived from the
physician, are influential jn the development of patient knowledge.
However, knowledge per se regarding one's illness and its
does not necessarily lead to compliance.
treatm~nt
Evidence does exist which
supports the contention thqt there is a positive association between
knowledge regarding diabetes and the degree of patient compliance (24).
Neely and Patrick (20) however, report no association between noncompliance and the duration of the
illnt-~ss.
22
Another factor commonly associated with non-compliant behavior is
the extent to which the condition interferes with the patient's performing daily activities.
Davis (4) reports that the greater the
debilitating effect of the condition on performing daily activities,
the more pronounced is the
p~tient's
non-compliance.
However,
Donabedian and Rosenfeld report that severe illness is associated with
decreased non-compliance (6).
Schwartz (22) has studied the effect of the number of other concurrent conditions on compliance and reports that patients with more
diagnoses are more likely to commit more medication errors.
This is
possibly due to the fact that a patient with a number of secondary
diagnoses is generally a sicker individual and will usually be involved
with a greater number of medications.
3.4
Attitudinal Characteristics of Patients and Their Relation to
Medication-Taking Behavior
To date, very little research has been done in this area and a
considerable effort is still required in the area of developing and
perfecting valid and reliable attitude instruments.
In reporting on
how patients viewed their illnesses and their associated non-compliance,
Davis (4) found that there was a tendency for persons with negative
attitudes toward their illness to be slightly more non-compliant.
Several investigators have emphasized the importance of physicianpatient communication and its relation to non-compliance.
It is
generally reported that when physicians fail to clearly convey the
significance of a regimen to the patient, there is a reciprocal
failure on the part of the patient to comply (5).
1
<
i
j
23
3.5
Professional Characteristics of Physicians and Their Relation to
Medication-Taking Behavior
The patterns. of prescription drug use and misuse observed in a
population may also be significantly related to various characteristics
of the conununity·' s physicians.
It has already been pointed out that
non-compliance tends to increase as the number of medications that a
patient currently has prescribed for him increase.
In a National
Health Service survey of physicians in England and Wales, it was
reported that prescribing rates declined as the physician's age
increased (21).
Based upon this report, the patients of older physi-
cians should have fewer currently prescribed medications and accordingly,
fewer medication-taking errors.
Becker and Stolley studied the pre-
scribing behavior of primary care physicians and determined that among
the factors associated with good prescribing were youth, recency of
graduation, a high rating of training in therapeutics, more postgraduate training, and a skeptical attitude toward the pharmaceutical
industry (21).
The individual styles of consultation which therapists use are
also thought to differentially influence a patient's response to his
treatment regimen.
Blackwell indicates that the physician's re 1 tion
to the patient and the manner in which he explains treatment can have
an effect on non-compliance (3).
~~rston
reports that non-compliance
is inversely related with the strength of the doctor-patient rel,tllonship (19).
It has also bC-:"n reported that non-compliance is lower in
private practices than in clinics (3).
Having briefly described the goals and research design of the
portion of the AAFP-UNC study with which this dissertltion Is concerneu
24
and having developed a foundation upon which a number of potential
associations may be based relating the constellation of indicator case
independent variables with non-compliant medication-taking behavior,
it remains to determine empirically which, if any, of these factors
may be useful in the prediction of our response variable.
ing chapters are devoted to this task.
The remain-
CHAPTER IV
METHODS OF STATISTICAL ANALYSIS
4.1
The General Theory of Multiple Linear Regression
In general, a major objective of many statistical investigations
is to establish functional relationships which make it possible to
predict a response, or dependent variable, Y, in terms of other independent variables Xl' X2 , . • . ,
~.
Formally, if we are given the joint distribution of a set of
random variables, say Y and Xl' X , . . • ,
2
variables are
kno~m
~,
to assume the values xl'
where the independent
~,
0
0
0,
Xk'
the basic
problem is that of determining the conditional expectation of Y given
Xl
= xl' X2 = x2 ' .
0
.,
~
=
~, i.e., E(Ylx l , x 2 ' . • . , ~).
If
the joint density of this set of random variables is known, the task
is one of finding the conditional density 0(Ylx l , x ' . . . , ~) and
2
then, for the continuous case, evaluating the integral
E(Y/x l , x 2 '
0
•
0'
~) = ~y
. 0(ylx l , x 2 ' •
0
.,
~)dyo
However, if we are not given the joint density of the random variables
involved, the determination of E(Ylx ,
l
..
X') ,
0
0
0,
~)
becomes a
problem of estimation b.:lsed on sample r,<lta.
If we write the regression equation in the form
where SO' Sl' .
0
.,
Sk are parameters (regression coefficients) the
Method of Least Squares allows one to estimate the (k + 1) parameters
26
in the regression model without making any assumptions about the joint
distribution of the random variables involved.
~
The shortcoming of this
technique is that without some distributional assumption one cannot
judge the "correctness" of the estimates obtained.
One practice receiving considerable attention is to assume that,
for data from a sample of size n, the observed x's are constants and
the y's are values of mutually uncorrelated random variables having
conditional densities which are assumed to be normally distributed,
i.e. ,
for
-~
< Yi
<~,
where 8 , 8 , • • • , 8 and a do not depend upon i.
0
1
k
It can then be shown that the resulting estimates
80 , Bl ,
••• ,
Bk
of SO' 8 , • • • , 8 are linear combinations of the random variable
1
k
Y ; since linear combinations of independent normal random variables
i
'"
'"
are themselves normally distributed, B
O' Bl ,
normal distributions.
., '"Bk also have
By using this theorem and certain others, one
is able to test various hypotheses concerning the parameters
SO' 81 , . . . , Sk'
In a number of practical situations however, it is readily
apparent that the assumption relating to the normality of the response
variable is clearly untenable.
Here, once again, without some other
distributional assumption about the joint distribution of the random
variables involved, we cannot perform valid tests of hypotheses
concerning the parameters included in our regression model.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _1
27
4.2
An Alternative Assumption for the Distribution of the Response
Variable
If we represent the parameter A of a Poisson distribution as
for k properly chosen indepenrient variables Xl' X , . . . ,
2
~,
then
"Poisson regression analysis" assumes that, for a given value of A, the
associated distribution of the response variable is a Poisson distri-bution.
By further assuming that the parameter A follows a gamma
distribution, the resulting unconditional compound distribution of the
response is represented by the negative binomial distribution.
One of the earliest applications of the negative binomial dlstribution resulting from the compounding of a Poisson process and a gamma
distribution was as an accident frequency model proposed by Greenwood
and Yule in 1920 (10).
Based upon this earlier work, a number of
authors (1, 2, 7) have reported deriving various bivariate accident
distribution models, differing in various underlying assumptions, which
can be used to obtain distributions of future accidents conditioned
upon the number of past accidents.
Gillings (9) has also recently
shown this compound distribution to be of value in representing the
patterns of utilization for various health services.
4.3
Negative Binomial Distribution
Assume that the number of errors that an individual makes in
taking his prescription medications is a Poisson process, i.e., for
a fixed value of A, the probability of his experiencing y medic.atlon-
28
taking errors during a time interval of length t is given by
P(y , t I~)
(4.3.1)
•
e
-(~t) (~t)Y
y!
' y • 0, 1, 2, • • .,
~ >
0,
t
>
O.
For the purpose of simplifying the following development, let t • 1 in
equation (4. 3.1), 1. e. ,
(4.3.2)
p(YIA)"
e-~ ~y
y!
y .. 0, 1, 2, . . . ,
>. > O.
Further assume that the population under study is composed of individuals with differing degrees of "proneness" toward making medicationtaking errors and that this proneness is represented by different values
of >..
Presume that the distribution of >. in the population is repre-
sen ted by
>.a-l e->'/S
(4.3.3)
g(A) .. ~-=---
Sa rea)
where >. > 0, a
>
0, and S
>
O.
The resulting unconditional distribution representing the y medication-taking errors experienced by an individual is
(4.3.4)
or,
(4.3.5)
29
for y • 0, 1, 2, • • • , a > 0,
a>
0;
This expression can be recognized
as the negative binomial distribution (16).
Various difficulties present themselves if we attempt to work with
the distribution of y written in this form.
The evaluation of the
combinatorial term is especially difficult for non-integer values of a.
The following demonstrates a more manageable method for obtaining
the negative binomial probabilities associated with the y = 0, 1. 2 •
• , medication-taking errors.
Note that
J
00
a-I
A
e
-A/a
e
-A
e
AZ
"'---~--"'---':;;'-
dA
aa rea)
D
can be equivalently expressed as
f
OO a-I
A
e
-A/a
e
-).
co
r.
(AZ)y dA
y=o y!
a
D a r (a)
or, as
00
(4.3.6)
E
JOO a-I
A
y=O a
e
a
-A/a
a r (a)
e
-A
y
~ Zy dA
y'
.
since it is legitimate here to switch integral and summation signs.
We recognize this above expression as being the sunnnation over y
of (4.3.4) multiplied "by
zY.
Recall that expression (4.3.4) represents
a form of the probability density function for the
distribution.
ne~ative
binomial
Thus. expression (4.3.6) is actually a probability
30
generating function, which can be written in closed form as
(4.3.7)
P(Z)· ( 5
1
+ 1
)a (1 -
5Z
5+ 1
)-a
The coefficient of ZY resulting from the expansion of (4.3.7) is
the corresponding negative binomial probability associated with y
medication-taking error.
P(Z)
=(
1
B+ 1
th
For example,if.we expand (4.3.7) we obtain
)a {(l) ZO +
Ia(
+ 1) ( a )2J z2
L B +a 1 )J zl +fa(a
[2!
B+ 1
+ . . .}
+ . . . .
In this above expression, the coefficient of
zY
is
the corres-
ponding negative binomial probability for the yth medication-taking
error.
31
The mean of the negative binond.al distribution given by (4.3.5) is
and the variance is
Equating the population parameters with their respective sample estimates produces the equations
x = aB
and
S2
= aB(l + B).
Solving these two simultaneous equations yields the estimators
~
(4.3.8)
-2
X
a"'--s2 - X
and
"
s2 - X
B =--=
(4.3.9)
X
Since, for the negative binomial distribution,
of a and
4.4
0 2 > ~,
the estimates
B should, in almost all cases, be greater than zero.
The Geometric Distribution as a Special Case of the Negative
Binomial Distribution
If one makes the assumption that the number of medication-taking
errors experienced by an individual is a Poisson proce$s, and if the
distribution of A in the
a
= I,
pop~lation
is a gamma distribution, but with
a special case of the negative binomial distrib1ition, namely,
the geometric distribution, arises via the compounding mechanism.
32
In particular, when a • 1 in expression (4.3.5), the unconditional
distribution for Y ... 0, 1, 2, . • • , medication-taking errors is given
by
(4.4.1)
where S >
o.
Expression (4.4.1) can be recognized as a geometric distribution
of the form p(l-p)Y with p •
e~
1
(16).
An advantage of representing the response variable as a geometric
distribution rather than as a negative binomial distribution is that the
former is characterized by only a single parameter whereas the latter
requires two parameters.
It can be shown that the maximum likelihood estimator of
e
in
(4.4.1) is
n
E
(4.4.2)
'"
S
=
i-I
Yi
n
4.5
Estimation of the Parameter A
Given that the distribution of the response variable is well
represented by either the negative binomial distribution or the geometric distribution, it remains to estimate the parameter A of the
Poisson process.
Jorgenson (17) and Weber (25) have investigated
the situation where A is expressed as a linear combination of k
~
33
~
properly chosen independent variables, i.e. ,
If we hypothesize the model
(4.5.1)
for j = 1, 2, • • . , n subjects, then, the probability that the j
individual with independent variable values X1j
~j
•.• ,
=
~j'
(4.5.2)
l:
i=O
f\ x..
1J
= xlj ' X2j = x 2j '
will have Yj medication-taking errors is given by
p(Y
k
where
j
IA j }
= ------,------
Yr
> 0, y. = 0, 1, 2, . . • •
J
For a sample of size n, the likelihood function is
k
(4.5.3)
th
L =
~
j=l
':.
k
j
Y
j
j
[_e_xp
__
(-_i_=_O_S_i_Xi__)_{i_:_O_Bi_X._i_)_]
Y !
j
34
taking the natural logarithm of L we obtain
n
t -
E
j-l
n
- 1:
j=l
(4.5.4)
k
- 1:
{
i-O
... -
Differentiating t with respect to ai' i ... O. 1, . • . , k, we obtain
n
.. - 1:
j-l
x
ij
On setting the k + 1 partials equal to ?ocro, the system of normal
equations obtained is
n
(4.5.5)
E
j=1
x
ij
35
Having assumed the model given in (4.5.1) we can also express it
in the form
(4.5.6)
where
~
is an (n x 1) vector containing the number of medicationtaking errors corresponding to the j = 1, 2, . . • , n
patients
X is an ~ x (k + l~ matrix containing the values of the k
independent variables corresponding to the j - 1, 2,
. . • , n patients
and
~
is a
~k
+ 1) x 1] vector of parameters.
Assuming that the y.'s are independent, we have that
J
where
V is an(n x n) diagonal matrix with elements
We note that the elements of V, i.e., the variances of the observed
responses, are not equal due to their dependence upon the values of xl]'
x 2j ' . • .,
~j'
It is a well known result that the weighted least squares minimum
variance unbiased estimator for
~
is
36
Unfortunately, since
~
is unknown, the matrix
(!Ci~)
V ..
(~i~)
0
0
(X' B)
~n~
is also unknown.
Thus, the problem becomes one of either obtaining an
estimate of V which will, in turn, provide us with an estimate of
we must attempt to estimate
~
by some other means.
discusses two alternative methods of estimating
~
~
or
Jorgenson (17)
and reports on some of
the properties of the resulting estimates.
One of the methods suggested for selecting an estimate of
means of the Newton-Raphson iterative procedure.
@is by
This method is one of
a growing number of iterative Maximum Likelihood techniques in use
today.
The essential rationale of the method of maximum likelihood is
that, for a given set of observed sample values xl' x ' • • • , x ' the
2
n
probability density function g(x , x ' • • • , X ) is a statement conl
2
n
cerning the a-priori probability of having obtained that specific
observed sample, given its parent population and its corresponding
parameters.
With xl' x 2 ' • . • , x now being constants, g(x , x '
l
n
2
• • . , x ) is a function of the parameters
n
el , e2 ,
population cumulative distribution function.
mathematically to find those values
et,
~~
..
.•. ,
e
k
of the
It then makes sense
., Ok
which maximize
this function g(x l , x 2 ' . . • , x ) and consider them to be a reasonable
n
choice for the estimates of
aI'
62 , • . . ,
ak .
For our problem, expression (4.5.4) corresponds to the function
.
.,
X )
n
mentioned above.
The Newton-Raphson procedure
37
will iteratively attempt to select those estimates of
BO' B1, . . • , Bk
which satisfy the system of equations given in (4.5.5).
O' B"*l ,
"...
are B
These estimates
"* , i.e., those values which maximize (4 .5. 4) i f both
.•• , B
k
of the following conditions are true
(1)
- 0
(2)
for i - 0, 1, . . . , k.
a2 J!.
aBOB k
0
>
a2 J!.
aB k2
when evaluated at 8~,
at, ... ,
~~.
Jorgenson (17) points out that by using the unweighted least
squares estimates of the regression coefficients as the initial values
for the iterative procedure the resulting vector ~*'
= (St, . . . ,
will be best asymptotically normal (BAN).
Given that ~* satisfies the necessary and sufficient conditions
stated above, we then have
(4.5.7)
A.
J
=
(x'S*).
-j -
e~)
38
A second suggested method is an iterative weighted least squares
approach.
If we let V*(m) denote the estimate of V obtained on the mth
iteration, then the corresponding estimate of ~ can be designated as
(4.5.8)
On the first iteration, let V*(o) be the (n x n) identity matrix
and define
..
(4.5.9)
., x'e*
]
~n~ (m) •
By allowing V*(o) to be In' the initial estimates, ~*(o)' are the
unweighted least squares estimates, i.e.,
=
(X'X)-l XI~
The iterative process should be continued until convergence is
realized, i.e., ~*(m+l)
= ~*(m)'
Jorgenson (17) has shown that conver-
gence will occur only if V*(m) and (X'V*(~)X)-l are positive definite
for all m.
Denote the final vector of esti~1ted parameters by
(4.5.10)
where V* is the corresponding estimated variance-covariance matrix.
39
As our final estimate of "A we use
j
(4.5.11)
Since the amount of effort involved in computing successive solutions for (4.5.8) can become prohibitive, an equivalent but more direct
computational route can be attained by applying the method of unweighted least squares to the original data after it has been transformed,
or weighted.
The weights are the standard deviations of the predicted
th
responses resulting from the m iteration.
Given the model in (4.5.6), we can form the transformed model
WA
WX§
where
( '0* )-~
~2~ (m)
,. '
W = diag[-(x'S* )-!2,
·~l- (m)
The resulting system of. normal equations
k
n
x ij (Yj -
L
L
i=O
8!(m+l)Xij )
=
k
j=l
L
i=O
0
B!(m)xij
can be shown to readily reduce to those in expression (4.5.4) for all
i
a*
-- 0 , 1 , . . . , k whell 1Ji(m+1)
= lJi(m)
B* .
40
Thus, all of the desired calculations may be carried out by using
th
the appropriate weights from the m iteration to transform the original observations which may then be processed through an ordinary
unweighted least squares regression program for the (m + 1)
th
iteration.
Jorgenson (17) points out that the final vector of estimated
coefficients resulting from the iterative weighted least squares
tech~
nique and the final vector of estimated coefficients produced by the
iterative maximum likelihood procedure will be identical given that
the initial parameter values used for both techniques were the
unweighted least squares estimates.
4.6
The Selection of Independent Variables
Consider the situation where there exists one dependent variable
and a large number of independent variables, and suppose that it is
desired to determine which, if any, of the independent variables are
useful as predictors of the dependent variable.
A review of the liter-
ature reporting on various aspects of multiple Poisson regression
reveals that no one has yet to develop a method for the problem of
properly selecting a set of k independent variables from this larger
set of independent variables in the absence of a-priori knowledge
completely specifying the model to be investigated.
The method for
independent variable selection proposed in this dissertation does not
represent a rigorous investigation of this problem, but rather, a set
of basic ideas resulting in a workable solution.
An ideal approach would be to consider all possible regression
models, obtain the proper weights by means of one of the previously
mentioned iterative procedures and then produce the appropriate sums
41
of squares to evaluate each weighted regression model in relation to
some predetermined criterion.
However, when a large number of inde-
pendent variables are of initial interest, say q independent variables,
the number of possible models is 2
hibitively large quite rapidly.
q
- 1, a number which becomes pro-
For example, if there were ten inde-
pendent variables of interest initially, the number of all possible
regressions is 2 10 - 1 ; 1,023.
Added to this burden is the iterative
search for the proper estimates of the regression coefficients thus
requiring an inordinate amount of expensive computer processing.
Given the above set of q independent variables, the approach
proposed by this dissertation is to initially select a set of r
~
q
independent variables which appear to be strongly related to the
response variable.
This initial variable selection procedure can be
accomplished by considering the correlations existing between the
response variable and the various independent variables.
Rather than
establishing a rigid criterion to delineate which independent variables will be selected, it is proposed that the researcher be required
to make this decision based upon his judgment as well as tests of
statistical significance.
At this point, having chosen a set of r independent variables
for further study, one may begin to formulate all possible models and
critically compare them to determine which of the r variables should
be retained to constitute the set of k independent variables chosen
to predict the parameter A.
For each possible model of interest, one should
fir~t
obtuIn
vector of estimated regression coefficients, ~*, which provide a
th~
42
solution to the system of maximum likelihood normal equations and
which satisfy the necessary and sufficient conditions required to
verify that they represent a relative maxima on the likelihood
surface.
Given these estimated coefficients, one can then transform
the original data by means of the weights
and through the use of an unweighted least squares regression program,
produce the appropriate weighted sums of squares.
One outgrowth of
computing the above sums of squares by means of weighting the original
data and then processing it through one of the available unweighted
least squares regression programs is that the resulting vector of
regression coefficients
A
will be identical to ~* if the weights, i.e., the solutions to
(4.5.5) are correct.
Having postulated a number of models which are of interest,
suitable criteria must be developed to judge the adequacy of each
model.
Recalling the assumption that the j th individual's medication-
taking behavior is represented by a Poisson process, we have the
constraint that Aj > 0 for all j
= 1,
2, . . • , n.
This constraint
may be such that the solution for (4.5.5) lies outside of the defined
region making it impossible for an iterative maximum likelihood
routine to obtain convergence.
Similarly, Jorgenson (17) states
43
that the iterative weighted least squares estimation procedure will
converge only if V*(m) is positive definite for all m.
~* (m+l) = diag [ ~i~* (m)' ~i~* (m)'
. • .,
Since
~~~* (m) ]
or
V\m+l)
= diag
[Xl(m)' X2 (m)' • . . ,
this condition no longer holds if any Xj(m) < O.
~n(m)J
'
Thus, any candidate
model for which a final vector of estimated regression coefficients
cannot be obtained by means of standard procedures should not, most
likely, be given prime consideration.
Another logical criterion would be that one require the coefficient of each independent variable included in the model to be significant at some predetermined level of significane.
Work done by
Wald (23) provides a theoretical basis for testing hypotheses of the
form
where
H is known [h x (k + 1)] matrix of rank h
and
6 is a specified vector of constants.
<
k + 1
44
The appropriate test statistic is
(4.6.1)
which is asymptotically distributed as a Chi-square with degrees of
freedom equal to the rank of H.
When comparing different models composed of the same set of independent variables but differing in form, e.g.; quadratic terms or
cross product terms, by correcting the weighted total sum of squares
for the pure error existing in the data it is possible to produce a
measure of the maximum attainable R2.
The proportion of this maximum
possible R2 explained by a model is another criterion for judging the
appropriateness of candidate models.
One can also judge the adequacy of various models by noting how
well they predict the ovserved data.
By using some or all of these proposed criteria plus one's own
professional judgment, it should be possible to choose those k independent variables which appear to be most useful in predicting a
patient's medication-taking behavior.
CHAPTER V
CHARACTERISTICS OF THE INDICATOR CASE SAMPLE
5.1
Description of the Patient Sample
Through the 46 physicians who agreed to participate in this study,
a total of 372 patients were initially inclined to participate and
were judged to be eligible.
Of this original group, 15 persons were
excluded because either medical record data were unavailable, the
physician did not complete the questionaire on current medications,
or the patient refused to show his medications to the nurse-interviewer (14).
final sample.
As a result, only 357 patients were included in the
Of these 357 patients, 123 were enrolled as congestive
heart failure patients and 234 had diabetes mellitus as their indicator
condition.
To be included in the portion of the study dealing with errors
relating to medication-taking behavior two additional criteria were
added.
It was felt that in order to qualify for this portion of the
study each patient should currently have at least one drug being prescribed for him by his physician, i.e., (a + b )
~
1, and each
patient should also currently be taking at least one prescription
medication, i.e., (a + c)
,
~
1.
Fourteen
per~ons
with diabetes mellitus
did not satisfy these conditions and three patients with congestive
heart failure failed to qualify.
In addition, one cDngestive heart
failure patient was not included due to missing
info~mation
concerning
46
the duration of his condition and another congestive heart failure
patient was excluded due to his being involved with what was considered
to be an excessive number of medications when compared with the other
patients. In summary, of the 234 patients initially enrolled with
diabetes mellitus as their jndicator condition only 220 were retained
for the medication-taking behavior portion of the study.
Similarly,
of the 123 persons initially enrolled as congestive heart failure
patients, the number finally included was 118.
5.2
Combining of the DM and CHF Indicator Case Samples
For the following reasons, the diabetes mellitus sample of 220
patients and the 118 congestive heart failure patients were combined
into a single sample of 338 patients.
It has been previously noted
that although each patient was assigned a primary indicator case
diagnosis it was also possible for them to have additional chronic
conditions.
For the two indicator conditions of interest to this
dissertation, the number of patients with a primary diagnosis of
diabetes mellitus but a secondary diagnosis of congestive heart failure
and vice versa was considered to be of substantial proportions to
suggest combining the two diagnostic groups.
Secondly, it was felt
that any results which could apply to both groups simultaneously
would be of more clinical value than results which were specific to
either one of the two diagnostic groups.
In later chapters it will
also be demonstrated that, for both number of omissions and number
of commissions, the form of the
distributio~
representing the response
variable is the same for each diagnostic group.
This finding removes
many of the statistically based objections to combinlng the two
~
47
samples.
In view of the fact that we would be combining the diabetic
and congestive heart failure samples, each independent variable was
reviewed to determine whether its meaning was the same for both groups
of patients.
None of the previously discussed independent variables
were excluded.
However, it was felt that the addition of a dichot-
omous variable indicating the primary indicator condition of each
patient was required.
5.3
Selected Sociodemographic Characteristics of the Combined
DM-CHF Sample
In the following tables we shall present selected sociodemo-
graphic indices which are felt to demonstrate the composition of the
DM-CHF combined sample.
Table 5.1 represents the distribution of males and females for
this sample.
Table 5.1.
Distribution of Males and Females
Sex
Frequency
Percent
Males
188
Females
150
56
44
Total
338
100
As one can see, the representation of males is slightly higher
than for females.
48
The distribution of Marital Status for the sample is presented
in Table 5.2.
Here we observe that considerably more patients are
married than unmarried.
Table 5.2.
Distribution of Marital Status
Marital Status
Frequency
Percent
Unmarried
79
23
Married
259
77
Total
338
100
.In Table 5.3 we observe the number of individuals falling into
each of the categories of Education.
Table 5.3.
Distribution of Education
Education
Frequency
Percent
College graduate
20
6
Partial college
43
13
High school graduate
113
33
Partial high school
58
17
Jr. high school or less
104
31
Total
338
100
Approximately 19 percent of the sample had attended college at
one time and 33 percent were at least high school graduates.
Of the
remaining 48 percent who did not complete high school, 17 percent
had completed their education at least beyond the niitth grade while
31 percent of the patients had an educational level nf ninth grade
or less.
49
Social Class as represented by the Hollingshead scale is presented
in Table 5.4.
Table 5.4.
Distribution of Social Class
Social Class
.Frequency
Percent
I & II
30
9
III
66
19
IV
172
51
70
21
338
100
V
Total
For the DM-CHF sample, Classes I and II, representing the upper
and upper middle social classes, comprise approximately nine percent
of the total sample.
The lower middle social class, Class III,
represents 19 percent of the total sample and Class IV, the working
class, accounts for 51 percent of the sample.
The remaining 21
percent is included in Class V, the lower social class.
Hollingshead
and Redlich (11) present a series of percentages which represent
that proportion of the New Haven, Connecticut population estimated
to be included within each defined category of social class.
figures quoted are, for Classes I and II through Class V
11 percent, 20 percent, 50 percent, and 19 percent.
The
respect~vely,
Comparing the
percentages observed for the combined patient sample with those
presented by Hollingshead and Redlich reveals that the distributions
are almost identical.
50
Table 5.5.
Mean and Standard Errors for Patient Age and Variables
Indicative of Disease Severity
Independent Variable
Patient age
Number of concurrent
conditions
Duration of disease
S.E.M.
56.4
± 0.5
1.9
2.9
+ 0.1
+ 0.2
Having reported on the development of the theory underlying the
method of multiple Poisson regression analysis and after having
briefly described some of the sociodemographic characteristics of
the patient sample, the remaining chapters shall present and discuss
the results ensuing from this statistical approach.
CHAPTER VI
A MODEL TO PRF.DICT NUMBER OF COMMISSIONS
6.1
The Definition and Distributional Form of Number of Commissions
The number of commissions, designated by the letter "c", is the
number of drugs which a patient is currently consuming of which his
physician is unaware.
This type of error is felt to depict one
aspect of medication-taking behavior which affects the quality of
medical care received by an individual.
In this chapter we shall
discuss various results emanating from efforts to describe such
medication-taking behavior.
We recall that one of the principle assumptions of Poisson
regression is that the distribution of the response variable be well
represented by the negative binomial distribution.
In Chapter V it
was revealed that, for certain clinical reasons, it would be desirable
to combine the diabetic and congestive heart failure patient samples.
We shall demonstrate that the observed distributions of the number of
commissions for both indicator condition groups are of the same form,
and that it is appropriate to propose that they be combined to form
one patient sample.
Based upon this proposition, it will then be
demonstrated that the distribution of the number of commissions for
the combined indicator condition patient group is
negative binomial distribution.
r~presented
by the
52
Table 6.1 presents the observed number of commissions and their
associated frequencies of occurrence for the diabetic patient sample,
the congestive heart failure patient sample, and the combined patient
sample.
Table 6.1.
The Observed Distribution of the Number of Commissions for
Patients Included in the Study of Medication-Taking
Behavior
CHF
DM
Combined
Relative
Relative
Number of
Relative
Frequency
Commissions Frequency Frequency
Frequency Frequency Frequency
0
133
0.60
59
0.50
192
0.57
1
51
0.23
23
0.19
74
0.22
2
23
0.10
16
0.14
39
0.11
3
9
0.04
11
0.09
20
0.06
4
1
0.01
5
0.04
6
0.02
5
1
0.01
4
0.04
5
0.01
6
2
0.01
0
0.00
2
0.01
220
1.00
118
1.00
338
1.00
Total Number
of Patients
It should be noted from inspection of the above distributions of
relative frequencies, that each distrlbution is quite positively
skewed.
Also, number of commissions is a discrete random variable.
From the sample data, for those patients with diabetes mellitus,
the values estimated for the parameters a and 8 are
ex
=:
a..
0.5983
1.0636.
The frequencies associated with c
= 4,
5 and 6 uere combined into
one category, 4+• to assure a reasonable expected frequency for that
cell.
53
Table 6.2 presents the information pertinent to the investigation
of the null hypothesis that the observed distribution of the number of
commissions is a negative binomial distribution.
Table 6.2.
The Fitting of a Negative Binomial Distribution to the
Observed Distribution of the Number of Commissions for
Patients with Diabetes Mellitus
Number of
Commissions
Expected
Frequency
Observed
Frequency
X2
0
133
142.6
0.646
1
51
44.0
1.114
2
23
18.1
1.327
3
4+
9
8.1
0.100
4
3.7
0.024
220
216.5
3.211
Totals
P
r
J
[X2(2) ~ 30211 180
=
0.201
By means of a Chi-square test statistic with, in this case, two
degrees of freedom, we would conclude that we do not have sufficient
evidence to reject the null hypothesis.
Thus, the distribution of the
number of commissions for the diabetic patient sample is adequately
represented by a negative binomial distribution.
The parameter estimates of a and 8 based upon the data collected
from those patients with congestive heart failure are
a = 0.5788
A
8
1.8745.
54
Once again, under the null hypothesis that the observed distribution of the number of commissions for the congestive heart failure
patients is a random sample from a negative binomial population, we
present the appropriate figures in Table 6.3.
Table 6.3.
The Fitting of a Negative Binomial Distribution to the
Observed Distribution of the Number of Commissions for
Patients with Congestive Heart Failure
Number of
Commissions
Observed
Frequency
Expected
Frequency
X2
0
59
64.1
0.406
1
23
24.2
0.060
2
16
12.4
1.045
3
11
7.0
2.286
4
5
4.1
0.198
5
4
2.4
1.067
118
114.2
5.062
Totals
P
r
[X2() ~ S.062!HO]
=
0.167
Here also, on the basis of the test statistic, we cannot reject
the null hypothesis and consequently, conclude that the negative
binomial distribution is an adequate representation for the distribution of the·number of commissions recorded for the CHF sample.
Having demonstrated that the distribution of thB number of commissions for patients witll either diabetes mellitus ur congestive
heart failure can be expressed as tile n~gatlve blnomtal distrJbution.
we propose that, in order to gain a broader clinical interpretation
for any forthcoming results, these two distributions be combined.
55
Using the distribution of the number of commissions associated
with the sample of 338 chronic disease patients, we must now establish
that this distribution is also represented by the negative binomial
distribution.
Based upon the combined sample data, the following are
the estimates for the parameters a and
a=
a.
0.5111
e = 1. 5803.
In Table 6.4 we present, for each level of number of commissions,
the observed frequency of occurrence and the expected frequency of
occurrence based upon an underlying negative binomial distribution
with parameter values Ii and
Table 6.4.
B above.
The Fitting of a Negative Binomial Distribution to the
Observed Distribution of the Number of Commissions for
the Combined Sample
Number of
Commissions
Observed
Frequency
Expected
Frequency
X2
0
192
208.2
1.261
1
74
65.2
1.188
2
39
30.1
2.632
3
20
15.4
1.374
4
6
8.3
0.637
5
5
4.6
0.035
2.6
0.138
334.4
7.265
6
2
----
Totals
338
P
r [ X'<4)
~ 7.2651"0]
= 00122
56
This table demonstrates that the assumption concerning the distribution of the response variable is' satisfied, for the test statistic indicates that we cannot reject the null hypothesis that the
observed distribution of the number of commissions is well represented
by the negative binomial dietribution.
6.2
The Initial Selection of Independent Variables to Predict the
Number of Commissions
Having satisfied the initial assumption regarding the form of the
distribution of the response variable, we now begin the task of considering those independent variables which may prove useful in the
prediction of number of commissions.
The initial set of independent variables consisted of 31 different variables.
Although they have all been previously discussed,
Table 6.5 presents a brief listing.
Table 6.5.
Variable
The Initial Set of Independent Variables Proposed for
Consideration in the Prediction of the Number of
Commissions
Description
(a+b+c)
Total number of drugs involved between the physicianpatient pair.
(a+b)
Total number of drugs currently .prescribed by the physician.
(a+c)
Total number of drugs currently being taken by the patient.
Number of antidiabetic drugs currently being taken.
Xl
X
2
X
3
Number of cardiac drugs currently being taken.
X4
Number of diuretic drugs currently being taken.
X
5
X
6
Xl
Number of eNS drugs currently being taken.
Number of hypotensive drugs currently being taken.
Number of drugu currently beIng Luk(!n
()n~·,!-u-dIJY.
Number of drugs currently prescribed to iDe tuken
(continued)
onCl'-U-
day.
57
Table 6.5.
Continued
Description
Variable
Proportion of the total number of drugs currently being
taken with different schedules.
Proportion of the total number of drugs currently prescribed with different schedules.
Number of drugs currently being taken for which the
patient knows the function.
XII
Patient's age (years).
X
l2
Xl3
Sex.
Marital status.
Xl4
Education.
XIS
X
l6
Social class.
Number of persons in the household.
Duration of disease (years).
Xl7
X
l8
Xl9
Number of other concurrent conditions.
Current activity level.
X
20
X
21
X22
Patient's attitude score toward condition.
. Communication score.
Physician's age (years).
X
23
Type of physician.
X
24
Board certification.
X25
X
26
X
27
Type of practice.
Average number of patient visits per M.D. per day.
X
28
Indicator condition.
Length of physician-patient relationship.
The form of our model is
E(C)
where
C
is an (n x 1) vector containing the number of commissions
for the jth individual within the combined p~tient sample.
58
X is an [n.x (k + 1)] matrix which will be composed of those
k independent variables deemed to be useful in predicting
number of commissions plus a variable identically equal to
one for each patient.
and
is a [(k + 1) x 1] vector contain~ng the estimated regression
coefficientscorrespcnding to the k + 1 columns in the matrix
x.
As previously indicated, the first phase of independent variable
selection will involve choosing that set of r
~
31 variables which
appear to be most strongly related to our response variable.
Previous
investigations had indicated that, by far, the strongest association
existing between the response and any of the independent variables was
with (a + b + c).
Also, many of the candidate independent variables
tend to be conditioned upon some subset of (a + b + c), i.e., either
(a + b) or (a + c).
This is especially true for those variables
intended to reflect various elements of a patient's medication regimen.
In light of this, it was felt that (a + b + c) should be included in
any set of independent variables proposed as being useful in the
prediction of number of commissions.
It was further concluded that
it would be more appropriate to condition the inclusion of additional
independent variables upon the presence of (a + b + c).
However, the
selection of the variable (a + b + c) brings about a certain definitional restriction for any model in which it appears.
Two of the
criteria established for use in patient enrollment for the medicationtaking behavior study were that 1) all patients must currently be
,
taking at least one prescription medication and 2) all patients must
currently have at least one drug prescribed for them by their
59
physician.
Thus, (a + b)
~
1 and (a + c)
~
1.
The situation where
(a + b + c) • 0, Le., there were no prescription drugs involved
between the physician and patient, does not exist for this data.
However, when (a + b + c) • 1, implying that there was one prescription medication involved between the physican and patient, by definition, this medication must be of the type designated by the letter "a".
Consequently, when (a + b + c)
= 1,
any model attempting to predict
number of commissions is entirely deterministic since c will always
be equal to zero.
To circumvent this problem, the lower limit for
the number of drugs involved between the patient and his physician
was set at (a + b + c)
~
2.
This resulted in a loss of 33 patients
from the combined sample leaving a total of 305 subjects.
At the
same time, in order to assure an adequate number of responses, those
patients with values of (a + b + c) > 9 were combined into one group
labeled (a + b + c)
= 9+ •
Having made these changes, the first phase of independent variable selection was carried out by calculating the partial correlation
coefficients for all of the candidate independent variables and the
number of commissions, given (a + b + c).
On the basis of these partial correlations, three additional
independent variables appear to be suggestive of a potential for
predicting number of commissions.
Table 6.6 presentR these three
independent variables and their respective partial correlations
with the number of commissions, given (a + b + c).
60
Table 6.6.
Independent Variables in Addition to (a + b + c)
Initially Selected to Predict Number of Commissions
Partial Correlation
Coefficientl (a + b + c)
Independent Variables
Selected
-0.336
X I(a + b + c)
7
XU' (a + b + c)
Xg I (a + b + c)
6.3
-0.273
-0.226
Estimation of Regression Coefficients by Means of the Method of
Maximum Likelihood
As previously discussed, maximum likelihood estimation is one
technique available for producing a solution to the set of normal
equations given in (4.5.5).
The particular maximum likelihood tech-
nique used for this research is a computer subroutine package developed
by Kaplan and Elston (18) named MAXLIK.
Briefly, given that the log
likelihood under a particular statistical model can be written, MAXLIK
will search the likelihood surface to find maximum likelihood estimates
of the parameters and then compute an estimate of their asymptotic
variance-covariance matrix.
A set of initial parameter estimates must
be supplied which, in our case, will be the unweighted least squares
estimates of the parameters.
of three basic components.
The search procedure utilized consists
Given that we desire to estimate k param-
eters, the first procedure initiated is a direct search of the likelihood surface.
If the direct search of the likelihood surface converges
k
to a solution for (4.5.5), then a 2 -dimensional
estimates at convergence i6 performed.
sear~h
around the
k
If this 2 -dimensional search
confirms that the estimates do produce convergence, the asymptotic
variance-covariance matrix of the estimates is comput,'"d.
Using this
1
1
1
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _t
61
estimated variance-covariance matrix, one iteration of the NewtonRaphscn method is performed.
If this one iteration of the Newton-
Raphson method produces a significant improvement in the estimates,
further Newton-Raphson iterations, without recomputing the second
derivatives of the log likelihood, are performed.
The user is noti-
fied of the result following the execution of each iteration of the
above search techniques.
By using the solutions from MAXLIK to
properly "weight" the original data, one can also further verify the
stability of the estimates by performing an unweighted least squares
regression analysis on this transformed data.
As mentioned in Section
4.6, the resulting vector of regression coefficients should be identical to the maximum likelihood solutions.
6.4
A Model to Predict Number of Commissions
Given the set of r
=4
independent variables
chosen during the first phase of variable selection, our task is now
to choose those k < r independent variables which best predict number
of commissions.
This selection process will proceed along the lines
established in Section 4.6.
First, each of the above independent variables was considered
I
separately to determine which was the single best predictor of number
of commissions.
Af te,' cofuputing the appropriate l3 weights, the single
independent variable which appeared to explain the largest relative
proportion of the variability in the data was (a + b
as expected.
I-
c).
This was
62
In addition to the above single variable models, models comprised
of a11 possible combinations of two or more of the four selected independent variables were investigated.
However, our efforts to include
as many of the r independent variables as possible into one predictive
model were not very successful.
Without exception, each of the multi-
ple independent variable models investigated produced negative ~'s.
It is recalled that we are operating under the constraint that A > 0
j
for all j
= 1,
A
2, • • • , n subjects.
If Aj
~
0, this complication
results in the maximum likelihood procedure's inability to converge
to a solution for the normal equations.
Also, any model which predicts
A
a value of A < 0 is, in an interpretive sense, predicting that an
j
individual's proneness toward making a medication-taking error is less
than zero.
Since we have defined the problem in such a manner that
"proneness" may only assume a positive value, any model which predicts
a negative proneness is, by definition, inappropriate.
For these
models, the number of cases where negative estimates of the parameter
A were predicted was considered to be too numerous to attempt to
develop a general solution to this problem.
Since we were unable, by standard procedures, to produce the
appropriate eStimates of the
a weights
for any of the multiple inde-
pendent variable models which we had proposed, we began to investigate
the usefulness and appropriateness of a single variable model predicting number of commissions in terms of (a + b + c), i.e., the total
number of drugs involved between the physican and his patient.
A
review of the literature on variables affecting patient non-compliance
revealed that there is a positive association between the number of
63
medication-taking errors which a patient makes and the total number
of medications with which he is involved.
This appears to be true
when the type of medication-taking error is defined to be a commission.
Table 6.7 presents the mean number of commissions associated with
each level of (a + b + c).
Table 6.7.
(a
Mean Number of Commissions Given the Total Number of
Drugs Involved Between the Physician and Patient
+
+
b
c)
Frequency
Mean Number
of Commissions
2
58
0.2069
3
55
0.3455
4
47
0.6383
5
43
0.8372
6
34
1.2647
7
22
1.4091
8
18
2.0000
9+
28
2.3929
305
0.8107
Total
As the number of drugs involved between the doctor-patient pair
increases, we note a corresponding increase in the mean number of
commissions.
of
Figure 1 is a graphic presentation of the mean number
co~issions
versus (a + b + c).
64
Figure 1.
Mean Number of Commissions Versus the Total Number of Drugs
Involved Between a Physician and Patient, (a + b + c)
3.0
o
o
0.0
I
2
3
4
6
5
(<I
+ b + <:)
7.
8
9+
65
Since we will need the unweighted least squares estimates of the
paramecers for any model which we shall choose to investigate further,
it seems logical to begin some preliminary investigations at the unweighted least squares stage.
After the consideration of a number
of models representing variuJs forms of (a + b + c), e.g., quadratic
terms and cubic terms, either regressed through the origin or with an
intercept term included, two models appeared to be the best selections
for further investigation.
One consisted of a linear model with an intercept term,
(6.4.1)
and, the other was a quadratic model with an intercept term,
(6.4.2)
For the model expressed in (6.4.1), the estimated vector of unweighted least squares regression coefficients was
-0.5359 ]
A
~*(O)
For (6.4.2) this vector
==
[
==
[
0.3042
.
wa~
~*(O)
0.1247 ]
_0.0285
•
66
The predicted ~'s for both of these models based upon the un-
weight~d least squares estimates for the regression coefficients are
presented in Table 6.8.
Table 6.8
Predicted ~'s Based upon the Unweighted Least Squares
Regression Coefficients
a + b + c
A's for (6.4.1)
~'s for (6.4.2)
Mean Number
of Commissions
2
0.0725
0.2388
0.2069
3
0.3767
0.3813
0.3455
4
0.6809
0.5809
0.6383
5
0.9851
0.8375
0.8372
6
1. 2893
1.1511
1.2647
7
1. 5936
1. 5217
1.4091
8
9+
1.8978
1. 9494
2.0000
2.2020
2.4341
2.3929
...
A short review of Table 6.8 will reveal that the model presented
as (6.4.2) does a "somewhat better" job of predicting the mean number
of commissions for given values of (a·+ b + c).
Turning our attention to MAXLIK, the maximum likelihood estimation subroutine, the following were
~he
estimated maximum likelihood
estimates for 60 and 61 in model (6.4.1), and for 6 and 6 in model
0
2
(6.4.2).
For the model presented as (6.4.1),
...
13*
-
=
[-0.3807J
0.2713
e·
67
and for the model in (6.4.2),
,..
13* ..
[0.1006J
0.0294
•
A comparison of the vectors ~*(O) and @* for the respective
models reveals that, for the commission data, there Is really very
little change in the magnitude of the estimates.
The predicted values of the parameter A using the final maximum
likelihood estimates for
Table 6.9.
"
@*
are displayed in Table 6.9.
Predicted ~'s Using the Maximum Likelihood Estimates of
the Regression Coefficients
"
A'S
for
Mean Number
of Commissions
a + b + c
~'s for (6.4.1)
2
0.1619
0.2182
0.2069
3
0.4332
0.3652
0.3455
4
0.7045
0.5710
0.6383
5
0.9758
0.8356
0.8372
6
1. 2470
1.1590
1.2647
7
1. 5183
1.5412
1.4091
8
9+
1.7896
1. 9822
2.0000
2.0609
2.4820
2.3929
(6.4.2~
Figure 2 is the graphical presentation of Table 6.9.
68
Figure 2.
Mean Number and Predicted Number of Commissions Versus
the Total Number of Drugs Involved Between a Physician
and Patient, (a + b + c)
3.0
(/)
l:l
0
2.0
.~
(/)
(/)
i
0
u
e
4-l
0
I-l
CIl
~Z
"tj
CIl
~
(J
.~
1.0
"tj
CIl
I-l
~
"tj
r::
Cll
~
~
0
0
0.0
2
I
3
I
,
4
5
(a
6.
I
6
+b +
I
7
c)
(a + b
+
~
A
~.I
(6.4.1)
~
A
c)
(6.4.2)
t
,q
I
9+
1,
1
"
69
The graphs for both models appear to fit the observed data very
well.
If one reviews Table 6.9, we see that model (6.4.2) does a
somewhat better job of predicting the mean number of commissions at
levels of (a + b + c).
If one uses the accuracy of prediction as a basis for selecting
a particular model, it appears that model (6.4.2) is more proficient
at predicting number of commissions.
It was previously mentioned that for the maximum likelihood
estimates to be a proper solution to the system of normal equations,
i.e., a local maxima, they must satisfy the necessary and sufficient
conditions that 1) the first partial derivatives of the log likelihood
function with respect to the parameters in the model must equal zero,
and 2) the determinant of the matrix of second partial derivatives
with respect to the parameters in the model must be greater than
~
zero.
It can be demonstrated that the vectors
~*
for both model
~
(6.4.2) and model (6.4.1) satisfy these conditions.
Given that
~*
for model (6.4.2) is a local maxima, the estimated asymptotic variance~
covariance matrix, V*, associated with this model is
v*
=[
0.3029 x 10- 2
-0.9946 x 10-4J
-0.9946 x 10- 4
0.7218 x 10- 5
•
Similarly, the estimated asymptotic variance-covariance matrix
~
associated with
@*
for model (6.4.1) is
v*
=[
0.5779 x 10-
2
-0.1491 x 10- 2
-0.1491 x 10-
2
0.5050 x 10- 3
J
•
70
Given that we have found appropriate B weights for the models
(6.4.1) and (6.4.2), we can now produce the appropriate sums of squares
for each and begin to test certain hypotheses relating to the adequacy
of the models as well as the level of significance of the estimated
regression coefficients.
The appropriate "weighted" sums of squares for the regression of
number of commissions upon (a + b + c)2 are presented in Table 6.10.
These figures were produced by performing an unweighted least squares
regression analysis on the weighted, or transformed, original data.
It was noted that the vector of estimated regression coefficients,
"w
~ , resulting from the unweighted least squares analysis of this
transformed data agreed with the estimates in
@*
to seven decimal
~
places.
Table 6.10.
Analysis of Variance for the Weighted Regression of
Number of Commissions upon (a + b + c)2
Source
Regression
d.L
S.S.
2
274.0000
Residual
303
319.3820
Total (unc.)
305
593.3820
As was proposed in Section 4.6, based upon work done by Wald, we
can test various hypotheses relating to the estimated regression
coefficients.
Expressed in the form
71
we are specifically interested in testing the hypotheses
(6.4.3)
(6.4.4)
and either
(6.4.5)
The proposed test statistic is
which is asymptotically distributed as a Chi-square with degrees of
freedom equal to the rank of H.
To test the null hypothesis that
@*
=
0 for model (6.4.2) we
would use the H matrix
The appropriate test criterion would be
J
Pr [X2(2) ~ 274.000 180
<
.001.
We note that this test statistic has two degrees of freedom since the
rank of H is two.
On the basis of our test criterion, we reject the
conclusion that the estimated regression
(6.4.2) are jointly equal to zero.
coefficient~
for model
72
To test the null hypothesis that the intercept of our model is
not significantly different from zero, the appropriate matrix H is
the vector
H .. [1
0].
The rank of H is one, thus, the appropriate test criterion is
Although our intercept term is not highly significant, it is
suggested that it is significantly different from zero to warrant its
inclusion in the model.
We have no a-priori knowledge to justify a
regression equation which must pass through the origin and furthermore, it can be demonstrated that the inclusion of this parameter
actually aids in the prediction of number of commissions.
To test the null hypothesis that the estimated coefficient of
(a + b + c)2 is significantly different from zero, we would use
H = [0
1J
Since the rank of H is again one, the appropriate test criterion
is
Our conclusion is that the estimate of the parameter
(6.4.2) is quite significantly different from zero.
8~
for
We also note that
it is positive in sign indicating a direct relationship between the
number of connnissions experienced by a patient and the total number
of drugs involved between the patient and his phys1.d an.
73
To summarize the results of the tests of hypotheses performed on
model (6.4.2), it would seem that there is evidence to suggest that
model (6.4.2) is adequate for the representation of number of commisslons.
We may begin our investigation of the adequacy of model (6.4.1)
by observing the appropriate "weighted" sums of squares for the
regression of number of commissions on (a + b + c) which are presented
in Table 6.11.
Table 6.11.
Analysis of Variance for the Weighted Regression of
Number of Commissions upon (a + b + c)
Source
s.s.
d.L
Regression
2
273.9999
Residual
303
321.0557
Total (unc.)
305
595.0556
By using the transformed data and unweighted least squares estimation, it was possible to verify that the resulting vector of
estimated regression coefficients,
~w
~
, was identical to
~
@*
for the
model (6.4.1).
Here again, we are interested in testing the same hypotheses as
were proposed for model (6.4.2).
Under the null hyppthesis that the vector of regression coefficients,
@*,
is equal to ~ero, the test criterion is
Pr [X 2 (2 1
~ 273.9999IH~
<
.D01.
Here we note that we do have sufficient evidence to reject our
null hypothesis.
74
Testing the significance of the null hypothesis
Ho:8*0 - 0
produces the test criterion
The conclusion which one would reach based upon this test
criterion is that the estimate of the intercept for the model (6.4.1)
is significantly different from zero.
To test the null hypothesis
HO:at = 0
for model (6.4.1), we observe the test criterion
Once again, we would reject our null hypothesis in favor of the
alternative hypothesis that
at
is not equal to zero.
It is noted
A
that
at
is also positive in sign.
To summarize our findings relating to model (6.4.1), we have
produced no evidence to indicate that this model does not provide an
adequate representation for the variation observed in our data.
Thus, in terms of the adequacy of each of the models (6.4.1) and
(6.4.2), we cannot reject either one based upon the preceding tests.
Since, for the single independent variable (a + b + c), there
are a number of possible values for number of commissions at each
value of our independent variable, it is impossible to fit a regression
line through each of these points.
Pure error is variation in the
75
data that cannot be explained by a regression equation.
When we speak
of the quantity R2, defined as
R2
= -n- - - - - - (y
~
i=l
i
- Y)2
this is the proportion of the total variation present in the data
which is being explained by a particular regression model, i.e.,
R2 = (S.S. due to regression, corrected for the mean)
(total S.S., corrected for the mean)
It makes sense to compare this above quantity with a maximum possible
value of R2.
This can be accomplished by subtracting from the total
sum of squares, the sum of squares for pure error inherent in the
data, i.e., that variation which cannot be explained by the regression
model.
The sum of squares for pure error can be obtained by pooling
n
the individual sum of squares L (Yi - Yi)2 occurring at each of the
i=l
unique combinations of values for the independent variables involved
in the model.
It should be noted that the sum of squares for pure
I
error is dependent upon the levels of the independent variables
included in the model of interest.
Thus, the maximum R2 for a
particular model can be found by
~
[ i"l
R2
MAX
(y
- y)2l - PE
J
i
.. - - - - - - - - - -
n
I:
i=l
(y
i
-
Y) 2
76
We shall attempt to demonstrate this concept through the use
of the two models proposed for predicting number of commissions.
Looking first at model (6.4.2), we see that R2, presented as a
percentage, for this model is
R2 • 274.0000 - 237.6369 x 100
593.3820 - 237.6369
36.3631
• 355.7451 x 100
R2
= 10.22% .
Thus, the regression of c on (a + b + c)2 accounts for approximately ten percent of the total variation in the data.
The maximum possible R2 for the regression of c on (a + b + c)2
For model (6.4.2), it has been deter-
is based upon the following.
mined that the sum of squares for pure error is
SS(PE)
= 318.2461.
This figure represents that portion of the total sum of squares which
cannot be explained by a model with levels of the independent variable
equal to those observed for (a + b + c)2 in model (6.4.2).
As a
result, we find that the maximum possible R2 for model (6.4.2) is
• 355.7451 - 318.2470 x 100
R2
355.7451
MAX
R2
MAX
= 10.54% .
Although a value of R2 ~ 10.22% is not exceptionally high, we
have demonstrated that the maximum possible value of R2 for model
(6.4.2) is only R2 MAX
= 10.54%.
One possible interp~etation of this
result is that the model (6.4.2) actually explains almost all of the
~
77
variation in the data that is possible for a model with the independent variable (a + b + c)2.
We can calculate the value of R2 for
model (6.4.1) as being
R2
R2
I:
273.9999 - 235.7135 x 100
595.0556 - 235.7135
I:
38.2864
359.3421 x 100
= 10.66%
•
The sum of squares for pure error associated with model (6.4.1)
is
= 316.0912
SS(PE)
The resulting maximum value of R2 for this model can be calculated to be
=
.::.35::;.:9~.-=.3...:.;42::..:1~-.....:3::.::1:.=6..:.. 0:=.,:9:. ;:1:.=.2
359.3421
=
12.04%
On the basis of selecting a model which predicts as much as is
possible of the total variation present in the data, model (6.4.2),
once
a~ain,
appears to be the stronger of the two proposed models
being investigated.
Although both models (6.4.1) and (6.4.2) were demonstrated to
be adequate in termo of explaining the variation present in the number
of commissions data, model (6.4.2) was shown to be more accurate In
its ability to predict number of commissions and hence, explains a
higher proportion of the total variation present in the data.
In
78
view of this, it is proposed that the best model for predicting
number of commissions is (6.4.2),
6.5
An Investigation of th~ UnderlYing Assumptions for the Model
Chosen to Predict Number of Commissions
In order to validate the assumptions made for the model (6.4.2)
which we have proposed for predicting number of commissions, we must
demonstrate that 1) the distribution of the number of commissions,
given a value of A > 0, is represented by
p(c l A)
for c
= 0,
-A
= eA
,
c.
c
1, 2, • • • , and 2) the distribution of the parameter A
is a gamma distribution.
In the series of Tables 6.12 - 6.19 we present the observed
frequencies and the expected frequencies arising from expression
(4.3.2) for the distribution of number of commissions resulting
from those patients with the same predicted value of A.
Since,
for this model, there is a one-to-one correspondence between the
predicted values of A and (a + b + c), when we speak of that group
of patients having a given value of A, this is analogous to discussing those patients with the same value of (a + b + c).
79
Observed and Expected Freq~encies Hypothesizing a
Poisson Distribution with A • 0.2182, or
Table 6.12.
(a
+
b
+ c) ..
2
Number of
Commissions
Observed
Frequency
Expected
Frequency
0
46
46.6
.0077
1
12
11.4
.0316
Total
58
58.0
.0393
Table 6.13.
X2
Observed and Expected Frequencies Hypothesizing a
Poisson Distribution with ~ = 0.3652, or
(a
+ b + c) = 3
Number of
Commissions
Observed
Frequency
Expected
Frequency
0
40
38.2
.0848
1
11
13.9
.6050
2
4
2.5
1.0240
Total
55
54.6
1. 7138
2
Pr [X (1)
Table 6.14.
~ 1. 7138 1"oJ
=
X2
.190
Observed and Expected Freq~encies Hypothesizing a
Poisson Distribution with A = 0.5710, or
(a + b + c) = 4
Number of
Commissions
Observed
Frequency
Expected
Frequency
0
24
26.6
.2541
1
2+
17
15.2
.2132
6
5.2
.1231
47
47.0
.5904
Total
2
Pr [X (1)
~ 0. 5904 1"0
J
.. .44'1
X2
80
Table 6.15.
Observed and Expected Frequencies Hypothesizing a
Poisson Distribution with ~ = 0.8356, or
(a + b + c) ... 5
Number of
Commissions
Observed
Frequency
Expected
Frequency
0
18.6
.0086
1
19
14
15.6
.1641
2
8
6.5
.3462
3
2
1.8
.0222
43
42.5
.5411
Total
X2
2
Pr [X (2)':: 0.5411 1"0] • .763
Table 6.16.
Observed and Expected Freq~encies Hypothesizing a
Poisson Distribution with A = 1.1590, or
(a
Number of
Commissions
+
b
+
c)
=6
Observed
Frequency
Expected
Frequency
X2
0
13
10.7
.4944
1
9
12.4
.9323
2
3+
5
7.2
.6722
_7_
3.6
3.2111
34
33.9
5.3100
Total
P r [ X2 (2) .:: 5. 3100 1"0 ]
.070
81
Observed and Expected Freq~encies Hypothesizing a
Poisson Distribution with A • 1.5412, or
(a + b + c) • 7
Table 6.17.
Number of
Commissions
Observed
Frequency
Expected
Frequency
X2
0
8
4.7
2.3170
1
3
7.3
2.5329
2
5
5.6
.0643
3
6
2.9
3.3138
Total
22
20.5
8.2280
]
Pr [X2(2) ~ 8.2280[HO
Table 6.18.
=
.016
Observed and Expected Frequencies Hypothesizing a
Poisson Distribution with ~ = 1.9822, or
(a
+
b
+
c)
=8
Number of
Commissions
Observed
Frequency
Expected
Frequency
0
4
2.5
.9000
1
5
4.9
.0020
2
2
4.9
1. 7163
3
4+
4
3.2
.2000
Total
X2
_3_
i
2.4
.1500
18
17.9
2.9683
Pr [X2(3) ~ 2.9683 IHO]
=
.396
82
Table 6.19.
Observed and Expected Frequencies Hypothesizing a
Poisson Distrib¥tion with ~ • 2.4820, or
(a
+b +
c) • 9
.
Number of
Commissions
Observed
Frequency
Expected
Frequency
X2
0
4
2.3
1.2565
1
4
5.8
.5586
2
10
7.2
1.0889
3
4+
3
6.0
1.5000
_7_
6.3
.0778
28
27.6
4.4818
Total
2
Pr [X (3)
J
~ 4.4818 8 0
1
=
.214
Before we comment upon the appropriateness of the underlying
distributional assumption made for each sample of patients with a
given value of ~ or (a + b + c), we shall briefly discuss the effect
of performing a series of significance tests upon the same data.
It
is known that, for this situation, if one performs a series of significance tests at some nominal significance level a, the actual probability of a Type I error, say a*, is greater than a.
Based upon the
Bonferroni inequalities (8), if We intend to,perform V tests of significance, the expression
(6.5.1)
a
==
a*/V
will allow us to determine an appropriate value of rn by controlling
for the value of a*.
To perform V
=
7 tests of significance, i.e., (a + b + c) = 3, 4,
., 9+, if we choose a*
=
should be performed at the a
.05, according to (6.5.1) each test
= 0.007
level of significance.
83
When ~
= 0.2182,
i.e., (a + b + c) - 2, we are not able to per-
form any such test since there are too few degrees of freedom.
How-
ever we can note the observed and expected frequencies associated with
each level of number of commissions and speculate that a Poisson distribution conditioned upon e parameter value equal to ~ when
(a + b + c)
=2
ior of this
se1ect~d
would adequately describe the medication-taking behavgroup of patients.
For the remaining distribu-
tions of number of commissions which occur at values of (a
+ b + c) =
3. 4 • • . '. 9+ • we see that we cannot reject the null hypothesis that
they also. can be represented by a Poisson distribution given their
respective parameter estimates~. Although there is some variation
in the "goodness-of-fit," Tables 6,12 through 6.19 demonstrate that
the assumption of a patient's medication-taking behavior being represented by a Poisson process with parameter A is well founded.
Secondly. we must establish that the distribution of the parameter
A can be represented as the gamma distribution.
The procedure used to demonstrate the validity of this assumption
is as follows:
Assume that the distribution of the parameter A is given by
(4.3.3). namely.
Aa - 1 e- A/ S
g(A) = - - - -
Sa r(a)
where A > O. a
>
O. and f3 > O.
sample of j = 1, 2,
By the Method of Moments, given a
.• n individuals, the sample estimators of
84
We know that for the model
there is a certain observed frequency of occt,rrence, ai' associated
with each predicted value of A, or (a + b + c).
the observed distribution of A.
Thus, we can specify
What is needed are the corresponding
expected frequencies, Ei , under (4.3.3).
The method proposed to esti-
mate these expected frequencies is to use a modified version of the
general Chi-square goodness-of-fit test.
Given the predicted values
of ~, one can determine the midpoint of the interval between each of
these values.
By integrating the probability density function of the
gamma distribution, given
&and B ,
from midpoint to midpoint, one can
estimate the area under this function corresponding to the respective
value of ~ contained therein.
Unfortunately, for the model (6.4.2),
85
this exercise is somewhat artificial since, with only the independent
variable (a + b + c) included in the model, the observed distribution
of A is discrete.
Nonetheless, we shallpertorm this test to demon-
strate the proposed procedure.
to help illustrate this method.
g(A)
o
The following figure is an attempt
86
•
••
2
.
Given the model Aj • So + 62 (a + b + c)j for j • 1, 2, . . . , n
subjects, since there are only i - 1, 2, . • 0, u distinct values of
(a + b + c), it is possible to let ~i represent the predicted value of
the parameter A for those n i subjects who all have the same value of
(a + b + c).
Given the values of ~i' we may then determine the (mp)i'
or midpoints of the intervals between ~i and ~i+l'
If we perform the
integration
o
it is suggested that this area will represent that portion of the area
.
under g(A) for A •
l
Similarly, if this is done for all of the remain-
ing intervals, the entire area under g(A) can be assigned to the
respective values of ~i'
The expected frequency, E., corresponding
1
to each ~i' will be
.
If there are i = 1, 2, . . . , u values of Ai' then our test statistic
1s
u
L
i=l
which is distributed as a Chi-square with u - (2 + 1) degrees of
freedom.
A problem arises in the evaluation of g(A) when a is not an
integer or some other tabulated value of the gamma function.
and Kotz (15) suggest that g(A) can be approximated by
Johnson
87
We can estimate a and
a from
the sample data and there are numerous
methods available for evaluating the probability density function of
the Chi-square distribution.
Table 6.19 presents the predicted values
of ~ from (6.4.2), the boundaries for the intervals about these
values of ~, and their associated observed frequencies.
Table 6.20.
Intervals Containing the Predicted Values of the
Parameter A and Their Respective Observed Frequencies
A's for (6.4.2)
Intervals about A
Observed
Frequency
2
0.2182
0.0000-0.2917
58
3
0.3652
0.2917-0.4681
55
4
0.5710
0.4681-0.7033
47
5
0.8356
0.7033-0.9973
43
6
1.1590
0.9973-1.3501
34
7
1.5412
1.3501-1. 7617
22
8
9+
1. 9822
1. 7617-2.2321
18
2.4820
2.2321- +
28
,.
a + b +
C
~
00
From the number of commissions sample data the estimates of the
parameters a and B were determined to be
~ = 1.6363
and
8 = 0.5490.
~
The values of A corresponding to the values of (a + b + c), the
observed frequencies, the expected frequencies under the distribution
given in (4.3.3), and the corresponding contribution to the total
value of the test statistic are presented in Table 6.21.
88
Table 6.21.
Observed and Expected Frequencies Hypothesizing a Gamma
Distribution
a + b + c
~'s for (6.4.2)
Observed
Frequencies
Expected
Frequencies
X2
2
0.2182
58
53.7
0.3443
3
0.3652
55
42.9
3.4128
4
0.5710
47
52.0
0.4802
5
0.8356
43
51.1
1.2840
6
1.1590
34
42.0
1. 5238
7
1.5412
22
29.4
1.8626
8
9+
1.9822
18
17.7
0.0051
2.4820
28
16.2
8.5951
305
305.0
17.5079
Total
Although one would undoubtedly conclude, based upon the value of
this test statistic, that we do have sufficient evidence to reject our
null hypothesis that the distribution of A is represented by a gamma
distribution, inspection of the last column of Table 6.21 will reveal
that over 50 percent of the total value of our test statistic is contributed by the cell where
X=
2.4820, or (a + b + c) = 9+.
It
should be remembered, that this category of (a + b + c) represents a
collection of a number of values of (a + b + c)
to assure an adequate sample size for this cell.
those values of (a + b + c)
~
~
9.
This was done
However, in pooling
9, we have destroyed any trend for the
data to asymptotically decrease in the same manner as the gamma
distribution.
For other than this last interval, it Is suggested
that a gamma distribution does provide an adequate representation of
the distribution of the parameter A.
89
It is felt that the results of Section 6.5 tend to substantiate,
although somewhat artificially, the tenability of the distributional
assumptions underlying our proposed model for predicting number of
commissions.
6.6
Summary of Chapter VI
Of the 31 initial independent variables, the total number of
drugs involved between the physician and his patient was selected as
the variable best suited, by our criteria, to predict number of
commissions.
As previously indicated, the literature reports a
positive association between this independent variable and noncompliant medication-taking behavior.
However, the implication in
this statement is that this relationship is linear in nature.
This
means that for a given increase in the total number of drugs involved
between the doctor-patient pair, there is an always constant increase
in the patient's non-compliant medication-taking behavior.
For the
model
Aj
=
80 + 82 (a + b + c)j,
this increase is not always constant.
As can be seen in Figure 2,
as the total number of drugs involved between a physician and his
patient increase, there is a correspondingly greater increase in the
number of commissions experienced by the patient.
The clinical
significance in this finding is possibly that the greater the number
of medications prescribed for and being consumed by the patient, the
ever greater is his proneness to consume more medications not
currently being prescribed by his physician.
As a result of his
90
being more prone to commit medication-taking errors of this type,
the probability of his consuming contraindicated medications or
developing medication-related complications in his physical condition
is possibly also heightened.
It is felt that this finding should
have the result of placing an even greater emphasis upon the
necessity for developing effective communication between a physician
and his patients.
CHAPTER VII
A MODEL TO PREDICT NUMBER OF OMISSIONS
7.1
The Definition and Distributiona1 1Form of Number of Omissions
Those medications which have been prescribed by a physician bUL
which are not being taken by the patient are defined as omissions.
The number of omissions is designated by the letter "b".
In Chapter
VII we shall propose and discuss various results derived from attempts
to select those variables which appear to best predict number of
omissions.
As for the number of commissions, we would like to combine the
distributions of number of omissions for the diabetes mellitus and
congestive heart failure patient samples in order to, hopefully,
broaden the scope of any results which may be obtained from this
work.
After having determined the form of the distribution of number
of omissions for the combined sample data, we shall then demonstrate
that this is also the form of the distribution of the response in each
of the respective indicator condition pat:f.ent samples, thereby providing evidence to support the proposition that it La approprlute Lo
combine these two patient samples.
In Table 7.1 we present the observed number of omissions and
their associated frequencies of occurrence for both indicator condition patient samples as well as the combined patient sample.
92
Table 7.1.
The Observed Distributions of the Number of Omissions
for Patients Included in the Study of Medication-Taking
Behavior
DM
CHF
Combined
Number of
Relative
Relative
Relative
Omissions Frequency Frequeney Frequency Frequency Frequency Frequency
0
126
0.57
57
0.48
183
0.54
1
63
0.29
33
0.28
96
0.28
2
20
0.09
15
0.13
35
0.10
3
10
0.04
9
0.07
19
0.06
4
1
0.01
2
0.02
3
0.01
5
0
Total Number
of Patients 220
0.00
2
0.02
2
0.01
1.00
118
1.00
338
1.00
Number of omissions is a discrete random variable, and also, from
inspection of the distributions of relative frequencies, we note that
each distribution is positively skewed.
It is proposed that a negative
binomial distribution of the form given in (4.3.5) will be appropriate
to represent the distribution of number of omissions for the combined
patient sample.
Based upon the combined patient sample data, the estimates of the
parameters a and
a are
a
= 0.6804
e=
1.0652.
By means of the probability generating function (4.3.7), we are able
to investigate the assumption that a negative binomial distribution
with parameter. values
&and
tion of the distribution of
patient sample data.
B above
n~mber
provides an adequate representaof omissions for fhe combined
Table 7.2 presents the figures pertinent to
the investigation of this assumption.
93
Table 7.2.
The Fitting of a Negative Binomial Distribution to the
Observed Distribution of the Number of Omissions for the
Combined Patient Sample
Number of
Omissions
Observed
Frequency
Expected
Frequency
X2
0
183
206.3
2.632
1
96
72.4
7.693
2
35
31.4
0.413
3
19
14.5
1.397 .
4
3
6.9
2.204
5
2
3.3
0.512
338
334.8
14.851
Total
2
Pr [X (Jl
~ 14.851 18 0
J
= 0.002
The value of our test criterion indicates that there is sufficient evidence to reject the null hypothesis that this data is
adequately represented by a negative binomial distribution.
We note
from inspection of Table 7.2 that there is a considerable discrepancy
between the observed and expected frequency of occurrence for both
zero and one omissions.
Having demonstrated that a negative binomial distribution does
not provide an adequate representation of our response variable, we
. recall that a special case of the negative binomial distribution, i.e.,
the geometric distribution, is another alternative for the distributiona1 form of the response variable.
the distributional form of the
Under the null hypothesis that
respoOl:~e
variable for the comhined
patient sample is given by expression (4.4.1), the dhta pertinent
to the investigation of this hypothesis are presented in Table 7.3.
The sample estimate for the parameter S in (4.4.1) is
B=
0.7248 .
94
Table 7.3.
The Fitting of a Geometric Distribution to the Observed
Distribution of the Number of Omissions for the Combined
Patient Sample
Number of
Omissions
Observed
Frequency
Expected
Frequency
X2
0
183
196.0
0.862
1
96
82.3
2.281
2
35
34.6
0.005
3
19
14.5
1.397
4
3
6.1
1.575
5
2
2.6
0.138
338
336.1
6.258
Total
P [X2(4l ~6.258IHOJ
r
=
0.181
Here, given the sample estimate of the parameter S, we observe that
a geometric distribution does provide an adequate representation of
the data.
To substantiate the merging of the two indicator case patient
samples, it will be demonstrated that the distributional form of the
number of omissions associated with each indicator condition sample
is also a geometric distribution.
For patients with diabetes mellitus, the sample estimate of the
parameter S in (4.4.1) is
i3
= 0.6227.
Given this parameter estimate, Table 7.4 presents the observed frequencies of occurrence for the number of omissions experienced by
the diabetic patient sample and the associated expected frequencies of
occurrence based upon the assumption that the data can be represented
by a geometric distribution.
I
I
I>
I
!: I
95
Table 7.4.
The Fitting of a Geometric Distribution to the Observed
Distribution of the Number of Omissions for Patients
with Diabetes Mellitus
Number of
Omissions
Expected
Frequency
x2
0
126
135.6
0.680
1
63
52.0
2.327
2
20
20.0
0.000
3
10
7.7
0.687
4
1
2.9
1.245
220
218.2
4.938
Total
On
Observed
Frequency
the basis of this test statistic, one would conclude that
there is sufficient evidence to accept the assumption that the observed
distribution of the number of omissions for the diabetic patient sample
is adequately represented by a geometric distribution.
From the sample data for patients with a primary diagnosis of
congestive heart failure, the estimate of the parameter B in (4.4.1)
is
a = 0.9152.
The entries in Table 7.5 pertain to the investigation of the
assumption that the observed distribution of the number of omissions
for the congestive heart failure sample can also be represented by
the geometric distrihution.
96
Table 7.5.
The Fitting of a Geometric Distribution to the Observed
Distribution of the Number of Omissions for Patients
with Congestive Heart Failu~e
Number of
Omissions
Observed
Frequency
Expected
Frequency
X2
0
57
61.6
0.344
1
33
29.4
0.441
2
15
14.1
0.057
3
9
6.7
0.790
4
2
3.2
0.450
5
2
1.5
0.167
118
116.5
2.249
Total
2
J
P r [X (4l:: 2 249 1HO
0
0.690
=
Here we observe that the geometric distribution does an excellent
job of representing the observed distribution of number of omissions
for the congestive
h~art
failure patient group.
Having demonstrated not only that the observed combined patient
sample distribution of number of omissions can be represented by a
geometric distribution of the form given in (4.4.1) but also, that
this distribution provides an adequate representation for the distribution of number of omissions for both of the indicator case patient
samples, attention is now directed toward selecting those independent
variables which appear to be best suited for predicting number of
omissions.
7.2
The Initial Selection of Independent Variables to Predict the
Number of Omissions
The initial set of independent
variabl~s
which were considered
as candidates for inclusion into a model attempting to predict the
~
97
number of omissions experienced by a patient consisted of the same 31
variables listed in Table 6.5.
All have previously been discussed in
detail in earlier sections.
Our model is of the form
E(~)
=
A = XB
where
B is an (n x 1) vector containing the number of omissions associated with the jth patient within the combined patient sample.
X is an [n x (k + 1)] matrix composed of those k independent
variables chosen as being useful in the prediction of number
of omissions plus a variable identically equal to one for
each patient.
and
@ is
a [(k + 1) x 1] vector of estimated regression coefficients
associated with the k + 1 columns of the X matrix.
Our task is to select a set of r
2 31 independent variables which
demonstrate a strong relationship to the independent variable.
Other
preliminary investigations of this data indicated that, with respect
to the number of omissions, the independent variable (a + b + c) was
the most informative single variable within our data set.
Also, in
view of the fact that, as explained in Section 6.2, many other candidate independent variables are conditional upon some subset of
(a + b + c), e.g., (a + b) or (a + c), it was decided that this
independent variable should be included in any set of variables
proposed as being useful in the prediction of number of omissions.
The inclusion of any additional independent variables is therefore,
conditioned upon the presence of (a + b + c).
As previously discusBed
in Section 6.2, the inclusion of the independent variable (a + b + c)
reduces the number of allowable patients in the combined sample to
98
305 since we must bound (a + b + c) to be greater than or equal to
Also, the category (a + b + c)
two.
= 9+
patients with values of (a + b + c) > 9.
was formed from those
This was done to assure an
adequate number pf responses at the upper end of the range of values
for (a + b + c).
Given these modifications in the structure of our data set, the
partial correlation coefficients for all of the candidate independent
variables and the number of omissions, given (a + b + c), were calculated.
Using a liberal selection criterion, three additional inde-
pendent variables, each indicative of some aspect of the patients'
medication regimens, were selected as being potentially useful in the
prediction of number of omissions based upon these partial correlation
coefficients.
In Table 7.6, the three additional independent variables
and their respective partial correlations with the response variable,
conditioned upon the presence of (a + b + c), are presented.
Table 7.6.
Independent Variables in Addition to (a + b + c) Initially
Selected to Predict Number of Omissions
Independent Variables
Selected
Partial Correlation
Coefficient I (a + b + c)
XlO I (a + b + c)
X lea + b + c)
5
X6 I (a + b + c)
7.3
-0.651
-0.324
-0.260
A Model to Predict Number of Omissions
Having initially chosen a set of r
=4
independent variables,
i. e. ,
+ b +
lO ' XS ' X6
by proceeding along the guidelines established in Section 4.6, we
(a
c), X
99
shall attempt to select those k
~
4 independent variables which appear
to best predict number of omissions.
As previously outlined, by fitting the 2 4 - 1
3
15 models com-
posed of all possible combinations of the r • 4 independent variables
initially selected for consideration, it should be possible to begin
to select those k variables which appear to be contributing the most
toward the prediction of the response variable.
Preliminary investigation by means of unweighted least squares
regression techniques revealed that, as was expected, the best single
predictor of number of omissions is the independent variable
(a + b + c).
In lieu of developing a single independent variable
model to predict number of omissions, a model which included as many
of the initially selected independent variables as possible would,
hopefully, furnish additional insight into other factors which may
also playa role in omission behavior.
To this end, all possible
multiple independent variable models composed of the r
selected independent variables were investigated.
=4
initially
Once again, the
great majority of these multiple variable models consistently produced
negative estimates of the parameter A which, as previously stated,
designates them as being inappropriate for the problem as we have
defined it.
The only combination of independent variables which
appeared to show some potential for predIcting numbel of omissions
was (a + b + c) and KIa.
In the unweighted least squares situation, if we begin with the
model
Aj
=
BO + Bl (a + b + c)j'
the addition of the independent variable KIa to this model produces
100
an increase in the amount of variation explained from 20.5% to 54.9%.
This increase in the value of R2 was felt to be large enough to justify
the inclusion of the independent variable X '
lO
Furthermore, the
inclusion of an independent variable representing the number of
prescription medications currently being consumed for which the patient
knows the function was felt to have meaning in a clinical sense.
Given
these incentives, it was decided that there was sufficient justification to propose the model
Aj = 60 + 61 (a + b + c)j + 62XlOj
as a possible model for predicting number of omissions.
By means of
the maximum likelihood estimation subroutine MAXLIK, an attempt was
made to arrive at a set of final estimates for the regression coefficients in the above model.
task resulted in failure.
However, all attempts to accomplish this
Subsequent searching produced the conclusion
that the parameter values which would satisfy the likelihood equations,
given the restriction that A > 0, were outside of our restricted region.
If we begin, in the unweighted least squares case, with the model
j = 61 (a + b +c)j'
A
the addition of the independent variable X
to this model produced an
10
increase in R2 from 51.1% to 74.4%.
It should be noted that since the
above model contains no intercept, the formula for calculating R2 is
not corrected for the overall mean.
of R2 for the two single variable
should be attempted.
Thus, no comparison of the values
models containing (a + b + c)
Also, in terms of predicting vAlues of A, the
no-intercept form of the model appeared to do almost as well as the
form of the model with an intercept term.
Based upon these results,
101
·it is proposed that the model
(7.3.1)
be considered as a possible candidate for pred:f.cting number of omissions and that we begin to
i~vestigate
the properties of this model.
The unweighted least squares estimates of the regression coefficients for model (7.3.l) were
at{O) '"'
0.5701
and
a~{O) '"' -0.5177.
A
Using 8t(0) and
"
8~(0)
as initial estimates, by means of the
maximum likelihood estimation subroutine MAXLIK. the final estimates
for the regression coefficients in model (7.3.l) were
at '"'
0.6751
and
a~
= 0.6583.
A comparison of the initial unweighted least squares estimates
of the regression coefficients in model (7.3.l) with the final weighted
least squares estimates of Sl and 8
2
reveals some slight change in
their magnitude.
For the maximum likelihood estimates of 8 and 13 in model (7.3.1)
2
1
to be a proper solution to the system of normal equations (4.5.5) they
must satisfy the necessary and sufficient conditions presented in
Section 4.5.
It can be demonstrated that the vector of estimated
"
tegression coefficients, ~*,
does satisfy these nece~sary and suffi-
cient conditions leading us to conclude that the elementl:! of
&*
for
102
model (7.3.1) represent a local maxima on the likelihood surface.
Given this finding, the estimated asymptotic variance-covariance
V*,
matrix,
associated with this model is
A
[
V* •
0.2248 x 10- 2
-0.2323 x 10- 2
-0.2323 x 10- 2
0.2436 x 10- 2
J
•
Tables 7.7 through 7.14 present, for observed values of X
10
given (a + b + c)t the predicted values of A resulting from the
weighted least squares form of model (7.3.1)
Table 7.7.
Predicted ~'s Using the Maximum Likelihood Estimates of
the Regression Coefficients for Model (7.3.1) When
(a +b + c)
=
2
A
Mean Number
of Omissions
Frequency of
Occurrence
0
1.3502
0.0000
1
1
0.6919
1.0000
14
2
0.0336
0.0000
43
X
10
A
A
Table 7.8.
Predicted A's Using the Maximum Likelihood Estimates of
the Regression Coefficients for Model (7.3.1) When
(a + b + c)
=
3
~
Mean Number
of Omissions
0
2.0253
0.0000
1
1
1.3670
2.0000
3
2
0.7087
0.8888
18
3
0.0504
0.0000
33
X
10
Frequency of
Occurrence
103
Table 7.9.
Predicted ~'s Using the Maximum Likelihood Estimates of
the Regression Coefficients for Model (7.3.1) When
(a
+
b
+
c)
=4
XIO
Mean Number
of Omissions
Frequency of
Occurrence
1
.2.0421
2.0000
4
2
1.3838
1.1667
6
3
0.7255
0.6818
22
4
0.0672
0.0000
15
Table 7.10. Predicted ~'s Using the Maximum Likelihood Estimates of
the Regression Coefficients for Model (7 . 3. 1) When
(a + b + c) = 5
~
Mean Number
of Omissions
Frequency of
Occurrence
0
3.3754
2.0000
1
1
2.7171
1.0000
3
2
2.0589
3.0000
2
3
1.4006
1. 7273
11
4
0.7423
0.8571
14
5
0.0840
0.0000
12
!10
Table 7.11 Predicted ~'s Using the Maximum Likelihood Estimates of
the Regression Coefficients for Model (7 • 3. 1) When
(a + b + c) = 6
A
Mean Number
of Omissions
Frequency of
Occurrence
2
2.7339
4.0000
1
3
2.0757
2.0000
9
4
1.4174
1.5000
10
5
0.7591
0.8571
1
6
0.1008
0.0000
7
~
!10
104
Table 7.12. Predicted ~'s Using the Maximum Likelihood Estimates of
the Regression Coefficients for Model (7.3.1) When
(a + b + c) .. 1
~
Mean Number
of Omissions
Frequency of
Occurrence
0
4.7256
2.0000
1
2
3.4090
0.0000
1
3
2.7507
2.0000
3
4
2.0925
2.5000
2
5
1.4342
1.0000
1
6
0.7759
1.0000
8
7
0.1176
0.0000 .
6
X
10
Table 7.13. Predicted ~'s Using the Maximum Likelihood Estimates of
the Regression Coefficients for Model (7 . 3. 1) When
(a + b + c) = 8
~
Mean Number
of Omissions
Frequency of
Occurrence
3
3.4258
5.0000
1
4
2.7675
2.5132
3
5
2.1093
1. 9973
4
6
1.4510
1.4815
3
7
0.7927
0.9656
4
8
0.1344
0.4498
3
X
10
e
105
Table 7.14. Predicted ~'s Using the Maximum Likelihood Estimates of
the Regression Coefficients for Model (7.3.1) When
(a + b + c) = 9+
"
A
Mean Number
of Omissions
Frequency of
Occurrence
4.1009
5.0000
1
3.4426
3.0000
1
2.7843
2.5000
4
2.1261
2.0000
4
1.4678
2.3333
3
0.8095
1. 2857
7
0.1512
0.8750
8
Having found appropriate
a weights
for the variables in model
(7.3.1), we can produce the appropriate sums of squares and begin to
test hypotheses concerning these estimated regression coefficients.
For the regression of number of omissions upon (a + b + c) and
X ' the appropriate "weighted" sums of squares are presented in
lO
Table 7.15.
Once again, these figures were developed by performing
an unweighted least squares regression analysis on the weighted, or
transformed, original data.
The vector of estimated regression
coefficients, @w, produced by the unweighted least squares regression
on the transformed data agreed very highly (six decimal places) with
.,
the maximum likelihood estimates in
@*.
Table 7.15. Analysis of Variance for the Weighted Regression of
Number of Omissions Upon (a + b + c:) and X
10
Source
Regression
Residual
Total (unc. )
d. f.
2
s.s .
. * -_. _ _.- -
.-----.
246.0000
303
---
177.4417
305
423.4417
106
To test the null hypothesis that
Ho: 13*
• 0~
~
for model (7.3.1) we would use the H matrix H - 1 •
2
thesis, the appropriate test criterion would be
2
Pr [X (2)
~ 246.0000 1"0]
For this hypo-
< 0.001
which would lead us to conclude that the estimates. of the regression
coefficients for our model are not jointly equal to zero.
The test of the null hypothesis
HO:ar ... 0
can be accomplished by letting
H = [1
0].
The test criterion for deciding if the estimated
B weight for the
independent variable (a + b + c) is equal to zero is
Our conclusion would be that we have sufficient evidence to believe
that the estimate of
We note that
af
af
is quite significantly different from zero.
is positive in sign indicating a direct relationship
between the number of omissions for a patient and the total number
of prescription drugs involved between the patient and his physician.
Turning our interest to the estimated
pendent variable X ' by letting
10
H... [0
we can test the null hypothesis that
1J
e weight
for the inde-
107
By observing the test criterion
Pr [X2(1) .::. 184.7882 180J
< 0.001
we would conclude that the estimated regression coefficient for X
10
given (a + b + c) is also significantly different from zero.
However,
" is negative indicating that, for given
in this case we note that a~
values of (a + b + c), the number of omissions experienced by a
patient increase as the number of prescription drugs which he is
currently consuming for which he knows the function decrease.
To summarize our findings thus far relating to the model (7.3.1),
we have produced no evidence to suggest that this model does not
provide an adequate representation for the variation observed in our
data.
Another criterion for assessing the adequacy of the proposed
model (7.3.1) is to consider the proportion of the total variability
in our data which is explained by our model in comparison with the
proportion of total variability which a two-variable. model, with
variable levels given by (a + b + c) and X ' could possibly explain.
10
Since model (7.3.1) is regressed through the origin, the.re is nO need
to correct for the overall mean thus altering the expression for R2
to the form
n
L
i=1
~2
Yi
R2 = - - n
E
i=l
y2
i
108
Looking at (7.3.1), we see that the value of R2, presented as a
percentage, for this model is
R2 • 246.0000
100
423.4417 x
R2 • 58.09%
An interpretation of this figure' is, that model
(7.3.1) accounts
for approximately 58 percent of the total variation present in our
data.
The maximum possible amount of variability which could be
explained by the regression of (a + b + c) and X
10 upon B, the number
of omissions, is based upon the determination that the sum of squares
for pure error inherent to this data space is
SS(PE)
= 81.7740.
This figure represents that portion of the total sum of squares which
cannot be explained by a model with independent variable levels equal
to those observed for (a + b + c) and X .
lO
2
possible value for R for model (7.3.1) is
2
~ =
~X
Therefore, the maximum
423.4417 - 81.7740
423.4417
= 80.69%.
A comparison of the value of R2 actually achieved by model
(7.3.1) and the maximum value of R2 which could possibly be achieved
by a model of the form represented by (7.3.1) reveals that the
proposed model for predicting number of omissions does indeed,
explain a sizable portion of the total explainable variation in our
data.
109
In view of the fact that no substantial evidence against the
suitability of model (7.3.1) has been generated to this point and
also, due to the fact that this model does include another clinically
meaningful facet to the prediction of number of omissions, it is
proposed that the model
be accepted as an adequate predictor of number of omissions.
7.3
An Investigation of the Underlying Assumptions for the Model
Chosen to Predict Number of Omissions
As stated in Section 4.3, one of the basic underlying distributional assumptions of Poisson regression analysis is that the probability of an individual experiencing y medication-taking errors,
given a fixed value of A, is given by
for y
= 0,
1, 2, . • • , when A > O.
When the type of medication-
taking error is defined to be an omission, we must demonstrate that,
for a fixed value of A > 0, the probability of an individual experiencing b
= 0,
1, 2, • • • , omissions is given by
p(bl>..)
For the sample of 305 indicator case
that for each predicted value of A,
e
->..
Ab
b!
pat~ents,
i.E.~
a problem arises in
•• u unique cnmbinntJon of
values for the independent variables (a + b + c)
size is not always sufficiently large to
aforementioned assumption.
/Jilt!
X10' lilt' Hlimp I ..
re~listlcllily
test the
In light of this restriction, it is
110
proposed that we demonstrate the validity of the assumption (4.3.2)
for only those values of
patients.
"~
which have a sufficiently large number of
In the series of Tables 7.16 - 7.29, we present, for
selected values of "~, the observed frequencies and the expected
frequencies arising from expression (4.3.2) when the random variable
is number of omissions.
Table 7.16. Observed and Expected Frequencies Hypothesizing a Poisson
Distribution with X= 0.7087, or (a + b + c) = 3 and
XIO = 2
Number of
Omissions
Observed
Frequency
0
2
8.9
5.3494
1
16
6.3
14.9349
2
0
2.2
2.2000
18
17.4
22.4843
Total
Expected
Frequency
X2
Table 7.17. Observed and Expected Frequencies Hypothesizing a Poisson
Distribution with ~ = 0.0504, or (a + b + c) = 3 and
X10 = 3
Number of
Omissions
0
1+
Total
Observed
Frequency
Expected
Frequency
33
31.4
0.0815
0
1.6
1.6000
33
33.0
1.6815
X2
111
Table 7.18. Observed and Expec~ed Frequencies Hypothesizing a Poisson
Distribution with ~ • 0.7255, or (a + b + c) • 4 and
X10 • 3
Number of
Omissions
Observed
Freguency
Expected
Freguency
0
7
10.7
1.2794
1
15
7.7
6.9208
2
0
2.8
2.8000
3
0
0.7
0.7000
22
21.9
11. 7002
Total
2
P r [ X (2)
.:c 11. 70021HO ]
X2
> 0.005
Table 7.19. Observed and Expec~ed Frequencies Hypothesizing a Poisson
Distribution with ~ • 0.0672, or (a + b + c) = 4 and
X =4
10
Number of
Omissions
Observed
Frequency
Expected
Frequency
0
1+
15
14.0
0.0714
0
0.9
0.9000
15
14.9
0.9714
Total
2
- -X- -
Table 7.20. Observed and Expected Frequencies Hypothesizing a Poisson
Distribution with ~ = 1.4006, or (a + b + c) = 5 and
X =3
10
Number of
Omissions
Observed
Frequency
Expected
Freguency
X2
0
1
2.7
1.0704
1
1
3.8
2.0631
2
9
2.7
14.7000
3
0
1.2
1.2000
.,,----
Total
11
10.4
19.0335
2
Pr [ X (2)
.:c 19.033SIHO
J
< 0.001
112
Table 7.21. Observed and Expec~ed Frequencies Hypothesizing a Poisson
Distribution with A ~ 0.7423, or (a + b + c) = 5 and
X =4
10
Number of
Omissions
Observed
Frequency
Expected
Frequency
X2
0
2
6.7
3.2970
1
12
4.9
10.2877
2
0
1.8
1.8000
3
0
0.5
0.5000
14
13.9
15.8847
Total
Table 7.22. Observed and Expected Frequencies Hypothesizing a Poisson
Distribution with ~ = 0.0840, or (a + b + c) = 5 and
X10 = 5
Number of
Omissions
Observed
Frequency
0
1+
Total
Table 7.23. Observed and
Distribution
X = 4
10
Number of
Omissions
Expected
Frequency
X2
12
11.0
0.0909
_0_
0.9
0.9000
12
11.9
0.9909
Expected Frequencies Hypothesizing a Poisson
with ~ = 1.4174, or (a + b + c) = 6 and
Observed
Frequency
Expected
Frequency
X2
0
1
2.4
0.8167
1
3
3.4
0.0470
6
3.6
1.6000
10
9.4
2.4637
2+
Total
2
P r [ X (1) -> 2. 4637
1"0
J
<'
0.150
113
Table 7.24. Observed and Expected Frequencies Hypothesizing a Poisson
Distribution with ~ • 0.7591. or (a + b + c) • 6 and
X10 • 5
Number of
Omissions
Observed
Frequency
Expected
Frequency
X2
0
1
3.3
1. 6030
1
2+
6
2.5
4.9000
0
1.1
1.1000
Total·
7
6.9
7.6030
Table 7.25. Observed and Expec~ed Frequencies Hypothesizing a Poisson
Distribution with A = 0.1008. or (a + b + c) = 6 and
X10 = 6
Number of
Omissions
Observed
Frequency
Expected
Frequency
0
1+
7
6.3
0.0778
0
0.6
0.6000
7
6.9
0.6778
Total
X2
Table 7.26. Observed and Expec~ed Frequencies Hypothesizing a Poisson
Distribution with A = 0.7759, or (a + b + c) = 7 and
X = 6
10
Number of
Omissions
Observed
Frequency
Expected
Frequency
X2
0
1+
0
3.7
3.7000
8
4.3
3.1837
8
8.0
6.8837
Total
1 14
Table 7.27. Observed and Expected Frequencies Hypothesizing a Poisson
Distribution with A = 0.1176, or (a + b + c) = 7 and
X =7
10
Number of
Omissions
Observed
Frequency
Expected
Frequency
X2
0
1+
6
5.3
0.0924
0
0.6
0.6000
6
5.9
0.6924
Total
Table 7.28. Observed and Expected Frequencies Hypothesizing a Poisson
Distribution with ~ = 0.8095, or (a + b + c) = 9+ and
X =8
10
Number of
Omissions
Observed
Frequency
0
2
3.1
0.3903
1
2+
3
2.5
0.1000
2
1.4
0.2571
-----
7
7.0
0.7474
Total
Expected
Frequency
X2
Table 7.29. Observed and Expec~ed Frequencies Hypothesizing a Poisson
Distribution with A = 0.1512, or (a + b + c) = 9+ and
X = 9+
10
Number of
Omissions
Observed
Frequency
0
1+
4
6.9
1.2188
4
·~-~4
7.6454
------
8
8.0
8.8642
Total
Expected
Frequenc'y
X2
For Tables 7.16 through 7.29 we note that we can only perfurm a
test of significance on seven of the fourteen tables presented.
The
115
remaining seven
t~bles
are meant to illustrate the "goodness-pf-fit"
of the expected frequencies to their respective observed frequencies
even though we cannot determine their statistical "goodness-of-fit".
Based upon expression (6.5.1), if we choose a*
seven tests should be
perfor~ed
at the a
= 0.05,
= 0.007
each of the
level of significance.
If we first consider only those seven tables where it is possible to
determine their statistical goodness-of-fit, we see that we accept
the null hypothesis of a Poisson distribution representing the probability of an individual experiencing b .. 0, 1, 2, • . • , omissions,
given a specified value of A > 0, about as many times as we reject
this null hypothesis.
By considering all 14 tables, it would appear
that the null hypothesis would be accepted more times than it would
be rejected.
Although certain weaknesses in our data prevent a
rigorous investigation of the tenability of the assumption that the
conditional distribution of number of omissions given a specified
"proneness" or value of A is a Poisson distribution, it is proposed
that we accept this proposition based upon the results of our attempts
to demonstrate this assumption.
It also remains to be shown that the form of the distribution of
the parameter A is a gamma distribution.
For the response variable
number of omissions, we initially assumed that rather than a gamma
distribution, A could be represented as a special case of the gamma
distribution, namely, an exponential distribution of the form
(7.3.2)
1
g(A) =-e
f3
-A/a
116
a>
where
O.
The maximum likelihood estimator for the
~arameter
B is
n
EA
i3 "" j-r j
n
In Section 6.5, a method was proposed for demonstrating the validity of the distributional assumption for the parameter A.
To briefly
~
reiterate upon this proposed solution, given the model A ""
j
8l (a
+ b + c)j +
B2 XlOj
for j "" 1,2, •
.
.,
n subjects, we must, in
some manner, group the ~j'S into i "" 1, 2, . . • , u distinct cat egories.
A
For the values of A contained within the
j
.
•
~
th
category, let the
~
maximum value, i.e., the upper. boundary point, be Ai.
~
~
with each Ai are the n
n
i
Associated
~
subjects contained between Ai and A + .
i
i l
subjects are the observed frequencies of occurrence for the i
2,
., u categories.
These
=
1,
Let these observed frequencies be denoted
~
by 0i.
Given the values of Ai' we may then determine the (mp)i' or
~
~
midpoints of the intervals between Ai and A + .
i l
If we perform the
integration
(mp)i
it is suggested that this area will represent that portion of the area
~
under g(A) associated with Ai.
1, 2,
..
If this is done for all of the i
=
., u intervals, the entire area under g(A) can be assigned.
~
The expected frequency of occurrence associated with each Ai will be
Ei "" g(A i ) x n.
Given the i "" 1, 2, . . . , u categories, the test
I
I, "
~
I
I"
i
"
117
statistic
will be distributed as a Chi-square with u - (i + 1) degrees of
freedom.
For the data concerning number of omissions, a natural grouping
of the ~j'S appeared to be based upon the number of drugs which the
patient was currently involved with for which he did not understand
the function.
This value is the difference between (a + b + c), the
total number of drugs involved between the patient and physician, and
x10 '
the number of drugs which the patient is currently consuming for
which he understands the function.
and X
10
= 3,
For example, if (a + b + c)
the value of (a + b + c) -
XIo
= 0 and the associated
would be assigned to the i
=1
category.
A
value of A.
J
=3
Using this
condition, i = 1, 2, • . • , 6 categories were fashioned for the
A
range of A •
j
Where the difference between (a + b + c) and X was
10
greater than or equal to five, a category i • 6+ was formed to assure
a reasonable sample size for this category.
Table 7.30 presents the
A
predicted values of Ai for model (7.3.1), the boundaries of the
intervals about these values of Ai' and their associated observed
frequencies.
.
II
I
~
I
,~!
I
,
118
Table 7.30. Intervals Containing the Predicted Values of the Parameter
A and Their Respective Observed Frequencies
Observed
Frequency
i
"
A's
for (7.3.1)
~
1
0.1512
0.0000 - 0.4216
127
2
0.8095
0.4216 - 1.0799
94
3
1.4678
38
4
2.1261
1.0799 - 1. 7466
1. 7466 - 2.4216
5
6+
2.7843
"
Intervals About Ai
26
2.4216 - 3.0799
3.0799 - + 00
4.7256
14
6
From the sample data the estimate of the parameter 8 in (7.3.2)
was determined to be
8" = 0.8066.
In Table 7.31, we present the information pertinent to the investigation of the assumption that the distribution of the parameter A
for model (7.3.1) can be represented by an exponential distribution
of the form given in expression (7.3.2) with a parameter value of
a=
0.8066.
Table 7.31. Observed and Expected Frequencies Hypothesizing an
Exponential Distribution
~ 's for (7.3.1)
-1
Observed
Frequency
Expected
Frequency
X2
0.1512
127
124.2
0.0631
0.8095
94
100.9
0.4718
1.4678
38
45.0
1.0889
2.1261
26
19.8
1.9414
2.7843
14
8.4
3.7333
4.7256
6
6.7
0.0731
305
305.0
7.3716
Total
~
119
Pr [X2 (4) ~ 703716111a]
<
0020
Based upon this test criterion, we would accept the null
hypothesis that the distributional form of the.parameter A for
model (7.3.1) can be represented by an exponential distribution.
It is felt that the results of Sections 7.2 and 7.3 tend to
substantiate both the tenability of model (7.3.1) as a predictor
of number of omissions as well as the validity of the distributional
assumptions underlying this model.
7.4
Summary of Chapter VII
In this chapter we have been able to identify two aspects of
patient medication regimens which appear to be useful in the prediction
of number of omissions.
The number of prescription medications
currently being prescribed which the patient is not taking was shown
to vary directly with the total number of prescription medications
involved between the physician and patient.
supported by the literature.
This result is generally
In addition, given the total number of
drugs involved between the physician and patient, it was demonstrated
that as the number of prescription medications which a patient is
currently consuming for which he knows or understands their function
decrease, the number of prescription medications he has been prescribed
but which he is not taking decrease.
This finding serves to reaffirm
the conviction that effective communication between the patient and
physician is of the essence in assuring that indlviduJUla rectdve
quality primary medical care.
This result also rcinforccl:l the
120
notion that there is a definite need for the physician to invest
some amount of time with each patient in an effort to educate them
in the function of the medications which he is prescribing for
their condition.
This does not however, release the patient from
his responsibility to assist the physician in his task.
CHAPTER VIII
SUMMARY AND CONCLUSIONS
8.1
Summary
Chapter I of this dissertation began with a statement expressing
the desire of the American Academy of Family Physicians and the
Department of Epidemiology at the University of North Carolina to
design a research study with the intent of assessing primary medical
care.
In connection with this intent, the concept of the
"indicator case" model was developed and presented.
Eight Elements
for Assessment were selected as being indicative of the effectiveness
of the medical services being provided and of these, physician-patient
communication and patient compliance were chosen for primary consideration in this dissertation.
An observable behavior which was felt to
be indicative of both of these chosen elements was medication-taking
behavior.
Having selected this response, the major goal of this
dissertation was set forth as being an investigation aimed at identifying factors which would prove to be useful in the prediction of
non-compliant medication-taking behavior.
Chapter II dealt with a brief description of the study site
followed by a discussion of physician and patient enrollment procedures.
Data collection methods were described in
so~e
detail along
with a number of problems which could possibly affect the validity
or completeness of the data.
Later sections of this chapter presented
122
the response variables of interest as well as the independent variables from which predictive factors would possibly be chosen.
The relevant literature in the field of non-compliant medicationtaking behavior was reviewed and related to the design of the AAFP-UNC
study in Chapter III.
Chapter IV was a presentation of the general theory underlying
the method of Multiple Regression of a Poisson Process.
Beginning
with a demonstration that the form of the distribution of a response
variable can be described by a negative binomial distribution arising
as the result of the compounding of a Poisson process with a gamma
distribution, methods for estimating the parameter A, when expressed
as a linear combination of k properly chosen independent variables,
were also discussed along with some of the statistjcal properties of
these estimates.
A final section dealt with a proposed methodology
for selecting independent variables from some larger set of independent variables.
Selected characteristics of the final patient sample were
presented in Chapter V.
In Chapter VI results derived from attempts to predict the number
of commissions experienced by a patient, i.e., the number of drugs
which the patient is currently taking of which his physician is
unaware, were presented.
Initial independent variable selection
revealed that, for these data, the most suggestive independent
variables appeared to be those relating to
thenumbe~
of drugs
involved between the physician and patient as well as to the scheduling complexity of these medications.
The final model chosen to
123
predict number of commissions was a single variable model containing
(a + b + c), the total number of drugs involved between the physicianpatient pair.
This relationship was positive in nature but non-
linear in that, the
grea~er
the total number of drugs involved between
the physician-patient pair, the ever greater is the proneness of the
patient to consume more medications of which his physician is unaware.
The major implication of this result is possibly that, given an ever
increasing number of medications with which the patient must become
involved, the likelihood of his consuming contraindicated medications
is also heightened.
The results obtained from attempts to select factors useful in
the prediction of number of omissions were presented in Chapter VII.
It is recalled that omissions are those drugs which are currently
being prescribed by a physician but which are not being taken by the
patient.
Those independent variables which appeared to be most
highly associated with this response variable were, once again,
factors relating to various aspects of the patient's medication
regimen.
Those factors selected to predict number of omissions were
1) (a + b + c), and 2) the number of drugs currently being taken for
which the patient knows or understands their function.
Given these
two aspects of a patient's medication regimen, it can be demonstrated
that, for these data, the number of prescription medications currently
being prescribed which the patient is not taking varies directly with
the total number of drugs involved between the physician and the
patient.
Also, given (a + b + c), the number of prescri.ption medi-
cations which a patient is currently consuming fur whIch Iw knowH
124
or understands their function varies inversely with the number of
omissions which he experiences.
That is to say, given (a + b + c),
the fewer the number of drugs for which the patient knows their
function, the higher is the likelihood of his not taking all of the
medications currently prescribed for him by his physician.
This
result effectively confirms the notion that there must be a reciprocal
and effectual level of communication between the physician and the
patient to insure the quality of the medical care received.
In summary, if anyone statement concerning factors which are
useful in the prediction or identification of the non-compliant
patient can be made, it would possibly be that behavior of this type
is relatively difficult to predict in a consistent fashion.
This is
generally supported by the current literature on non-compliance.
The
~
one factor which consistently appeared to be related to non-compliance
was the total number of medications involved between the physician
and patient.
For these data, increased non-compliance was positively
related to an increase in the total number of medications with which
the patient is involved.
8.2
Recommendations
Although the results produced by this dissertation appear to be
less than definitive regarding those specific factors which may be
employed to predict non-compliant medication-taking behavior, I do
believe that they are indicative of areas requiring further research
and refinement.
From the large !:let of indepmldcnt var lab h'H
COIIH fd(~n'd
initially, those variables which were related lo vildoUH ilHIH:clH 01 the
medications or medication regimens appeared to displ,ay the most poten-
e
125
tia1 for predicting non-compliant behavior.
For the present study,
the major emphasis with respect to variables in this category was to
simply define aspects of the medications and medication regimens which
were logical and measurable.
In future efforts, variables of this
type should receive more emphasis and refinement.
One recommendation would be to define sets of independent variables which are specific to the source or orientation of the response.
For example, number of omissions is a "physician-oriented" response
since the patient did not contribute any information concerning these
drugs.
Conversely, number of commissions is a "patient-oriented"
response due to the physician being unaware of the presence of these
drugs.
Variables which are proposed as being useful in the prediction
of physician-oriented responses should only be based upon the (a + b)
drugs, i.e., those drugs which are currently being prescribed.
Under
this condition, physician-oriented information is available for each
medication comprising the independent variables.
When variables based
upon (a + c) drugs, those drugs currently being taken, are used as the
basis for the prediction of physician-oriented responses one has
physician-oriented information for only a portion of the medications
comprising these variables.
This situation can sometimes detract from
the strength of a result because it is not possible to clearly state
whether the result is due to the contribution of the physician-oriented
drugs, Le., the "a" drugs, or the contribution of the patient-oriented
drugs, Le., the "c" drugs.
Another variation of the above recommendation w(juld be to base
all independent variables upon only the "a" drugs, of those upon which
information from both the physician and patient is pr-esent.
A p06siblc
126
problem with this approach may be that a sizable number of prescription
medications would have to be discarded due to the lack of information
from one source.
The general statistical theory of Poisson regression analysis
appears to be adequate but, it is only appropriate when one's model is
completely specified.
Quite often researchers in the health, social,
and physical sciences are interested in discerning and interpreting
complex relationships involving large numbers of interrelated variables.
To date, there is little or no literature available regarding
independent variable selection procedures for Poisson regression
analysis.
Without the development of some such techniques, the value
of this type of statistical analysis is severely limited for many
potential users.
Non-compliant medication-taking behavior remains a prevalent
phenomenon in most medical practices and given this, the medical care
received by such patients cannot realize its full potential.
Hopefully, this work has provided some direction for future research
into the identification of such individuals.
BIBLIOGRAPHY
1.
Arbous, A. G., and J. E. Kerrich: "Accident Statistics and the
Concept of Accident-Proneness." Biometrics, Vol. 7,
December, 1951, pp. 340-432.
2.
Bates, G. E., and J. Neyman: "Contributions to the Theory of
Accident-Proneness." University of California Publications
in Statistics, Vol. 1, April, 1952, pp. 215-275.
3.
Blackwell, Barry: "Patient Compliance." New England Journal of
Medicine, Vol. 289, August 2, 1973, pp. 249-252.
4.
Davis, M. S.: "Physiologic, Psychological and Demographic Factors
in Patient Compliance with Doctors' Orders." Medical Care,
Vol. 6, No.2, March-April, 1968, pp. 115-122.
5.
Davis, M. S., and R. R. von der Leppe: "Discharge from Hospitals
Against Medical Advice: A Study of Reciprocity in the DoctorPatient Relationship." Social Science and Medicine, Vol. 1
1968, pp. 336-342.
6.
Donabedian, A., and L. S. Rosenfeld: "A Follow-Up Study of Chronically III Patients Discharged from 'Hospital." Presented at
the Ninety-First Annual Meeting of the American Public Health
Association, Kansas City, 1963.
7.
Edwards, C. B., and J. Gurland: "A Class. o.f Distributions Applicable to Accidents." Journal of the American Statistical
Association, Vol. 56, September, 1961, pp. 503-517.
8.
Feller, William: An Introduction to Probability Theory and Its
Applications, Vol. 1, New York, John Wiley and Sons, Inc.,
1968.
9.
Gillings, Dennis B.: "Some Statistical Methods in Health Services
Research." Unpublished Doctoral Dissertation, The University
of Exeter, 1972.
10. Greenwood, M., and G. U. Yule: "An Inquiry into the Nature of
Frequency Distributions Representative of Multiple Happenings
with Particular Reference to the Occurrence of Multiple
Attacks on Disease or of Repeated Accidents." Journal of
.the Royal Statistical Society, Vol. 83, March, 1920,
pp. 255-279.
128
11. Hollingshead, A. B., and Redlich, F. c.: Social Class and Mental
Illness. New York, John Wiley and Sons, Inc., 1958.
12. Hu1ka, B. S., n a1.: "Scale for the Measurement of Attitudes
toward Physicians and Primary Medical Care." Medical Care,
Vol. 8, No.5, September-October, 1970, pp. 429-436.
13. Hu1ka, B. S., and J. C. Cassel: "The AAFP-UNC Study of the Organization, Utilization, and Assessment of Primary Medical Care."
American Journal of Public Health, Vol. 63, No.6, June, 1973,
pp. 494-501.
14. Hu1ka, B. S., et a1.: "Medication Use and Misuse: PhysicanPatient Discrepancies." Journal of Chronic Diseases, Vol. 28
1975, pp. 7-21.
15. Johnson, Norman L., and Kotz, Samuel: Distributions in Statistics:
Discrete Univariate Distributions. New York, John Wiley and
Sons, Inc., 1972.
16. Johnson, Norman L., and Fred C. Leone: Statistics and Experimental
Design in Engineering and the Physical Sciences. Vol. 1, New
York, John Wiley and Sons, Inc., 1964.
17. Jorgenson, Dale W.: "Multiple Regression Analysis of a Poisson
Process." Journal of the American Statistical Association,
Vol. 56, June, 1961, pp. 235-245.
18. Kaplan, Ellen B., and R. C. Elston: "A Subroutine Package for
Maximum Likelihood Estimation." Institute of Statistics
Mimeo Series, No. 823, The University of North Carolina,
Chapel Hill, June, 1972.
19. Marston, Mary-'Vesta: "Compliance with Medical Regimens: A Review
of the Literature." Nursing Research, Vol. 19, No.4, JulyAugust, 1970, pp. 312-323.
20. Neely, Elizabeth A., and Maxine L. Patrick: Problems of Aged
Persons Taking Medications at Home." Nursing Research,
Vol. 17, No.1, January-February, 1968, pp. 52-55.
21. Rabin, David L.: "Use of Medicines: A Review of Prescribed and
Nonprescribed Medicine Use." Medical Care Review, Vol. 29
No.6, June, 1972, pp. 668-699.
n
22. Schwartz, Doris,
al.: "Medication Errors Made by Elderly,
Chronically III Patients." American Journal of Public Health,
Vol. 52, No. 12, December, 1962, pp. 2018-2029.
129
23. Wald, A.: "Tests of Statistical Hypotheses Concerning Several
Parameters When the Number of Observations is Large."
Transactions American Mathematical Society, Vol. 54,
November, 1943, pp. 426-482.
24. Watkins, J. D., et a1.: "Observation of Medication Errors Made by
Diabetic Patients in the Home." Diabetes, Vol. 16, No. 12,
December, 1967, pp. 882-885.
25. Weber, Donald C.: "Accident Rate Potential: An Application of
Multiple Regression Analysis of a Poisson Process." Journal
of the American Statistical Association, Vol. 66, No. 334,
June, 1971, pp. 285-288.