PATHMOX: A PLS-PM Segmentation Algorithm

PATHMOX: A PLS-PM Segmentation Algorithm
Gastón Sánchez1
Tomàs Aluja2
Laboratory of Information Analysis and Modelling (LIAM)
Universitat Politècnica de Catalunya,
1
e-mail: [email protected], [email protected]
Summary: One of the main issues within path modeling techniques, especially in
business and marketing applications, concerns the identification of different segments in
the model population. The approach proposed by the authors consists of building a path
models tree having a decision tree-like structure by means of the PATHMOX (Path
Modeling Segmentation Tree) algorithm. This algorithm is specifically designed when
prior information in form of external variables (such as socio-demographic variables) is
available. Inner models are compared using an extension for testing the equality of two
regression models; and outer models are compared by means of a Ryan-Joiner
correlation test.
Keywords: PLS-PM Segmentation, equality of regression models, Ryan-Joiner test,
1
Path Modeling and Segmentation
Within marketing and business management researches, PLS-PM has been applied
successfully in studies concerning the measurement of intangibles like customer and
employee perceptions (e.g. satisfaction, motivation, loyalty). In this type of studies it is
interesting to try to identify groups of individuals with similar behavior, that is, to
identify customer/employee segments. This segmentation task is crucial to managers so
they can improve their decision making process and increment organizations
profitability. Different proposals have been developed for tackling this problem: The
finite mixture for PLS was proposed by Hahn et al (2002) and extended later in Ringle
et al (2005); Squillacciotti (2005) extends PLS Typological Regression to perform PLS
path modeling classification.
In many occasions, external information (information out of the model) is available,
regarding individuals’ characteristics such as socio-demographic variables (e.g. age,
gender, level of studies, etc.). In these cases the process of segments identification has
to take into account not only the structure of the model but also the available external
information. However there is one main problematic issue, common to all path
modeling segmentation approaches, concerning the following question: given two
segments and their corresponding models, how should they be compared? This may
require the definition of a measure of distance among the models, which is not an easy
task due to model complexity because every path model is integrated by two models:
the inner model and the outer model; hardly jointly-comparable with other (inner-outer)
models.
2
PATHMOX Algorithm
In this work a different approach for path modeling segmentation is proposed with the
PATHMOX1 (Path Modeling Segmentation Tree) algorithm. The idea is to build a path
models tree having a decision tree-like structure with models for different segments in
each of its nodes. The segments identification takes into account not only the available
prior information, in form of external variables (such as socio-demographic variables),
but also considers the structural relationships between variables. That is, different
segments at the level of construct relationships can be identified using external
information which is very useful for management executives who often require such
variables (gender, age, levels of study, etc.) to direct decision making processes and
allocate company resources to increase organization profits. Also, by producing a binary
segmentation tree, the segments are clearly identified and easily described.
Until now, there is no agreement about which criterion should be used for path models
to be compared. We suggest that this comparison should be performed at two levels:
first at the inner models level, and then at the outer models level. For the inner models
the comparison, which serves to identify different segments, should be based on the
path coefficients because they imply the causal structural relationships. Once the
segments are identified, the outer models comparison is tested in order to answer the
next question: should latent variables in children nodes remain the same as in the parent
node?
The algorithm starts with the estimation of the global PLS Path Model (over all the
individuals) at the root node. Then, with the help of the explanatory external variables,
all possible binary splits of the latent variables are produced and local models for each
partition are calculated. Among all the possible splits, the best one is selected by means
of a test for comparing inner models. The applied test is an extension for testing the
equality of two regression models in Lebart et al (1985). In addition, outer models are
also compared using a Ryan-Joiner correlation test to decide whether latent variables in
children nodes remain the same as in the parent node. The stop criterion considers the
number of individuals in a node, and the significance level for the best split.
PATHMOX Algorithm
Step 1: Start with the global PLS path model at the root node
Step 2: Find the best split: test for equality of coefficients of the inner model
Step 3: If (stop criteria = false) then
For each child node
Evaluate the outer models equivalence by means of Ryan-Joiner test
If (distinct outer models = true) then
Re-estimate the PLS path model in the child node
Return to Step 2
Else
Stop algorithm
Path Modeling Segmentation Tree Algorithm
1
Actually, the term MOX in PATHMOX refers to Moxexeloa which is a Nahuatl word (the Aztec
language) that means “divide into groups”.
The splitting process is the following. Consider the modalities of an external explicative
variable; then for every possible two-way split of these modalities, the set of latent
variables is divided into two groups 1 and 2 of size n1 and n2 respectively, and the inner
model for each set is estimated, that is, every partition produces two potential models
for the children nodes. The inner models of the children nodes are compared with the
inner model of the parent node, performing this comparison by an extension of a test for
evaluating the equality of regression models using an F-statistic hypothesis test. In this
case, this test is based on the path coefficients and assumes that residual terms ε’s have
a normal distribution.
In the null hypothesis H0 all the coefficients are assumed to be identical; in the
alternative hypothesis H1 the coefficients of the two models are considered to be
different:
η1 = ξ1 Β1 + ε1 ,
η2 = ξ2 Β2 + ε2
where ηi is a column vector of all endogenous latent variables; ξi is the matrix of the
explicative latent variables related to each endogenous LV; Βi is a column vector of all
path coefficients; and εi is the residual vector assumed to be normally distributed.
Under the null hypothesis H0 all coefficients are equal: Β1 = Β2 = Β
η1 = ξ1 Β + ε1 ,
η2 = ξ2 Β + ε2
The models in both hypotheses can be expressed in matrix notation as follows:
Under H0:
η1  ξ1 
ε 1 
η  = ξ [β ] + ε 
 2  2
 2
Under H1:
η1  ξ1 0   β 1  ε 1 
η  =  0 ξ   β  + ε 
2  2 
 2 
 2
Calculating the sum square error SSE0 and SSE1 from each hypothesis, the test statistic
is an F statistic with p* and (n* - 2p*) degrees of freedom:
F=
(n * −2 p*) SSE 0 − SSE1
p*
SSE1
where n* = N J ; N = n1 + n2, (number of elements in the model); J: number of
endogenous LVs; p* = Σj pj; pj: number of explicative LVs for each j-th endogenous
LV, j =1,…,J.
The partition resulting with the most significant p-value is considered as a candidate for
the best split. This process is applied for each external explicative variable selecting the
partition with the minimum p-value among all the candidates as the optimal split.
Once a child node (segment) is identified, the next step consists in testing the
equivalence of the child’s outer model with the parent’s outer model. This is done in
order to verify if the estimated latent variables remain the same as in the parent node, or
if they should be re-estimated in the child node. In order to compare the outer models,
we are focusing on the correlations between the LVs in the parent node and the LVs in
the child node. Assuming that if the outer model in a child node is very similar to the
outer model in the parent node, correlations between the LVs in the parent node and the
LVs in the child node should be high (close to unity). To asses how high correlations
are, the Ryan-Joiner correlation test (Ryan & Joiner, 1976) is used.
The Ryan-Joiner test is an objective way of judging normal probability plots used for
testing normality on a set of data. In other words, this test is used to measure the
straightness of a probability plot. By using a Ryan-Joiner test we do not pretend to
perform any normality test; instead we use it as a tool for assessing how close to unity
are the correlations between the LVs in the parent node and the LVs in the child node. It
may be argued that this test is being misused the way it is applied in the PATHMOX
algorithm, however we use it as a first (although primitive) tool for outer models
comparison.
Finally, the stop rule evaluates two conditions: (1) a fixed number of individuals in a
node, and/or (2) the p-value significance level. The first condition is used to avoid the
presence of small size segments which are not duseful in practice. The second criterion
avoids the identification of segments with low significance levels.
3
Job Motivation and Satisfaction
For many decades, the importance of intangibles has been recognized among businessmanagement literature. But it is until recent years that more attention has been paid to
the development of methodologies for measure and reporting them (Eskildsen et al,
2004a). Some examples are the American Customer Satisfaction Index (ACSI) or the
European Customer Satisfaction Index (ECSI).
Measuring the levels of employee perceptions like satisfaction, motivation,
commitment, or intention to leave the job, is an important task because of their
implications for job related with productivity, absenteeism, competitiveness, etc.
Usually, a motivated worker is assumed to have a better performance, and consequently
he or she will give a better contribution to the business. Another aspect closely related
to motivation is satisfaction, and it is important to know how satisfied –or dissatisfiedan employee is, because a frustrated worker or a passive worker could have serious
intentions to leave the job. Turnover could be a serious problem for many businesses
due to the skills and the necessary experience employees must have which are hard to
acquire and require years of formation and training. Even if employee’s knowledge and
skills are not important, companies must face a decrease in productivity when its labor
force is reduced because of turnover.
Employee perceptions analyses have a long tradition among psychologists, sociologists
and human resources researchers, being part of a field of study known as Organizational
Psychology. However, the application of causal models aimed to develop standardized
measuring methods for such perceptions is relatively new.
3.1
Causal Model
The proposed causal model is an adaptation of the models exposed in Känd and Rekor
(2005) and Eskildsen et al (2004b). Some other similar models are found in Gaertner,
(1999), Currivan, (1999), and Kim (1999). One of the main differences between the
actual model and those which served as basis is the fact that not only a satisfaction
construct is considered but also a motivation construct is taken into account. The reason
for consider satisfaction and motivation by separate, is based on their definitions. It is
assumed that a positive emotional state causes someone to perform efficiently on the
job; in other words, it is assumed that satisfaction causes motivation.
The theoretical framework for the causal model comprises Herzberg’s two factor theory
and expectancy theory. Herzberg’s theory states that persons have two classes of needs:
(1) hygiene needs and (2) motivation needs (Furnham, 2001). The first type of needs is
influenced by the physical and psychological conditions in which people work. Factors
related to hygiene needs are immediate supervision, work conditions, workload, salary,
corporate policies, managerial practices, personal relationships, among others.
Motivation needs are related to autonomy, achievement, responsibility, recognition,
feedback, enrichment and promotional chances.
On the other hand, expectancy theory assumes employees enter organizations with a set
of beliefs about their workplace divided into three categories: expectancy,
instrumentality and valence. Expectancy is the belief that a personal effort will conduce
to an efficient performance. Instrumentality is the belief that a good performance will be
rewarded. Valence is the perceived value of the expected rewards. Thus, motivation is
seen as a multiplicative process of expectancy, instrumentality and valence. Motivation
will be achieved with high levels of valence, instrumentality and expectancy.
The model includes eight constructs of which five are exogenous and three are
endogenous. The exogenous constructs can be divided among four main groups of work
environment characteristics (hygiene factors) and one construct related to the
motivational factors. These are the following:
1. Conditions of work: Perceptions of the workplace conditions and facilities
2. Salary: Remuneration of work performed in the organization
3. Leadership: Degree of consideration expressed from an employee in a subordinate
position
4. Image: Degree to which an employee feels identified with the organization
5. Empowerment: Perceptions of autonomy, initiative, responsibility, recognition
The three endogenous latent variables are satisfaction, motivation and loyalty. The
concept of loyalty is a complex one because it involves the employee propensity to stay
in the organization and the degree to which he or she is committed with it. It is assumed
that loyalty is the output of satisfaction and motivation.
The proposed causal model showed in figure 1 focuses on the causal relationship
between job satisfaction, job motivation, and loyalty. All three endogenous variables are
assumed to be influenced by the exogenous constructs. In addition, satisfaction affects
motivation, and both of them affect loyalty. By letting all the exogenous latent variables
to be related to the three endogenous constructs, the aim is to cover different job
satisfaction and motivation dimensions and allow comparison of impacts to the three
endogenous variables.
Empowerment
Satisfaction
Image
Salary
Loyalty
Work
Conditions
Motivation
Leadership
Figure 1 Path diagram for the Causal Model
3.2
Data and Results
The PATHMOX algorithm is applied using data collected from an employee
satisfaction and motivation survey from employees working in a Spanish banking
entity. The data contain 41 variables observed over 8020 employees and grouped in 8
sets regarding the 8 latent variables: Empowerment, Image, Salary, Work Conditions,
Leadership, Satisfaction, Motivation, and Loyalty. In addition to the 41 manifest
variables, five external explanatory variables regarding socio-demographic aspects are
considered: gender, age, job level, seniority, and sector differences.
For comparison purposes we give in Table 1 the estimates of the path coefficients (inner
model) for the global and the 6 models of the identified segments. The resulting final
segments are: (1) female managers, (2) male managers, (3) senior assistant managers,
(4) medium-junior assistant managers, (5) employees in sector A, and (6) employees in
sector B.
Figure 2 illustrates the obtained segmentation tree. The number of employees forming
each segment is shown inside each node, and segments in final nodes are numbered
from 1 to 6. Also, every split is characterized by its corresponding explanatory partition.
The segmentation tree shows a first division according to the job level into managers
and other employees. Managers are further splitted according to gender. With respect to
the other employees they are divided into assistant managers, and the rest of workers.
Assistant managers are partitioned according to their seniority (before 1975, after 1975);
and finally the rest of the workers are segmented according to the sector (A or B).
Table 1 Inner model results for global model and final segments
SATISFACTION
R2
Empowerment
Work Conditions
Leadership
Image
Salary
Global
Seg 1
Seg 2
Seg 3
Seg 4
Seg 5
Seg 6
0,568
0,4972
0,1263
0,0822
0,1494
0,1432
0,505
0,3883
0,2354
0,1176
0,2155
0,0244
0,533
0,4467
0,1587
0,0624
0,2045
0,1137
0,571
0,4865
0,0609
-0,0119
0,2023
0,2241
0,572
0,4931
0,1568
0,0784
0,1528
0,1135
0,6356
0,5161
0,1486
0,1238
0,1315
0,1272
0,6263
0,5698
0,1331
0,1149
0,0634
0,0869
MOTIVATION
R2
Empowerment
Work Conditions
Leadership
Image
Salary
Satisfaction
Global
Seg 1
Seg 2
Seg 3
Seg 4
Seg 5
Seg 6
0,47
0,1536
0,1456
0,1036
0,1025
-0,0649
0,3964
0,3904
0,1686
-0,1062
0,2317
-0,0218
-0,0485
0,4344
0,376
0,1463
0,0751
0,1118
0,096
-0,0644
0,3741
0,51
0,2869
-0,0144
-0,0438
0,1395
0,0254
0,4141
0,4263
0,1313
0,1592
0,1332
0,0713
-0,0238
0,3663
0,5538
0,1491
0,1936
0,0771
0,1588
-0,1315
0,4095
0,5412
0,1366
0,1243
0,1002
0,0815
0,0397
0,428
Global
Seg 1
Seg 2
Seg 3
Seg 4
Seg 5
Seg 6
0,56
0,1457
0,0004
-0,0405
0,2109
0,1405
0,2879
0,2302
0,5338
0,3213
0,0156
0,0938
0,2459
-0,0867
0,1502
0,2036
0,526
0,1334
-0,0296
0,0274
0,1899
0,1513
0,2863
0,2182
0,568
0,1219
0,0143
-0,0222
0,0614
0,1659
0,3194
0,2603
0,5087
0,1795
-0,024
-0,0091
0,2339
0,1039
0,2795
0,1744
0,5777
0,1573
0,0448
-0,0703
0,2271
0,1394
0,2082
0,2803
0,6
0,1873
0,0225
-0,0854
0,1825
0,1405
0,2455
0,2748
LOYALTY
R2
Empowerment
Work Conditions
Leadership
Image
Salary
Satisfaction
Motivation
From Table 1 one can see some differences between the global model and the identified
segments according to the three endogenous constructs. Regarding Satisfaction, female
managers give much importance to Work Conditions and Image; male managers are
satisfied mainly by the Image; Salary influences senior assistant managers; Leadership
is important for employees in Sectors A and B.
With respect to Motivation, female managers are motivated mainly by having a good
Leadership and Image; senior assistant managers consider Empowerment and Salary
important factors while the other assistant managers consider Leadership; workers in
Sector A relate Motivation with Work Conditions and Image.
Finally with Loyalty, female managers are influenced by Empowerment, Work
Conditions and Leadership; senior assistant managers give importance to Work
Conditions, Satisfaction and Motivation; other assistant managers are influenced by
Empowerment; and the rest of the workers consider Work Conditions and Motivation
important to be loyal.
8020
Root node
Other employees
Managers
Women
Men
2650
5370
Assi stant
Managers
Rest of
employees
2
1
272
2378
2102
3268
Sen
<= 75
3
174
Sen >
75
4
1928
Sector
A
5
1609
Sector
B
6
1659
Figure 2 Segmentation Tree
4
Conclusions
Future research on Path Modeling segmentation with the PATHMOX algorithm will be
mainly focused in two problematic issues, which are common also to other approaches
in Path Modeling segmentation. They concern (1) the comparison among different path
models, and (2) whether structural models remain the same for all segments.
Specifically, the model comparison criteria applied within the PATHMOX algorithm
have two problems. On the one hand the F-test used for inner model comparison
assumes normal distribution with the residual terms ε’s which may not be applied in
practice. On the other hand, the Ryan-Joiner correlation test for normal probability plots
could be misused for outer models comparison. Thus, other comparison methods should
be developed disregarding normal distributional assumptions.
With respect to the second problematic issue, all path modeling segmentation
techniques assume that the inner model remains the same for all segments. However,
further analysis might be performed considering the possibility that segments may also
differ for the structural model itself.
References
Currivan D. B. (1999) The Causal Order of Job Satisfaction and Organizational
Commitment in Models of Employee Turnover, Human Resource Management Review,
9 (4), 495-524
Eskildsen J. K., Kristensen K., Westlund A. (2004a) Work motivation and job
satisfaction in the Nordic countries, Employee Relations, 26 (10), 122-136
Eskildsen J. K., Kristensen K., Westlund A. (2004b) Measuring employee assets: The
Nordic Employee Index, Business Process Management, 10 (5), 537-550
Furnham A. (2001) Psicología Organizacional: el comportamiento del individuo en las
organizaciones. Oxford University Press, Mexico. (Trans from: The Psychology of
behaviour at work: the individual in the organization)
Gaertner S. (1999) Structural Determinants of Job Satisfaction and Organizational
Commitment in Turnover Models, Human Resource Management Review, 9 (4), 479493
Hahn C., Johnson M., Herrmann A., Huber F. (2002) Capturing Customer
Heterogeneity using a Finite Mixture PLS Approach, Schmalenbach Business Review,
54, 243-269.
Känd M., Rekor M. (2005) Perceived Involvement in Decision Making and job
Satisfaction: The Evidence from a Job Satisfaction Survey among Nurses in Estonia,
SSE Riga Working Papers, 6, Stockholm School of Economics in Riga.
http://www.sseriga.edu.lv/library/working_papers/FT_2005_6.pdf
Kim S. (1999), Behavioral Commitment Among the Automobile Workers in South
Korea, Human Resource Management Review, 9 (4), 419-451
Lebart L., Morineau A., Fénelon J. P. (1985) Tratamiento estadístico de datos,
Marcombo, Barcelona, Spain.
Ringle C. M., Wende S., Will A. (2005) Customer Segmentation with FIMIX-PLS, in:
Proceedings of the PLS’05 International Symposium, T. Aluja, J. Casanovas, V.
Esposito, A. Morineau, M. Tenenhaus (Eds.), SPAD Test&go, 507-514, .
Ryan T. A., Joiner B. L. (1976) Normal Probability Plots and Tests for Normality,
Technical Report, Statistics Department, The Pennsylvania State University, USA.
Squillacciotti S. (2005) Prediction oriented classification in PLS Path Modelling, in:
Proceedings of the PLS’05 International Symposium, T. Aluja, J. Casanovas, V.
Esposito, A. Morineau, M. Tenenhaus (Eds.), SPAD Test&go, 499-506.