The transformation and search of semi

The transformation and search of
semi-structured knowledge in
organizations
Chun-Che Huang and Chia-Ming Kuo
Chun-Che Huang is a Professor,
Laboratory of Intelligent Systems
& Knowledge Management,
Department of Information
Management, National Chi-Nan
University, Taiwan
([email protected]).
Chia-Ming Kuo is a graduate
student, Department of
Information Management, National
Chi-Nan University, Taiwan.
Abstract Knowledge is perceived as very important asset for organizations and knowledge
management is critical for organization competitiveness. Because the nature of knowledge is
always represented as complex and varied, it is difŽcult to extend effectiveness of knowledge
re-use in organizations. In this article, an approach based on the Zachman’s Framework
to externalize organizational knowledge into semi-structured knowledge is developed, and
eXtensible Markup Language (XML) is applied to transform the knowledge into documents. In
addition, latent semantic indexing (LSI), which is capable of solving problems of synonyms and
antonyms, as well as improving accuracy of document searches, is incorporated to facilitate
search of semi-structured knowledge (SSK) documents based on user demands. The SSK
approach shows great promise for organizations to acquire, store, disseminate, and reuse
knowledge.
Keywords Knowledge management, Organizations
Introduction
Knowledge management is becoming more important for individuals and organizations, and is
increasingly considered as a main source of competitive advantage for corporations (Grant,
1996; Prusak, 1996; Roth, 1996; Spender and Grant, 1996). Davenport and Prusak (1998)
deŽne knowledge as a uid mix of framed experience, values, contextual information, and
expert insight that provides a framework for evaluating and incorporating new experience and
information. Knowledge is assumed to originate from the mind of knowledge workers (Alavi
and Leidner, 1999) and is constantly spiral in organizations (Nonaka and Takeuchi, 1995). Also,
in organizations it often becomes embedded not only in documents or repositories, but also
in organizational routines, processes, practices, and norms (Davenport and Prusak, 1998).
Knowledge can also be designed as the power to act and to make value-producing decisions
(Kanter, 1999; Polanyi, 1962). The characteristics of knowledge are complexity, high
changeability, abstraction, fuzziness, and lack of structure. Therefore, capturing, transforming
and search knowledge is a critical issue for organizations. Basically, knowledge cannot be
controlled or managed in a rational, top-down fashion like other assets of the organization (Klint
and Verhoef, 2002).
Documents are a common source of knowledge in organizations; they partially provide the
contents of knowledge based on entries. However, it is known that its contents reect only a
fraction of the knowledge that is encoded in a document (Dzbor et al., 2000). Traditional
documents and searching techniques not only make knowledge difŽcult to discover, but also
PAGE 106
|
JOURNAL OF KNOWLEDGE MANAGEMENT
|
VOL. 7 NO. 4 2003, pp. 106-123, ã MCB UP Limited, ISSN 1367-3270
DOI 10.1108/13673270310492985
make it brittle. Consequently, it can only be applied in very limited application (Miller et al.,
1992).
There are numerous studies on knowledge management. Mo and Menzel (1998) created a
model to capture knowledge from company domain experts and from Želd information available
from experienced users. O’Leary (1998) developed a system which is able to capture companywide knowledge. Liebowitz (1997) proposed a study of knowledge assets and the scheduling of
their use within organizations. Overview and developments in knowledge management are
provided in Satyadas et al. (2001), Alavi (2001), Fischer and Ostwald (2001), Chauvel and
Despres (2002). Although many scholars studying KM focus on the structured knowledge in the
deŽned formats like rules or procedures, only a few attempt to emphasize semi-structured or
unstructured knowledge. This article aims to focus on:
f
A deŽnition of semi-structured knowledge: as mentioned above, knowledge is changing
constantly, and the general observation is that knowledge cannot be controlled or managed
in a rational, top-down fashion, frequently existing in an unstructured format (Klint and
Verhoef, 2002). Furthermore, people play different roles in an organization, so the
requirements of knowledge and its application are diverse. These requirements are
expressed with a series of dimensions or abstractions in the Zachman Framework (Inmon
and Zachman, 1997). This format of 5W1H (what, where, who, when, why, and how)
represents a suitable solution approach to externalize knowledge in organizations because it
can capture the nature of each dimension (perspective) and integrate the target knowledge.
Applying the Zachman Framework, knowledge in organizations can be transformed
systematically as semi-structured knowledge documents.
f
Transformation of semi-structured knowledge: it is assumed that whenever organizations
experience internal or external changes, complex and large amounts of data, information,
and knowledge are generated. Furthermore, most of the generated knowledge is probably
illustrated in unstructured or semi-structured manners. Hence, the management of
unstructured or semi-structured knowledge is an important issue. Due to lack of wellstructured storage approaches for KM in literature, the integration of knowledge activities has
been impossible (Nonaka and Takeuchi, 1995). In addition, it is known that the storage
structure signiŽcantly impacts the effectiveness and efŽciency of operating KM activities.
Therefore, this article attempts to apply eXtensible Markup Language (XML) to effectively
transform organizations experience into documents of semi-structured knowledge and
manage them.
f
Search of semi-structured knowledge: how to use organizational knowledge effectively
and agilely is crucial since: if desired organizational knowledge can not be accessed in time,
the knowledge is in vain and no more active (Turban and Aronson, 2001). However, the form
of semi-structured knowledge usually causes difŽculty for search and use of that knowledge
because it does not have systematic format, which is one of the major problems in the re-use
of semi-structured knowledge. Information retrieval is technologies used to extract
interrelated information from semi-structured or unstructured documents, and then to
present the documents in structured ways (Van Rijsbergen, 1979). However, this approach is
limited to operate with synonyms, polymorphism, or dictionaries, and low accuracy of
searching results is observed in information retrieval. To resolve this, latent semantic indexing
(LSI) is a technique to search relevant semi-structured knowledge documents, which
provides faster and more correct retrieval performance.
Semi-structured knowledge (SSK) in organizations
Main issues of knowledge management
Asymmetry, structural uncertainty (Becker, 2001), and re-use of knowledge are the main issues
when organizations attempt to put knowledge management into practice. These issues are
illustrated as follows:
f
Asymmetry: the organizational knowledge is retrieved from experts, documents,
repositories, organizational routines, processes, practices, norms, and other events, and it
VOL. 7 NO. 4 2003
|
JOURNAL OF KNOWLEDGE MANAGEMENT
|
PAGE 107
structural uncertainty and re-use of
‘‘ Asymmetry,
knowledge are the main issues when organizations
attempt to put knowledge management into practice.
’’
has strong experiential and reective elements. Moreover, this knowledge is information that
is contextual, relevant, and actionable (Turban and Aronson, 2001). Having the knowledge
implies that it can be used to solve a problem, whereas having information does not carry the
same connotation. Hence, the ability to act is an integral part of being knowledgeable (Turban
and Aronson, 2001).
f
Structural uncertainty: the knowledge in organizations is hard to manage and store,
because of its structural uncertainty. To resolve this, the Zachman Framework can be used to
describe knowledge architecture and to capture organizational knowledge derived from
knowledge management activities, and this knowledge has signiŽcant impacts on the
organization. Although there may be many types of knowledge, as far as organizations are
concerned, knowledge is a fundamental factor behind all of the organization activities
(Liebowitz, 1997; Wiig, 1999). Structured knowledge may be represented through the format
of rules, models, procedures, etc. In contrast, semi-structured knowledge uses six
dimensions (5W and 1H) of the Zachman Framework to capture knowledge in organizations.
In organizations, semi-structured knowledge is more signiŽcant than structured knowledge
because more non-structured information can easily be found.
f
Re-use: building complex knowledge-based applications requires the incorporation of large
amounts of domain knowledge (Levy and Rousset, 1998). In this article, semi-structured
knowledge is externalized through XML, whose signiŽcant attribute is the ability to comprise
large amounts of domain knowledge, and know-how (Birbeck, 2000; Fabio, 2001). Thus, by
using XML, the re-usability of SSK is increased and the domain knowledge and know-how
can be shared in organizations more effectively.
DeŽnition of SSK
Semi-structured knowledge is here deŽned as a collection of knowledge resulting from KM
activities in organizations, and which is constructed by the six dimensions of the Zachman
Framework; it contains solution approach information, domain knowledge and know-how, and
has impact on organizations. Some knowledge in organizations is hard to manage and store,
because of its structural uncertainty. The Zachman Framework provides a systematical
approach to externalize unstructured knowledge in organizations, although not all contents of
knowledge can be represented in this format. That is the reason why the term of ‘‘semi’’ is used
in this article (see Figure 1).
Characteristics of SSK
Abiteboul (1997) deŽned semi-structured data as data that was neither raw, nor strictly
organized, as in conventional database systems, and described characters of semi-structured
data. Similar to semi-structured data, semi-structured knowledge, which includes analogical
characteristics, can be described as follows:
PAGE 108
|
f
The structure is irregular: SSK includes several heterogeneous dimensions. Some
dimensions may be uncompleted, and some may comprise additional information (e.g.
annotations); or different perspectives on the same kind of information may result in using
different dimensions. For example, the pricing of merchandise is perceived differently by
purchasing and accounting departments: one perceives it as cost, whereas the other sees it
as selling price.
f
The structure is divisible: SSK documents consist of text and grammar; so the parsing of
SSK documents is able to divide crude information into pieces and discover relationships
between them. Therefore, the SSK structure is constituted from divided information and its
signiŽcant relationship.
JOURNAL OF KNOWLEDGE MANAGEMENT
|
VOL. 7 NO. 4 2003
Figure 1 Conceptual diagram for semi-structured knowledge
f
The structure is a-posteriori: database management system (DBMS) is based on the
hypothesis of a Žxed schema that has to be predeŽned before introducing any data. This is
not the case for semi-structured knowledge where the notion of schema is often posterior to
the existence of data.
Semi-structured knowledge not only includes similar characteristics of the semi-structured
data, as shown above; but it also contains the characteristics which are based on organizational
activities as driven through problematic events. These are subject oriented, time dependent,
and reference oriented (see Figure 2).
f
Subject oriented: semi-structured knowledge in organizations is constituted from a clear
and concrete subject. The subject, for instance, could be the desired event, which needs to
analyze the nature of entities, e.g. customers in the organization or business processes.
Therefore, the generated knowledge is only substantially meaningful to some speciŽc
problem domains. For example, knowledge generated from classiŽcation analysis for
customers is only beneŽcial to the marketing department.
f
Time dependent: retrieving information in time is required for semi-structured knowledge. If
the information cannot be delivered at the appropriate time, that can cause a tremendous
impact on the value of the semi-structured knowledge.
f
Reference oriented: semi-structured knowledge, which is discovered from organizational
knowledge through data analysis techniques and domain experts, not only provides unique
conclusions, but it also discovers knowledge and supports conclusions from multiple
perspectives. Whenever an organization uses SSK, it eventually should invigorate other types
of knowledge.
Numerous classiŽcation approaches have been used to distinguish different types of
knowledge, for example, Polanyi (1962) divided knowledge into tacit or explicit knowledge.
Tuthill and Levy (1991) proposed that knowledge could be separated into declarative
knowledge, procedural knowledge, heuristic knowledge, commonsense knowledge, and
informed commonsense knowledge. Knowledge in organizations may be rules, such as rule
VOL. 7 NO. 4 2003
|
JOURNAL OF KNOWLEDGE MANAGEMENT
|
PAGE 109
Figure 2 Characteristics of semi-structured knowledge
The
structure is
irregular
The structure
divisible
Reference
oriented
Characteristics of
Semi-structured
Knowledge
Time
dependent
The structure
a-posteriori
Subject
oriented
knowledge; or a procedure, as symbol-type knowledge (Collins, 1997); or commonsense,
as commonsense knowledge (Tuthill and Levy, 1991); or a written description, as declarative
knowledge (Tuthill and Levy, 1991); or a process and result by experts inferring, as Inference
knowledge (Wielinga et al., 1992), and so on. In general, different types of generated knowledge
are the results of different classiŽcation approaches. Most of the classiŽcations of knowledge
still remain at the conceptual level, rather than having practical utility.
The semi-structured knowledge presented in this article can exibly externalize most of the
knowledge described above and incorporate with their characteristics, but it would not organize
the belief and culture existing in organizations. This is because semi-structured knowledge is
simply the collected knowledge that is produced through the generation processes (presented
in the next section). The SSK does not aim at inuencing the relationships, standards, attitudes,
between individuals and organizations (e.g. organizational culture) but it does focus on decision
support and problem solving.
Externalization of semi-structured knowledge with the Zachman Framework
Whenever a problematic event occurs, managers in organizations apply solution approaches
and desired knowledge to solve the problem. Knowledge is often evolved from past knowledge
and experience, and knowledge by nature is continuous and extendable. Knowledge is
changed constantly. Furthermore, perception of a single event in organizations varies from
different perspectives (e.g. time and place). Therefore, organizational knowledge must be
constructed as semi-structured knowledge with additional information from various
perspectives (e.g. the 5W1H dimensions in the Zachman Framework).
The Zachman Framework represents the perspectives and dimensions in matrix form, with the
perspective representing the rows and the dimensions representing the columns (see Table I).
The columns include:
PAGE 110
|
f
Entities (What? Interest or focus areas): considering semi-structured knowledge in
organizations, entities are the data, information, events, and knowledge to be manipulated.
f
Activities (How? Methods of problem-solving): considering semi-structured knowledge in
organizations, activities are the capture, interpretation, measurement, accumulation,
deployment, externalization, innovation, and feedback of the entities.
JOURNAL OF KNOWLEDGE MANAGEMENT
|
VOL. 7 NO. 4 2003
Table I Dimensions of semi-structured knowledge in organizations
Entities
(What)
Activities
(How)
Locations
(Where)
People
(Who)
Time
(When)
Motivation
(Why)
Perspectives
Data, information,
knowledge
Capture, interpretation,
measurement, accumulation,
deployment, externalization,
innovation, feedback
Organization,
process
Users,
administrators,
developers
Time
instance
Reason needed
Goal
Accuracy
Effectiveness/efŽciency
Accuracy
location
Accuracy
location
Accurate
timing
Motivated or not
Example
An association
rule
Approach of discovering
association rule
Marketing
department
Marketing
manager
2000-2001
year
Study the relationship
between products
A and B
f
Locations (Where? Places of interest or focus): considering semi-structured knowledge in
organizations, locations are appointed to be in organizations or processes.
f
People (Who? Individuals and organizations of interest or focus): considering semistructured knowledge in organizations, people are users, administrators, or developers.
f
Times (When? Activities occurring): considering semi-structured knowledge in organizations, time is the instance when activities occur.
f
Motivations (Why? Reasons for inspiration): Considering semi-structured knowledge in
organizations, motivations are the reasons why activities occur.
A simple example of the semi-structured knowledge about ‘‘a good bank account’’ is
presented in the six dimensions. The perspectives and goals of semi-structured knowledge are
classiŽed in Table I.
Generation and content of semi-structured knowledge in organizations
Semi-structured knowledge in an organization is generated through a series of transformation
processes, which are triggered by problematic events. Every step of these processes involves
technical and managerial issues. Figure 3 illustrates the generation process of semi-structured
knowledge. More details are as follows:
f
A dashed line represents the source of data, which indicates a problematic ‘‘event’’. Here,
the needs of data or processing information from organizations are triggered by the
problematic event occurring in a working process or organization. In other words, the
problematic event is deŽned as the source to generate semi-structured knowledge.
f
The blocks represent the entities in the process. There are assessment of requirements,
classiŽcation of events, subject identiŽcation, experts’ involvement, and group discussions.
The contents in parentheses represent generated information or knowledge from each step.
f
The oval blocks represent data storage systems or knowledge repository.
f
Three different types of arrows are used in the diagram. Those with solid lines illustrate the
direction of the process; with long dashed lines point out the needs of semi-structured
knowledge documents; and with dotted dash lines show paths to construct semi-structured
knowledge. As shown above, semi-structured knowledge includes general information,
solution approach information, and feedback knowledge.
Five detailed models that serve as building blocks in the generation process are formulated
explicitly. There are event classiŽcation, subject, knowledge processing, knowledge
transformation, and semi-structured knowledge models. Each function in the detailed model
can be perceived as a mechanism, which is able to transform input variables to output.
VOL. 7 NO. 4 2003
|
JOURNAL OF KNOWLEDGE MANAGEMENT
|
PAGE 111
Figure 3 Flow diagram of generating semi-structured knowledge
D ata operation
requirement
Event occurred
D ocuments
Repositories
Semi-structured
knowledge
documents design
D ata process
requirement
D istribution
of Problem
(G eneral information
I G)
Expert
(Solution approach
information
Experts knowledge
and involvement
(K nowledge
I S)
K p)
Professional Feedback
knowledge
(K nowledge
K T)
Semi-structured
knowledge
(SSK )
Subject about problems
(Solution approach
infor mation
I S)
Model 1: Event classiŽcation model
E = fE (o, p)
(1)
where: fE = event transformation function; o = business organization; p = business process;
E = problematic event
(1) Function: formula (1) corresponds to the relationship between a business process and a
problematic event.
(2) Parameter: problematic event E – semi-structured knowledge is derived from original
problematic events. Therefore, the purpose of event classiŽcation is to support the
distinction of semi-structured knowledge.
(3) Contents: general information (IG ) is described as fundamental information corresponding
to a problematic event, for example, a description, observer, location and time etc. General
information is not the most important information, but it is necessary. Knowledge users are
assumed to be able to capture the whole framework of semi-structured knowledge if
general information is given appropriately. At this stage, generated information mainly
focuses on business structures and processes corresponding to the problematic events.
Model 2: Subject model
S = fs (
1,
2,
3,
4,
5 , h1 )
(2)
where: fs = subject transformation function; S = subject of problematic event; 1 = entity of
subject (what); 2 = place of subject (where); 3 = people of subject (who); 4 = timing of
subject (when); 5 = motivations and reasons of subject (why); h1 = process of subject (how).
(1) Function: formula (2) corresponds to the relationship between a subject and a problematic
event, which is correlated to 5W1H.
(2) Parameter: (a) Subject, S: the subject should be clear, concise and able to represent the
characteristics of semi-structured knowledge. Furthermore, the structure of the subject
should be dynamic; i.e. sensitive to dimension and level changes. (b) Entity of subject, 1 :
the entity of the subject is ‘‘what’’. This parameter dominates the rest of parameters (where,
who, when, why, and how). For example, if entity is clear then the place, people, timing and
PAGE 112
|
JOURNAL OF KNOWLEDGE MANAGEMENT
|
VOL. 7 NO. 4 2003
motivation are also certain. (c) Place,
(g) Process of subject, h 1 .
2.
(d) People,
3.
(e) Timing,
4.
(f) Motivation,
5.
(3) Contents: solution approach information (IS ): the solution approach information includes
important interaction messages and results by applying the approach, but not the details of
processes. The purpose of this information is to allow users to conveniently and clearly
understand messages of interaction among, users, the knowledge and the environment
while solution approaches are applied.
Model 3: Knowledge processing model
KP = fp (E, S, P)
(3)
where: fp = knowledge process function; KP = knowledge coming from the expert; E = problematic event; S = subject; P = expert.
(1) Function: formula (3) corresponds to the knowledge resulted from an expert or analyzer
using a particular solution approach, e.g. the Apriori association rules (Agrawal and Srikant,
1994).
(2) Parameter: (a) The knowledge is derived from experts, KP : this knowledge is more focused
on subject-oriented expertise; i.e. the knowledge is generated based on a subject S and
corresponds to a problematic event. (b) Problematic event, E (refer Model 1). (c) Subject, S
(refer Model 2). (d) Expert, P: the deŽnition of expert is conŽned to a domain expert for the
particular subject.
(3) Contents: the knowledge is a combination of the domain expert’s expertise and the results
produced based on this subject.
Model 4: Knowledge transform model
Kt = ft (KP , P 0 , Ci)
(4)
where: ft = knowledge transformation function; Kt = feedback knowledge; KP = knowledge
derived from experts; P 0 = people in general (not only expert); Ci = the i t h content category.
(1) Function: formula (4) corresponds to the feedback knowledge produced through domain
expert and administrators, using their knowledge and experience.
(2) Parameter: (a) Feedback knowledge, Kt : is the feedback of a group of members to these
information and knowledge. (b) The knowledge is derived from experts, KP (refer Model 3).
(c) The people, P 0 : here, people include both experts and non-experts. (d) Content category,
Ci: contents are based on experts’ insights, which are extracted from the solution approach
information. Contents should incorporate other different categories, for instance, fact,
reason, suggestion, etc.
(3) Contents: feedback knowledge shows different professional knowledge from different
standpoints and opinions about the truth from each member.
Model 5: Semi-structured knowledge model
K = I G + I S + Kp + K t
(5)
where: K = semi-structured knowledge; IG = general information; IS = solution approach information; KP = knowledge derived from experts; Kt = feedback knowledge.
Formula (5) corresponds to the Žnal semi-structured knowledge, which incorporates general
information, solution approach information, the knowledge coming from experts and feedback
knowledge.
Transformation of semi-structured knowledge with XML
Different types of information in the generation process are derived from different subjects and
solution approaches. For example, an association rule approach may be used to generate
solution approach information, whereas a classiŽcation approach may be used to form
VOL. 7 NO. 4 2003
|
JOURNAL OF KNOWLEDGE MANAGEMENT
|
PAGE 113
is critical to design the structure of knowledge to
‘‘ Itachieve
knowledge externalization.
’’
captured information. Therefore, it is critical to design the structure of knowledge to achieve
knowledge externalization and a standard storage system providing a place for the semistructured knowledge to reside. Moreover, XML shares many features of semi-structured data
and semi-structured knowledge. For example: its structure can be irregular, is not always
known in advance, and may change frequently and without any notice. Therefore, this article
delineates the semi-structured knowledge using XML because XML-based SSK documents are
easy to store and manage. Figure 4 illustrates an XML-based documentation of contents for the
semi-structured knowledge, which constructed by the Zachman Framework.
To be able to support and transform semi-structured knowledge, XML is implemented to
deŽne standard documents for externalization, including three different categories: general
information, solution approach information, and feedback knowledge. The association rule of
data mining is illustrated as an example below.
General information (IG ) – includes common and fundamental information. To help users
clearly understand the overall framework of semi-structured knowledge, ‘‘documents’’
correspond to the objective in this format (see Figure 5).
Problematic event:
According to formula (1) above, problems pertain to which event is clariŽed and the types
of organizational structure and processes in need to solve the problem are speciŽed. For
Figure 4 An example of XML-based documentation representing semi-structured
knowledge
PAGE 114
|
JOURNAL OF KNOWLEDGE MANAGEMENT
|
VOL. 7 NO. 4 2003
Figure 5 XML format for general information
E = f E (o, p)
example, the department is indicated by ‘‘place’’ while the desired persons is stipulated
by ‘‘who’’. Here, identiŽcation of the problematic types of organization structure and
process is similar to business process reengineering.
Format:
<General_Information>
<Event>
<Event_Name>
Less percentage of seat utilization
</Event_Name>
<Event_Observer>
#56321578, Mendy Wang
</Event_Observer>
<Event_Description>
During this month (2002/05) travel route 13, 13% of seat utilization decline.
</Event_Description>
<Event_When>2002/06/01</Event_When>
<Event_Where>travel route 13, plan routes</Event_Where>
</Event>
</General_Information>
Solution approach information (IS ) – the contents of Figure 6 show that 5W and 1H
are important components of solution approach information. The components of
‘‘<Subject_How>’’ are emphasized. Different approaches contain different contents. The
association rule analysis in XML is exempliŽed as below:
Figure 6 XML format for solution approach information
S = f s w1 w2 w3 w4 w5 h1
VOL. 7 NO. 4 2003
|
JOURNAL OF KNOWLEDGE MANAGEMENT
|
PAGE 115
Subject:
The 5W1H (What, Where, Who, When, Why and How) is used to specify ‘‘Subject’’
corresponding to formula (2) above.
Format:
<Process_Information>
<Subject>
<Subject_What>
<Id>#S0982834</Id>
<Name>Routes analysis</Name>
<ClassiŽcation>Plan</ClassiŽcation>
</Subject_What>
<Subject_Who>
<Expert_Info>
<Expert_Name>John Li</Expert_Name>
<Expert_Id>#7985462</Expert_Id>
<Expert_Depart>Routes Plan Dep.</Expert_Depart>
<Expert_Position>Plan manager</Expert_Position>
</Expert_Info>
<Target>customer and travel routes</Target>
</Subject_Who>
<Subject_When>
<Process_Time>2001/06/01</Process_Time>
<Data_Period>2001/05/01~2001/05/31</Data_Period>
</Subject_When>
<Subject_Where>
<Process_Location>Taipei division</Process_Location>
<Target_Location>Routes in Taipei</Target_Location>
<Source_Location>
"SELECT * FROM customers, sales"
</Source_Location>
<Storage_Location>20010601132201.xml
</Storage_Location>
</Subject_Where>
<Subject_Why>
<Reason>
Through check the data of customer and sales, we can Žnd
out the relation between customer and his behaviors.
</Reason>
</Subject_Why>
<Subject_How>
<Tech_info>
<Tech_Field>Data Mining</Tech_Field>
<Tech_Method>Association Analysis</Tech_Method>
<Tech_Tools>IBM Intelligent Miner</Tech_Tools>
</Tech_info>
<SpeciŽc_Info>
<Itemset>
20~25, 25~30, Yang Ming Shan NP, Tourism River, 2days,
3days, tickets}
</Itemset>
<Min_Sup>0.5</Min_Sup>
<Min_Conf>0.75</Min_Conf>
<Result>
<Rule>
<Id>#R785123</Id>
PAGE 116
|
JOURNAL OF KNOWLEDGE MANAGEMENT
|
VOL. 7 NO. 4 2003
<Large_Itemset>
{25~30, Tourism River, 2days, one way ticket}
</Large_Itemset >
</Rule>
</Result>
</SpeciŽc_Info>
</Subject_How>
</Subject>
</Process_Information>
Knowledge (Kp +KT ) – the knowledge (Kp +KT ) includes: (1) experts’ explanation for the results
of solution approach information, and (2) feedback from a group of members regarding this
information and knowledge. The same fact, through different explanation of different members,
may have different implications. Thus, the value of knowledge is increased spirally through
communication and sharing in an organization. Through (1) different expert knowledge from
group members, (2) different standpoints and opinions on the fact, and (3) the function of
grouping feedback, the knowledge not only expresses various personal professional opinions
but also the understanding of each member. It can help to avoid the trap of self-consciousness,
and to increase the quality and accuracy of decision-making.
The format of XML illustrating the association rule analysis is shown in Figure 7.
Knowledge:
There are two types of knowledge – experts’ explanation of the results of solution
approach information, and the feedback from group members regarding this information
and knowledge, corresponding to formula (3) and formula (4) above.
Format:
<Feedback_Knowledge>
<Knowledge>
<Knowledge_Who>
<Id>#7985462</Id>
<Depart>Routes Plan Dep.</Depart>
<Name>John Li</Name>
<Position>Plan manager</Position>
</Knowledge_Who>
<Knowledge_When>2001/06/01</Knowledge_When>
<Knowledge_Where>Routes plan</Knowledge_Where>
Figure 7 XML format of feedback knowledge
KP = fP(E,S,P1)
KT = fT (KP, P¢,Ci )
VOL. 7 NO. 4 2003
|
JOURNAL OF KNOWLEDGE MANAGEMENT
|
PAGE 117
<Reference_Id>#R785123,#R7895462</ Reference_Id>
<Knowledge_Why>
(1) New policy about holidays
(2) Special celebration
</Knowledge_Why>
<Suggest>
Company should improve routing schedule for special celebration.
</Suggest>
</Knowledge>
</Feedback_Knowledge>
In this article, the generation process of semi-structured knowledge represented through the
Zachman Framework clariŽes the processes and activities of knowledge management in
organizations. This semi-structured knowledge presented as XML documents is helpful in
packing, storing, management, and sharing of knowledge in organizations.
The search of semi-structured knowledge in organizations
Because the contents of these semi-structured knowledge documents are variable and
irregular, it is difŽcult to search and share. Latent semantic indexing is a useful and automatic
way to help search relevant semi-structured knowledge documents according to the sentences
or articles, which users query. Furthermore, latent semantic indexing (LSI) is also able to solve
problems of synonyms and polysemy, and promote accuracy in searching for solutions
documents (Letsche, 1996; Deerwester et al., 1990). Therefore, in this article LSI is used to
discover desired XML documents for the persons who need them.
Using LSI for semi-structured knowledge documents
The latent semantic indexing information retrieval model builds upon prior research in
information retrieval, and uses the singular value decomposition (SVD) to reduce the dimensions
of the term-document space to solve the synonomy and polysemy problems that plague
automatic information retrieval systems (Letsche, 1996). By reducing the dimensionality of
the term-document space, the underlying, semantic relationships between documents are
revealed, and much of the ‘‘noise’’ (differences in word usage, terms that do not help
distinguish documents, etc.) are eliminated.
There are three main steps in using LSI to identify relevant semi-structured knowledge
document in organizations (Letsche, 1996; Story, 1996; Chen, 1999):
PAGE 118
|
f
Step 1 – Pre-processing: some markups are removed, and all hand-indexed entries are
removed from collections. Upper case characters are translated into low case, punctuation is
removed, and white spaces are used to delimit terms, and so on.
f
Step 2 – Singular value decomposition (SVD): in the LSI model, terms and documents are
represented by an m n incidence matrix, A. Once the m n matrix A has been created, a
rank-k approximation (k min (m, n)) to A, Ak , is computed using an orthogonal decomposition known as the singular value decomposition (SVD). With regard to LSI, Ak is the closest
k-dimensional approximation to the original term-document space represented by the
incidence matrix A.
f
Step 3 – Query formulation: in the LSI model, queries are formed into ‘‘pseudodocuments’’ that specify the location of the query in the reduced term-document space.
Once the query is projected into the term-document space, similarity measures, e.g. the
cosine similarity measure, can be applied to compare the position of the pseudo-document
to the positions of the terms or documents in the reduced term-document space. Once the
similarities between the pseudo-document and all the terms and documents in the space
have been computed, the terms or documents are ranked according to the results of the
similarity measure; and the highest-ranking terms or documents, or all the terms and
documents exceeding some threshold value, are returned to the user.
JOURNAL OF KNOWLEDGE MANAGEMENT
|
VOL. 7 NO. 4 2003
Illustrative example
This example to illustrate the three main steps in transportation industry assumes:
(1) a new problematic event that ‘‘utilization of ight seats is decreasing’’ has been discovered;
(2) to make the example compact, only six semi-structured knowledge documents in
collections. Abstracts of the six semi-structured knowledge documents are shown in
Table II; and
(3) as the Žrst step, pre-processing is completed though intelligent agents (Huang, 2001), and
the terms and the keywords are taken from the abstract of the documents.
The SSK searching system determines which semi-structured knowledge document in
collections is most likely to contain the potential knowledge conceptualized by a set of
keywords to support the manager in solving the problem. After pre-processing, a termdocument matrix, whose entries indicate occurrence of terms among the documents, is
formed. The term-document matrix, matrix A, resulting from the selected terms and
documents, is shown in Table III. Each entry in the matrix indicates that a particular term
occurs in a given document.
Next, matrix A is decomposed by using the singular value decomposition (SVD). SVD produces
matrices U, S, and V, such that A = U S V. Matrix A is the term-document matrix shown in
Table III; its rank is 6, so only six dimensions are needed to represent it. The matrices are as
follow: U (19 19) and V (6 6) are unitary matrices and S (19 6) is a diagonal matrix of singular
value. Then the six-dimensional space is projected onto two dimensional subspaces by
selecting the rows and columns of U, S, and V corresponding to the largest two singular values.
The resulting two-dimensional matrices are:
U2 (19 2) =
0.388949
0.06021
0.023998
0.145742
0.026749
0.319761
0.118993
0.046536
0.023998
0.011999
0.632945
0.284105
0.051426
0.098844
0.146262
0.246228
0.093072
0.093072
0.331925
S2 (2 2) =
0.115807
–0.11727
–0.03115
–0.23285
–0.04346
–0.63229
–0.1894
0.024753
–0.03115
–0.01557
0.340607
0.148832
0.025066
0.054132
0.083198
–0.53228
0.049507
0.049507
0.203465
10.518864
0
V2 (6 2) =
0
10.171535
0.281365
0.126218
0.351975
0.540943
0.489504
0.498782
–0.442010
–0.158400
–0.750821
0.254956
0.251780
0.295650
A vector of the query, ‘‘To Žnd anything of passengers carried and decrease’’, would be
q = [0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]T
where
T
indicates transpose.
Terms of ‘‘passengers carried’’ and ‘‘decrease’’ are 2nd and 4th terms in the index, and no
other terms are selected. Let q be the query vector. Then the document space vector
corresponding to q is given by:
qT
U2
inv (S2) = Dq
For the product vector, the result is:
Dq = [0.0196
VOL. 7 NO. 4 2003
–0.0344]
|
JOURNAL OF KNOWLEDGE MANAGEMENT
|
PAGE 119
Table II Examples extracted from semi-structured knowledge documents
Document
Content
Document
Doc1
<General_Information>
<Event>
<Event_Name>The decreasing of passenger
trafŽc volume</Event_Name>
<Event_Observer>Kevin Lee</Event_Observer>
<Event_Description>Passenger trafŽc volume
decrease year by year, because the
passengers carried to shift personal trafŽc
tools.
In 1998, the passengers carried is 157.294
million, to grow up 5.38%.
In 1999, the passengers carried is 160.330
million, to grow up 1.93%.
In 2000, the passengers carried is 159.981
million, to grow up – 0.22%.
In 2001, the passengers carried is 159.438
million, to grow up -0.34%.
</Event_Description>
<Event_When>2002/12/03</Event_When>
<Event_Where>Customer service Dep.
</Event_Where>
</Event>
</General_Information>
Doc2
<General_Information>
Doc5
<Event>
<Event_Name>Passenger-Kilometers raised
</Event_Name>
<Event_Observer>Joe Lu</Event_Observer>
<Event_Description>
The FAST rapid transit system to Žnished, FAST
rapid transit system’s passengers carried to
increase, bring ABC Transport Company’s
passenger trafŽc volume increase.
In 2000, Passenger-Kilometers is 9978 million,
grow up 2.11%.
</Event_Description>
<Event_When>2002/11/05</Event_When>
<Event_Where> Sales department </Event_Where>
</Event>
</General_Information>
<General_Information>
<Event>
<Event_Name>A1 station to B1 station Passenger
TrafŽc Volume not same</Event_Name>
<Event_Observer>Bill Hu</Event_Observer>
<Event_Description>
A1 station to B1 station and B1 station to A1,
station Passenger TrafŽc Volume not same.
B1 station to A1 station select columniation to
check the number, every day Passenger TrafŽc
Volume is 27 hundred.
B1 station to A1 station not sure select
columniation to check the number.
</Event_Description>
<Event_When>2002/01/28</Event_When>
<Event_Where>A1 Station</Event_Where>
</Event>
</General_Information>
Doc3
<General_Information>
<Event>
<Event_Name>The decreasing of PassengerKilometers</Event_Name>
<Event_Observer>Tom Wu</Event_Observer>
<Event_Description>
Passenger-Kilometers decrease year by year.
In 1996, Passenger-Kilometers is 9542 billion,
grow up –0.5%.
In 1997, Passenger-Kilometers is 9505 billion,
grow up –3.14%.
In 1998, Passenger-Kilometers is 9489 billion,
grow up –2.49%.
In 1999, Passenger-Kilometers is 8969 billion,
grow up –0.63%.
In 2000, Passenger-Kilometers is 9254 billion,
growup –0.89%.
In 2001, Passenger-Kilometers is 9784 billion,
grow up –0.52%.
</Event_Description>
<Event_When>2002/03/08</Event_When>
<Event_Where>Sales department</Event_Where>
</Event>
</General_Information>
<General_Information>
<Event>
<Event_Name>A1 station to every station Ticket
Revenues</Event_Name>
<Event_Observer>Charlie Tseng</Event_Observer>
<Event_Description>
A1 station to every station Ticket Revenues, E1
station to A1 station have most Ticket
Revenues, 17.09% of the total Ticket Revenues.
A1 station to E1 station Ticket Revenues,
15.51% of the total Ticket Revenues. A1
statiom to D1 station Ticket Revenues, 13% of
the total Ticket Revenues.
</Event_Description>
<Event_When>2002/05/08</Event_When>
<Event_Where>Station</ Event_Where>
</Event>
</General_Information>
PAGE 120
|
JOURNAL OF KNOWLEDGE MANAGEMENT
|
VOL. 7 NO. 4 2003
Doc4
Content
Doc6
<General_Information>
<Event>
<Event_Name>A1 Station go to north Passenger
TrafŽc Volume</Event_Name>
<Event_Observer>Tom Wu</Event_Observer>
<Event_Description>
A1 Station go to north Passenger TrafŽc
Volume.
A1 station to B1 station Passenger TrafŽc
Volume is 3.75 billion.
A1 station to C1 station Passenger TrafŽc
Volume is 1.4 billion.
A1 station to D1 station Passenger TrafŽc
Volume is 1.1 billion.
A1 station to E1 station Passenger TrafŽc
Volume is 0.65 billion.
</Event_Description>
<Event_When>2002/03/08</Event_When>
<Event_Where>A1 Station</Event_Where>
</Event>
</General_Information>
Table III Term-document matrix
Term
Passenger trafŽc volume
Decrease
Increase
Passengers carried
Personal trafŽc tools
Grow up
Million
Hundred
FAST rapid transit system
Finished
A1 station
B1 station
C1 station
D1 station
E1 station
Passenger-Kilometers
Columniation
Check the number
Ticket Revenues
Doc1
Doc2
Doc3
Doc4
Doc5
Doc6
1
1
0
5
1
4
4
0
0
0
0
0
0
0
0
0
0
0
0
1
2
2
1
0
1
1
0
2
1
0
0
0
0
0
1
0
0
0
0
1
0
0
0
6
0
0
0
0
0
0
0
0
0
7
0
0
0
5
0
0
0
0
0
0
0
0
0
5
1
1
1
1
0
0
0
0
2
0
0
0
0
0
0
1
0
0
4
5
0
0
0
0
2
2
0
0
0
0
0
0
0
0
0
0
0
4
0
0
1
2
0
0
0
7
To Žnd the best document match, the Dq vector is compared against all the document vectors
in the two-dimensional V2 space. The document vector that is nearest in direction to Dq is the
best match. The cosine of the angle between the query vector and the document vector is a
convenient measure of goodness-of-Žt. The cosine values for the six document vectors and the
query vector are:
[0.9988
0.9880
0.9968
0.0774
0.0428
–0.0172]
The best Žt for product query vectors is indicated by the Žrst document, which is Doc1. The
third document, Doc3, is also indicated as a good solution.
Conclusion
Knowledge should be seen from different viewpoints and be used diversely. Through the
Zachman Framework and XML, organizational knowledge could be externalized systematically
as semi-structured knowledge documents, and stored and managed effectively. Due to the
structure of semi-structured knowledge documents and the low accuracy of searching results,
the latent semantic indexing (LSI) can quickly and correctly search relevant semi-structured
knowledge documents to support knowledge management activities.
The main contributions of this article are that:
(1) The novel concept of semi-structured knowledge (SSK), has its generation represented by
the Zachman’s Framework for clariŽcation of the processes and activities in organizational
knowledge management.
(2) The semi-structured knowledge delineated as XML documents is helpful in packing, storing,
management, and sharing of SSK in organizations.
(3) The LSI model and an easy search system are developed to diffuse relevant knowledge to
the person who needs it. Through the LSI tools, relevant semi-structured knowledge
should be seen from different viewpoints and
‘‘ Knowledge
be used diversely.
’’
VOL. 7 NO. 4 2003
|
JOURNAL OF KNOWLEDGE MANAGEMENT
|
PAGE 121
documents can be identiŽed to support daily duties or make decisions for knowledge
workers.
The SSK approach shows great promise for organizations to acquire, store, disseminate, and
reuse knowledge.
Acknowledgment
This work was partially supported by funding from the Nation Science Council of Taiwan
(NSC91-2146-E-260-005).
Reference
Abiteboul, S (1997), ‘‘Querying semi-structured data’’, ICDT ’97, 6th International Conference, Delphi,
Greece, pp. 1-18.
Agrawal, R. and Srikant, R. (1994), ‘‘Fast algorithms for mining association rules in large databases’’,
Proceedings of 20th International Conference on Very Large Data Bases, San Francisco, CA, pp. 487-99.
Alavi, L. (2001), ‘‘Review: Knowledge management and knowledge management systems: conceptual
foundations and research issues’’, MIS Quartely, Vol. 25 No. 1, pp. 107-36.
Alavi, M. and Leidner, D. (1999), ‘‘Knowledge management systems: emerging viewsand practices from the
Želd’’, Proceedings of 32nd Hawaii International Conference on System Sciences, Hawaii, pp. 4-11.
Becker, M.C. (2001), ‘‘Managing dispersed knowledge: organizational problems, managerial strategies,
and their effectiveness’’, Journal of Management Studies, Vol. 38 No. 7, pp. 1037-51.
Birbeck, M. (2000), Professional XML, Wrox Press, Chicago, IL.
Chauvel, D. and Despres, C. (2002), ‘‘A review of survey research in knowledge management: 1997-2001’’,
Journal of Knowledge Management, Vol. 6 No. 3, pp. 207-23.
Chen, C. (1999), ‘‘Visualising semantic spaces and author co-citation networks in digital libraries’’,
Information Processing and Management, Vol. 35, pp. 401-20.
Collins, H. (1997), Human, Machines, and the Structure of Knowledge, Knowledge Management Tools,
Butterworth-Heinemann, New York, NY.
Davenport, T. and Prusak, L. (1998), Working knowledge, Harvard Business School Press, Boston, MA.
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K. and Harshman, R. (1990), ‘‘Indexing by latent
semantic analysis’’, Journal of the American Society for Information Science, Vol. 41, pp. 391-407.
Dzbor, M., Paralic, J. and Paralic, M. (2000), ‘‘Knowledge management in a distributed organization’’, 4th
IEEE/IFIP Conference on IT for Balanced Automation Systems, Berlin.
Fabio, A.A. (2001), XML Developer’s Guide, McGraw-Hill, New York, NY.
Fischer, G. and Ostwald, J. (2001), ‘‘Knowledge management: problems, promises, realities, and
challenges’’, IEEE Intelligent Systems, Vol. 16 No. 1, pp. 60-72.
Grant, R.M. (1996), ‘‘Toward a knowledge management and the N-form corporation’’, Strategic
Management Journal, Vol. 15, pp. 73-90.
Huang, Chun-Che (2001), ‘‘Using intelligent agents to manage fuzzy business process’’, IEEE Transactions
on Systems, Man, and Cybernetics, Part A, Vol. 31 No. 6, pp. 508-23.
Inmon, W.H., Zachman, J.A. and Geiger, J.G. (1997), Data Stores, Data Warehousing, and the Zachman
Framework: Managing Enterprise Knowledge, McGraw-Hill, New York, NY.
Kanter, J. (1999), ‘‘Knowledge management, practically speaking’’, Information Systems Management,
pp. 7-15.
Klint, P. and Verhoef, C. (2002), ‘‘Enabling the creation of knowledge about software assets’’, Data and
Knowledge Engineering, Vol. 41 No. 2-3, pp. 141-58.
Letsche, T.A. (1996), ‘‘Toward large-scale information retrieval using latent semantic indexing’’, Master’s
thesis, Department of Computer Science, University of Tennessee.
Levy, A. and Rousset, M. (1998), ‘‘Carin: a knowledge representation language combining horn rules and
description logics’’, ArtiŽcial Intelligence Journal, Vol. 104, pp. 165-209.
PAGE 122
|
JOURNAL OF KNOWLEDGE MANAGEMENT
|
VOL. 7 NO. 4 2003
Liebowitz, J. (1997), Knowledge Management Handbook, CRC Press, New York, NY.
Miller, J.A., Potter, W.D. and Kochut, K.J. (1992), ‘‘Knowledge, data, and models: taking an objective
orientation on integrating these three’’, IEEE Potentials, Vol. 11 No. 4, pp. 13-17.
Mo, J.P.T. and Menzel, C. (1998), ‘‘An integrated process model driven knowledge based system for
remote customer support’’, Computer in Industry, Vol. 37 No. 3, pp. 171-83.
Nonaka, I. and Takeuchi, H. (1995), The Knowledge Creating Company, Oxford University Press, New York,
NY.
O’Leary, D.E. (1998), ‘‘Enterprise knowledge management’’, IEEE Computer, Vol. 31 No. 3, pp. 54-61.
Polanyi, M. (1962), Personal Knowledge, corrected ed., Routledge, London.
Prusak, L. (1996), ‘‘The knowledge advantage’’, Strategy and Leadership, Vol. 24, pp. 6-8.
Roth, A.V. (1996), ‘‘Achieving strategic agility through economies of knowledge’’, Strategy and Leadership,
Vol. 24, pp. 30-7.
Satyadas, A., Harigopal, U. and Cassaigne, N.P. (2001), ‘‘Knowledge management tutorial: an editorial
overview’’, IEEE Transactions on Systems, Man and Cybernetics, Part C: Applications and Reviews, Vol. 31
No. 4, pp. 429-37.
Spender, J.C. and Grant, R.M. (1996), ‘‘Knowledge and the Žrm: overview’’, Strategic Management
Journal, Vol. 17, pp. 5-9.
Story, R.E. (1996), ‘‘An explanation of the effectiveness of latent semantic indexing by means of a Bayesian
regression model’’, Information Processing & Management, Vol. 32 No. 3, pp. 329-44.
Turban, E. and Aronson, J.E. (2001), Decision Support Systems and Intelligent Systems, Prentice Hall,
Upper Saddle River, NJ.
Turban, E. and Aronson, J.E. (2001), Decision Support Systems and Intelligent Systems Sixth Edition:
Knowledge Management, Prentice Hall, Upper Saddle River, NJ.
Tuthill, G.S. and Levy, S.T. (1991), Knowledge Based Systems: A Manager’s Perspective, TAB Book, New
York, NY.
Van Rijsbergen, C.J. (1979), Information Retrieval, 2nd ed., Butterworths, London.
Wielinga, B.J., Schreiber, A.Th. and Breuker, J.A. (1992), ‘‘KADS: a modelling approach to knowledge
engineering’’, Knowledge Acquisition, Vol. 4 No. 1, pp. 5-53.
Wiig, K.M. (1999), ‘‘The intelligent enterprise and knowledge management’’, invited article for the UNESCO
Encyclopedia of Life Support Systems, Knowledge Research Institute, Arlington, TX.
ISKM Laboratory of Information Management, National Chi-Nan University, Pu-Li, Nan-Tau, Taiwan (2002),
‘‘Semi-structured knowledge’’, available at: http://iskmlab.im.ncnu.edu.tw/SSK.
VOL. 7 NO. 4 2003
|
JOURNAL OF KNOWLEDGE MANAGEMENT
|
PAGE 123