The transformation and search of semi-structured knowledge in organizations Chun-Che Huang and Chia-Ming Kuo Chun-Che Huang is a Professor, Laboratory of Intelligent Systems & Knowledge Management, Department of Information Management, National Chi-Nan University, Taiwan ([email protected]). Chia-Ming Kuo is a graduate student, Department of Information Management, National Chi-Nan University, Taiwan. Abstract Knowledge is perceived as very important asset for organizations and knowledge management is critical for organization competitiveness. Because the nature of knowledge is always represented as complex and varied, it is difcult to extend effectiveness of knowledge re-use in organizations. In this article, an approach based on the Zachman’s Framework to externalize organizational knowledge into semi-structured knowledge is developed, and eXtensible Markup Language (XML) is applied to transform the knowledge into documents. In addition, latent semantic indexing (LSI), which is capable of solving problems of synonyms and antonyms, as well as improving accuracy of document searches, is incorporated to facilitate search of semi-structured knowledge (SSK) documents based on user demands. The SSK approach shows great promise for organizations to acquire, store, disseminate, and reuse knowledge. Keywords Knowledge management, Organizations Introduction Knowledge management is becoming more important for individuals and organizations, and is increasingly considered as a main source of competitive advantage for corporations (Grant, 1996; Prusak, 1996; Roth, 1996; Spender and Grant, 1996). Davenport and Prusak (1998) dene knowledge as a uid mix of framed experience, values, contextual information, and expert insight that provides a framework for evaluating and incorporating new experience and information. Knowledge is assumed to originate from the mind of knowledge workers (Alavi and Leidner, 1999) and is constantly spiral in organizations (Nonaka and Takeuchi, 1995). Also, in organizations it often becomes embedded not only in documents or repositories, but also in organizational routines, processes, practices, and norms (Davenport and Prusak, 1998). Knowledge can also be designed as the power to act and to make value-producing decisions (Kanter, 1999; Polanyi, 1962). The characteristics of knowledge are complexity, high changeability, abstraction, fuzziness, and lack of structure. Therefore, capturing, transforming and search knowledge is a critical issue for organizations. Basically, knowledge cannot be controlled or managed in a rational, top-down fashion like other assets of the organization (Klint and Verhoef, 2002). Documents are a common source of knowledge in organizations; they partially provide the contents of knowledge based on entries. However, it is known that its contents reect only a fraction of the knowledge that is encoded in a document (Dzbor et al., 2000). Traditional documents and searching techniques not only make knowledge difcult to discover, but also PAGE 106 | JOURNAL OF KNOWLEDGE MANAGEMENT | VOL. 7 NO. 4 2003, pp. 106-123, ã MCB UP Limited, ISSN 1367-3270 DOI 10.1108/13673270310492985 make it brittle. Consequently, it can only be applied in very limited application (Miller et al., 1992). There are numerous studies on knowledge management. Mo and Menzel (1998) created a model to capture knowledge from company domain experts and from eld information available from experienced users. O’Leary (1998) developed a system which is able to capture companywide knowledge. Liebowitz (1997) proposed a study of knowledge assets and the scheduling of their use within organizations. Overview and developments in knowledge management are provided in Satyadas et al. (2001), Alavi (2001), Fischer and Ostwald (2001), Chauvel and Despres (2002). Although many scholars studying KM focus on the structured knowledge in the dened formats like rules or procedures, only a few attempt to emphasize semi-structured or unstructured knowledge. This article aims to focus on: f A denition of semi-structured knowledge: as mentioned above, knowledge is changing constantly, and the general observation is that knowledge cannot be controlled or managed in a rational, top-down fashion, frequently existing in an unstructured format (Klint and Verhoef, 2002). Furthermore, people play different roles in an organization, so the requirements of knowledge and its application are diverse. These requirements are expressed with a series of dimensions or abstractions in the Zachman Framework (Inmon and Zachman, 1997). This format of 5W1H (what, where, who, when, why, and how) represents a suitable solution approach to externalize knowledge in organizations because it can capture the nature of each dimension (perspective) and integrate the target knowledge. Applying the Zachman Framework, knowledge in organizations can be transformed systematically as semi-structured knowledge documents. f Transformation of semi-structured knowledge: it is assumed that whenever organizations experience internal or external changes, complex and large amounts of data, information, and knowledge are generated. Furthermore, most of the generated knowledge is probably illustrated in unstructured or semi-structured manners. Hence, the management of unstructured or semi-structured knowledge is an important issue. Due to lack of wellstructured storage approaches for KM in literature, the integration of knowledge activities has been impossible (Nonaka and Takeuchi, 1995). In addition, it is known that the storage structure signicantly impacts the effectiveness and efciency of operating KM activities. Therefore, this article attempts to apply eXtensible Markup Language (XML) to effectively transform organizations experience into documents of semi-structured knowledge and manage them. f Search of semi-structured knowledge: how to use organizational knowledge effectively and agilely is crucial since: if desired organizational knowledge can not be accessed in time, the knowledge is in vain and no more active (Turban and Aronson, 2001). However, the form of semi-structured knowledge usually causes difculty for search and use of that knowledge because it does not have systematic format, which is one of the major problems in the re-use of semi-structured knowledge. Information retrieval is technologies used to extract interrelated information from semi-structured or unstructured documents, and then to present the documents in structured ways (Van Rijsbergen, 1979). However, this approach is limited to operate with synonyms, polymorphism, or dictionaries, and low accuracy of searching results is observed in information retrieval. To resolve this, latent semantic indexing (LSI) is a technique to search relevant semi-structured knowledge documents, which provides faster and more correct retrieval performance. Semi-structured knowledge (SSK) in organizations Main issues of knowledge management Asymmetry, structural uncertainty (Becker, 2001), and re-use of knowledge are the main issues when organizations attempt to put knowledge management into practice. These issues are illustrated as follows: f Asymmetry: the organizational knowledge is retrieved from experts, documents, repositories, organizational routines, processes, practices, norms, and other events, and it VOL. 7 NO. 4 2003 | JOURNAL OF KNOWLEDGE MANAGEMENT | PAGE 107 structural uncertainty and re-use of ‘‘ Asymmetry, knowledge are the main issues when organizations attempt to put knowledge management into practice. ’’ has strong experiential and reective elements. Moreover, this knowledge is information that is contextual, relevant, and actionable (Turban and Aronson, 2001). Having the knowledge implies that it can be used to solve a problem, whereas having information does not carry the same connotation. Hence, the ability to act is an integral part of being knowledgeable (Turban and Aronson, 2001). f Structural uncertainty: the knowledge in organizations is hard to manage and store, because of its structural uncertainty. To resolve this, the Zachman Framework can be used to describe knowledge architecture and to capture organizational knowledge derived from knowledge management activities, and this knowledge has signicant impacts on the organization. Although there may be many types of knowledge, as far as organizations are concerned, knowledge is a fundamental factor behind all of the organization activities (Liebowitz, 1997; Wiig, 1999). Structured knowledge may be represented through the format of rules, models, procedures, etc. In contrast, semi-structured knowledge uses six dimensions (5W and 1H) of the Zachman Framework to capture knowledge in organizations. In organizations, semi-structured knowledge is more signicant than structured knowledge because more non-structured information can easily be found. f Re-use: building complex knowledge-based applications requires the incorporation of large amounts of domain knowledge (Levy and Rousset, 1998). In this article, semi-structured knowledge is externalized through XML, whose signicant attribute is the ability to comprise large amounts of domain knowledge, and know-how (Birbeck, 2000; Fabio, 2001). Thus, by using XML, the re-usability of SSK is increased and the domain knowledge and know-how can be shared in organizations more effectively. Denition of SSK Semi-structured knowledge is here dened as a collection of knowledge resulting from KM activities in organizations, and which is constructed by the six dimensions of the Zachman Framework; it contains solution approach information, domain knowledge and know-how, and has impact on organizations. Some knowledge in organizations is hard to manage and store, because of its structural uncertainty. The Zachman Framework provides a systematical approach to externalize unstructured knowledge in organizations, although not all contents of knowledge can be represented in this format. That is the reason why the term of ‘‘semi’’ is used in this article (see Figure 1). Characteristics of SSK Abiteboul (1997) dened semi-structured data as data that was neither raw, nor strictly organized, as in conventional database systems, and described characters of semi-structured data. Similar to semi-structured data, semi-structured knowledge, which includes analogical characteristics, can be described as follows: PAGE 108 | f The structure is irregular: SSK includes several heterogeneous dimensions. Some dimensions may be uncompleted, and some may comprise additional information (e.g. annotations); or different perspectives on the same kind of information may result in using different dimensions. For example, the pricing of merchandise is perceived differently by purchasing and accounting departments: one perceives it as cost, whereas the other sees it as selling price. f The structure is divisible: SSK documents consist of text and grammar; so the parsing of SSK documents is able to divide crude information into pieces and discover relationships between them. Therefore, the SSK structure is constituted from divided information and its signicant relationship. JOURNAL OF KNOWLEDGE MANAGEMENT | VOL. 7 NO. 4 2003 Figure 1 Conceptual diagram for semi-structured knowledge f The structure is a-posteriori: database management system (DBMS) is based on the hypothesis of a xed schema that has to be predened before introducing any data. This is not the case for semi-structured knowledge where the notion of schema is often posterior to the existence of data. Semi-structured knowledge not only includes similar characteristics of the semi-structured data, as shown above; but it also contains the characteristics which are based on organizational activities as driven through problematic events. These are subject oriented, time dependent, and reference oriented (see Figure 2). f Subject oriented: semi-structured knowledge in organizations is constituted from a clear and concrete subject. The subject, for instance, could be the desired event, which needs to analyze the nature of entities, e.g. customers in the organization or business processes. Therefore, the generated knowledge is only substantially meaningful to some specic problem domains. For example, knowledge generated from classication analysis for customers is only benecial to the marketing department. f Time dependent: retrieving information in time is required for semi-structured knowledge. If the information cannot be delivered at the appropriate time, that can cause a tremendous impact on the value of the semi-structured knowledge. f Reference oriented: semi-structured knowledge, which is discovered from organizational knowledge through data analysis techniques and domain experts, not only provides unique conclusions, but it also discovers knowledge and supports conclusions from multiple perspectives. Whenever an organization uses SSK, it eventually should invigorate other types of knowledge. Numerous classication approaches have been used to distinguish different types of knowledge, for example, Polanyi (1962) divided knowledge into tacit or explicit knowledge. Tuthill and Levy (1991) proposed that knowledge could be separated into declarative knowledge, procedural knowledge, heuristic knowledge, commonsense knowledge, and informed commonsense knowledge. Knowledge in organizations may be rules, such as rule VOL. 7 NO. 4 2003 | JOURNAL OF KNOWLEDGE MANAGEMENT | PAGE 109 Figure 2 Characteristics of semi-structured knowledge The structure is irregular The structure divisible Reference oriented Characteristics of Semi-structured Knowledge Time dependent The structure a-posteriori Subject oriented knowledge; or a procedure, as symbol-type knowledge (Collins, 1997); or commonsense, as commonsense knowledge (Tuthill and Levy, 1991); or a written description, as declarative knowledge (Tuthill and Levy, 1991); or a process and result by experts inferring, as Inference knowledge (Wielinga et al., 1992), and so on. In general, different types of generated knowledge are the results of different classication approaches. Most of the classications of knowledge still remain at the conceptual level, rather than having practical utility. The semi-structured knowledge presented in this article can exibly externalize most of the knowledge described above and incorporate with their characteristics, but it would not organize the belief and culture existing in organizations. This is because semi-structured knowledge is simply the collected knowledge that is produced through the generation processes (presented in the next section). The SSK does not aim at inuencing the relationships, standards, attitudes, between individuals and organizations (e.g. organizational culture) but it does focus on decision support and problem solving. Externalization of semi-structured knowledge with the Zachman Framework Whenever a problematic event occurs, managers in organizations apply solution approaches and desired knowledge to solve the problem. Knowledge is often evolved from past knowledge and experience, and knowledge by nature is continuous and extendable. Knowledge is changed constantly. Furthermore, perception of a single event in organizations varies from different perspectives (e.g. time and place). Therefore, organizational knowledge must be constructed as semi-structured knowledge with additional information from various perspectives (e.g. the 5W1H dimensions in the Zachman Framework). The Zachman Framework represents the perspectives and dimensions in matrix form, with the perspective representing the rows and the dimensions representing the columns (see Table I). The columns include: PAGE 110 | f Entities (What? Interest or focus areas): considering semi-structured knowledge in organizations, entities are the data, information, events, and knowledge to be manipulated. f Activities (How? Methods of problem-solving): considering semi-structured knowledge in organizations, activities are the capture, interpretation, measurement, accumulation, deployment, externalization, innovation, and feedback of the entities. JOURNAL OF KNOWLEDGE MANAGEMENT | VOL. 7 NO. 4 2003 Table I Dimensions of semi-structured knowledge in organizations Entities (What) Activities (How) Locations (Where) People (Who) Time (When) Motivation (Why) Perspectives Data, information, knowledge Capture, interpretation, measurement, accumulation, deployment, externalization, innovation, feedback Organization, process Users, administrators, developers Time instance Reason needed Goal Accuracy Effectiveness/efciency Accuracy location Accuracy location Accurate timing Motivated or not Example An association rule Approach of discovering association rule Marketing department Marketing manager 2000-2001 year Study the relationship between products A and B f Locations (Where? Places of interest or focus): considering semi-structured knowledge in organizations, locations are appointed to be in organizations or processes. f People (Who? Individuals and organizations of interest or focus): considering semistructured knowledge in organizations, people are users, administrators, or developers. f Times (When? Activities occurring): considering semi-structured knowledge in organizations, time is the instance when activities occur. f Motivations (Why? Reasons for inspiration): Considering semi-structured knowledge in organizations, motivations are the reasons why activities occur. A simple example of the semi-structured knowledge about ‘‘a good bank account’’ is presented in the six dimensions. The perspectives and goals of semi-structured knowledge are classied in Table I. Generation and content of semi-structured knowledge in organizations Semi-structured knowledge in an organization is generated through a series of transformation processes, which are triggered by problematic events. Every step of these processes involves technical and managerial issues. Figure 3 illustrates the generation process of semi-structured knowledge. More details are as follows: f A dashed line represents the source of data, which indicates a problematic ‘‘event’’. Here, the needs of data or processing information from organizations are triggered by the problematic event occurring in a working process or organization. In other words, the problematic event is dened as the source to generate semi-structured knowledge. f The blocks represent the entities in the process. There are assessment of requirements, classication of events, subject identication, experts’ involvement, and group discussions. The contents in parentheses represent generated information or knowledge from each step. f The oval blocks represent data storage systems or knowledge repository. f Three different types of arrows are used in the diagram. Those with solid lines illustrate the direction of the process; with long dashed lines point out the needs of semi-structured knowledge documents; and with dotted dash lines show paths to construct semi-structured knowledge. As shown above, semi-structured knowledge includes general information, solution approach information, and feedback knowledge. Five detailed models that serve as building blocks in the generation process are formulated explicitly. There are event classication, subject, knowledge processing, knowledge transformation, and semi-structured knowledge models. Each function in the detailed model can be perceived as a mechanism, which is able to transform input variables to output. VOL. 7 NO. 4 2003 | JOURNAL OF KNOWLEDGE MANAGEMENT | PAGE 111 Figure 3 Flow diagram of generating semi-structured knowledge D ata operation requirement Event occurred D ocuments Repositories Semi-structured knowledge documents design D ata process requirement D istribution of Problem (G eneral information I G) Expert (Solution approach information Experts knowledge and involvement (K nowledge I S) K p) Professional Feedback knowledge (K nowledge K T) Semi-structured knowledge (SSK ) Subject about problems (Solution approach infor mation I S) Model 1: Event classication model E = fE (o, p) (1) where: fE = event transformation function; o = business organization; p = business process; E = problematic event (1) Function: formula (1) corresponds to the relationship between a business process and a problematic event. (2) Parameter: problematic event E – semi-structured knowledge is derived from original problematic events. Therefore, the purpose of event classication is to support the distinction of semi-structured knowledge. (3) Contents: general information (IG ) is described as fundamental information corresponding to a problematic event, for example, a description, observer, location and time etc. General information is not the most important information, but it is necessary. Knowledge users are assumed to be able to capture the whole framework of semi-structured knowledge if general information is given appropriately. At this stage, generated information mainly focuses on business structures and processes corresponding to the problematic events. Model 2: Subject model S = fs ( 1, 2, 3, 4, 5 , h1 ) (2) where: fs = subject transformation function; S = subject of problematic event; 1 = entity of subject (what); 2 = place of subject (where); 3 = people of subject (who); 4 = timing of subject (when); 5 = motivations and reasons of subject (why); h1 = process of subject (how). (1) Function: formula (2) corresponds to the relationship between a subject and a problematic event, which is correlated to 5W1H. (2) Parameter: (a) Subject, S: the subject should be clear, concise and able to represent the characteristics of semi-structured knowledge. Furthermore, the structure of the subject should be dynamic; i.e. sensitive to dimension and level changes. (b) Entity of subject, 1 : the entity of the subject is ‘‘what’’. This parameter dominates the rest of parameters (where, who, when, why, and how). For example, if entity is clear then the place, people, timing and PAGE 112 | JOURNAL OF KNOWLEDGE MANAGEMENT | VOL. 7 NO. 4 2003 motivation are also certain. (c) Place, (g) Process of subject, h 1 . 2. (d) People, 3. (e) Timing, 4. (f) Motivation, 5. (3) Contents: solution approach information (IS ): the solution approach information includes important interaction messages and results by applying the approach, but not the details of processes. The purpose of this information is to allow users to conveniently and clearly understand messages of interaction among, users, the knowledge and the environment while solution approaches are applied. Model 3: Knowledge processing model KP = fp (E, S, P) (3) where: fp = knowledge process function; KP = knowledge coming from the expert; E = problematic event; S = subject; P = expert. (1) Function: formula (3) corresponds to the knowledge resulted from an expert or analyzer using a particular solution approach, e.g. the Apriori association rules (Agrawal and Srikant, 1994). (2) Parameter: (a) The knowledge is derived from experts, KP : this knowledge is more focused on subject-oriented expertise; i.e. the knowledge is generated based on a subject S and corresponds to a problematic event. (b) Problematic event, E (refer Model 1). (c) Subject, S (refer Model 2). (d) Expert, P: the denition of expert is conned to a domain expert for the particular subject. (3) Contents: the knowledge is a combination of the domain expert’s expertise and the results produced based on this subject. Model 4: Knowledge transform model Kt = ft (KP , P 0 , Ci) (4) where: ft = knowledge transformation function; Kt = feedback knowledge; KP = knowledge derived from experts; P 0 = people in general (not only expert); Ci = the i t h content category. (1) Function: formula (4) corresponds to the feedback knowledge produced through domain expert and administrators, using their knowledge and experience. (2) Parameter: (a) Feedback knowledge, Kt : is the feedback of a group of members to these information and knowledge. (b) The knowledge is derived from experts, KP (refer Model 3). (c) The people, P 0 : here, people include both experts and non-experts. (d) Content category, Ci: contents are based on experts’ insights, which are extracted from the solution approach information. Contents should incorporate other different categories, for instance, fact, reason, suggestion, etc. (3) Contents: feedback knowledge shows different professional knowledge from different standpoints and opinions about the truth from each member. Model 5: Semi-structured knowledge model K = I G + I S + Kp + K t (5) where: K = semi-structured knowledge; IG = general information; IS = solution approach information; KP = knowledge derived from experts; Kt = feedback knowledge. Formula (5) corresponds to the nal semi-structured knowledge, which incorporates general information, solution approach information, the knowledge coming from experts and feedback knowledge. Transformation of semi-structured knowledge with XML Different types of information in the generation process are derived from different subjects and solution approaches. For example, an association rule approach may be used to generate solution approach information, whereas a classication approach may be used to form VOL. 7 NO. 4 2003 | JOURNAL OF KNOWLEDGE MANAGEMENT | PAGE 113 is critical to design the structure of knowledge to ‘‘ Itachieve knowledge externalization. ’’ captured information. Therefore, it is critical to design the structure of knowledge to achieve knowledge externalization and a standard storage system providing a place for the semistructured knowledge to reside. Moreover, XML shares many features of semi-structured data and semi-structured knowledge. For example: its structure can be irregular, is not always known in advance, and may change frequently and without any notice. Therefore, this article delineates the semi-structured knowledge using XML because XML-based SSK documents are easy to store and manage. Figure 4 illustrates an XML-based documentation of contents for the semi-structured knowledge, which constructed by the Zachman Framework. To be able to support and transform semi-structured knowledge, XML is implemented to dene standard documents for externalization, including three different categories: general information, solution approach information, and feedback knowledge. The association rule of data mining is illustrated as an example below. General information (IG ) – includes common and fundamental information. To help users clearly understand the overall framework of semi-structured knowledge, ‘‘documents’’ correspond to the objective in this format (see Figure 5). Problematic event: According to formula (1) above, problems pertain to which event is claried and the types of organizational structure and processes in need to solve the problem are specied. For Figure 4 An example of XML-based documentation representing semi-structured knowledge PAGE 114 | JOURNAL OF KNOWLEDGE MANAGEMENT | VOL. 7 NO. 4 2003 Figure 5 XML format for general information E = f E (o, p) example, the department is indicated by ‘‘place’’ while the desired persons is stipulated by ‘‘who’’. Here, identication of the problematic types of organization structure and process is similar to business process reengineering. Format: <General_Information> <Event> <Event_Name> Less percentage of seat utilization </Event_Name> <Event_Observer> #56321578, Mendy Wang </Event_Observer> <Event_Description> During this month (2002/05) travel route 13, 13% of seat utilization decline. </Event_Description> <Event_When>2002/06/01</Event_When> <Event_Where>travel route 13, plan routes</Event_Where> </Event> </General_Information> Solution approach information (IS ) – the contents of Figure 6 show that 5W and 1H are important components of solution approach information. The components of ‘‘<Subject_How>’’ are emphasized. Different approaches contain different contents. The association rule analysis in XML is exemplied as below: Figure 6 XML format for solution approach information S = f s w1 w2 w3 w4 w5 h1 VOL. 7 NO. 4 2003 | JOURNAL OF KNOWLEDGE MANAGEMENT | PAGE 115 Subject: The 5W1H (What, Where, Who, When, Why and How) is used to specify ‘‘Subject’’ corresponding to formula (2) above. Format: <Process_Information> <Subject> <Subject_What> <Id>#S0982834</Id> <Name>Routes analysis</Name> <Classication>Plan</Classication> </Subject_What> <Subject_Who> <Expert_Info> <Expert_Name>John Li</Expert_Name> <Expert_Id>#7985462</Expert_Id> <Expert_Depart>Routes Plan Dep.</Expert_Depart> <Expert_Position>Plan manager</Expert_Position> </Expert_Info> <Target>customer and travel routes</Target> </Subject_Who> <Subject_When> <Process_Time>2001/06/01</Process_Time> <Data_Period>2001/05/01~2001/05/31</Data_Period> </Subject_When> <Subject_Where> <Process_Location>Taipei division</Process_Location> <Target_Location>Routes in Taipei</Target_Location> <Source_Location> "SELECT * FROM customers, sales" </Source_Location> <Storage_Location>20010601132201.xml </Storage_Location> </Subject_Where> <Subject_Why> <Reason> Through check the data of customer and sales, we can nd out the relation between customer and his behaviors. </Reason> </Subject_Why> <Subject_How> <Tech_info> <Tech_Field>Data Mining</Tech_Field> <Tech_Method>Association Analysis</Tech_Method> <Tech_Tools>IBM Intelligent Miner</Tech_Tools> </Tech_info> <Specic_Info> <Itemset> 20~25, 25~30, Yang Ming Shan NP, Tourism River, 2days, 3days, tickets} </Itemset> <Min_Sup>0.5</Min_Sup> <Min_Conf>0.75</Min_Conf> <Result> <Rule> <Id>#R785123</Id> PAGE 116 | JOURNAL OF KNOWLEDGE MANAGEMENT | VOL. 7 NO. 4 2003 <Large_Itemset> {25~30, Tourism River, 2days, one way ticket} </Large_Itemset > </Rule> </Result> </Specic_Info> </Subject_How> </Subject> </Process_Information> Knowledge (Kp +KT ) – the knowledge (Kp +KT ) includes: (1) experts’ explanation for the results of solution approach information, and (2) feedback from a group of members regarding this information and knowledge. The same fact, through different explanation of different members, may have different implications. Thus, the value of knowledge is increased spirally through communication and sharing in an organization. Through (1) different expert knowledge from group members, (2) different standpoints and opinions on the fact, and (3) the function of grouping feedback, the knowledge not only expresses various personal professional opinions but also the understanding of each member. It can help to avoid the trap of self-consciousness, and to increase the quality and accuracy of decision-making. The format of XML illustrating the association rule analysis is shown in Figure 7. Knowledge: There are two types of knowledge – experts’ explanation of the results of solution approach information, and the feedback from group members regarding this information and knowledge, corresponding to formula (3) and formula (4) above. Format: <Feedback_Knowledge> <Knowledge> <Knowledge_Who> <Id>#7985462</Id> <Depart>Routes Plan Dep.</Depart> <Name>John Li</Name> <Position>Plan manager</Position> </Knowledge_Who> <Knowledge_When>2001/06/01</Knowledge_When> <Knowledge_Where>Routes plan</Knowledge_Where> Figure 7 XML format of feedback knowledge KP = fP(E,S,P1) KT = fT (KP, P¢,Ci ) VOL. 7 NO. 4 2003 | JOURNAL OF KNOWLEDGE MANAGEMENT | PAGE 117 <Reference_Id>#R785123,#R7895462</ Reference_Id> <Knowledge_Why> (1) New policy about holidays (2) Special celebration </Knowledge_Why> <Suggest> Company should improve routing schedule for special celebration. </Suggest> </Knowledge> </Feedback_Knowledge> In this article, the generation process of semi-structured knowledge represented through the Zachman Framework claries the processes and activities of knowledge management in organizations. This semi-structured knowledge presented as XML documents is helpful in packing, storing, management, and sharing of knowledge in organizations. The search of semi-structured knowledge in organizations Because the contents of these semi-structured knowledge documents are variable and irregular, it is difcult to search and share. Latent semantic indexing is a useful and automatic way to help search relevant semi-structured knowledge documents according to the sentences or articles, which users query. Furthermore, latent semantic indexing (LSI) is also able to solve problems of synonyms and polysemy, and promote accuracy in searching for solutions documents (Letsche, 1996; Deerwester et al., 1990). Therefore, in this article LSI is used to discover desired XML documents for the persons who need them. Using LSI for semi-structured knowledge documents The latent semantic indexing information retrieval model builds upon prior research in information retrieval, and uses the singular value decomposition (SVD) to reduce the dimensions of the term-document space to solve the synonomy and polysemy problems that plague automatic information retrieval systems (Letsche, 1996). By reducing the dimensionality of the term-document space, the underlying, semantic relationships between documents are revealed, and much of the ‘‘noise’’ (differences in word usage, terms that do not help distinguish documents, etc.) are eliminated. There are three main steps in using LSI to identify relevant semi-structured knowledge document in organizations (Letsche, 1996; Story, 1996; Chen, 1999): PAGE 118 | f Step 1 – Pre-processing: some markups are removed, and all hand-indexed entries are removed from collections. Upper case characters are translated into low case, punctuation is removed, and white spaces are used to delimit terms, and so on. f Step 2 – Singular value decomposition (SVD): in the LSI model, terms and documents are represented by an m n incidence matrix, A. Once the m n matrix A has been created, a rank-k approximation (k min (m, n)) to A, Ak , is computed using an orthogonal decomposition known as the singular value decomposition (SVD). With regard to LSI, Ak is the closest k-dimensional approximation to the original term-document space represented by the incidence matrix A. f Step 3 – Query formulation: in the LSI model, queries are formed into ‘‘pseudodocuments’’ that specify the location of the query in the reduced term-document space. Once the query is projected into the term-document space, similarity measures, e.g. the cosine similarity measure, can be applied to compare the position of the pseudo-document to the positions of the terms or documents in the reduced term-document space. Once the similarities between the pseudo-document and all the terms and documents in the space have been computed, the terms or documents are ranked according to the results of the similarity measure; and the highest-ranking terms or documents, or all the terms and documents exceeding some threshold value, are returned to the user. JOURNAL OF KNOWLEDGE MANAGEMENT | VOL. 7 NO. 4 2003 Illustrative example This example to illustrate the three main steps in transportation industry assumes: (1) a new problematic event that ‘‘utilization of ight seats is decreasing’’ has been discovered; (2) to make the example compact, only six semi-structured knowledge documents in collections. Abstracts of the six semi-structured knowledge documents are shown in Table II; and (3) as the rst step, pre-processing is completed though intelligent agents (Huang, 2001), and the terms and the keywords are taken from the abstract of the documents. The SSK searching system determines which semi-structured knowledge document in collections is most likely to contain the potential knowledge conceptualized by a set of keywords to support the manager in solving the problem. After pre-processing, a termdocument matrix, whose entries indicate occurrence of terms among the documents, is formed. The term-document matrix, matrix A, resulting from the selected terms and documents, is shown in Table III. Each entry in the matrix indicates that a particular term occurs in a given document. Next, matrix A is decomposed by using the singular value decomposition (SVD). SVD produces matrices U, S, and V, such that A = U S V. Matrix A is the term-document matrix shown in Table III; its rank is 6, so only six dimensions are needed to represent it. The matrices are as follow: U (19 19) and V (6 6) are unitary matrices and S (19 6) is a diagonal matrix of singular value. Then the six-dimensional space is projected onto two dimensional subspaces by selecting the rows and columns of U, S, and V corresponding to the largest two singular values. The resulting two-dimensional matrices are: U2 (19 2) = 0.388949 0.06021 0.023998 0.145742 0.026749 0.319761 0.118993 0.046536 0.023998 0.011999 0.632945 0.284105 0.051426 0.098844 0.146262 0.246228 0.093072 0.093072 0.331925 S2 (2 2) = 0.115807 –0.11727 –0.03115 –0.23285 –0.04346 –0.63229 –0.1894 0.024753 –0.03115 –0.01557 0.340607 0.148832 0.025066 0.054132 0.083198 –0.53228 0.049507 0.049507 0.203465 10.518864 0 V2 (6 2) = 0 10.171535 0.281365 0.126218 0.351975 0.540943 0.489504 0.498782 –0.442010 –0.158400 –0.750821 0.254956 0.251780 0.295650 A vector of the query, ‘‘To nd anything of passengers carried and decrease’’, would be q = [0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]T where T indicates transpose. Terms of ‘‘passengers carried’’ and ‘‘decrease’’ are 2nd and 4th terms in the index, and no other terms are selected. Let q be the query vector. Then the document space vector corresponding to q is given by: qT U2 inv (S2) = Dq For the product vector, the result is: Dq = [0.0196 VOL. 7 NO. 4 2003 –0.0344] | JOURNAL OF KNOWLEDGE MANAGEMENT | PAGE 119 Table II Examples extracted from semi-structured knowledge documents Document Content Document Doc1 <General_Information> <Event> <Event_Name>The decreasing of passenger trafc volume</Event_Name> <Event_Observer>Kevin Lee</Event_Observer> <Event_Description>Passenger trafc volume decrease year by year, because the passengers carried to shift personal trafc tools. In 1998, the passengers carried is 157.294 million, to grow up 5.38%. In 1999, the passengers carried is 160.330 million, to grow up 1.93%. In 2000, the passengers carried is 159.981 million, to grow up – 0.22%. In 2001, the passengers carried is 159.438 million, to grow up -0.34%. </Event_Description> <Event_When>2002/12/03</Event_When> <Event_Where>Customer service Dep. </Event_Where> </Event> </General_Information> Doc2 <General_Information> Doc5 <Event> <Event_Name>Passenger-Kilometers raised </Event_Name> <Event_Observer>Joe Lu</Event_Observer> <Event_Description> The FAST rapid transit system to nished, FAST rapid transit system’s passengers carried to increase, bring ABC Transport Company’s passenger trafc volume increase. In 2000, Passenger-Kilometers is 9978 million, grow up 2.11%. </Event_Description> <Event_When>2002/11/05</Event_When> <Event_Where> Sales department </Event_Where> </Event> </General_Information> <General_Information> <Event> <Event_Name>A1 station to B1 station Passenger Trafc Volume not same</Event_Name> <Event_Observer>Bill Hu</Event_Observer> <Event_Description> A1 station to B1 station and B1 station to A1, station Passenger Trafc Volume not same. B1 station to A1 station select columniation to check the number, every day Passenger Trafc Volume is 27 hundred. B1 station to A1 station not sure select columniation to check the number. </Event_Description> <Event_When>2002/01/28</Event_When> <Event_Where>A1 Station</Event_Where> </Event> </General_Information> Doc3 <General_Information> <Event> <Event_Name>The decreasing of PassengerKilometers</Event_Name> <Event_Observer>Tom Wu</Event_Observer> <Event_Description> Passenger-Kilometers decrease year by year. In 1996, Passenger-Kilometers is 9542 billion, grow up –0.5%. In 1997, Passenger-Kilometers is 9505 billion, grow up –3.14%. In 1998, Passenger-Kilometers is 9489 billion, grow up –2.49%. In 1999, Passenger-Kilometers is 8969 billion, grow up –0.63%. In 2000, Passenger-Kilometers is 9254 billion, growup –0.89%. In 2001, Passenger-Kilometers is 9784 billion, grow up –0.52%. </Event_Description> <Event_When>2002/03/08</Event_When> <Event_Where>Sales department</Event_Where> </Event> </General_Information> <General_Information> <Event> <Event_Name>A1 station to every station Ticket Revenues</Event_Name> <Event_Observer>Charlie Tseng</Event_Observer> <Event_Description> A1 station to every station Ticket Revenues, E1 station to A1 station have most Ticket Revenues, 17.09% of the total Ticket Revenues. A1 station to E1 station Ticket Revenues, 15.51% of the total Ticket Revenues. A1 statiom to D1 station Ticket Revenues, 13% of the total Ticket Revenues. </Event_Description> <Event_When>2002/05/08</Event_When> <Event_Where>Station</ Event_Where> </Event> </General_Information> PAGE 120 | JOURNAL OF KNOWLEDGE MANAGEMENT | VOL. 7 NO. 4 2003 Doc4 Content Doc6 <General_Information> <Event> <Event_Name>A1 Station go to north Passenger Trafc Volume</Event_Name> <Event_Observer>Tom Wu</Event_Observer> <Event_Description> A1 Station go to north Passenger Trafc Volume. A1 station to B1 station Passenger Trafc Volume is 3.75 billion. A1 station to C1 station Passenger Trafc Volume is 1.4 billion. A1 station to D1 station Passenger Trafc Volume is 1.1 billion. A1 station to E1 station Passenger Trafc Volume is 0.65 billion. </Event_Description> <Event_When>2002/03/08</Event_When> <Event_Where>A1 Station</Event_Where> </Event> </General_Information> Table III Term-document matrix Term Passenger trafc volume Decrease Increase Passengers carried Personal trafc tools Grow up Million Hundred FAST rapid transit system Finished A1 station B1 station C1 station D1 station E1 station Passenger-Kilometers Columniation Check the number Ticket Revenues Doc1 Doc2 Doc3 Doc4 Doc5 Doc6 1 1 0 5 1 4 4 0 0 0 0 0 0 0 0 0 0 0 0 1 2 2 1 0 1 1 0 2 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 6 0 0 0 0 0 0 0 0 0 7 0 0 0 5 0 0 0 0 0 0 0 0 0 5 1 1 1 1 0 0 0 0 2 0 0 0 0 0 0 1 0 0 4 5 0 0 0 0 2 2 0 0 0 0 0 0 0 0 0 0 0 4 0 0 1 2 0 0 0 7 To nd the best document match, the Dq vector is compared against all the document vectors in the two-dimensional V2 space. The document vector that is nearest in direction to Dq is the best match. The cosine of the angle between the query vector and the document vector is a convenient measure of goodness-of-t. The cosine values for the six document vectors and the query vector are: [0.9988 0.9880 0.9968 0.0774 0.0428 –0.0172] The best t for product query vectors is indicated by the rst document, which is Doc1. The third document, Doc3, is also indicated as a good solution. Conclusion Knowledge should be seen from different viewpoints and be used diversely. Through the Zachman Framework and XML, organizational knowledge could be externalized systematically as semi-structured knowledge documents, and stored and managed effectively. Due to the structure of semi-structured knowledge documents and the low accuracy of searching results, the latent semantic indexing (LSI) can quickly and correctly search relevant semi-structured knowledge documents to support knowledge management activities. The main contributions of this article are that: (1) The novel concept of semi-structured knowledge (SSK), has its generation represented by the Zachman’s Framework for clarication of the processes and activities in organizational knowledge management. (2) The semi-structured knowledge delineated as XML documents is helpful in packing, storing, management, and sharing of SSK in organizations. (3) The LSI model and an easy search system are developed to diffuse relevant knowledge to the person who needs it. Through the LSI tools, relevant semi-structured knowledge should be seen from different viewpoints and ‘‘ Knowledge be used diversely. ’’ VOL. 7 NO. 4 2003 | JOURNAL OF KNOWLEDGE MANAGEMENT | PAGE 121 documents can be identied to support daily duties or make decisions for knowledge workers. The SSK approach shows great promise for organizations to acquire, store, disseminate, and reuse knowledge. Acknowledgment This work was partially supported by funding from the Nation Science Council of Taiwan (NSC91-2146-E-260-005). Reference Abiteboul, S (1997), ‘‘Querying semi-structured data’’, ICDT ’97, 6th International Conference, Delphi, Greece, pp. 1-18. Agrawal, R. and Srikant, R. (1994), ‘‘Fast algorithms for mining association rules in large databases’’, Proceedings of 20th International Conference on Very Large Data Bases, San Francisco, CA, pp. 487-99. Alavi, L. (2001), ‘‘Review: Knowledge management and knowledge management systems: conceptual foundations and research issues’’, MIS Quartely, Vol. 25 No. 1, pp. 107-36. Alavi, M. and Leidner, D. (1999), ‘‘Knowledge management systems: emerging viewsand practices from the eld’’, Proceedings of 32nd Hawaii International Conference on System Sciences, Hawaii, pp. 4-11. Becker, M.C. (2001), ‘‘Managing dispersed knowledge: organizational problems, managerial strategies, and their effectiveness’’, Journal of Management Studies, Vol. 38 No. 7, pp. 1037-51. Birbeck, M. (2000), Professional XML, Wrox Press, Chicago, IL. Chauvel, D. and Despres, C. (2002), ‘‘A review of survey research in knowledge management: 1997-2001’’, Journal of Knowledge Management, Vol. 6 No. 3, pp. 207-23. Chen, C. (1999), ‘‘Visualising semantic spaces and author co-citation networks in digital libraries’’, Information Processing and Management, Vol. 35, pp. 401-20. Collins, H. (1997), Human, Machines, and the Structure of Knowledge, Knowledge Management Tools, Butterworth-Heinemann, New York, NY. Davenport, T. and Prusak, L. (1998), Working knowledge, Harvard Business School Press, Boston, MA. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K. and Harshman, R. (1990), ‘‘Indexing by latent semantic analysis’’, Journal of the American Society for Information Science, Vol. 41, pp. 391-407. Dzbor, M., Paralic, J. and Paralic, M. (2000), ‘‘Knowledge management in a distributed organization’’, 4th IEEE/IFIP Conference on IT for Balanced Automation Systems, Berlin. Fabio, A.A. (2001), XML Developer’s Guide, McGraw-Hill, New York, NY. Fischer, G. and Ostwald, J. (2001), ‘‘Knowledge management: problems, promises, realities, and challenges’’, IEEE Intelligent Systems, Vol. 16 No. 1, pp. 60-72. Grant, R.M. (1996), ‘‘Toward a knowledge management and the N-form corporation’’, Strategic Management Journal, Vol. 15, pp. 73-90. Huang, Chun-Che (2001), ‘‘Using intelligent agents to manage fuzzy business process’’, IEEE Transactions on Systems, Man, and Cybernetics, Part A, Vol. 31 No. 6, pp. 508-23. Inmon, W.H., Zachman, J.A. and Geiger, J.G. (1997), Data Stores, Data Warehousing, and the Zachman Framework: Managing Enterprise Knowledge, McGraw-Hill, New York, NY. Kanter, J. (1999), ‘‘Knowledge management, practically speaking’’, Information Systems Management, pp. 7-15. Klint, P. and Verhoef, C. (2002), ‘‘Enabling the creation of knowledge about software assets’’, Data and Knowledge Engineering, Vol. 41 No. 2-3, pp. 141-58. Letsche, T.A. (1996), ‘‘Toward large-scale information retrieval using latent semantic indexing’’, Master’s thesis, Department of Computer Science, University of Tennessee. Levy, A. and Rousset, M. (1998), ‘‘Carin: a knowledge representation language combining horn rules and description logics’’, Articial Intelligence Journal, Vol. 104, pp. 165-209. PAGE 122 | JOURNAL OF KNOWLEDGE MANAGEMENT | VOL. 7 NO. 4 2003 Liebowitz, J. (1997), Knowledge Management Handbook, CRC Press, New York, NY. Miller, J.A., Potter, W.D. and Kochut, K.J. (1992), ‘‘Knowledge, data, and models: taking an objective orientation on integrating these three’’, IEEE Potentials, Vol. 11 No. 4, pp. 13-17. Mo, J.P.T. and Menzel, C. (1998), ‘‘An integrated process model driven knowledge based system for remote customer support’’, Computer in Industry, Vol. 37 No. 3, pp. 171-83. Nonaka, I. and Takeuchi, H. (1995), The Knowledge Creating Company, Oxford University Press, New York, NY. O’Leary, D.E. (1998), ‘‘Enterprise knowledge management’’, IEEE Computer, Vol. 31 No. 3, pp. 54-61. Polanyi, M. (1962), Personal Knowledge, corrected ed., Routledge, London. Prusak, L. (1996), ‘‘The knowledge advantage’’, Strategy and Leadership, Vol. 24, pp. 6-8. Roth, A.V. (1996), ‘‘Achieving strategic agility through economies of knowledge’’, Strategy and Leadership, Vol. 24, pp. 30-7. Satyadas, A., Harigopal, U. and Cassaigne, N.P. (2001), ‘‘Knowledge management tutorial: an editorial overview’’, IEEE Transactions on Systems, Man and Cybernetics, Part C: Applications and Reviews, Vol. 31 No. 4, pp. 429-37. Spender, J.C. and Grant, R.M. (1996), ‘‘Knowledge and the rm: overview’’, Strategic Management Journal, Vol. 17, pp. 5-9. Story, R.E. (1996), ‘‘An explanation of the effectiveness of latent semantic indexing by means of a Bayesian regression model’’, Information Processing & Management, Vol. 32 No. 3, pp. 329-44. Turban, E. and Aronson, J.E. (2001), Decision Support Systems and Intelligent Systems, Prentice Hall, Upper Saddle River, NJ. Turban, E. and Aronson, J.E. (2001), Decision Support Systems and Intelligent Systems Sixth Edition: Knowledge Management, Prentice Hall, Upper Saddle River, NJ. Tuthill, G.S. and Levy, S.T. (1991), Knowledge Based Systems: A Manager’s Perspective, TAB Book, New York, NY. Van Rijsbergen, C.J. (1979), Information Retrieval, 2nd ed., Butterworths, London. Wielinga, B.J., Schreiber, A.Th. and Breuker, J.A. (1992), ‘‘KADS: a modelling approach to knowledge engineering’’, Knowledge Acquisition, Vol. 4 No. 1, pp. 5-53. Wiig, K.M. (1999), ‘‘The intelligent enterprise and knowledge management’’, invited article for the UNESCO Encyclopedia of Life Support Systems, Knowledge Research Institute, Arlington, TX. ISKM Laboratory of Information Management, National Chi-Nan University, Pu-Li, Nan-Tau, Taiwan (2002), ‘‘Semi-structured knowledge’’, available at: http://iskmlab.im.ncnu.edu.tw/SSK. VOL. 7 NO. 4 2003 | JOURNAL OF KNOWLEDGE MANAGEMENT | PAGE 123
© Copyright 2026 Paperzz