is design using analogical generation

Decision Sciences Institute 2002 Annual Meeting Proceedings
IS DESIGN USING ANALOGICAL GENERATION
Dan Zhu
Department of Logistics, Operations and MIS
300 Carver Hall
Iowa State University
Ames, IA 50011
515-294-5041
515-294-2534 (Fax)
[email protected]
Chang Zhang
Microsoft Corp.
Dongrong Xu
Center for Biomedical Image Computing
Johns Hopkins University
Abstract: This paper presents a framework for using analogical reasoning and generation to automate the process of
designing forms and reports. We begin by separating successful cases into different regions, and then storing them in a
knowledge base that allows future retrieval based on identity and similarity. The target can be obtained through a three-layered
structure matching and analogical reasoning process.
1. INTRODUCTION
Information systems development is a complex process, which requires a clear understanding of the organization
or enterprise being modeled and systematic skills in developing computer-based systems. This research aims to
use an analogical reasoning approach to provide design strategies for forms and report design in IS project
development. Analogy forms a significant part of our cognitive endeavors. Analogical reasoning occurs when we
evaluate and classify objects according to their similarities. Significant research has been conducted on the
important role of the comparative process in human decision-making (Markman and Medin 1995, Medin,
Godstone, & Gentner 1995; Simonson & Tversky 1992). This research focuses on the conceptual design phase of
information development. Conceptual design phase is one of the most critical phases and has been viewed as a
predominantly creative process (French 1971). This is where analogy can play an important role. Research on
creative design has explored the use of analogies in proposing solutions to design problems in the design process's
conceptual phase (Goel 1997). This conceptual design phase is also the phase which makes most demands on the
designers and which offers the greatest scope for improvements (French 1971). In general, system designers learn
through experience, with guidance from proven methods. As they seek out additional information, they also begin
to formulate alternative solutions to the problem. Devising solutions leads to a search for more information, which
in turn leads to improvements in the alternatives.
More specifically, we focus on the design of the database relations. The design of database relations (tables)
is probably the most important part of the job, generally requiring 30-40% of the total time and effort spent on the
project. The tables must be designed according to certain criteria, but more importantly, they must be convenient
for users to operate, allowing them to efficiently retrieve information from the database. Often, different tables
share the same resources and thus may be similar in many ways. Despite these similarities, tables are designed
manually and individually, which can be a mundane and routine task that is both inefficient and subject to a wide
margin of error. An alternative method is to collect examples and separate them into elements with descriptions
attached. When a new task is given, the descriptions can be searched to determine if any of the previous
information can be used in the new table. If so, data corresponding to each item in the table can be obtained
directly from the system and inserted into a new table. This can potentially automate the design process by
allowing designers to work from existing tables rather than constantly designing new ones from scratch (Poulin
1995).
The rest of the paper is organized as follows. Section 2 introduces the theory of analogical reasoning and
203
multi-source analogical generation. Section 3 presents the new representation schema of forms as matrices.
Section 4 describes the process of applying multi-source analogical generation to the design of tables and forms.
Section 5 uses an example to illustrate our approach. Section 6 concludes the paper with some directions for
future research.
2. BACKGROUND
A typical system development life cycle includes project identification and selection, project initiation
and planning, analysis, logical and physical design, implementation, and maintenance (McFadden et al. 2001).
Given the problem specification, people will need to analyze the problem and determine the ways to solve it.
Simon (Simon 1960) proposed a problem-solving model that lends significant sights into how people solve certain
types of problems. The model includes four phases for analyzing and solving problems. These phases are
intelligence, design, choice and implementation. First of all, all relevant information is collected during the
intelligence phase. Following that, several alternatives are formulated during the design phase. The best
alternative solution is then chosen during the choice phase. Finally, the solution is put into practice during the
implementation stage.
The development of information systems can be broken down into many activities. In the conceptual
design phase when system's requirements and constraints have been identified and prioritize, people may develop
several alternative design strategies for the organization's information system problem. No matter which design
approach is adopted, a successful design that satisfies all the requirements within one step is impractical if
impossible. It is usually necessary to subdivide the overall job into smaller tasks. In designing problems for large
organizations, the decomposition is often done through the separate consideration based on different user views. A
user view is the set of requirements that is necessary to support the operations of a particular user. Therefore, an
information level design involves representing each individual user views, refining them and then merging them
into a cumulative design. When given a user view or some sort of stated requirement, we can develop a collection
of tables that will support it. People will find that the more designs they have done, the easier it will be for them to
develop such a collection without resorting to any special procedure.
Tables are a major component of information systems. End users access and manage information by
manipulating and querying the data contained in the tables. This paper explores the process of table design and
investigates using a structured matrix to access the knowledge embedded in the tables. First, a table is represented
as a matrix, and its element can be either a data item or a matrix. Therefore, a table is treated as a general matrix
while its elements are sub-matrices. In this way, a table is represented as a multi-layered matrix. There are usually
up to two layers and the 2-layered matrix structure can be expressed by a 3-layered structure, like a tree with
many branches. Every branch of the tree has attributes that are the data items constructing the basic structure of
the matrix. The root of the tree represents a table and its attributes are the name of the table, the number of its submatrices, and the number of different matrices, respectively. The leaves of the tree are the data items, and their
attributes are the size of the data item, the number of lines in the matrix of the specific data item, and the number
of columns in the matrix of the specific data item. The intermediate nodes of the tree stand for every sub-matrix,
and the attributes of the nodes are the number of data items and sub-matrices that belong to the current sub-matrix,
the feature of the sub-matrix, and the position marked by its column and row location taking the sum-matrix as the
unit. Figure 1 illustrates the tree analogy by showing a table that has been divided into modules. Table A can be
considered a 3 by 2 matrix consisting of 4 sub-matrices: A1, A2, A3 and A4:
 A1 A 2 

A=  0
A 3



0 
 A4
a211 a213
a111 a112
a221
a121 a122 a123
a131 a132
a311
a321 a322 a323
a331 a332
a411 a412 a413
a421 a422 a423
a431 a432 a433
204
Figure 1. An Example Matrix
Therefore,
0 
 a 1 11 a 1 12
A1=  a 1 21 a 1 22 a 1 23




0 
 a 1 31 a 1 32
In this case, A4 consists of records with the same three fields. In the sense of the whole table, it is obtained
by reading the database more than once. When a record occupies one row, records line horizontally and form
rows. When a certain number of records are combined into one column, they will be arranged vertically and more
than one row and column are formed. In either vertical or horizontal direction, the 4 sub-matrices cannot be
aligned; each is read from the database once and only once. The original matrix can be rewritten in the format of a
3-layered tree, as shown in Figure 2.
There may exist other ways to combine the tables. However, the basic hypothesis suggests that a successful
case in the knowledge base is attributable to a successful general form, which mainly refers to the relative location
of the sub-sections. Therefore, if we can find a similar case in the knowledge base, the result should be considered
acceptable and other solutions should be discarded. In such a way, many designed tables are collected and
recorded as knowledge based on the structured representation. When the task of designing a new table is given,
similar cases can be retrieved from the knowledge base and new solutions can be obtained by analogical reasoning
on the “successful experience.” The objective of retrieval is to find trees with layer-structures matching the one
that follows, when it comes to the level of description.
3. METHODOLOGY
Analogy is an important aspect of human learning and thinking. Analogical reasoning occurs when people recall
knowledge from a previous problem and relate it to a new problem (Mayer 1991). Research on analogical problem
solving is rooted in cognitive psychology (Carbonell 1983, Gentner 1983; Holyoak and Thagard 1989 & 1995).
Analogies have been used to enhance students' abilities to model and solve, for example, algebra and geometry
problems (Weaver & Kintsch 1992; Lovett and Anderson 1994, Chee 1993). Several models of an analogy
machine have been developed with some limitations. For example, analogy has looser constraints, thus the final
result cannot be guaranteed to be true for all the cases.
Analogy is one of the most important human abilities and it is the kernel of human intelligence. Gentner's
theory of structure mapping laid the foundation for analogical reasoning. However, the theory is limited by the fact
that it requires detailed knowledge and understanding of the objects, plans, and targets as well as an accurate
description and expression of the targets. Therefore, useful and satisfactory results can only be obtained when
sources and targets are greatly similar, such as the model of the atom and the solar system. Analogical problem
solving involves retrieving a source that is similar to the target problem and then subsequently using its solution to
solve the target problem. One of the keys to successful analogical reasoning lies in ignoring the similarities or
dissimilarities in the surface features of the problems but recognizing analogies in their structures (Mayer 1991).
Research on analogy and design has focused primarily on mental models of design (Bhatta & Goel 1996, Goel
1997). The type of knowledge they capture characterizes mental models. One of the most common mental models is
the structure-behavior-function model. This model represents the structure of a design in some object-attributerelation ontology, representing its internal causal behavior as well as its functions.
In this paper, we propose the concept of multi-source analogy, which releases the traditional analogy from
the limitation of presumed restriction, thus allowing its newly added sources to be mapped to a more extended
target field. The similarity calculation based on multi-source analogy can thus be applied to a variety of domains.
In order to apply analogical reasoning in the field of design to produce a new result, we should have no
presumption of similarity. A less rigid definition of similarity is needed. Because AR is not a restrictive reasoning,
its conclusion is not necessarily true. Therefore, it is important to verify its outcome. However, the fact that it is
not always true complicates the verification process. Since the method presented in this article uses the same
structure in the final result and in its sources, we suggest using a separate set of evaluating standards on each part,
205
and then obtaining a generally evaluated score. If the score is too low, the process is guided back into the redesign
loop. The design if considered complete when the score is sufficient.
4.
DESIGN FRAMEWORK
The next problem is how to create a new table/form according to the given restrictions from a group of
retrieved cases. The task involves synthesizing more than one analog to a result, there must be conflicts existing in
the source cases. Simply put, only two original tables (Table A and Table B) are considered; both tables consist of
two sub-matrices. Now let us take A1 out of Table A and B2 out of Table B, and re-combine them into a new table
-- Table C. According to the analogical reasoning theory, this is an example of 2-source analogical generation
(AG). Before Table C was created, analogical reasoning was used to revise A1 and B2 to A1’ and B2’. Therefore
the resulting Table C is similar to both Table A and Table B.
A1
F O R M -A
F O R M -B
B1
A2
B2
M : A1 →
A 1'
M :B
2
→
B
'
2
F O R M -C
Figure 2. An example of Analogical Generation (AG)
Since Table A and Table B already exist, it would be meaningless to design Table C so that it duplicates A or
B. However, Table C is different from either A or B. On the other hand, C is from A and B, the design makes
sense, and only C is new. In order to make the idea of AG practical and operable, the measure of similarity has to
be set in order to calculate the procedures of retrieving, matching, and reasoning. A mapping in analogy reasoning
is: m: s/t. In this case, it should be revised as m (p%):s/t where s stands for the source and t for the target while p%
is the similarity between s and t. Since an object consists of many parts with various similarities between them,
another parameter is needed to determine the influence of each part. That is to say, what distinguishes one part
from another part or how easily the object is recognized if only one specific part is known. Denote identity of the
j-th sub-part as Ij, then:
N
∑
Ij = 1
j =1
We can see that the more parts an object has, the more difficult it will be for a sub-object to represent it. If
the sub-part vector of the identity of A and B are (a1, a2…an) and (b1, b2… bn), respectively, and the vector of
similarity is (s1, s2…Sn), then the general similarity between A and B is defined as:
n
S = w1s1 + w2 s2 +...+ wn sn = ∑ wi si
i =1
where ω is weight and
n
∑w
i =1
i
= 1 and wi =
ai bi
n
∑a b
j =1
j
j
As a result:
206
S =
a 1 b1
n
∑
s1 +
a i bi
i =1
a 2 b2
n
∑
i =1
a i bi
s2 + L L
a n bn
n
∑
i =1
a i bi
sn =
n
∑
i =1
a i bi si
n
∑
a jb j
j =1
Expanding the situation discussed above to general cases, we should consider the followings: (1) a generated
table can have more than two analogues; (2) the "de-parting" of each table may not be necessarily the same (e.g.
some have two parts while others may have three or more parts). (3) A resulting table may inherit different subparts from different analogues, and it may be possible for it to inherit a synthesis sub-result from the
corresponding parts of different analogues. And the concepts are expanded to similarity matrix and identity matrix
(both of n by m).
When a new table is designed, the cases stored as analogues in the knowledge base should be removed apriori.
In other words, cut the root of the 3-layer tree to cancel the high-order constraints between the source and the target,
making the sub-matrix a sub-part of the table. Thus, each sub-matrix of the target can be compared with all the submatrices of every table. This means that all of the tables can act as one analog of the sub-part. This is a multi-source
analogy and it is necessary to choose the matrix most similar to the target matrix. This is a problem of similarity
calculation. Each data item is considered a sub-part, and all of their identities are considered identical, and the most
matched source sub-matrixes can be retrieved; thus a map from the source sub-matrix to the target sub-matrix is
formed, and the target sub-matrices are achieved.
Subsequently, these target sub-matrices need to be reunited as an integrated table. This reunification task is also
performed by analogical generation. This time the goal is to find a table in the source base that includes matrices that
are almost the same style as the target sub-matrix. We believe that all of the sub-matrices here possess the same
identity. Therefore, the objective is to find a table in the base that has the greatest sum of similarity of the
corresponding parts. After finding the table, all of the target sub-parts are arranged into the retrieved table the same
way that the retrieved one does. It is notable that only the general formation rather than its content is concerned.
Finally, since analogical reasoning is not 100 percent accurate, the results must be reviewed and validated by a
human user.
Analogical reasoning techniques can be a tremendous help in automating the design of information systems.
We tested our theory of multi-source analogical generation on an inventory order processing system and have found
tremendous success.
5. CONCLUSION
Designing an information system can be a time-consuming task. In this paper, we propose a novel approach of multisource analogical generation based on analogical reasoning to facilitate information system design and development.
We demonstrate the successful uses of the theory in designing tables and forms in databases. Traditional analogy
reasoning is a one-source two-part analogy course and the operation applied on it is binary, which does not
accurately reflect the human thought process. Multi-source analogies use analogical reasoning to innovate the
process of simple repetition. This allows creative and practical design of databases with few constraints. The
methodology presented in this paper is helpful to database designers, especially novices, and can be generalized to
other domains as well. It makes it possible for a beginner to use the rich experience stored in the knowledge base and
the efficiency of the system can thus be improved with the progress of the system being used.
References
Bhatta, S. and Goel, A. From Design Experiences to Generic Mechanisms: Model-Based Learning in Analogical
Design, Artificial Intelligence in Engineering Design, Analysis, and Manufacturing, special issue on machine
learning in design, Vol. 10, 1996, 131-136.
Carbonell, J. Learning By Analogy: Formulating and Generalizing Plans form Past Experience, in Machine
Learning: An Artificial Intelligence Approach, R. Michalski, J. Carbonell and T. Mitchell, eds., Tioga, Palo Alto,
207
California, 1983.
Chee,Y.S., Applying Gentner's theory of analogy to the teaching of computer programming, Int. J. Man-Machine
Studies, 38, 1993, 347-368
French, MJ. Engineering Design: The Conceptual Stage. Heinemann Educational Books, London, 1971.
Gentner, D. Structure-mapping: A Theoretical framework for analogy. Cognitive Science, 1983, 7, 155-170.
Gentner, D., & Markman, B. A. Structural Alignment in Comparison: No difference without similarity.
Psychological Science, 1994, 5, 152-158.
Gentner, D., Rattermann, M. J., & Forbus, D. D. The roles of similarity in transfer: Separating retrievability from
inferential soundness, Cognitive Psychology, 25, 1993, 524-575.
Goel, A. "Design, Analogy and Creativity", IEEE Expert, Vol. 12, no. 3, May/June, 1997, 62-70.
Holyoak, K. J. and P. Thagard. A computational model of analogical problem solving, Similarity and Analogical
Reasoning, Cambridge University Press, Cambridge, England, S. Vosniadou, A. Ortony, eds. 1989, 242-266.
Holyoak KJ and P. Thagard, Mental Leaps: Analogy in Creative Thought, MIT Press, Cambridge, MA, London,
England, 1995.
Lung, CH. and Urban, J.E., An Approach to the Classification of Domain Models in Support of Analogical Reuse,
ACM, Software Engineering Notes, 1995, ACM No.595950
Lovett, C.M. and Anderson, J.R. Effects of solving related proofs on memory and transfer in geometry problem
solving. Journal of Experimental Psychology: Learning, Memory and Cognition, 1994, 20 (2), 366-378.
McFadde, F. R., J. A. Hoffer, and M.B. Prescott, Modern Database Management. 6th edition. Reading, MA:
Addision Wesley Longman, 2001.
Markman, A.B. & Gentner, D. Structrual alignment during similarity comparisons. Cognitive Psychology, 1993,
25, 431-467.
Markman, A. B. & Medin, D. L. Similarity and Alignment in Choice. Organizational Behavior and Human
Decision Processes, 1995, 63, 117-130.
Mayer, RE. Thinking, Problem Solving, Cognition. Second Edition. W. H. Freeman and Company, New York.
1991.
Medin, DL, R. L. Goldstone and D. Gentner, Comparison and Choice: Relations between similarity processes and
decision processes. Psychological Review, 2, 1995, 1-19.
Mukhopadhyay, D., Dalezman, B. Designing Open System with CASE, Information System Management, Vol.12
, No.1,Winter 1995
Poulin, JS, Populating Software Repositories: Incentives and Domain-Specific Software, The Journal of Systems
and Software, Vol. 30, No.3, September 1995
Simon, H. A. 1960. The New Science of Management Decision, New York: Harper & Row.
Weaver, C.A. and Kintsch, W. Enhancing Students' Comprehension of the Conceptual Structure of Algebra Word
Problems. Journal of Educational Psychology, 1992, 84 (4), 419-428.
208