download

!
! "
#$
www.scianta.com
Business-to-Business (B2B) opportunities in the Internet world establish lines of communication between
business peers on the Web. Capitalizing on the opportunities in this world means leveraging knowledge and
marketplace intelligence at ever accelerating rates. This has driven corporations into a race to build and use
advanced computational models derived from sophisticated data mining and machine learning capabilities.
In this article we explore two important kinds of computational intelligence models: fuzzy and neural
systems. In particular, we look at how they fit into the emerging distributed data warehouse and data mart
architectures that are forming the core knowledge repositories for companies doing business across the
Internet.
The Age of Distributed Knowledge
To almost no one’s surprise the Internet has transformed our perspective on the nature and the availability
of knowledge. Perhaps nowhere has this change been more keenly felt than in the corridors of business.
Modern Chief Technology and Chief Information Officers are struggling to fuse seemly diametrically
opposed data management objectives: assurance, availability, and integrity. At issue is the control of a
corporation’s intellectual property – which, as we move well into the next millennium, threatens to become
the sine qua non of an organization’s robustness. And, as Figure 1 illustrates, corporate knowledge assets
are no longer isolated behind the glass walls of the computer room.
Internet
Intranet
Corporation
Knowledge
Assets
Local
Networks
%& '
( )
))
Corporate knowledge is distributed, often in an apparently random manner, throughout the organization.
Internet, intranet, and local networks provide users with nearly immediate access to on-line client, human
resource repositories, and departmental or division level financials. An ever increasing spectrum of users
have access corporate web sites. Understanding, maintaining, and managing this sea of data often taxes the
1
capabilities (and sometimes the vision) of the corporate data processing division. Yet, a failure to account
for and integrate a corporation’s knowledge assets will become a critical competitive and survival issue in
the very near future.
Already corporate information officers have learned that their future is intimately tied to their past
as formal data modeling, data mining, and knowledge discovery processes have become critical
components in such traditional business activities as risk assessment, customer profitability analysis,
budgeting, and new product positioning. These formal methodologies, aimed at creating and maintaining a
competitive edge, rely on the security and integrity of information. A consideration of these objectives has
seen the rapid rise of the centralized data warehouse. But, as corporations and government agencies steadily
move toward a consolidation of their data assets in data warehouse and data mart architectures, the simple
availability of vast amounts of readily accessible data coupled with emerging giga-hertz desk top
computers will drive an accelerated push toward deeper and broader forms of analysis. Conventional offline data mining using historical data will give way to high speed Online Analytical Processing (OLAP)
engines that dynamically integrate history with the Online Data Store (ODS).
Perhaps a larger problem facing management is the synthesis of information into an adaptable
knowledge base. Using this knowledge base, an organization can construct and connect an entire suite of
cooperating, synergistic business process models. These models share information and support their
conclusions through an accumulation of evidence only possible when they have access to the company’s
complete information framework. How to build these models and what technology should be used are
common themes in distributed data warehouse projects. In this article we look at an approach that combines
several computational intelligence techniques. Fundamentally we look at the use of fuzzy rule induction to
build business process models and the use of self-organizing neural networks to create text exploration
models.
Fuzzy Logic for Business Process Models
In previous articles we have looked as various aspects of fuzzy logic used in such areas as
adaptive expert systems and case based reasoning. But in a more focused and structured approach to model
building in the Age of Distributed Knowledge, fuzzy systems provide the means of combining, weighing,
and using multiple competing experts. Often these experts are not people but other knowledge sources
(such as other business models and expert systems.) Figure 2 as an example (somewhat simplified), shows
how several models work cooperatively in a distributed environment.
2
Price
Inventory
Model
Required
On
Hand
Sales
Forecast
Model
Pricing
Model
Units
sold
Product
Budgeting
Inventory
Network
Portal
Customers
Sales
Figure 2. Multiple Business Process Models
In a distributed environment, multiple experts compete for attention, either as peers in the decision
process or as components of a larger decision making model. The ability of fuzzy models to easily
incorporate evidence from several expert source (as well as assign degrees of credibility to each source)
makes them an ideal vehicle for building shared decision models in the distributed data warehouse and data
mart environment. As an example, Figure 3 shows the product pricing model and the various distributed
sources of information.
3
Our Price Must Be Low
Our Price Must Be HIgh
Our Price should Around 2*MfgCosts
if Competition.Price is Not Very High
then Our Price should be Near the Competiton Price
Pricing
Model
Network
Portal
Network
Portal
Manufacturing
Sales
Inventory
Customers
Figure 3. The Product Pricing Model
In Figure 3 the bold part of the business rules represent fuzzy sets. These fuzzy sets are combined
under the methods of fuzzy composition. Combining rules in this way accumulates evidence for the final
product pricing position. Now, because fuzzy models are adept at handling evidence form many sources,
they provide a flexible and powerful method of modeling distributed processes. These are typically the
kinds of processes we find at the data warehouse and data mart level in Internet-centered organizations.
Evolving Distributed Fuzzy Models
Due to the way fuzzy sets can represent approximate patterns, fuzzy models are especially easy to
evolve use knowledge discovery or data mining techniques. Rule indication – the isolation and extraction
of if-then rules from large databases – provides a way creating a prototypical fuzzy model from patterns
occurring naturally in the data. Figure 4 illustrates how this step is used to combine knowledge from
several distributed data sources.
4
Rules
Fuzzy Models
Rule
Induction
(Data Mining)
Network
Portal
Network
Portal
Manufacturing
Sales
Inventory
Customers
Figure 4. Building a Fuzzy Model with Rule Induction
By incorporating a rule induction step in your business models, you can maintain currency with
the external world. As the long term behavior of your customers changes, the model can adapt to those
changes by discovering new rules reflecting a shift in buying habits (as one example). Rule induction, of
course, need not be viewed as a complete model building technique; rather, it can be used to “prime the
pump” so to speak when you have many experts contributing knowledge to a set of models.
Self-Organizing Maps and Textual Models
As corporations and government agencies move toward a complete automation of the knowledge
discovery and modeling process they are increasingly concerned with incorporating vast quantities of
textual material. This is especially true in the public sector where agencies are trying to extract knowledge
from years of grant initiatives covering such diverse areas as breast cancer research, environmental
monitoring, toxic waste disposal, genetic engineering, physiological warfare problems (such as the Gulf
War syndrome), and blue water resource assessments. Even private corporations are turning to text mining
and text content analysis as a way to analyze customer retention, problem resolution, and product warranty
costs through scripts recorded from help desk and field agent conversations. Data Warehouses and Data
Marts also benefit from a text analysis component – they serve as important barometers of customer health
and service levels.
Like fuzzy systems, text models must be adaptive and flexible. A comprehensive but easy to build
and use approach to text analysis involves self-organizing map (SOM), an unsupervised type of neural
networks. These are also known as Kohonen nets (after their inventor, Teuvo Kohonen, a professor at the
5
Helsinki University of Technology in Espoo, Finland.). As illustrated in Figure 5, the neurons of a
Kohonen net are connected in all directions.
The weights on the connection edges indicate how strongly adjacent neurons are related. A selforganizing neural map is not trained like the more familiar back propagation neural networks. Instead, the
relationships between neurons are learned from patterns that exist in the data. A nearest-neighbor algorithm
is used to reinforce weights on the connections between each neuron. In this way, self-organizing maps are
similar to the induced rules in a fuzzy model. Inducing the rules for a self-organizing map involves
presenting the map with patterns form the data. For textual data this generally means removing noise words
(such as articles), plurals, prepositions, and similar language artifacts. Vectors of related concept
semaphores are used to create related domains in the self-organizing map. From these related domains, we
can find related concepts.
Combined into a set of analytical tools, fuzzy systems and self-organizing neural networks provide
the knowledge engineer and business analyst with a rich set of modeling tools. In particular, Kohonen Nets
provide data warehouse and data mart designers, builders, and users with the technology for exploring nonnumeric data in a way that can illuminate deeply buried as well as space patterns. These tools become very
important for organizations who want to explore the relationship between customers and service agents
(running help desks, call centers, general support, and related client support activities).
&*
&
)
+
)&
&+
6
! "
#$
) ''')
+& +
, ##$ + & &
-"
./"
##!
(&+ ..
7