DB2 Best Practices

IBM® InfoSphere Information Server
Best Practices
Topology Design
Martin Breining
Software Architect, Information Server
Thomas Cherel
Software Architect, Information Server
Jean-Claude Mamou
STSM and Program Director, Information Server
®
Topology Design
Page 2
Background....................................................................................................................... 3
Executive Summary......................................................................................................... 3
Foreword ........................................................................................................................... 3
Assess your needs ............................................................................................................ 3
High availability ........................................................................................................ 4
Performance and scalability..................................................................................... 5
Ease of installation and manageability (aka simplicity) ...................................... 8
Assess your means........................................................................................................... 8
Hardware.................................................................................................................... 8
Technical expertise .................................................................................................... 8
Start shopping .................................................................................................................. 9
Basic topologies ....................................................................................................... 10
Single machine .....................................................................................................10
Dedicated Engine machine.................................................................................11
Dedicated machine for each tier ........................................................................12
Parallel Processing and Grid Engine Topologies................................................ 13
All tiers collocated, including the Engine conductor node............................13
Dedicated machine for Engine conductor node ..............................................14
Basic High Availability Topologies ..................................................................... 16
All tiers collocated ...............................................................................................16
Dedicated Engine machine.................................................................................17
Parallel processing and grid Engine configurations.......................................19
Advanced Topologies ............................................................................................. 20
Highly scalable Services tier ..............................................................................20
Highly scalable Services tier with parallel processing and grid Engine
configurations ......................................................................................................22
Advanced high availability ................................................................................23
Advanced high availability with parallel processing and grid Engine
configurations ......................................................................................................24
Conclusion ...................................................................................................................... 27
Appendix: Topology scorecards .................................................................................. 27
Further reading .............................................................................................................. 30
Contributors ............................................................................................................. 30
Notices ............................................................................................................................. 31
Trademarks .............................................................................................................. 32
Topology Design
Page 3
Background
A successful deployment of IBM InfoSphere Information Server must include a topology that meets
customer expectations, such as performance, ease of maintenance, security, high availability, and
scalability. Determining the optimal deployment is a complex task that requires a firm understanding of
customer requirements and hands-on deployment experience with InfoSphere Information Server. What
works well for one customer might be a poor choice for others.
Customers are trending toward an increased number of topologies, which further complicates the process
of identifying a suitable topology. For example, InfoSphere Information Server, Version 8.5 introduces
various high availability solutions ranging from a basic active/passive failover cluster configuration to
more advanced high availability solutions such as IBM WebSphere Application Server Network
Deployment and IBM DB2 HADR.
Summary
This document provides a blueprint on how to design an ideal topology for InfoSphere Information
Server based on a set of available resources (such as hardware and skills) and a set of functional
requirements (such as high availability, scalability, and simplicity). Each of these variables represents
different dimensions of a topology. A change in any of these dimensions can greatly impact the resulting
topology, so identifying and quantifying these dimensions is important to remain within set constraints
while meeting expectations. After these dimensions are defined, a particular topology, or perhaps a
family of topologies, is expected to emerge almost naturally when following the guidance that is outlined
in this document.
Dimensions often compete with one another. For example, scalability is rarely associated to a single
computer environment and high availability and simplicity are typically opposites. As a general rule, the
greater the functional requirements, the greater the cost and the more sophisticated the topology.
Therefore, do not expect to achieve the highest levels of performance, high availability, scalability and
simplicity at a minimal cost.
Foreword
This document assumes that the reader has a solid understanding of the overall InfoSphere Information
Server architecture and the various tiers (Metadata Repository, Services, Engine and Client). See
‘Understanding tiers and components’ in Further reading for more information.
Assess your needs
The first step is to evaluate the overall system expectations or requirements, mainly in terms of high
availability, performance and scalability, and simplicity. Additional requirements might need to be
clarified (such as security), but the aforementioned requirements dictate the core of the design.
Remain conservative when assessing your functional requirements. Every requirement that is added to
the system has a cost, so if the requirement is not currently needed, do not factor it into the design.
Topology Design
Page 4
The following questions are important to consider when quantifying each requirement:
-
If high availability is a requirement, what is the level of downtime that can be tolerated? Is 20
minutes acceptable?
If scalability is a requirement, how many concurrent clients or data integration jobs need to be
serviced at any given time? In the one digit range? Two or three digits range?
If simplicity is a requirement, how do you define simplicity? Is running and maintaining
InfoSphere Information Server on an environment with two or three computers considered
simple?
After assessing your requirements, conduct capacity planning to ensure that your
anticipated needs can be accommodated. Make sure to size your memory, disk space,
and database needs.
High availability
High availability refers to the level of operational performance that is ensured by the system during a
specific time period. InfoSphere Information Server provides several high availability solutions, each
with varying degrees of guaranteed availability and applicability. The tiers in the InfoSphere Information
Server suite (Metadata Repository, Services, and Engine tiers) are based on different technologies, so not
all high availability solutions can be applied to all tiers.
Consider high availability as a suite requirement rather than a tier requirement for a given environment
type (development, test, or production). If InfoSphere Information Server must be highly available, then
all tiers must be configured with a high availability solution. Otherwise, if one tier is not configured for
high availability, then that tier becomes a single point of failure for the overall system, which defeats the
purpose of configuring the other tiers for high availability. Again, this configuration applies for a given
environment type only: it is perfectly acceptable to have a development environment with no high
availability capabilities and a production environment that is configured for high availability.
-
-
Active/passive failover cluster – This solution requires a shared file system (where the
InfoSphere Information Server binaries are installed), two separate computers (one active, one
passive), a virtual IP address (floating between the active and passive computers) and a high
availability software that manages the entire failover process. InfoSphere Information Server is
initially started on the primary server. When this server fails, the high availability software
triggers a failover of the InfoSphere Information Server processes to the backup server, which
effectively becomes the active server. In an InfoSphere Information Server environment, the
failover process can take up to 20 minutes to complete, so some downtime must be tolerable. IBM
HACMP and Tivoli System Automation are examples of high availability software supported for
this solution. This solution can be applied to all of the InfoSphere Information Server tiers.
IBM WebSphere Application Server Network Deployment – This solution is typically more
costly (hardware, skills) than the previous solution but offers a greater level of high availability.
This solution provides an active/active N-to-N cluster (see the ‘High Availability’ reference for
more information) for services that are running in WebSphere Application Server. In most cases,
automatic and transparent failover can be achieved by the WebSphere Application Server,
guaranteeing no downtime. This solution can only be applied to the Services tier of InfoSphere
Information Server.
Topology Design
-
Page 5
IBM DB2 HADR with Automatic Client Reroute – This solution turns an IBM DB2 database into
a highly efficient active/passive cluster (or active/standby cluster) where data replication between
a primary and a standby database is performed periodically to protect against data loss. Several
data replication settings exist (synchronous, near-synchronous, and asynchronous) depending on
the amount of data loss that can be tolerated versus the level of performance that is required.
Combined with the Automatic Client Reroute facility, a DB2 HADR database provides a greater
level of high availability than the basic active/passive failover cluster because the standby
database is already running, and the Automatic Client Reroute facility ensures a transparent and
speedy failover. This solution can only be applied to the Metadata Repository tier of InfoSphere
Information Server.
Note: InfoSphere Information Server also supports Oracle Real Application Clusters (RAC),
which are more of an active/active cluster of Oracle RDBMS databases that share a single
database. The Oracle RAC is a very different architecture than IBM DB2 HADR, but the level of
high availability achieved is comparable.
The following table summarizes the main characteristics of the high availability solutions that are
supported by InfoSphere Information Server.
Table 1: High availability solutions supported by InfoSphere Information Server
Solution
Failover
Active/passive failover
Automatic but slow (downtime
must be tolerated)
Automatic and instantaneous
(no downtime)
IBM WebSphere
Application Server
Network Deployment
IBM DB2 HADR with
Automatic Client Reroute
Automatic and instantaneous
(no downtime)
Level of high
availability
Basic
Applicability
All tiers
Advanced
Services tier
Advanced
Metadata
Repository tier
Performance and scalability
Performance refers to the rate at which work is being processed by the system. Scalability refers to the
system’s ability to handle growing amounts of work. In the context of InfoSphere Information Server, the
term work typically refers to data integration job designs, metadata management, and data integration job
executions.
Performance and scalability are treated equally in this document to simplify the topology discussion
without jeopardizing the end result.
Performance and scalability should be researched more thoroughly than other dimensions. For a given
topology, performance and scalability can vary greatly depending on how InfoSphere Information Server
is being used in conjunction with the suite products that are being installed (such as IBM InfoSphere
DataStage, IBM InfoSphere QualityStage, or IBM InfoSphere Information Analyzer) and the type and
scale of environment that you plan to run.
Topology Design
Page 6
A typical example is InfoSphere DataStage job design versus InfoSphere DataStage job execution. These
usages of InfoSphere Information Server are very different and stress the tiers in their own way, leading
to different topologies. The following table shows how the tiers are being stressed based on particular
InfoSphere Information Server usages.
Table 2: How InfoSphere Information Server tiers are stressed based on usage
Product
Engine Tier
Services Tier
IBM InfoSphere Business
Glossary
IBM InfoSphere Business
Glossary Anywhere
IBM InfoSphere Business
Glossary Browser
InfoSphere Information
Server Console
InfoSphere Information
Server Web Console
InfoSphere Metadata
Workbench
InfoSphere Asset
Interchange (istool)
InfoSphere DataStage or
InfoSphere QualityStage
Designer
InfoSphere FastTrack
InfoSphere Information
Analyzer Client
InfoSphere Information
Server Manager
InfoSphere Information
Services Director
InfoSphere DataStage or
InfoSphere QualityStage
Administrator
InfoSphere DataStage or
InfoSphere QualityStage
Director
InfoSphere DataStage Job
Execution
InfoSphere Information
Analyzer Execution (not
Reports)
InfoSphere Information
Services Director Execution
N/E (Not Exercised)
Medium
Metadata Repository
Tier
Medium
N/E
Medium
Medium
N/E
Medium
Medium
N/E
Heavy
Heavy
N/E
Medium
Medium
N/E
Heavy
Heavy
Light
Heavy
Heavy
Light
Heavy
Heavy
Light
Light
Medium
Medium
Medium
Medium
Light
Heavy
Heavy
Light
Heavy
Heavy
Light
Light
Light
Light
N/E
N/E
Heavy
Light
Light
Heavy
Light
Light
Heavy
Heavy
Light
Topology Design
Page 7
Typically, development environments consist of InfoSphere Information Server clients that interact with
InfoSphere Information Server to design data integration jobs and other artifacts (such as the InfoSphere
Datastage Designer) or manage metadata (such as the InfoSphere Business Glossary or InfoSphere
Metadata Workbench clients). These interactions tend to exercise the Services and Metadata Repository
tiers most. Test and production environments primarily run data integration jobs (such as InfoSphere
DataStage job executions) and tend to stress the Engine tier more.
Make sure that you understand how you intend to use InfoSphere Information Server so that you know
which tiers require a particular focus with respect to performance and scalability. The InfoSphere
Information Server suite includes at least ten different products, each stressing the various tiers
differently. An InfoSphere Information Server installation might include any of these products, so
numerous unique combinations exist. Use the previous table as a reference to help you understand and
quantify how each tier is stressed based on your particular usage of InfoSphere Information Server.
Additionally, ensure that you understand the scale of the InfoSphere Information Server environment
that you intend to deploy. The scale of an InfoSphere Information Server environment is defined by the
number of clients or job executions that are concurrently being serviced by InfoSphere Information
Server. Experience shows that environments typically fall between small and large, where small refers to
five or less concurrent clients or concurrent jobs and large refers to more than five concurrent clients or
concurrent jobs. Try to quantify the scale of your environment. For development environments, estimate
the number of concurrent clients that are expected to interact with InfoSphere Information Server at any
time. For test and production environments, size the number of jobs and other runtime artifacts that are
executed concurrently.
After you understand your performance and scalability requirements, you can determine an additional
performance or scalability boost. InfoSphere Information Server provides a few performance
optimization solutions that are targeted for environments where high levels of performance are expected,
each with varying degrees of applicability.
-
-
Parallel processing and grid configurations – This solution can only be applied to the Engine tier
of InfoSphere Information Server. Although architecturally different, both of these configurations
aim to maximize the performance and throughput of the Engine tier by distributing and
parallelizing job executions across multiple processors. Refer to ‘Parallel processing and grid
topologies’ in Further reading for more information on parallel processing and grid
configurations.
IBM WebSphere Application Server Network Deployment – Along with increasing availability,
this solution also improves scalability by spreading the work on multiple clustered WebSphere
Application Server instances as opposed to a single, standalone WebSphere Application Server
instance. This solution can only be applied to the Services tier of InfoSphere Information Server.
Table 3: Performance and scalability optimization solutions supported by InfoSphere Information
Server
Solution
Parallel processing and grid configurations
IBM WebSphere Application Server Network
Deployment
Benefits
Improved throughput
Improved scalability
Applicability
Engine tier
Services tier
Topology Design
Page 8
Ease of installation and manageability (a.k.a. simplicity)
Simplicity encapsulates ease of deployment, ease of configuration, ease of maintenance, and
serviceability. Simplicity refers to quickly deploying an InfoSphere Information Server installation and
easily maintaining it.
Simplicity might not be a first-class requirement because it is often assumed that customers want their
topology to be simple. However, the level of complexity generally increases as more focus is placed on
other dimensions such as high availability and scalability. A byproduct of increased complexity is
increased cost (hardware, skills, and time), so it is important to consider simplicity as one of the
dimensions that must be assessed when designing a topology for InfoSphere Information Server.
The simplest InfoSphere Information Server deployment includes all tiers running in standalone mode
(no high availability solution, no clustering, and no grid) on one single computer. This topology requires
minimal installation, configuration, and maintenance efforts because all components are collocated.
Anything that diverges from this configuration increases the overall level of complexity. Use this basic
InfoSphere Information Server configuration as a reference point whenever assessing the complexity of a
particular topology. Ensure that you can absorb any added complexity and its associated hidden costs.
Assess your means
Hard constraints must be addressed when designing a topology. You must ensure that adequate
resources are available, especially hardware resources and technical expertise. Both of these requirements
must be met to ensure that the topology can succeed.
Do not design a topology based on resources that are not yet accounted for. Plan your topology based on
the resources that are currently available rather than what might be available in the future. Adding
hardware to an existing topology might require a complete revision of the topology, and quickly
educating a team on high availability and scalability technologies is not a trivial task.
Hardware
Evaluate all available computers and ensure that they meet the system requirements for InfoSphere
Information Server. A computer refers to a physical computer, LPAR, or virtual environment. A single
InfoSphere Information Server deployment can require anywhere between one computer to dozens of
computers, depending mostly on the high availability, performance, and scalability needs of the
topology.
Technical expertise
Evaluate the technical expertise in areas such as high availability, scalability, and networking. Complex
InfoSphere Information Server deployments can require expert skills during the initial installation and
configuration phase and also during regular operational hours for miscellaneous maintenance and
administration tasks. Ensure that you understand these considerations before designing a complex
topology.
Topology Design
Page 9
Start shopping
After you understand the resources and requirements, you can scrutinize the supported InfoSphere
Information Server topologies and determine which ones fit best. InfoSphere Information Server supports
a wide range of topologies, and singling out a specific topology is not easy. However, this task is greatly
simplified when a thorough list of requirements and resources has been created.
The following section is not a comprehensive list of supported InfoSphere Information Server topologies.
Rather, this list highlights the most popular families of topologies that, over time, have proven to yield
the best results and are often recommended to customers by IBM consulting teams. If you cannot locate a
suitable topology in the following list, then you might have overlooked or misunderstood your
requirements during the assessment phase. In all instances, contact an IBM representative to validate
your topology before moving forward.
Topologies are grouped into four main families based on their architectural and requirements similarities.
ƒ
ƒ
ƒ
ƒ
Basic
Parallel processing and grid engine
Basic high availability
Advanced
Basic topologies are installations of InfoSphere Information Server that do not provide any high
availability capabilities, advanced job processing, or scalability optimizations. Basic topologies provide
the foundation on which the other topology families are built on and in many cases are sufficient. Parallel
processing and grid engine topologies allow InfoSphere Information Server to achieve higher levels of job
processing performance while the basic high availability topologies turn InfoSphere Information Server
into an entry-level highly available platform. The advanced topologies push InfoSphere Information
Server to its limits in areas such as performance, scalability, and high availability.
Within each topology family, a few popular topologies are described and rated based on the various
dimensions that are mentioned previously:
-
Zero (VERY POOR) - A dimension rated zero indicates that the topology is a very poor fit and
should not be considered if the dimension is part of your specifications list.
One (POOR) - A dimension rated one indicates that the topology is a poor fit and should only be
considered if the dimension is not high in your specifications list.
Two (GOOD) - A dimension rated two indicates that the topology is a good fit and should be
considered if the dimension is high in your specifications list.
Three (VERY GOOD) - A dimension rated three indicates that the topology is an excellent fit and
should be considered if the dimension is high in your specifications list.
Read through the list of topologies and see which fits your needs best. It is unrealistic to find a topology
that rates three (very good) on all of the dimensions that you are interested in (keep in mind that
dimensions often work against each other), so do your best to find a topology that successfully addresses
Topology Design
Page 10
the dimensions at the top of your specifications list but might not rate as high on other dimensions on
your specifications list.
Basic topologies
Single computer
Characteristics:
- Hardware: One computer
- Tier configuration: All tiers collocated on computer
Figure 1. Topology with all tiers collocated on one computer
As mentioned previously, this topology is the simplest because it requires minimal hardware and
minimal skills. All tiers are installed on the same computer, which further extends the simplicity of the
design. This layout ensures a speedy and straightforward installation and configuration phase while
minimizing maintenance work and increasing overall serviceability.
This simple yet powerful topology is ideal for demos and small scale environments where scalability is
not the main concern. This topology is also great for many other types of environments, especially
environments in which the Services and Metadata Repository tiers are not exercised much, so experiment
with this topology before jumping to more sophisticated ones. All runtime components run in standalone
mode (as opposed to cluster or grid environments), so this topology does not offer any high availability
capabilities.
Table 4: Dimensions for a topology with a single computer
Dimension
Minimal Hardware Required
Minimal Skills Required
High Availability
Rating
3
3
0
Topology Design
Simplicity
Performance and Scalability – Small Environments
Performance and Scalability – Large Environments
Page 11
3
3
1 (for environments that stress mostly the
Services and Metadata Repository tiers)
2 (for environments that stress mostly the
Engine tier)
Dedicated Engine computer
Characteristics:
- Hardware: Two computers
- Tier configuration:
o Engine tier on dedicated computer
o Services/Metadata Repository tiers collocated on another computer
Figure 2. Topology with tiers split between two computers, one dedicated for the Engine tier
Along with the Single computer topology, this topology is very effective when high availability is not a
requirement. Some simplicity is sacrificed for increased scalability on larger environments, making this
topology ideal for larger scale environments that have no high availability requirements.
Table 5: Dimensions for a topology with a dedicated engine computer
Dimension
Minimal Hardware Required
Minimal Skills Required
High Availability
Simplicity
Performance and Scalability – Small Environments
Performance and Scalability – Large Environments
Rating
2
3
0
2
3
2
Topology Design
Page 12
Dedicated computer for each tier
Characteristics:
- Hardware: Three computers
- Tier configuration: All tiers configured on dedicated computers
Figure 3. Topology with tiers split between three computers
This topology is typically not recommended because it does not offer any benefits compared to the
previous topologies. This topology increases the cost (hardware, complexity, and technical expertise) for
no tangible added value. This topology does not provide any high availability capabilities and ranks
below most other topologies in terms of performance and scalability.
In some instances, organizational constraints within a company (usually large companies) require that the
tiers be deployed on separate computers, perhaps managed by separate teams or system administrators,
making this topology a relatively popular one.
Table 6: Dimensions for a topology with a dedicated computer for each tier
Dimension
Minimal Hardware Required
Minimal Skills Required
High Availability
Simplicity
Performance and Scalability – Small Environments
Performance and Scalability – Large Environments
Rating
1
2
0
2
2 (for environments that stress mostly the
Services and Metadata Repository tiers)
3 (for environments that stress mostly the
Engine tier)
2
Topology Design
Page 13
Parallel Processing and Grid Engine Topologies
All tiers collocated, including the Engine conductor node
Characteristics:
Hardware: One computer with multiple processors for SMP configurations;
alternatively, two or more computers for MPP or cluster configurations
- Tier configuration:
o Engine tier configured for parallel or grid processing. The conductor node is
collocated with the other two tiers. Depending on whether a parallel processing
or grid configuration is configured, the compute nodes can be on the same
computer or dedicated computers. The following graphic depicts a parallel
processing configuration with the conductor and compute nodes on the same
computer.
o Services/Repository tiers on same computer as the Engine conductor node
-
Figure 4. Topology with all tiers on a single computer, plus conductor and compute nodes
This topology is an enhanced version of the Single computer topology in which the Engine tier is
configured as either a parallel processing or grid configuration. This topology maximizes the
performance and throughput of the Engine tier by distributing job executions across multiple processors.
For optimal performance, consider offloading the compute nodes to dedicated computers (cluster or
grid). Scalability and high availability are not improved with this type of configuration.
Topology Design
Page 14
This topology is an ideal choice for InfoSphere Information Server deployments that have strong
throughput requirements on job executions at the Engine tier level. That is, deployments where business
constraints mandate that a certain amount of jobs must complete within a particular timeframe.
Table 7: Dimensions for a topology with all tiers collocated, including the Engine conductor node
Dimension
Minimal Hardware Required
Minimal Skills Required
High Availability
Simplicity
Performance and Scalability – Small Environments
Performance and Scalability – Large Environments
Rating
1
1
0
0
3
1 (for environments that stress mostly the
Services and Metadata Repository tiers)
3 (for environments that stress mostly the
Engine tier)
Dedicated computer for Engine conductor node
Characteristics:
- Hardware: At least two computers
- Tier configuration:
o Engine tier configured for parallel or grid processing with the conductor node on a
dedicated computer. Depending on whether parallel processing or grid processing
is configured, the compute nodes can be on the same computer or dedicated
computers. The following graphic depicts a parallel processing configuration with
the conductor and compute nodes on the same computer.
o Services/Repository tiers collocated on another computer
Topology Design
Page 15
Figure 5. Topology with tiers split between two computers; one dedicated for the engine tier configured
for parallel or grid processing
This topology is an enhanced version of the Dedicated Engine computer topology where the Engine tier
is configured as either a parallel processing or grid configuration. As with the All tiers collocated,
including the Engine conductor node topology, this topology maximizes the performance and
throughput of the Engine tier by distributing job executions across multiple processors. For optimal
performance, consider offloading the compute nodes to dedicated computers (cluster or grid). This
topology maximizes the performance of the Services and Metadata Repository tiers by providing
dedicated hardware for these tiers. Scalability and high availability are not improved with this
configuration.
While the previous topology is ideal for InfoSphere Information Server deployments that have strong
performance requirements on the Engine Tier, this topology is ideal for InfoSphere Information Server
deployments with strong performance needs on all tiers and is typically better suited for larger scale
deployments.
Table 8: Dimensions for a topology with a dedicated computer for the Engine conductor node
Dimension
Minimal Hardware Required
Minimal Skills Required
High Availability
Simplicity
Performance and Scalability – Small Environments
Performance and Scalability – Large Environments
Rating
1
1
0
0
3
2 (for environments that stress mostly the
Services and Metadata Repository tier)
3 (for environments that stress mostly the
Engine tier)
Topology Design
Page 16
Basic High Availability Topologies
All tiers collocated
Characteristics:
- Hardware: Two computers plus one SAN
- Tier configuration: All tiers collocated on one computer
- High availability technology:
o Active/passive failover solution setup for all tiers. Failover is completed on second
computer.
Figure 6. Topology with all tiers on one computer, one computer for failover, and one SAN
The All tiers collocated topology is the entry-level topology for InfoSphere Information Server
deployments that have high availability requirements. This topology takes the Single computer topology
and complements it with the active/passive failover solution so that the tiers fail over to a secondary
computer if the primary computer fails.
Topology Design
Page 17
This topology is best suited for small scale environments that have a high availability need and can
tolerate a certain amount of downtime, because the passive computer must be booted if a failover occurs.
Among the small scale InfoSphere Information Server deployments, this topology is very popular and
rates high on the performance and high availability dimensions while considering the hardware, skills,
and complexity dimensions.
Table 9: Dimensions for a topology with all tiers collocated plus active/passive failover
Dimension
Minimal Hardware Required
Minimal Skills Required
High Availability
Simplicity
Performance and Scalability – Small Environments
Performance and Scalability – Large Environments
Rating
2
2
2
1
3
1 (for environments that stress mostly the
Services and Metadata Repository tiers)
2 (for environments that stress mostly the
Engine tier)
Dedicated Engine computer
Characteristics:
- Hardware: Two computers plus one SAN
- Tier configuration:
o Engine on dedicated computer
o Services/Metadata Repository tiers collocated on another computer
- High availability technology:
o Active/passive failover solution configured for the Engine tier on one side and
the Services/Metadata Repository tier on the other side. The computers fail over
to each other.
Topology Design
Page 18
Figure 7. Topology with one computer for the Services tier, one computer for the Engine tier, and one SAN
This topology builds on the Dedicated Engine computer topology from the ‘basic topologies’ family by
adding the active/passive failover solution. This topology is an alternative to the previous topology
because it offers the same levels of high availability without compromising the hardware, skills, and
simplicity dimensions. This topology works well for most environments because it allows the various
tiers to scale optimally. Performance and scalability might be impacted if a failover occurs because all
tiers are collocated and are competing for the same computer resources after a failover completes.
Table 10: Dimensions for a topology with a dedicated engine computer, plus active/passive failover
Dimension
Minimal Hardware Required
Minimal Skills Required
High Availability
Simplicity
Performance and Scalability – Small Environments
Performance and Scalability – Large Environments
Rating
2
2
2
1
3
2
Topology Design
Page 19
Parallel processing and grid Engine configurations
Characteristics:
- Hardware: At least two computers plus one SAN
- Tier configuration:
o Engine tier configured for parallel or grid processing. The conductor node is
collocated with the other two tiers. Depending on whether a parallel processing
or grid configuration is configured, the compute nodes can be on the same
computer or dedicated computers. The following graphic depicts a parallel
processing configuration with the conductor and compute nodes on different
computers.
o Services/Repository tiers on the same computer as the Engine conductor node.
- High availability technology:
o Active/passive failover solution configured for the Engine conductor node and
the Services/Metadata Repository tiers. The Engine compute nodes are not
configured for high availability.
Figure 8. Topology with one computer for the Services tier, one computer for the Engine tier, one
computer for failover, and one SAN
Topology Design
Page 20
You can configure this topology in various ways depending on how the tiers and Engine nodes are
configured. For example, you could place all tiers on the same computer, have a dedicated Engine tier
computer, and decide whether you want the Engine conductor node and compute nodes collocated. This
topology offers the combined benefits of the All tiers collocated solution and the All tiers collocated,
including the Engine conductor node solution.
This topology works well for InfoSphere Information Server deployments that require high efficiency at
the Engine tier and some level of high availability where some downtime can be tolerated. You should
not consider this topology if you do not expect the Engine tier to be exercised much. The downside to this
topology is an increased complexity and cost due to the hardware and technical expertise that is required
to configure and maintain this environment.
Table 11: Dimensions for a topology with parallel processing and grid Engine configurations
Dimension
Minimal Hardware Required
Minimal Skills Required
High Availability
Simplicity
Performance and Scalability – Small Environments
Performance and Scalability – Large Environments
Rating
0
0
2
0
3
1 (this is the worse case: all tiers, including
Engine compute nodes, are collocated)
2 or 3 (this is the best case: Services and
Metadata Repository are on dedicated
computer, Engine tier is on dedicated
computer with compute nodes also on
dedicated computers)
Advanced Topologies
Highly scalable Services tier
Characteristics:
- Hardware: Six or more computers
- Tier configuration: All tiers on dedicated computers
- High availability/scalability technology:
o WebSphere Application Server Network Deployment cluster configured for
the Services tier
Topology Design
Page 21
Figure 9. Topology with one computer for each tier plus a WebSphere Application Server Network
Deployment cluster configured for the Services tier
This topology is the first that introduces the WebSphere Application Server Network Deployment cluster
technology. In this setup, the Services tier is configured as a WebSphere Application Server Network
Deployment cluster, primarily as a means to increase the scalability of the Services tier rather than a
means to increase the overall availability of the InfoSphere Information Server installation. Because the
Engine and Metadata Repository tiers are not configured for high availability, this topology is not suited
for environments that have high availability needs. High availability should be configured for all tiers or
no tiers at all. This topology is suited for environments that exercise mostly the Services tier and have
strong scalability requirements for that tier.
The complexity and hardware cost associated with this topology are high. A minimum of six computers
are required:
-
-
One computer for the Engine tier
One computer for the Metadata Repository tier
Three or more computers for the Services tier, configured as a WebSphere Application Server
Network Deployment cluster (the WAS Deployment Manager and managed nodes are deployed
on dedicated computers)
One computer for the front-end dispatcher (typically a web server) that spreads requests to the
WebSphere Application Server Network Deployment cluster (this computer is not depicted in the
previous graphic)
Table 12: Dimensions for a topology with a highly scalable Services tier
Dimension
Minimal Hardware Required
Minimal Skills Required
High Availability
Simplicity
Performance and Scalability – Small Environments
Performance and Scalability – Large Environments
Rating
0
0
1 (only the Services tier is highly available)
0
3
2 (for environments that stress mostly the
Engine tier)
3 (for environments that stress mostly the
Services tier)
Topology Design
Page 22
Highly scalable Services tier with parallel processing and grid
Engine configurations
Characteristics:
- Hardware: Six or more computers
- Tier configuration:
o All tiers on dedicated computers
o Engine tier configured for parallel or grid processing with the conductor
node on a dedicated computer. Depending on whether parallel processing or
grid processing is configured, the compute nodes can be on the same
computer or dedicated computers.
- High availability/scalability technology:
o WebSphere Application Server Network Deployment cluster configured for
the Services tier
Figure 10. Topology with one computer for each tier, a WebSphere Application Server Network
Deployment cluster configured for the Services tier, plus the conductor and compute nodes on a dedicated
computer
This topology adds a parallel processing or grid Engine configuration to the Highly scalable Services tier
topology. The end result is a topology that optimizes throughput and scalability at the Engine and
Services tiers respectively, which is ideal for environments that have strong performance requirements.
This topology does not improve high availability, and the complexity, hardware cost, and amount of
technical expertise required are high.
Table 13: Dimensions for a topology with a highly scalable Services tier with parallel processing and
grid Engine configurations
Dimension
Minimal Hardware Required
Minimal Skills Required
Rating
0
0
Topology Design
High Availability
Simplicity
Performance and Scalability – Small Environments
Performance and Scalability – Large Environments
Page 23
1 (only the Services tier is highly available)
0
3
3
Advanced high availability
Characteristics:
- Hardware: Eight or more computers plus one SAN
- Tier configuration: All tiers on dedicated computers
- High availability technology:
o Active/passive failover solution configured for the Engine tier
o IBM DB2 HADR with Automatic Client Reroute configured for the Metadata
Repository tier
o WebSphere Application Server Network Deployment cluster configured for
the Services tier
Figure 11. Topology with all tiers on dedicated computers, one SAN, and active/passive failover
configured
Topology Design
Page 24
This topology seeks to maximize the high availability capabilities of InfoSphere Information Server. All
tiers are configured with the most advanced high availability solutions. The cost in hardware is high
because a minimum of eight computers are required:
- Two computers for the Engine tier, configured with the active/passive failover solution
- Three or more computers for the Services tier, configured as a WebSphere Application Server
Network Deployment cluster (the WAS Deployment Manager and managed nodes are deployed
on dedicated computers)
- One computer for the front-end dispatcher (typically a web server) that spreads requests to the
WebSphere Application Server Network Deployment cluster (this computer is not depicted in the
previous graphic)
- Two computers for the Metadata Repository tier that runs on IBM DB2 HADR with the
Automatic Client Reroute.
In conjunction with hardware costs, this topology also requires strong technical expertise and a great deal
of complexity.
You can adapt this topology as needed to meet available resources and specific requirements. For
example, in an effort to reduce hardware cost, you can consolidate some of the tiers and components on
fewer computers. You could also collocate some of the Services tier cluster nodes with the DB2 HADR
instances of Metadata Repository tier or collocate the front-end web server with one of the Services tier
nodes. Similarly, if performance and scalability at the Services tier is required more than high availability,
then you might consider configuring the Metadata Repository tier with the active/passive failover
solution rather than with the IBM DB2 HADR technology. The Engine and Metadata Repository tiers can
be consolidated on the same computers to mitigate the cost of two additional computers.
Table 12: Dimensions for a topology with eight or more computers, plus active/passive failover
Dimension
Minimal Hardware Required
Minimal Skills Required
High Availability
Simplicity
Performance and Scalability – Small Environments
Performance and Scalability – Large Environments
Rating
0
0
3
0
3
2 (for environments that stress mostly the
Engine tier)
3 (for environments that stress mostly the
Services and Metadata Repository tiers)
Advanced high availability with parallel processing and grid
Engine configurations
Characteristics:
- Hardware: Nine or more computers plus one SAN
- Tier configuration:
o All tiers on dedicated computers
o Engine tier configured for parallel or grid processing. Depending on whether
a parallel processing or grid configuration exists, the compute nodes can be
located on the same computer or dedicated computers. The following
Topology Design
-
Page 25
graphic depicts a parallel processing configuration with the conductor and
compute nodes on different computers.
High availability technology:
o Active/passive failover solution configured for the Engine tier
o IBM DB2 HADR with Automatic Client Reroute, configured for the Metadata
Repository tier
o WebSphere Application Server Network Deployment cluster, configured for
the Services tier
Figure 13. Topology with all tiers on dedicated computers, one SAN, active/passive failover configured,
and an Engine tier configured for parallel processing
This topology is the most complex because it adds a parallel processing or grid configuration to the
Engine tier on top of the Advanced high availability topology. This topology supports great levels of high
availability along with strong performance and scalability at the various tiers. The high complexity
translates into a high cost in hardware and a strong requirement for technical expertise.
As with the previous topology, numerous variations exist due to the number of combinations between
the layouts of the tiers and their components. You can experiment with these combinations to meet
Topology Design
Page 26
specific needs, such as consolidating some of the tiers on a smaller set of computers to help reduce
hardware costs.
Table 15: Dimensions for a topology with nine or more computers with parallel processing and grid
Engine configurations
Dimension
Minimal Hardware Required
Minimal Skills Required
High Availability
Simplicity
Performance and Scalability – Small Environments
Performance and Scalability – Large Environments
Rating
0
0
3
0
3
3
Topology Design
Page 27
Conclusion
Designing a well-rounded topology for InfoSphere Information Server is a complex task that is
paramount to a successful deployment and customer experience. An erroneous or awkward
design can lead to future complications, which are often difficult to rectify after the system is
active. Designing a topology that meets your customer’s needs and accounts for available
resources is critical to a successful implementation.
Following a methodical approach like the one described in this document helps to simplify the
design process. You must conduct a thorough analysis and quantification of the resources and
requirements. This step helps to measure what the system can handle and at what cost. Most
importantly, a thorough analysis forces you to evaluate your constraints and remove any
unrealistic expectations. Using this input data, you can begin planning your topology and pairing
it with one of the well-known and proven topologies in this document.
Appendix: Topology scorecards
The following tables summarize the various topologies and their dimension ratings that are
covered in this document. For more details on each topology, refer to the corresponding topology
section.
Table 16: Basic topologies
Topology
Hardware
requirements
Skills
requirements
High
availability
Simplicity
Single
computer
Dedicated
Engine
computer
Dedicated
computer
for each
tier
Very low
Very low
Very poor
Very good
Performance
and
scalability –
Small
Environments
Very good
Performance
and scalability
– Large
Environments
Low
Very low
Very poor
Good
Very good
Moderate
Low
Very poor
Good
Good to very
good
Good
Performance
and
scalability –
Small
Environments
Performance
and scalability
– Large
Environments
Moderate to
good
Good
Table 17: Parallel processing and grid engine configurations
Topology
Hardware
requirements
Skills
requirements
High
availability
Simplicity
Topology Design
All tiers
collocated,
including
the Engine
conductor
node
Dedicated
computer
for Engine
conductor
node
Page 28
Moderate
Moderate
Very poor
Very poor
Very good
Good to very
good
Moderate
Moderate
Very poor
Very poor
Very good
Good to very
good
Table 18: Basic high availability topologies
Topology
Hardware
requirements
Skills
requirements
High
availability
Simplicity
All tiers
collocated
Dedicated
Engine
computer
Parallel
processing
and grid
Engine
configurations
Low
Low
Good
Moderate
Performance
and
scalability –
Small
Environments
Very good
Performance
and
scalability –
Large
Environments
Moderate to
good
Good
Low
Low
Good
Moderate
Very good
High
High
Good
Poor
Very good
Moderate to
very good
Performance
and
scalability –
Small
Environments
Good to very
good
Performance
and
scalability –
Large
Environments
Good to very
good
Table 19: Advanced topologies
Topology
Hardware
requirements
Skills
requirements
High
availability
Simplicity
Highly
scalable
Services tier
High
Very high
Poor
Very poor
Topology Design
Highly
scalable
Services tier
with parallel
processing
and grid
Engine
configurations
Advanced
high
availability
Advanced
high
availability
with parallel
processing
and grid
Engine
configurations
Page 29
High
Very high
Poor
Very poor
Very good
Very good
Very high
Very high
Very good
Very poor
Good to very
good
Good to very
good
Very high
Very high
Very good
Very poor
Very good
Very good
Topology Design
Page 30
Further reading
Understanding tiers and components [InfoSphere Information Server, Version 8.5
information center]
http://publib.boulder.ibm.com/infocenter/iisinfsv/v8r5/topic/com.ibm.swg.im.iis.producti
zation.iisinfsv.install.doc/topics/wsisinst_pln_configurations.html
High Availability [Wikipedia]
http://en.wikipedia.org/wiki/High-availability_cluster
Planning for the installation of IBM InfoSphere Information Server [InfoSphere
Information Server, Version 8.5 information center]
http://publib.boulder.ibm.com/infocenter/iisinfsv/v8r5/topic/com.ibm.swg.im.iis.producti
zation.iisinfsv.migrate.doc/topics/wsismig_migrate_intro.html
Parallel processing and grid topologies [InfoSphere Information Server, Version 8.5
information center]
http://publib.boulder.ibm.com/infocenter/iisinfsv/v8r5/topic/com.ibm.swg.im.iis.producti
zation.iisinfsv.overview.doc/topics/cisoarchparalov.html
Oracle RAC [Wikipedia]
http://en.wikipedia.org/wiki/Oracle_RAC
Contributors
The document authors would like to recognize the following individuals for their
feedback on this document and their contributions to this topic:
Robert Johnston
Application Architect, Information Management Data Integration for InfoSphere
DataStage
Ernie Ostic
Specialist, Worldwide Sales
Sriram Padmanabhan
Distinguished Engineer, Information Management; Chief Architect for
InfoSphere Information Server
Topology Design
Page 31
Notices
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other
countries. Consult your local IBM representative for information on the products and services
currently available in your area. Any reference to an IBM product, program, or service is not
intended to state or imply that only that IBM product, program, or service may be used. Any
functionally equivalent product, program, or service that does not infringe any IBM
intellectual property right may be used instead. However, it is the user's responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in
this document. The furnishing of this document does not grant you any license to these
patents. You can send license inquiries, in writing, to:
IBM Director of Licensing
IBM Corporation
North Castle Drive
Armonk, NY 10504-1785
U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where
such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES
CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER
EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do
not allow disclaimer of express or implied warranties in certain transactions, therefore, this
statement may not apply to you.
Without limiting the above disclaimers, IBM provides no representations or warranties
regarding the accuracy, reliability or serviceability of any information or recommendations
provided in this publication, or with respect to any results that may be obtained by the use of
the information or observance of any recommendations provided herein. The information
contained in this document has not been submitted to any formal IBM test and is distributed
AS IS. The use of this information or the implementation of any recommendations or
techniques herein is a customer responsibility and depends on the customer’s ability to
evaluate and integrate them into the customer’s operational environment. While each item
may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee
that the same or similar results will be obtained elsewhere. Anyone attempting to adapt
these techniques to their own environment do so at their own risk.
This document and the information contained herein may be used solely in connection with
the IBM products discussed in this document.
This information could include technical inaccuracies or typographical errors. Changes are
periodically made to the information herein; these changes will be incorporated in new
editions of the publication. IBM may make improvements and/or changes in the product(s)
and/or the program(s) described in this publication at any time without notice.
Any references in this information to non-IBM Web sites are provided for convenience only
and do not in any manner serve as an endorsement of those Web sites. The materials at
those Web sites are not part of the materials for this IBM product and use of those Web sites is
at your own risk.
IBM may use or distribute any of the information you supply in any way it believes
appropriate without incurring any obligation to you.
Any performance data contained herein was determined in a controlled environment.
Therefore, the results obtained in other operating environments may vary significantly. Some
measurements may have been made on development-level systems and there is no
guarantee that these measurements will be the same on generally available systems.
Furthermore, some measurements may have been estimated through extrapolation. Actual
Topology Design
Page 32
results may vary. Users of this document should verify the applicable data for their specific
environment.
Information concerning non-IBM products was obtained from the suppliers of those products,
their published announcements or other publicly available sources. IBM has not tested those
products and cannot confirm the accuracy of performance, compatibility or any other
claims related to non-IBM products. Questions on the capabilities of non-IBM products should
be addressed to the suppliers of those products.
All statements regarding IBM's future direction or intent are subject to change or withdrawal
without notice, and represent goals and objectives only.
This information contains examples of data and reports used in daily business operations. To
illustrate them as completely as possible, the examples include the names of individuals,
companies, brands, and products. All of these names are fictitious and any similarity to the
names and addresses used by an actual business enterprise is entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate
programming techniques on various operating platforms. You may copy, modify, and
distribute these sample programs in any form without payment to IBM, for the purposes of
developing, using, marketing or distributing application programs conforming to the
application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions.
IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these
programs.
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International
Business Machines Corporation in the United States, other countries, or both. If these and
other IBM trademarked terms are marked on their first occurrence in this information with a
trademark symbol (® or ™), these symbols indicate U.S. registered or common law
trademarks owned by IBM at the time this information was published. Such trademarks may
also be registered or common law trademarks in other countries. A current list of IBM
trademarks is available on the Web at “Copyright and trademark information” at
www.ibm.com/legal/copytrade.shtml
Windows is a trademark of Microsoft Corporation in the United States, other countries, or
both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
Other company, product, or service names may be trademarks or service marks of others.