ThruPut Manager® and Production Batch

ThruPut Manager® and Production Batch
FEBRUARY 2017
ThruPut Manager automatically and intelligently optimizes the processing of batch jobs. Its automation capabilities
improve your z/OS production batch environment, regardless of which job scheduler you use.
PRODUCTION BATCH IN A JES2 ENVIRONMENT
Production batch workload is different from other batch workloads in a number of ways.

It is usually submitted by a job scheduler that knows the job dependencies and submits jobs according to
date/time schedules or other logical requirements.

Production is usually the most important batch workload, taking precedence over development and other ad hoc
workloads. Consequently, it should only compete for resources with other production work and possibly online
workload.

Production workload is fairly predictable and well understood. When it does change, it is subject to strict change
control procedures.

Production batch, like other batch, has a propensity for using external resources—indeed this is one reason why
batch processes aren’t written as online transactions in the first place. Fetching a resource, such as staging a
virtual volume, may be relatively time-consuming and delays prompt job execution.
AUTOMATING BATCH SERVICE MANAGEMENT
If all your batch jobs had equal importance, equal demand for external resources and consumed the same CPU, then all
jobs could be presumed to have the same service goal. However, every data center needs to give preference to more
important workload, whether they call it “production,” a “mission-critical” application or a special class of work. Once
that’s taken care of, other workloads can be accommodated.
Data centers define batch service goals to ThruPut Manager by creating policies. A policy specifies the service goals and
relative batch importance for each workload, together called a Service Group. It also controls the deployment of the
JESplex and LPARs, in the form of constraints on jobs and drives. Jobs are analyzed to determine which goals apply to
them. Data center criteria and constraints regarding resource utilization are also defined in the policy.
The automation engine uses the policy to determine what level of service each job is entitled to, as well as to deploy the
available data center resources (LPARs, initiators, tape drives) according to the data center’s goals. When you need new
goals, you change the policy. ThruPut Manager expects the policy to change over time (e.g., MONTH END policy,
HOLIDAY policy, DISASTER RECOVERY policy). Whether the policy change is anticipated or spontaneous, the
ThruPut Manager automation engine transitions from one to another seamlessly.
With these simple settings and the built-in heuristics, the automation engine makes sure your production workload gets
the service you want it to with minimal manual intervention. It provides automatic escalation, resource management and
improved throughput. Each of these are accomplished via a single queue, utilizing TM-managed JES2 initiators.
AUTOMATIC ESCALATION
ThruPut Manager provides automatic escalation when a job waits too long in the queue. Your policy settings indicate
what escalation you want for jobs in each Service Group. The escalation is relative to other jobs in the system at the
same time. Though all jobs may enjoy escalation, production jobs are most likely to need its benefits.
In a policy, each Service Group defines a TARGET and ACCEPTABLE time to be in the queue for what you would expect
under normal conditions. Since values for shorter production jobs are normally lower than for long running jobs, a
shorter production job reaches its thresholds faster. This is an “aging” strategy. Jobs that have aged more are given
higher priority in the queue. Though some jobs travel faster than others, if a job waits long enough, it will reach the
ACCEPTABLE threshold.
If jobs are exceeding their ACCEPTABLE threshold, it suggests the system is stressed; and, ThruPut Manager knows it’s
time to give preference to the most important work—i.e., “use an importance strategy.” Each Service Group is assigned
a Batch Importance from a business point of view that now comes into play.
Once past the ACCEPTABLE threshold, all jobs with a higher batch importance will be given greater priority than any jobs
with lower batch importance. Jobs with the same batch importance will “age” towards the CRITICAL threshold value set
for its Service Group.
When there is an outage or unprecedented load, for example, some jobs must be run, while others can wait until the
situation improves. ThruPut Manager invokes an “urgency” strategy under these circumstances. Once a job “ages” past
the CRITICAL threshold without being selected, it is placed at one of the three highest priorities, indicated by the setting
2
in the Service Group, and jobs are selected in order of this CRITICAL priority, regardless of the Service Group to which
they belong.
In summary, production jobs get the service they need under normal circumstances with automatic escalation as the
system degrades and automatic recovery as the system improves.
RESOURCE MANAGEMENT
Rather than tie up an initiator while fetching external resources, ThruPut Manager collects resources while the job is in
the queue. The initiator takes over when the job is ready for execution. While resources are being fetched, the initiator
can provide execution services for other jobs that don’t need, or already have, their external resources. This strategy
delivers better service for high priority work than the “zero queue time” approach and delivers better overall throughput
for the entire workload.
In addition to queue time resource fetching, ThruPut Manager allows for prioritized fetching. For instance, if two jobs are
trying to recall HSM archived datasets, it gives precedence to the job with the higher Batch Importance.
Some data centers use job scheduler features, such as negative dependencies, to control external resources and avoid
contention. ThruPut Manager does this with much less specification effort and does it better, since it can detect and
automatically avoid contention at runtime.
IMPROVED THROUGHPUT
It is almost impossible to manually balance every job’s performance with the optimization of overall batch throughput.
ThruPut Manager automation algorithms optimize resource utilization and throughput of the workload as a whole.
ThruPut Manager uses its automation engine to optimize the workload, based on service goals, knowledge of the
workload and data center configuration and constraints. It also automatically responds to unexpected obstacles and
problems (e.g., having an LPAR down). When goals specify that some applications or jobs are more important than
others, the engine delivers the most important jobs first when it can’t deliver them all.
Here are some examples of the ThruPut Manager built-in automation behavior:

A job that has been waiting for 30 minutes takes precedence over another job with the same goals, but which
has only been waiting 5 minutes.

A job that cannot immediately begin execution is passed over for selection until it is ready.

A job is managed individually to its target (i.e., not ‘averaged’ with a group of jobs), making it impossible for a
single job to languish.

When two jobs both need HSM recalls, the recalls for the more important job are initiated first.

A job needing DB2 to run is only selected by initiators on LPARS that have DB2 available.
ThruPut Manager verifies that WLM Service Classes receive service before starting another job in that Service Class. It
manages dataset contention automatically, giving preference to higher priority work and avoiding a dataset contention
“ripple” when production jobs finish earlier. ThruPut Manager preemptively captures information for later reporting,
including performance against the original policy settings, reasons for delays and the number of Service Units consumed.
Besides prioritized fetching of external resources, ThruPut Manager prevents interference from non-production workload,
such as contention from TSO users. It also allows you to reserve and cap drives for specific workloads, thus allowing
several workloads to share drives but ensuring no one workload ‘hogs’ them.
More efficient use of resources without sacrificing production service. There is a tendency for data centers to run
production and non-production workloads on separate LPARs/JESplexes to give production the preference it needs.
ThruPut Manager can achieve this without such extreme measures—and allows organizations to realize the full benefits
of consolidating workloads.
Hardware upgrades may be deferred. ThruPut Manager automatically limits the number of jobs it starts on an
“online” image, while completing all work according to SLAs. The resulting throughput efficiencies are unquestionably
good for production; and, the workload is run with fewer CPU resources and at a lower cost, resulting in more slack in
the batch window, and an opportunity for hardware upgrades to be deferred.
Operator training is reduced and procedure maintenance is simplified. When better throughput is achieved
through optimization and automation of batch workload management, then users also reduce their dependence on
3
manual operator intervention. While that is a direct benefit in and of itself, the benefit for the user is more consistent
response and service.
DYNAMIC INITIATORS
A key function of the automation engine is the provision and placement of the optimum number of initiators as well as
the manipulation of the job queue to make sure the “right” job is the next one to be initiated. ThruPut Manager dynamic
initiators are JES2 initiators managed by the automation engine and are designed for batch.
ThruPut Manager dynamic initiators take into account the unique characteristics of a queued workload with a wide
variety of service goals and resource characteristics. ThruPut Manager decides to add or remove initiators based on
current system load and data center policies.
A TM-managed initiator strategy impacts batch processing in a variety of ways:

Initiators are started and stopped based on system load, data center controls and workload performance against
service targets.

Workload is distributed across systems.

The data center controls which systems can process which jobs.

Initiators always select the ‘best’ job to execute next.

Queue time is used to reduce initiator delays.

Execution time delays are reduced to increase initiator availability.
One of the objectives of ThruPut Manager is to minimize delays. Much of this is achieved through the preparation
activities occurring while jobs are in the queue, as discussed previously. However, runtime delays can still occur due to
dataset contention and tape drive shortages. Because ThruPut Manager automatically mitigates these delays, optimizing
initiator utilization and improving turnaround time.
Jobs wait in a single queue until selected by TM-managed initiators. ThruPut Manager automates the management of this
queue to optimize the overall workload throughput. This is accomplished by dynamically reordering the queue and using
queue time to prepare jobs for execution while reducing the time initiators are tied up.
ThruPut Manager analyses every job and can therefore
detect any job characteristic directly, without reference
to the class. ThruPut Manager reorders the jobs in its
queue so that the right job is always at the top. A job
pushes ahead in the queue if it is closer to missing its
service goals than other jobs. A job with aggressive
targets pushes ahead faster than a job with less
aggressive targets. Initiators select from the top of the
queue, mindful of the data center goals and constraints.
Testing has shown that ThruPut Manager starts fewer
initiators than WLM, yet completes the batch workload
in a shorter period of time because of the efficiencies
realized by automated, intelligent workload and
resource management.
SUMMARY
ThruPut Manager has many features to enforce standards, reduce reliance on end-user compliance, distribute workload,
optimize and increase throughput, and save on software license costs. All of these contribute to a healthy batch
management environment, and in many cases, the impact is greater for production batch. ThruPut Manager is the
product of choice to automate and optimize your z/OS production batch service.
Learn more at Compuware.com/thruputmanager.
4