Overview of Archival and Purge Process

Overview of Archival and Purge Process in IBM Sterling B2B
Integrator
- Bhavya M Reddy ([email protected]), Staff Software Engineer, IBM Sterling
B2B Integrator L2 Support
Table Of Contents
Introduction to Sterling B2B Integrator...................................................................2
Importance of Database Maintenance......................................................................2
Business Process and Document lifespan Configuration........................................3
Lifespan calculation for the Business Process and the
Document.................................................................................................................5
Business Processes Involved in Archival and Purge................................................8
Archival and Purge Process Flow depicted pictorially...........................................11
Life Cycle of a BP and a Document in Sterling B2B
Integrator.................................................................................................................13
PurgeAll BP.............................................................................................................18
Related Links...........................................................................................................19
Introduction to Sterling B2B Integrator
Sterling B2B Integrator (SBI) is a Transaction Engine that runs the processes we define
and helps in managing these processes. It ties together applications, processes, data, and
people, both within and outside your organization. It therefore helps in integrating
businesses.
The Sterling B2B Integrator approach to integration centers around business process
management. A business process is a goal-driven, ordered flow of activities that
accomplishes a business objective. Using Sterling B2B Integrator, you integrate the
activities that make up your company's business processes.
Common examples of such activities include
 XML, EDI, and proprietary file translation, transformation, and filtering
 Human interaction through a browser interface (such as reviewing and
approving data)
 Content-based routing of messages
 Data publishing
 Extended process models that integrate the execution of a B2B protocol, such
as AS2, with enterprise system integration, such as invoking the SAP adapter
Importance of Data Maintenance
Every activity here involves either a Business Process (BP) or a Business Document or a
Business associated with a Business Document. These BPs and Documents have to be
stored for future use. SBI maintains this data in the Database or the File System as per
the Document Storage settings.
During the course of the business activity in SBI, data starts building up in the database
or the file system (if the document storage is File system). It is very important to clear
this data at scheduled intervals to maintain the good health of the database.
SBI has well-defined clean up processes as stated below which will help in clearing the
data from the database
 ScheduleIndexBusinessProcess
 ScheduleBackupService
 SchedulePurgeService




ScheduleAssociateBPsToDoc
ScheduleRecoveryBusinessProcess
ScheduleAutoTerminateService
Schedule_BPLinkagePurgeService
Business Process and Document lifespan
The time period for which a BP and Document are available in SBI is determined by its
lifespan. The lifespan can be configured in the SBI dashboard User Interface (UI) under
Operations → Archive Manager → Archive Configurations-> Configure Archive
settings
In the screenshot below, the lifespan of the BP/Workflow is 2 days and the lifespan of
the document (trackable business process information) associated with the BP is 1 day
12 hours.
The same screen determines whether the BP and the associated document have to be
archived and then purged, or directly purged. If the “Archived” option is chosen, the
Backup Directory field for choosing the location to archive the documents has to be
updated.
When a BP is configured to choose “System Default”, the BP takes the lifespan set in
the Archive Manager. Consider the MapTest BP shown in the screenshot below
Click On source manager → edit → Traverse through the lifespan page to find out the lifespan
chosen
If for a particular BP the lifespan has to be different, then choose the “Process Specific”
Option and set the lifespan accordingly. This will overwrite the default settings in the
archive manager for this BP.
According to the configuration in the screenshot below the MapTest BP will be available
in the system for 2 days.
The same screen also opens up for choosing the archival option that is Archive first and
then purge or directly purge.
Lifespan calculation for the BP and the Document
SBI provides an option to enable Document Tracking, for which we set the trackable
BP lifespan in the archive manager page. This option is available to ensure the document
is available for an extended period of time.
Document Tracking can be enabled at a global level for all the BPs by setting the
tracking.global.enabled=true in the doc_tracking.properties.
Document Tracking can also be enabled at a BP level for a specific BP by editing the BP
and enabling the document tracking as below
Now let us consider two scenarios for lifespan calculation
1. If Document Tracking is enabled, for the configuration below
lifespan of the BP = 2 days
lifespan of the Document = lifespan of the BP + lifespan of the Trackable Business
Process Information
= 2 days + 1 day 12 hours
= 3 days and 12 hours
2. If Document Tracking is not enabled, for the configuration below
lifespan of the BP = 2 days
lifespan of the Document = lifespan of the BP = 2 days.
Business Processes involved in clean up activity
Schedule_IndexBusinessProcessService (BP moving service)
 BP name is ‘Schedule_IndexBusinessProcessService’
 Runs every 10 minutes by default
 Index works only on BPs that are in Completed or Terminated state. That is to say
Index works on BPs with ARCHIVE_FLAG=-1
 Calculates the lifespan and Removal Method for the BP based on the Workflow
Definition
 Removal Method is either 0 (Archive/backup) or 1,2 (Purge)
 Updates the records in ARCHIVE_INFO, setting the ARCHIVE_DATE equal to
the lifespan, and the ARCHIVE_FLAG equal to the Removal Method from
WF_INST_S
 For messages added to mailbox using FTP/SFTP or a mailbox add service into,
the Index updates archive_info records with 10 years lifespan when the lifespan is
reset following a mailbox delete
 If a BP fails Index, it is re-marked with an ARCHIVE_FLAG of -5.
Schedule_BackupService (Archive)
 Runs every morning at 2:00 AM by default
 ‘Schedule_BackupService’ archives data that has an ARCHIVE_FLAG of 0
 Uses the records in WF_INST_S to calculate the eligible data for the backup

Once Archived, the ARCHIVE_FLAG is changed from 0 to 1 or 2 to indicate the
BP is now ready for Purge
 The change to the ARCHIVE_FLAG is determined by the purge settings in the BP
or in the Archive Manager in the UI.
Schedule_PurgeService
 By default, Schedule_PurgeService BP runs every 10 minutes
 Deletes expired BP and Document data from the various tables
 Deletes documents from disk
 Uses the ARCHIVE_INFO table, and looks for BPs that have an
ARCHIVE_FLAG of either 1 or 2, and an ARCHIVE_DATE of less than the
current system date
 Deletes all eligible BP records, in batches.
Schedule_AssociateBPsToDocs
 Very important housekeeping process
 Looks through the DOCUMENT, DOCUMENT_LIFESPAN and
WORKFLOW_CONTEXT table for eligible workflow_id with 0,-1 IDs, and
updates their BP ID (associates them) to its workflow_id
 For example, if the Schedule_AssociateBPsToDocs is set with a BP ID of
‘12345’, then these Documents with a BP ID of ‘0’,‘-1’ have their BP ID updated
to ‘12345’
 This process will flag records in the following tables: DOCUMENT,
DOCUMENT_LIFESPAN,
DOCUMENT_EXTENSION,TRANS_DATA,CORRELATION_SET.
Schedule_BPRecovery
 If an SBI instance fails abnormally (JVM crashes or is killed via hardstop.sh), the
WorkFlowEngine (WFE) doesn't have an opportunity to synchronize the database.
 Therefore Business Processes that are in an ACTIVE, HALTING or
WAITING_ON_IO state will remain that way indefinitely (referred to as Active
Hung processes), and the UI will not offer any actions to repair them (since
operating on an in-flight BP is not safe).
 The BPRecover attempts to address the problem of how to synchronize the
database and the WFE so as to not impact any newly executing BPs.
 The BPReportService obtains the list of ACTIVE, HALTING or
WAITING_ON_IO from the database. This set is then compared to the list of
threads, messages in the queues and ActivityData entries (objects in memory that
can be associated with an in-flight process). This is done 3 times with a 10 sec
sleep interval. If a candidate makes it through all 3 checks, then it is considered
active hung.
 BP recovery level can be set individually for a BP in the Business Process
manager.
Schedule_AutoTerminateService
 The Auto Terminate service is pre-configured and, by default, is scheduled to run
each day at 4:00 A.M. The service checks for business processes that have been in
a specified state for a specified length of time and then terminates them.
 By default, the Auto Terminate service checks for and terminates business
processes that have been in a halted state for over 14 days. You can adjust these
settings to suit your specific business needs.
 Overriding the bprecovery.properties File Settings. The number of days a business
process must be in a specified state before being terminated by the Auto
Terminate service, and the specified state or states, are defined by properties in the
bprecovery.properties file. The default settings are specified by the following
lines:
auto_terminate_days=14
num_states=1
auto_terminate_state1=halted
auto_terminate_batch=1000
 The default settings can by overridden using the customer_overrides.properties
file. You can change the number of days before termination, change the specified
state, or add additional states.
 The value of auto_terminate_days in the bprecovery.properties file can also be
overridden using BPML in your business process using a statement in the
following format:
<assign to="AUTO_TERM_DAYS" >new_value</assign>
Schedule_BPLinkagePurgeService
 Cleans the workflow_linkage table
 Workflow_linkage table contains parent-child BP information
 Runs once every night
 Needs to run more frequently on loaded systems
 Adjust the max BP if needed
<assign to="max_business_processes">180000</assign>
Archival and Purge Process Flow depicted pictorially
Message enters SBI → BP Processes the Message → BP after completion gains
ARCHIVE_FLAG=-1
The clean up Processes --Schedule_IndexBusinessProcessService,
Schedule_BackupService and Schedule_PurgeService-- will act on the completed BP
and remove it from SBI based on the lifespan set for it.
Schedule_IndexBusinessProcessService BP is responsible for changing the
ARCHIVE_FLAG from -1 to 0 or 1 and Schedule_BackupService BP is responsible for
changing the ARCHIVE_FLAG from 0 to 2. Schedule_PurgeService BP looks for
ARCHIVE_FLAG 1 and 2 records to purge them.
BP With
ARCHIVE_FLAG= -1
Index BP runs
Index BP sets the
ARCHIVE_FLAG to 0
or 1
Backup and Purge BP runs
If ARCHIVE_FLAG =1
If ARCHIVE_FLAG=0
ARCHIVE_FLAG
value?
Purge
Purge BP acts on
ARCHIVE_FLAG 1 and 2
records and purges them
If ARCHIVE_FLAG=2
Purge
Backup BP Archives
the Record, sets the
ARCHIVE_FLAG to 2 and
makes
it eligible for Purge
Back up data
Archive data in the
folder
Provided in the
Archive manager
Life Cycle of a BP and a Document in SBI
Based on the BP and Document lifespan as per the Archive manager configurations, the
BP and document goes through different stages within SBI.
Let us consider the example below in which the default document storage is Database
and the lifespan of the BP is 1 day and the lifespan of the Trackable BP is 1 day and
the Document Tracking is disabled.
BP manager Configuration: lifespan
Let us now check the process of archival for the BP StockQuoteBP which has
completed execution.
Screenshot of the BP Execution
Select *from WORKFLOW_CONTEXT where WORKFLOW_ID='61618'
This BP is also associated with a message
Select *from DOCUMENT where WORKFLOW_ID='61618'
Every BP after completing execution will get an ARCHIVE_FLAG=-1
Select *from ARCHIVE_INFO where WF_ID='61618'
Now that the BP is eligible to be indexed, when the BP
Schedule_IndexBusinessProcessService runs, it picks up this Workflow ID for indexing
and makes it eligible for archival by setting ARCHIVE_FLAG=0 and also sets an
ARCHIVE_DATE.
Select *from ARCHIVE_INFO where WF_ID='61618'
Now that the BP is eligible for archival, when the BP Schedule_BackupService runs, it
archives this workflow and the message associated with it. It then sets the
ARCHIVE_FLAG to 2
Select *from ARCHIVE_INFO where WF_ID='61618'
Note: The archived documents get saved in the directory mentioned in the Archive
Manager which will be available there forever until they are removed manually. SBI will
not play any role in removing these documents from the file system.
The BP and the message are now archived and are therefore eligible to be purged. As
illustrated by the BP execution screenshot it is observed that the BP got executed on 2606-2015, as the BP and document lifespan is 1 day and as document tracking is not
enabled. The BP lifespan= document lifespan= 1 day.
Therefore the ARCHIVE_DATE is set to 27-07-2012, which means when the BP
Schedule_PurgeService runs on the date specified, it purges this BP and the associated
document from the system.
After the Purge service BP completes its execution on the specified ARCHIVE_DATE,
you can see that the BP and document have been removed from the system.
The query below returns no results, which confirms the same thing.
Select *from WORKFLOW_CONTEXT where WORKFLOW_ID='61618'
Select *from ARCHIVE_INFO where WF_ID='61618'
Select *from DOCUMENT where WORKFLOW_ID='61618'
Searching for the BP ID in the UI returns null
This confirms the document has been completely removed from the system.
When the Document Storage is set to file system, the process above remains the same.
But for the Documents and Work flows to be removed/purged from the file system
physically, the parameters below have to be set in the archive_thread.properties file
GENERATE_PURGE_DOCDISK_LIST=true
PURGE_DOCS_ON_DISK=true
PURGE_DOCDISK_LIST_FILENAME=/SBI_install_directory/documents/purge_dod_
list.txt
When the documents are created a file is put in the documents directory. When (if) the
process using the document is archived, a copy of the file goes to the arc_data folder.
When purge runs, it creates a purge_dod_list of the files to be deleted from the
documents directory. If PURGE_DOCS_ON_DISK is set to true, it consumes that file
and deletes the files from the documents directory. The files in the arc_data directory are
not deleted though. They have to be removed manually.
Purge All Business Process
 Used to purge all the eligible records in the system
 The PurgeAll BP contains two flags – Purge (set to ALL) and Max Loops (set by
default to 100). Max Loops value can be changed.
 The scheduled Purge BP and the PurgeAll BP cannot be run at the same time.
CAUTION:
The Purge All business process should not be used for ordinary production purposes. It
is only for use, generally on the advice of IBM Support, to immediately remove data
from the live system, regardless of its expiration date. This may be advisable, for
example, if the Scheduled Purge business process has encountered some failure causing
a back up of purge-eligible data. There is an additional flag (MAX_LOOPS) available
that will help limit the number of loops made by the Purge All business process, thereby
helping to control how much data the system will handle in a single execution. If a large
amount of data had accumulated, this limit will help the system continue with other
processing.
There is also a PurgeAll script found in the bin directory. Running this script will
truncate all the Transaction related data. Therefore it is not suggested to run this in the
production system.
CAUTION : Before running this command even on UAT or DEV customers are
requested to open a PMR with IBM Support.
Related Links :
IBM Sterling B2B Integrator Documentation Home Page :
http://www-01.ibm.com/support/knowledgecenter/SS3JSW/welcome
IBM Sterling B2B Integrator – Understanding and Monitoring DB growth :
http://www.ibm.com/support/docview.wss?uid=swg27044160&aid=1