Batch Resiliency

On 4 May 2021, IBM announced IBM Z Batch Resiliency V1.2 delivering significant updates to support the resiliency of key applications running on z/OS. These enhancements provide capabilities to reduce the time required to decide and act on issues affecting business critical data, such as corruption incidents. In this article we’ll take a closer look at the major updates provided.

What is IBM Z Batch Resiliency

It is estimated around 70% of enterprise data resides on the mainframe[1] making the role of the platform critical to success of any business exploiting this technology. These high value assets are found within databases, files and applications running 24 hours a day and so protecting the integrity of this data is paramount. With key subsystems that provide databases, such as Db2 and IMS, key features within the product and associated tooling can help with the management, journaling and recovery from problems. For data that resides outside of this scope however, for example VSAM files read and updated by batch workloads, there may be a lack of equivalent processes or visibility which increases the risk of extended or failed recovery when problems occur.

IBM Z Batch Resiliency provides resiliency management of these non-database managed data and applications, leveraging detailed analytic reporting that can reduce reliance on complex domain specific and possibly error-prone manual approaches. This also contributes to reducing mean time to recovery from hours or days by identifying at-risk data at the point of failure within minutes, enabling rapid restoration of data set and determine possible down-stream impacts associated from the point of corruption forward.

Other capabilities within IBM Z Batch Resiliency can help reduce operational and storage costs by identifying critical data sets that are not being backed up and when unnecessary backups can be eliminated to only back up what’s needed.  Support for audit preparation is also provided by pre-defined reports, including details on data sharing between production and non-production workload, to reduce the risk of non-compliance of data management requirements.

Let’s take a look at some of the new features delivered in IBM Z Batch Resiliency V1.2.

Enhanced integration with IBM Z Workload Scheduler

To derive the analysis of your workloads and data sets being used, IBM Z Batch Resiliency captures data from many sources including SMF, tape management and your scheduler. While all major schedulers are supported, the new release provides an enhancement for closer real-time integration with IBM Z Workload Scheduler. Typically job information is collected from the scheduler on a daily basis, but what happens if changes are made during the day, either planned or ad hoc? Knowing there might have been a change could help with investigation in what data needs to be restored and what jobs may need to be resubmitted to complete the recovery.

IBM Z Batch Resiliency takes advantage of the EQQUX007 exit point provided by IBM Z Workload Scheduler to capture changes to the schedule that may have been made during the day. This enables a more accurate analysis of what workloads and jobs have been submitted when investigating the impact of an issue and knowledge of what jobs will need to be resubmitted by IBM Z Workload Scheduler following recovery of selected data sets.

Support for managing and restoring data found on zFS

Many batch workloads increasingly make use of data stored on a z/OS UNIX file system, such as zFS. IBM Z Batch Resiliency now provides insight into this via data captured in SMF 92 records to provide backup reporting and restore capabilities for files on zFS.

Strengthen cyber resiliency strategy 

With the threat from malicious data corruption a major concern for many enterprises, the IBM Z Cyber Vault solution is positioned to help clients address the challenge of logical data corruption. When full system backups are taken to provide an air-gapped cyber vault of critical data, the capability to know which batch jobs were running and files open at copy time can help clients understand the value of each snapshot when making a decision to restore.

New capabilities in IBM Z Batch Resiliency support this solution with health check insight into IBM DS8000 SafeGuarded Copy snapshot from the new Cyber Vault Health Check report which identifies any non-database-managed data set that is open for output at the time of the Safeguarded copy. 

The pre-defined Reverse Cascade report can assist in forensic investigation to identify the original source of the corruption by identifying the jobs and steps that updated the corrupted files, aiding in the isolation of the program that caused the error. In addition, IBM Z Batch Resiliency can identify which data sets are critical and need to be restored and which ones can simply be recreated. From there the surgical recovery of any non-database-managed data sets that were open for output at the time of the Safeguarded Copy can be performed, automatically generating with accurate restore JCL generated automatically. In addition, the Forward Cascade report will help develop a forward recovery plan for the applications that use the data that is surgically recovered.

How can I learn more?

I hope this has given you some ideas on how IBM Z Batch Resiliency could make a difference to your operational resiliency. Together with the latest Z hardware, storage and software updates, it forms a key foundation of an IT enterprise resiliency strategy on IBM Z. If you want to learn more, take a look at our solution brief or the product documentation. Please also reach out to myself or your IBM representative if you have any questions or would like a demonstration of any of the features described here.

References
[1] Leveraging IBM Z for a hybrid cloud world

Originally published on IBM Community Blog

Leave a Reply

Your email address will not be published. Required fields are marked *