If It Ain’t Broke…

May 15, 2019

Gil Peleg

With over two decades of hands-on experience in enterprise computing, data centers management, mainframe system programming and storage development, I’m now on a mission to accelerate cloud adoption at large enterprises by making their most trusted core business platforms more flexible, affordable and cloud compatible. Connect with Gil on LinkedIn.

It was Bert Lance, Jimmy Carter’s Director of the Office of Management and Budget, who made the phrase “If it ain’t broke, don’t fix it” popular by publicly urging the US government in mid-1977 to adopt it as a motto in order to save billions of dollars annually. He was quoted as saying, “That’s the trouble with government: Fixing things that aren’t broken and not fixing things that are broken.”

On the one hand, “If it ain’t broke” can be a very common-sense approach to many situations in life. Why waste energy and resources on something that’s working just fine? On the other hand, in the 21st century, we know that innovation and progress are driven by the unceasing desire to take things to the next level, making them better, faster, and smarter. It’s hard to imagine any business today that could compete successfully by embracing the “If it ain’t broke” worldview.

There is one sector, however, where the “If it ain’t broke” attitude is still prevalent: mainframe shops. In our article The Mainframe Manager’s Essential Guide to Hiring Millennials, we explore in depth the challenges faced by mainframe shops in replacing highly experienced retirees with next-gen software engineers. When the authors of legacy apps or the originators of workflows are no longer around to explain their code or processes, the tendency is to avoid application and other software updates unless something actually breaks, i.e., stops working.

In this article, we discuss how the “If it ain’t broke” approach has crept into the mainframe world and has caused real business damage. We also bring examples of how mainframe shops can embrace new technologies in key areas, such as backup, security, and app modernization, in order to mitigate risk and enhance business outcomes – we know a little bit about this at Model9 – it’s what we do.

Breaking the Broken Barrier

Let’s go back for a moment to Bert Lance’s observation about government, that it wastes resources by fixing things that aren’t broken and fails to fulfill its mission by not fixing things that are broken. It begs a very important question: How do you know if something is broken? For example, children might be sitting in classrooms across the nation by virtue of universal access to free schooling, but you will not know if they are actually getting an education until you test them.

Interestingly enough, the same holds true in a mainframe shop. An app or a process may be working, but you won’t know whether or not it’s achieving its intended goal until you test it. An attitude of “Nothing’s broken in my shop” creates risk. Your backups may be working like clockwork, but have you checked lately if you can use them for a successful recovery? Your security controls are in place, but can you really know if your mainframe is hack-proof without conducting regular penetration tests? Have routine processes like backup crept unnoticed into your four-hour rolling peak, driving up your monthly license costs? And what are you going to do if a legacy app that hasn’t been touched for twenty years develops a show-stopping bug, and no one in the shop knows the programming language in which it was written in order to remediate it?

It is our belief that an “If it ain’t broke” approach in a mainframe shop is more often than not closer to another popular expression—“Burying one’s head in the sand.” We believe that adopting an attitude of always being on the lookout for where and how to improve environments, apps, and processes is vital in order to stay on the cutting edge and optimize availability and performance.

We’ll now explore three concrete scenarios in which the “If it ain’t broke” approach has been problematic:

Backups/DR
Security
App Modernization

Your Backups May Be Broken and You Don’t Know It

Organizations often do not realize that their backups do not provide them with adequate coverage because they do not test them and, fortunately for them, they have never had to use them in a significant restore scenario. Our product often runs parallel to other legacy backup solutions in enterprises that are transitioning to or adding our cloud-based mainframe backup solution to their toolset. During the course of preparing to compare our backup and recovery technology to the existing tools, it is not unusual for the mainframe shop manager to discover that their backup process has been broken for quite a while, but they weren’t aware of it. Here are three real-life examples:

In preparing a backup job in order to assess our solution side by side with their existing backup product, the mainframe manager was surprised to see that they are currently backing up to tape only about one-third of their DASD. Their legacy backup process was running every weekend, consuming tapes and CPU, but the problem had gone unnoticed because they were running their DR drills on their secondary DASD. Should they have ever needed to use this tape “doomsday” copy, they would not have been able to achieve a full-site recovery—which is the last thing you want to discover in that kind of crisis scenario.
A mainframe shop was using HSM for incremental backup of data sets to virtual tapes in their VTL. This activity should produce a significant volume of data in the VTL, but when we analyzed the content of their VTL database to prepare a migration, we saw that HSM was only producing a small number of tapes. In other words, their coverage was not nearly as wide as they had thought, putting them at considerable risk for data corruption or disaster recovery scenarios, not to mention lack of compliance with regulations.
A governmental defense agency was running the same backup process on a weekly basis for years. As part of migrating to a new tape solution, however, they discovered an error in the backup process and realized that, in fact, no backups were being created.

In addition to a rude awakening to the fact that an organization’s coverage is broken, an analysis of its legacy backup process often reveals hidden costs. Backup-related and other routine processes are typically designed to take place outside of the rolling four-hour peak. However, as data sets grow or infrastructures change, these processes start to take longer than originally planned and start driving up monthly licensing costs. But nobody is monitoring these issues because, more often than not, the engineers who defined these rather complex workflows are no longer on staff, and it seemed safer to just let them run as is.

The fact is that, prior to our tapeless cloud-based mainframe backup and recovery solution, it didn’t really matter if these hidden costs were discovered as there was no viable remediation strategy.

With Model9, however, both these issues—backup coverage and backup and archive costs—can be easily remediated. With our intuitive policy management interface as well as its comprehensive reporting, mainframe shops no longer need in-depth knowledge and expertise to monitor the completeness and coverage of their backups. In addition, automated recovery tests constantly run in the background to verify actual recoverability from backups.

In terms of cost containment, with Model9 mainframe shops can now offload backup, archiving, and recovery processes to Java-oriented zIIP engines that are not included in the rolling four-hour peak calculations. In this way, they can implement highly robust backup and disaster recovery plans while lowering costs.

In short, our solution is a classic example of how mainframe shops can and should embrace advanced technologies in order to solve problems rather than ignore them.

Is Your Mainframe as Secure as You Think?

In a survey conducted in June 2017 among 400 CIOs, 78% of them considered the mainframe their most secure computing system, with 64% entrusting it with their most sensitive data, including PII (Personally Identifying Information). However, there are several trends that are making it more of a challenge to maintain truly robust security controls on mainframe systems. For example, enterprises today leverage their mainframe data for business intelligence and other analytics, with more and more business units requesting and getting access to the mainframe’s data stores. In addition, it is more and more common for mainframes to be directly connected to the Internet with their services exposed as APIs. Both of these trends expose the mainframe to malicious attacks, whether from insiders or from external attackers.

One of the more spectacular mainframe security breaches took place at the Office of Personnel Management (OPM), which is responsible for human resources management for the entire US federal government. Between March 2014 and March 2015, hackers succeeded in exfiltrating sensitive information about the OPM’s IT environment as well as security clearance background files, personnel files, and fingerprint data. In one of the series of exploits, the hacker used a contractor’s credentials to log into the OPM system and create a backdoor to the network.

One of the conclusions from the security breach was that the “OPM lacked an effective managerial structure to implement reliable IT security policies.” This aligns well with the fact that, in the same survey noted above, 84% of CIOs reported that they lacked visibility into their mainframe environments and could not effectively track in real time who had accessed their mainframe data, through which applications, and for what purpose. This blind spot creates not only security risks, but compliance risks as well.

Rather than being lulled into a false sense of security, mainframe shops should be seeking out solutions that provide the same kind of cutting-edge real-time security controls that are implemented in open-system and cloud computing environments, such as SIEM (Security Information and Event Management) systems. An example of an innovative security solution that brings the latest in open-system security controls and processes to the mainframe world is CorreLog, which was acquired recently (November 2018) by BMC.

Can You Troubleshoot Your Legacy Software?

Not too long ago, one of our customers, a large banking institution, faced a crisis due to a relatively small bug in a legacy application that threatened to disrupt business-critical operations. When the application was written decades ago in a now-obsolete mainframe development environment, it failed to take into account a calendar event that was unique to the year 2018. Because the application had worked faithfully for many years, no effort had been made to convert it into a more modern programming language. But now there was no one in the mainframe department who could troubleshoot the bug, and the original application developer had to be cajoled out of retirement to come and fix the problem. Although the story ultimately had a happy ending, it shook the confidence of the bank’s management in their mainframe environment.

However, the risk involved in migrating mainframe applications to new, non-mainframe platforms is also very high. These legacy mainframe applications embody a great deal of domain knowledge and drive core business activities—both as systems of record and increasingly as an integral part of systems of engagement. In a Forrester report published in March 2017, they found that 96% of new business initiatives involved the mainframe either as a system of record or engagement, or both.

A better strategy is to invest in continuously modernizing and repurposing legacy mainframe applications and processes—even if they are currently not broken. This way, you can improve performance, mitigate risk, and ensure that the current mainframe team has the skills to troubleshoot and resolve bugs.

A Final Note

Over the last fifty years, the mainframe has proven itself time and again as a resilient computing platform that adapts to and embraces the latest innovations in technology. Some of the most advanced 21st century technologies have already been successfully incorporated into mainframe hardware and operating systems, such as cloud connectivity, containers, and cutting-edge encryption.

Mainframe shops must look to the future with confidence and make sure their applications and processes are aligned with next-generation software approaches. We are proud to be part of the cutting edge of mainframe evolution.

0 Comments

Submit a Comment Cancel reply

Sign up to receive the latest mainframe information

← Previous Article Next Article →

Recently Published

What the New IBM z17 Rack-Mount Systems Bring to the Table

by Mark Wilson

ROI, Business Case, and Tuning for Value: Round Two

by Sonja Soderlund

The bill for technical debt just came due

by Mark Wilson

Stop Defending Mainframe Costs. Start Demonstrating Mainframe Business Value.

by Planet Mainframe

From the Super Bowl to the System of Record: Why Winning Runs on Data Integrity

by Allan Zander

The Super Bowl looks like a game of skill and instinct: a quarterback scans the defense, the coach makes a call in seconds, a receiver adjusts a route in motion. All while 70,000 people hold their breath. From the outside, it feels like improvisation under pressure....

The Need for Effective Change Management

by Hugo Prittie

Introduction Over the last few years there has been an ever-increasing number of widely publicised problems involving notable corporate organisations and the failings of their IT systems. Incidents of ransomware, hacking and phishing are becoming worryingly...

AI and ethics and mainframes

by Trevor Eddolls

When Beliefs Shape Machines Imagine two people talking in a bar—one believes in God, and the other doesn’t. One swears by Apple, the other by Android. One supports Trump, the other can’t stand him. It doesn’t matter which side you’re on; the point is that people...

IBM z17 Time Synchronization Resiliency Enhancements

by Steve Guendert PhD

Most likely, you have heard the generic acronym RAS, which typically stands for Resiliency, Availability, and Serviceability. However, in the world of time synchronization at IBM, we changed RAS to mean Resiliency, Accuracy, and Security. From RAS to IBMz17 Timing,...