Based on estimates and benchmarks from Technavio and Gartner, the mainframe market—hardware and software—is worth about $44 billion a year. An estimated 90% of all credit and debit cards were still processed on the mainframe in 2017, and IBM continues to sell more processing capability each year.
Considering its size, the mainframe market does not have many vendors. As a result, it is unusually stable for a technology market, with rigid control over technological direction.
While stability can be extremely useful, the stagnation of data protection solutions in particular has proven to be of concern.
Mainframes, and the ecosystem of hardware, services, and applications that surround them are costly. They act on vital data and run critical workloads. They are counted upon to provide the highest level of reliability and speed. These factors have created a high level of risk aversion among mainframe administrators. Change controls are extremely strict and new technologies are introduced slowly. Mainframe customers want to see others adopt the technologies successfully before taking the plunge themselves.
However, with the volume and velocity of data at unprecedented levels, with business continuity SLAs becoming more demanding, and with a high priority focus on mainframe costs, managing legacy data and system recovery processes need to be looked at in an entirely different light. In this article we will explore the real costs of mainframe backup and archive today.
The Data Protection Dilemma on Mainframes
Data protection and archive is one area of the mainframe ecosystem where evolution has been significantly slower than in the open systems world. It is dominated by a small group of vendors such as IBM, CA, EMC, Oracle, and Innovation Data Processing.
These vendors offer a limited number of backup and archive products, all of them based on tape architecture and all of which consume costly central processor (CP) resources. The only significant innovation over the years has been the introduction of Virtual Tape Libraries (VTLs)—hard disk arrays that serve as a buffer between the backup streams and the physical tape storage devices, often doing away with physical tape altogether.
With mainframes typically handling critical data in highly regulated business environments, risk-averse mainframe administrators have not been clamoring for novel backup/restore solutions despite the high costs of hardware and software, cumbersome restore procedures, and other drawbacks of these legacy systems.
Backup and Archive Impact on MLC
Based on the analysis of numerous mainframe logs from a wide range of companies worldwide, backup and space management workloads can take up to 10% of a mainframe’s compute consumption, much of which is the result of emulating tape systems. They expend costly main Central Processor (CP) cycles on house-cleaning tasks that are only necessary because the backup and space management solutions need to believe that their data sits on tape. This will be explored further in a future article on the idiosyncrasies of Virtual Tape.
IBM employs the Monthly License Charge (MLC) model, in which organizations are charged based upon a monthly measurement of the four consecutive peak hours of usage, known as the Rolling 4-Hour Average (R4HA). Keeping backup and space management workloads out of the R4HA peak is a challenge for many administrators. Organizations frequently face the dilemma of having to choose between restricting backup timing and scope so as not to fall into the R4HA monthly peak, or allowing backups to run during peak times and affecting MLC charges.
Shifting Data Protection Workloads to Specialty Processors
For many mainframe administrators the R4HA and its impact on MLC are constantly top of mind. In a 2017 BMC mainframe survey, 60% of the respondents said that they spend at least 30% of their mainframe budgets on IBM MLC costs, while 63% listed cost control as their primary concern. And these costs are continually increasing. IBM raised its MLC fees by 4% in 2017 alone.
As noted above, workloads that use the main Central Processor (CP)—including data protection workloads—drive the bulk of the MLC costs. However, there are specialty processors that allow organizations to execute some percentage of a workload’s compute time without impacting the main CP. Specialty processor cycles are charged at significantly lower rates than main CP cycles.
The z Systems Integrated Information Processor (zIIP) is one such specialty processor. It can be used to offload numerous types of workloads, such as Java, XML, and Db2 for z/OS. The zIIP is proving to be of increasing importance as Java usage grows. According to the previously mentioned BMC survey, 64% of the respondents reported Java usage growing, and it is the language of choice for new applications.
Encryption and decryption are also examples of processor-intensive data protection tasks that can be offloaded to specialty processors. This cost-saving measure becomes even more significant in light of regulatory guidelines that strongly advise that backups—and even production storage—be encrypted. Fines for violations of the EU’s General Data Protection Regulation (GDPR) regulations, for example, can soar to 4% of annual global turnover or €20,000,000, whichever is higher.
Fortunately, a new generation of Java-based backup and archive solutions is emerging that can move some or all of the backup and archive processing over to the less expensive specialty processors, avoiding the main CP overhead (and reducing the MLC charges) of virtual-tape-based products. This shift can greatly decrease costs and may have the added benefit of allowing backups to be executed with fewer timing constraints, even as higher priority tasks are executed on the main CP.
In the open systems world, centralized storage and x86 virtualization are two great examples of solutions that have delivered impressive savings to organizations of all sizes. While the reliability, speed, and functionality of open systems storage keep advancing, the average storage prices are declining consistently year over year.
Adopting the modern commodity storage solutions utilized by open systems for use with mainframes would allow organizations to store 3x to 10x more data compared to traditional mainframe storage solutions of the same price. This would seem to be a compelling value proposition to those 63% of mainframe administrators for whom cost reduction is their primary concern. Using cloud storage from public providers such as Amazon’s AWS, Microsoft’s Azure, and others, further reinforces the value proposition. For archive specifically, cold storage on public cloud providers can improve the capacity/cost ratio 100X.
However, while solutions that enable mainframes to make use of commodity storage exist, significant barriers to adoption remain. Risk aversion, based on outdated beliefs that commodity storage solutions are inadequately resilient for mainframe usage, is one issue. Another is that many of the extant approaches to adding commodity storage to mainframes simply make this commodity storage available as part of a VTL, inheriting the problems and costs associated with the VTL approach. Furthermore, the current approaches may place commodity storage behind DASD—which maintains lock-in to specific storage HW.
In reality, commodity storage solutions have evolved rapidly, and now provide a wide range of features that can make it at least as resilient as traditional mainframe storage at a significantly reduced cost. Cloud storage, for example, can be easily and inexpensively configured to be locally and geographically redundant. Mainframes can be configured to use cloud storage for primary, secondary, or backup storage, as well as a replication tier.
Commodity storage needn’t be restricted to a storage solution hidden behind a VTL or DASD. Mainframes can use commodity storage solutions, ranging from extremely high speed all-flash arrays to cold archival disk warehouses, or even inexpensive mainstream LTO (linear tape-open) tape libraries. Perhaps more importantly, mainframes can embrace all that cloud storage has to offer.
“Data protection for us was easy – everything was on 1/4″ tape back then so we just duplicated them. We ran a twice-yearly recovery exercise to our DR site near Heathrow—in the 13 years I worked there I think the test partially worked once and failed the other 25 times…” –Jon Waite, CTO, Computer Concepts Ltd
For some organizations, the high cost of classic mainframe storage solutions and MLC cost concerns can lead to compromising on the number of separate remote backup copies, the frequency of their backups, and/or the amount of backup testing and verification that they perform. This is a very serious problem. Backups aren’t worth anything if you aren’t sure you can restore from them.
Administrators don’t want to see backup jobs creating contention with other workloads, ensuring that only absolutely critical backup jobs run during R4HA peaks. Pushing backup jobs off peak can save money, but it also reduces the window in which backups can run, potentially allowing fewer backup runs.
Amongst all of this backups need to be verified, restores tested, and disaster recovery failovers planned and executed. In a perfect world, all of this is automated so that it occurs regularly. In reality, many organizations make a series of compromises to juggle cost and backup execution windows while still leaving enough time to test restores.
These issues are not unique to mainframes. Administrators in the open systems world juggle these problems as well. In the open systems world only 18% of respondents to a June 2016 Dell/EMC data protection survey felt their current data protection solution(s) were adequate.
Open systems administrators, however, don’t have to worry about MLC pricing. The pressure on mainframe administrators to compromise is even greater than it is in the open systems world. Considering the criticality of mainframe workloads and data, this is alarming.
The Complexity Problem
If the vendor and technology issues were insufficient to cause angst, there is a looming skills gap. Many of today’s mainframes are maintained by practitioners who are retiring or near retirement. While there are some younger administrators choosing mainframes as a career, they are not compensating for the number of individuals looking to exit the workforce.
Consider the previously mentioned BMC survey. BMC tries to put a positive spin on workforce statistics by promoting the fact that 53% of respondents are under the age of 50. This is right next to the statistic that says 45% of respondents hold executive positions. They do not say how many respondents consider themselves actual mainframe practitioners, or what the age spread of those individuals is.
Elsewhere in that survey BMC notes that “Technicians are more concerned than executives about staffing and skills: Almost half of technicians indicate staffing/skills as a top challenge, compared to one-third of executives.” Reading between the lines on this, not only is the graying of the mainframe workforce a very real problem for mainframe customers, but there’s a significant disconnect between the views of executives and practitioners on this issue.
Many organizations have intricate or bespoke backup implementations. The backup solutions that serve as the VTLs are often separate from the solutions that schedule and run the backups themselves. Instead of a single, simple solution to this critical aspect of mainframe infrastructure, a complex web of multiple applications often exists.
This complexity is a problem. The mainframe skills shortage is not only seeing the pool of experts shrink, but many administrators also feel that time is running out to transfer knowledge to a new generation of mainframe practitioners.
Data protection has been a perennial—and expensive—bugbear for mainframe customers. When we asked Gil Peleg, CEO of Model 9 and an IBM mainframe veteran, how the mainframe backup cost could be quantified, here’s what he had to say: If you consider, as mentioned above, that 35% of mainframe cost is workload related (MLC) and that 10% of that is backup and archive, then you’re already looking at ~3.5% of f MF cost. Add to that the cost of secondary storage, which can come to 25% of mainframe hardware costs (based on our experience with customer budgets), and Gartner saying that hardware accounts for 18% of total mainframe cost—then we’re looking at another 4.5% of total budget. So with a total cost item of 8% of budgets and the potential to save ~30–70% of that cost with offloads and commodity storage we are looking at a potential savings of 2–5.5% on mainframe budgets. For a typical enterprise, that could translate into 100s of thousands to millions of dollars per year. By all accounts that’s a significant chunk of money, with the additional upside being that the enterprise also benefits from superior data protection and access to the best in storage paradigms the cloud can offer.”
Outside of the mainframe ecosystem, the world has moved on. Open systems needn’t be feared, and commodity storage has demonstrated reliability. Mainstream data protection solutions have grown and evolved so much that they are no longer merely a check against various mistakes or disasters, they are themselves part of solutions that deliver tangible benefits. Cloud storage in particular can offer a number of resilience options and deliver consistent and significant cost reductions.
Fortunately, mainframe customers do have the choice to embrace commodity storage. They can enjoy the best of both worlds by combining the unmatched reliability of mainframes with the rapid evolution of capabilities and significantly decreased costs of open systems.
Gil Peleg has over two decades of hands-on experience in mainframe system programming and deep understanding of methods of operation, components and diagnosis tools. Prior to founding Model 9, Gil worked at IBM ITSO center in Poughkeepsie, NY and at storage companies XIV and Infinidat. He is a co-author of 8 IBM Redbooks on z/OS Implementation and holds a B.S.c in Computer Science.
Connect with Gil on LinkedIn.