Total Network Failure

Perhaps higher than you may think.

Total loss of network connectivity is a critical event that can be hard to diagnose and recover from, especially if the systems specialists cannot connect to the system.

The three mostly likely causes for total network failure on a production system area include: hardware outage, a software crash, or a denial-of-service attack.

Open Systems Adapters (OSA) can fail, but network routing will detect an OSA outage and automatically re-route to an alternate OSA if one exists. Software crashes are also possible, but rare in major operating system components—and even then, it would be unlikely to lose everything.

That only leaves a denial-of-service attack as a possible cause. But would it be possible for a non-authorised user to bring the entire network down? The IBM z/OS Communications Server has a wide range of security mechanisms to protect TCPIP connections.

Unless someone could elevate their security privileges or access APF authorised libraries to issue commands, it seems impossible.

How could it happen?

The introduction of TCP/IP on the mainframe over 30 years ago, plus the migration away from systems network architecture (SNA) technologies, significantly introduced the risk of network security exposures. Over these same years, the risks have been understood and security mechanisms introduced to counter the risk. Ironically, an exposure that could lead to a total network outage today is related to legacy SNA, not TCP/IP.

As SNA services and related products were decommissioned on mainframe systems, redundant definitions were also often left on the system. Many mainframe products relied on virtual telecommunications access method (VTAM) definitions, and, although these products may have been upgraded for TCP/IP or uninstalled, their associated VTAM definitions could remain in place. This may have happened due to time, lack of knowledge of older technologies, or a reluctance to delete things we’re not sure about in a production system.

Want to know more about mainframe security? Read more from Planet Mainframe

During recent security reviews conducted by Vertali for financial institutions across the UK and Europe, many VTAM application definitions remained active on the production systems. Although the majority of these cause no harm, some can be dangerous.

VTAM application definitions, also called ACBs, can include primary or secondary programmed operator (PPO|SPO) privileges, typically used by network monitoring and management solutions. Even if these products are still in use, upgrades may have resulted in some formerly used ACBs not being opened –making them still connectable. 

Why is this a problem?

VTAM is still a critical component on mainframe systems. Even though it is historically associated with SNA networking, it still provides the hardware channel support required by TCP/IP, and it provides the conduit for TN3270 connections to applications. Disabling key VTAM components using simple commands can disconnect the logical partition (LPAR) from the network, disabling all TCPIP connectivity and potentially VTAM connections designed for use when TCP/IP is unavailable.

Without the correct security mechanisms in place and with about 50 lines of code, any user can run a batch job to open an available VTAM ACB. With another 30 lines of code, a bad actor can issue any VTAM command, including those to disable critical VTAM components. The job can trap the command output, making it even harder to detect the event in system logs or the console. 

The bad actor does NOT require authorised program facility (APF) authorisation or even read access to any VTAM datasets. The programming interfaces to perform these tasks are well documented and available to download. 

The impact on a large financial or retail institution if one of these batch jobs was submitted on Black Friday could be catastrophic. It would be very difficult to detect. The most likely scenario for quick recovery would require restarting VTAM and TCP/IP or even re-IPL. But then, the attacker could login and re-submit, continuing a repetitive and difficult-to-detect denial of service.

Fortunately, the solution to this denial-of-service process is quite simple, but often overlooked. In the security reviews undertaken by Vertali, two of three sites did not adequately protect against this exposure.

A simple solution

The first action is to remove redundant ACBs from the system, especially any with extended privileges such as SPO, PPO and even CNM. The latter can be used by more complex code to trace data (such as TN3270 application logins) after decryption.

For remaining ACBs, the VTAMAPPL class in RACF (or the equivalent in ACF2|TSS) can be used to protect ACBs against being opened by users without the appropriate permission. Our reviews have seen the VTAMAPPL class active, but few or no ACB resources defined. If an ACB resource is not in VTAMAPPL, any user can open it.

Summary

Network security is equally important as the system security for data and applications. The focus for security is typically on TCPIP services, encryption and intrusion detection. But don’t forget VTAM. It is still an essential system component and an easily overlooked historical definition which can potentially be exploited with severe consequences.

A specialist in designing and developing leading-edge system software products, Tony Amies has more than 40 years of experience in IBM z mainframes and related fields. A recognized expert in networking and communications, he is Software Technical Director at Vertali.

Leave a Reply

Your email address will not be published. Required fields are marked *