Soft capping, using IBM’s Defined Capacity (DC) or Group Capacity (GC) options, is an effective method to help control software costs. An even more effective method is to automate the limits of these caps based on application consumption and budgetary requirements. This article will discuss how such automation should be designed and implemented.
Sub-Capacity Pricing background
The Rolling Four-Hour Average (R4HA) is a calculated value based on system utilization in Millions of Service Units (MSU). Each IBM machine has an MSU rating. This is very much like MIPS as it is a measure of capacity. This capacity is consumed by the LPARs and applications on the machine on an on-demand basis. As application consumption is based on, often unpredictable, demand, CPU/MSU may be very high or low at any moment in time. To allow for these brief spikes in consumption, IBM calculates the R4HA every five minutes. Each hour, the system takes the average of these 12 five-minute values. After four hours, the system will then have a true “Four-Hour Average”. As the system continues to run the calculation continues to “roll” so you then have a continuously updated Rolling Four-Hour Average (R4HA).
Each month, via the SCRT report (technically this is from midnight (the very beginning) of the second day of the month up to midnight (the very beginning) of the second day of the next month) the peak R4HA hour is reported and this is the basis for the software bill.
LPAR Scope
Note that the R4HA (SMF70LAC) has an LPAR scope and is stored in the SMF 70 record. Quite often, it is necessary to calculate a R4HA for multiple LPARs. In this case they are added together. Note that the peak value is not the “sum of the peaks” but rather the “peak of the sums”. That is to say, the hour with the highest combined peak of the combined LPARs, not necessarily the highest peak of any individual LPAR or sub-group.
Subsystem billing
Most subsystems (IBM software) such as z/OS, DB2, CICS, IMS, etc. are billed on individual rates per subsystem. While products are licensed on a CPC/CEC basis, the installation has the option to start or not start a given product on a given LPAR. Products are only billed on an LPAR if they are started. This is an easy way to avoid unnecessary MLC charges. Note that once a product is started on a given LPAR – even for a moment – it will be billed for that billing cycle. SCRT tracks this via the SMF 89 record.
One further note on subsystem billing. The invoice is based on total MSU consumption on the LPAR(s) where a given subsystem is started; not what the subsystem is actually consuming. Every MSU of consumption contributes to the bill whether or not the subsystem is actually utilized. In other words, if you have IMS started on a given LPAR and the workload is entirely batch jobs that never use IMS, all of the MSU consumption generated by those batch jobs contributes to the IMS bill.
LPARs and Subsystem scope
Combining the previous two points is how billing is actually calculated. The peak R4HA for every LPAR that hosts a given subsystem on a CPC is combined to calculate the invoice for that subsystem. If CICS, for example, is started on two LPARs on the CPC then the combined R4HA for those two LPARs will determine the invoice for CICS. If z/OS is running on all LPARs on the CPC (as is common) then the entire machine R4HA determines the z/OS bill. These peak hours may be on completely different days in the month. SCRT will report all this.
Standard WLC pricing (CEC/CPC scope)
Sub-capacity pricing rates are organized into tiers. The first few MSU’s are very expensive and the incremental rate decreases as the capacity increases. With standard WLC this is done within CEC/CPC boundaries. Consider the above example, where CICS is running on two LPARs. Assume, for this example that CICS is running on two LPARs on each CEC/CPC. The combined R4HA peak of each of the two LPARs on each machine will be calculated separately. On each machine, they start at zero MSU and measure up through the WLC tiers. The two values are then added together to determine the CICS invoice.
Country Wide Multiplex (CMP)
CMP is a relatively new offering from IBM that allows installations with more than one CEC/CPC to take advantage of workload distribution without cost penalties. As discussed above, the traditional WLC model utilizes a CEC/CPC scope. CMP allows MSU’s billed to subsystems to cross machine boundaries provided all of the machines are within the same geographical country. Consider once again, the CICS example from above. Under CMP, the MSU’s from the four CICS LPARs (two on each machine) would be added together and the total determines the tier (incremental rate) which is very likely lower.
CMP baseline (‘the catch’)
When an installation switches to CMP, IBM requires that you provide the last three months of SCRT reports. They will average this consumption to determine a ‘baseline’ for CMP billing. In short, if your consumption does not change under CMP your bill will not change. The primary benefit of CMP is that future growth will be at a lower incremental rate. If your consumption is decreasing, you may even pay more by moving to CMP. Assuming this is not the case (your business is growing) your best course of action is to lower your MSU consumption as much as possible in the three month prior to converting to CMP.
Controlling MLC with capping
The first point to understand is that the MLC invoice is based on the peak of the R4HA OR the peak of the DC/GC – whichever is LOWER. In each five-minute interval, there may be a DC/GC limit that is different from the R4HA peak. At the end of the hour, the average of the DC/GC is compared to the average of the R4HA and the LOWER VALUE is what SCRT will use for billing.
It may seem a simple strategy then to simply maintain lower DC/GC limits. The problem is that capping can impact application performance. If the R4HA exceeds the DC/GC limit, the LPAR(s) become ‘eligible’ for capping. WLM informs PR/SM to limit the provisioned capacity to the DC/GC limit. Should the application demand exceed this limit, it will be delayed. Note that your WLM priority cannot guarantee protection from this. Your WLM Service Class (dispatching priority) can get you dispatched on to a logical processor within the LPAR but the logical processor must still be dispatched by PR/SM on to a physical processor in order to run. The LPAR (which is a logical construct) is simply not dispatched on the hardware as frequently as workload demands. Under such conditions, all applications in the LPAR may be impacted. In short, DC/GC capping is at an LPAR SCOPE; to limit individual service classes, you should use Resource Groups (another subject…)
Capping without application impact
Recall that I can lower my MLC by maintaining a DC average that is lower than the R4HA. Under such conditions, the LPAR is ‘eligible’ for capping. Whether or not applications are delayed depends on their current (not average) demand. If the current demand is lower than the R4HA, you can lower the DC to the demand level with no application impact. Should demand increase, you need to quickly raise the DC. By now, you may have likely surmised that this function should be automated. An intelligent and responsive automation solution that tracks instantaneous demand as well as R4HA values and combines that with installation-defined min/max values for DC (based on budget) can lower MLC without impacting applications.
Soft Capping automation
As you might guess, such solutions exist today. There are only a handful of these solutions, and the most effective ones were conceived shortly after IBM released the basic DC option over ten years ago. With these solutions, installations can set a global MSU limit for all their LPARs. On the surface, this may seem like Group Capacity (GC): it’s not. For one thing, GC is limited to a single CEC/CPC scope. The managed LPARs may span multiple machines – and Country Multiplex (CMP) is fully supported. Further, you can set priorities, minimums, maximums, desired responsiveness for individual LPARs – well beyond the scope of a simple GC.
While the concept may seem simple, the algorithms are complex. The best solutions do not subscribe to the popular (and risky) strategy of attempting to cap workloads. While billing is on an hourly basis, application demand may change quickly. The solution monitors both instantaneous and average (R4HA) demand on all LPARs and factors in numerous variables such as WLM and user-set priorities. As frequently as every single minute, the solution will ‘exchange’ MSUs between LPARs, raising and lowering DC’s below the R4HA when demand allows – without impacting workloads. Should demand increase, DC(s) will immediately be raised as required.
Once properly configured, automation allows the WLM to do its job, and allows your DBAs to do their jobs. Costs are controlled to the greatest extent possible without risking application performance. For datacenters looking for cost savings but concerned about the risks associated with capping, this is a natural choice that I highly recommend.
John Baker has more than 20 years’ experience as a IT performance engineer specializing in IBM z mainframe systems performance. Formerly a banking IT specialist and IBM z performance analyst for a variety of IT software and consulting companies, John became well known in the industry as an experienced, trusted mainframe professional, and a regular presenter at several trade show events over the past 10 years. John is now a senior analysis consultant providing objective advice & clear, detailed insights for EPS and DataKinetics mainframe customers across North America.