The problem of orphaned data

Mainframes are renowned for their ability to store large amounts of data and being able to access that data. IMS, for example, has, for many years, been able to store huge amounts of data and access the records in that data very quickly. It’s what made IMS so popular with financial institutions and other organizations.

And most of the time, we happily assume that someone knows what all that data is and is able to access it. Unfortunately, that’s not always the case. All over the mainframe’s storage devices are datasets that don’t seem to belong to anyone. And inside those datasets lives the orphaned data. It’s probably being backed up and filling up space on those backups that it doesn’t need to, but it’s just sitting there.

Orphaned data was probably created by an employee, who has now left. Or it was created by a current member of staff for a project that never got taken any further – you may well find that there’s a few of these on your nascent cloud storage. Or it was created by someone who simply made a mistake. Or it is old data that is no longer in use in that format by any currently-used applications.

An example might be someone who created a new VSAM KSDS with data and index components. However, for whatever reason, they deleted the cluster, then re-created it with new data and index components. That leaves the original data and index components orphaned, with no connection to the new cluster.

Or someone may have deleted log data sets, and an allocation error occurred freeing space in the directory extent, but the log data set continues to occupy space on the DASD.

These are just a couple of ways that orphaned data can be created on a mainframe. And both of those can be solved with the IDCAMS DELETE command. The problem that often occurs is that people are busy, and so they decide to delete the orphaned data later. Their top priority is to get a piece of work completed. And once that task is complete, there’s probably another high-priority job that has to be done. And, so, the orphaned data is forgotten about.

But it’s not really a problem, is it? Mainframes have loads of data storage capacity these days. And the cloud has an almost infinite amount of space. My couple of files aren’t going to make much difference at all, are they? The answer is that orphaned data does pose a risk.

Firstly, orphaned data is taking up space. And as more people create files that are disconnected from any applications that are running on the mainframe or the cloud, the bigger the problem gets. As I said before, those files may well be taking up backup space and causing backups to take longer to complete, which can disrupt other activities. Storing and maintaining orphaned files does have a cost to the organization.

There may also be a compliance risk. There are various standards organizations, like FISMA, SOX, GDPR, and HIPAA, that set regulatory standards for organizations. An accumulation of orphaned data may lead to non-compliance, which could lead to financial and legal penalties.

Non-compliance could then lead to reputational risk. No CEO wants to see his company’s name headlined in the trade press and all over Google for the wrong reasons because customers and potential customers are likely to start looking for a new business partner. This loss of business, following on from potential legal actions could see organizations going out of business or being taken over by their more compliant competitors.

Like everything else, there could be a security risk. Bad actors could access those orphaned files and potentially could find all sorts of sensitive, confidential, or personal data, which they could use for their own criminal purposes. Were that to happen, it would also lead to a huge decrease in the reputation of any organization.

So, what’s the answer? Organizations need to ensure that there are policies and procedures in place to address the problem of orphaned data both on the mainframe and in their cloud storage. Data needs to be regularly audited to identify potential orphaned data. And responsibility needs to be assigned to a group to deal with the orphaned data.

It’s also worth training staff to highlight the importance of firstly, not creating orphaned data, and secondly which team needs to be informed should anyone become aware of the existence of orphaned data. Often applications are tested in the cloud using a copy of live mainframe data. However, that application is not taken forward and the data is left in the cloud because everyone moves on to their next project. Staff need to be made aware that this is an issue and appropriate people need to be informed so that the application and data can be deleted.

Too often, orphaned data is simply ignored at organizations. Successful organizations won’t let that happen and will take steps to ensure orphaned data is removed from their mainframe storage and the cloud.

Trevor Eddolls

Regular Planet Mainframe Blog Contributor
A popular speaker, blogger, and writer, Trevor is CEO of iTech-Ed Ltd. He has an extensive 40-year background in mainframes and IT, and has been recognized as an IBM Champion from 2009–2024 for his leadership and contributions to the Information Management community.