Podcast: The Evolution of Observability

Mar 14, 2024

Mainframe, Media, Observability, podcast

Amanda Hendley

Amanda Hendley is the Managing Editor of Planet Mainframe and host of the Virtual Mainframe User Groups. With a career rooted in the technology community, she has held leadership roles at the Technology Association of Georgia, Computer Measurement Group (CMG), and Planet Mainframe. A proud Georgia Tech graduate, Amanda spends her free time renovating homes and volunteering with SEGSPrescue.org in Atlanta, Georgia.

For our latest episode of the planet mainframe podcast Amanda Hendley sat down with Paul DiMarzio at SHARE Orlando. Amanda and Paul talked about the evolution of observability, what observability means in the mainframe space, and how Broadcom’s new WatchTower platform is improving mainframe observability and enabling all areas of the organization.

WatchTower is highlighted as a unique observability platform that focuses on streamlining workflows for users of varying skill levels. It incorporates machine learning and AI to sift through alerts, provide contextual information, and facilitate seamless collaboration among teams. The platform not only offers historical tracking for alerts but also integrates with third-party observability tools, enabling a holistic view across hybrid environments.

Transcription

[00:00:04]
Welcome to the Planet Mainframe podcast, your gateway to the forefront of technology in the digital age. Join us as we dive deep into the heart of tech innovation, where industry experts and thought leaders gather to explore the ever evolving world of mainframes and beyond. In each episode, we’ll unravel the complexities of the digital realm, dissecting the technology that shapes our lives. From the giants of mainframe computing to the latest breakthroughs in AI, cybersecurity, and more, we’re here to guide you through it all. Our mission is clear to bring you the brightest minds, the boldest ideas, and the most captivating stories from the dynamic world of tech. Whether you’re a tech veteran or simply tech curious, get ready to embark on this enriching journey with us. So fasten your seatbelts for a world of knowledge, innovation, and inspiration. Welcome to the Planet Mainframe podcast.

[00:00:53] – Amanda Hendley (Planet Mainframe)
Welcome to the Planet Mainframe podcast. This is your host, Amanda Hendley. And joining me today from Orlando, Florida at share is Paul DiMarzio. Paul is customer engagement strategist at Broadcom Mainframe Software. Paul, thank you for joining me today.

[00:01:09] – Paul DiMarzio (Broadcom)
Well, thanks, Amanda. Glad to be here.

[00:01:11] – Amanda Hendley (Planet Mainframe)
So before we jump in, I was hoping you could just further introduce yourself and tell us about your expertise in mainframe.

[00:01:18] – Paul DiMarzio (Broadcom)
Well, I hate to do this because I’m dating myself now, but I started in mainframes in 1984. So this is my 40th year of working with mainframes. And I got into this space because I learned at UCONN, at the University of Connecticut. They taught on mainframes, so all my education was there. It made a lot of sense for me to start there. So other than, what was it, about four years? I took out of mainframes and I went into the blockchain space.

[00:01:46] – Amanda Hendley (Planet Mainframe)
Interesting.

[00:01:47] – Paul DiMarzio (Broadcom)
And decided, yeah, no, that’s not for me. So then I came into Broadcom about three years ago, and I’m very happy that I came back to the space. It’s really the place to be.

[00:01:58] – Amanda Hendley (Planet Mainframe)
Awesome. Well, we’re glad you’re back.

[00:02:00] – Paul DiMarzio (Broadcom)
Thanks.

[00:02:01] – Amanda Hendley (Planet Mainframe)
And one of the reasons we’re here today and having a conversation is this recent announcement about the WatchTower platform and monitoring. It’s not anything new. It’s been around for quite some time. And I would say, in my opinion, monitoring became observability. And observability has been around and gaining traction, especially when you talk about DevOps and AIops and Agile. But tell me, what is mainframe observability and why is it crucial in the it landscape today?

[00:02:34] – Paul DiMarzio (Broadcom)
Yeah, so you’re right. Observability isn’t a new term. I mean, if you loosely define it, it’s the ability to tell what a system is doing based on its outputs. Right. And I used to be a programmer, I’m not programming anymore, but I remember back in the day I was doing MVS operating system development, and we were very concerned about RAS- reliability, availability, recoverability, serviceability. And we were taught to continuously throw right-to-operator messages out there. So we were flooding the screens with information so that people could go back. So that’s basically observability. You’re producing the telemetry that people will look at to understand what’s happening. So that’s been around forever. And if you look at some of our products, like SYSVIEW and OPS/MVS, NetMaster, Vantage, these are not new products. They’ve been around for a long time and they’ve been allowing people to see what’s happening within the mainframe by looking at the telemetry and trying to make some sense out of it and giving it to other people so that they can decide what to do.

[00:03:36] – Amanda Hendley (Planet Mainframe)
And in the context of the mainframe, what is observability really solving for us?

[00:03:43] – Paul DiMarzio (Broadcom)
Well, what we’re trying to do is, like most platforms, it’s really no different. You’re trying to shorten the time it takes to identify issues and then resolve them. So you’re using these observability tools and the telemetry coming out of the system to see if anything’s wrong. And it doesn’t necessarily mean an outage. It could just be a performance issue, it could be the resources are constrained, understand that that’s happening and then know very quickly, how do I remediate that problem and make everything work again. So it’s really simply, it would be the same in a distributed system or even in the cloud. You want to make sure that your system is running properly and your observability is helping you understand what’s happening and how to go and fix it.

[00:04:24] – Amanda Hendley (Planet Mainframe)
So I know I could plug in a lot of different observability tools and things that are going to give me reports and data, but that becomes an issue because there is a lot of data I could get. How do I manage this? How do I know what data to read and trust, and how do I just deal with all this data?

[00:04:45] – Paul DiMarzio (Broadcom)
Yeah, you’ve actually hit on the main problem that I think we’re trying to resolve with WatchTower is it’s become untenable for most people to really see what’s going on. I remember back in the day there was one colleague, she impressed me totally because she could look at a dump, a system dump, which is just hexadecimal characters, and see the problem. Nobody can do that anymore, I don’t think. And so if you were to look at some of these operator screens and all the information that scrolls through there, when I was talking about putting out WTOs, we would put out a message “oh, this process you started, it’s finished”. That’s not an interesting message.

[00:05:27] – Amanda Hendley (Planet Mainframe)
Right.

[00:05:28] – Paul DiMarzio (Broadcom)
And somewhere in there there might be a message that, oh, “you’ve got a lock contention on a database”. How do you see that? So we need to simplify. So we need to use more tools like AI and machine learning to try to weed out the noise. Somebody talked about this once, is trying to spot a snowflake in a storm. How do you find that one little piece of information that you need? It’s hard. And so the tools have to adapt and make it easier for people to see what’s really important and what’s going on and what’s actually causing that issue.

[00:06:03] – Amanda Hendley (Planet Mainframe)
Awesome. So let’s use that for a segue into what is WatchTower. It was just announced this week, we’ve got this WatchTower platform. Tell me a little bit about it and its key features.

[00:06:15] – Paul DiMarzio (Broadcom)
Okay, so WatchTower is our mainframe observability platform. And we just had that conversation about that. But what I think makes it different, I even say unique, is that we’re focused on workflows and we’re focused on workflows of people of different skill levels. So there are the people, the frontline folks, the level one, who aren’t always the most skilled. They haven’t necessarily been with the mainframe for a very long time. A lot of these folks come from distributed platforms or they learned on distributed platforms. I learned on the mainframe. That’s not usual in college anymore. You’re usually learning on something else. So you want them to be able to handle this. But then you’ve got your experts that have to work with this information, and you’ve got non mainframe teams as well that are looking at business applications and need to understand, is the mainframe causing an issue or not? So WatchTower is tailorable to all of these different skill levels and it correlates information and it puts things in contextual reference. So for an example, let’s say that a level one operator is looking at alerts. The first thing we would do is we would use machine learning, AI and automation to sift out all of those stupid alerts like this.

[00:07:33] – Paul DiMarzio (Broadcom)
Process just ended, not important. Try to put up what’s really interesting and important and help that person identify where problems may exist. And let’s say they click on that alert now we give them all the information they need to understand what that alert is actually about. If you were to think about how things usually work today, that person is going to log into multiple different tools. Some of them may be green screen tools, some of them may have a nice UI, but they still all work individually. And you have to go and look at that. And the context doesn’t carry from tool to tool. With WatchTower, we’re giving an experience now where if I click on an alert, I can look at the documentation for that alert, I can see what topology of assets are related to that alert. I can look at machine learning insights related to those assets. I can look at a lot of different things, and it’s all correlated and in context. And let’s say this person finds that, okay, this is a Db2 problem. For example, I need to pass this along to the Db2 SME. We’re integrated with ticketing tools, like Servicenow. So I write up the ticket, it goes to the expert. The expert uses the same tooling, but can dig deeper.

[00:08:49] – Amanda Hendley (Planet Mainframe)
Okay.

[00:08:49] – Paul DiMarzio (Broadcom)
All right, so we’re passing context from team to team so that nothing is lost and everything is correlated within the context of what the problem they’re trying to resolve.

[00:09:00] – Amanda Hendley (Planet Mainframe)
I’m curious to know how much historical tracking for alerts is in there. So if I get an alert, will it also tell me or show me the last time I got this alert or anything to that nature?

[00:09:14] – Paul DiMarzio (Broadcom)
Yeah, a lot of the history does flow with this, but where the historical data really shines for us is in the creation of the models, the machine learning models that help identify potential problems based on past experience. Right. So we have a team of really mainframe knowledgeable folks working on building these models out so we know how to look at historical data, look at patterns. So let’s say I’m looking at a CICS transaction, and I know that certain times of day it goes up, certain times down, we get a good understanding of that, and we can use that past historical information and build that into the model so that as things are progressing, we know when failures may be occurring. So this is this whole concept of trying to get from being reactive to proactive. The system is now giving folks a hint that, hey, we think a problem may be occurring here because we’ve seen this in the past, and that’s where past information is most helpful, is in building models to predict future behavior.

[00:10:15] – Amanda Hendley (Planet Mainframe)
Awesome. And tell me, is this tool a standalone, or does it interact and work with other observability tools?

[00:10:23] – Paul DiMarzio (Broadcom)
So now we’re moving into the next phase of things, right? So let’s think about, we’ve been talking about the mainframe teams right now. There’s also enterprise teams generally. These are the SREs, or system reliability engineers that are focused on business applications. Today. They use tools like splunk or datadog or Grafana or a whole variety of them. These tools don’t have visibility into the mainframe. So an SRE may see a problem with a web app and they start to go through the traces and they may find that, oh, there’s a call out to a mainframe. They may not even know that. It’s hard to tell sometimes, and then their vision stops. So we get a lot of false calls into the mainframe team, right? We think this is a mainframe problem. And then the mainframe team goes, looks at, no, it’s not us, it’s got to be somebody else. Or even if it is a mainframe problem, who on that team should look at it? So what we’re doing here is we export telemetry through open telemetry. It’s an open standard that these products are using and that allows the mainframe information to be included in these traces.

[00:11:30] – Paul DiMarzio (Broadcom)
So now if I was looking at a trace and before it was stopping at the mainframe call, now I’m actually going all the way through so that SRE can understand very quickly and this is all done in real time. They can understand is the problem in the network, in the connection, in the transaction manager, or even in the database and know precisely who to talk to on the mainframe team to get the problem resolved. So we are not just consuming telemetry and giving that to the mainframe team, but we’re also exporting that telemetry so that the enterprise teams have access to it as well.

[00:12:02] – Amanda Hendley (Planet Mainframe)
I think that’s so important because we did a study recently of mainframe users, and what we saw was that mainframe organizations, they’re continuing to grow in their mainframe usage. And you would think that in some cases, mainframe and cloud are going to have this reverse correlation, right? As mainframe usage declines, cloud will go up, and as mainframe usage goes up, cloud would decline. But what we actually saw is mainframe usage in an organization is going up and cloud usage is going up because they’re all becoming these hybrid environments and that plays right into what their needs are.

[00:12:43] – Paul DiMarzio (Broadcom)
Absolutely. I mean, we live in a hybrid world. There’s no doubt about it. Nobody runs everything on a mainframe and nobody runs everything in the cloud. If they have needs for strong transactional backgrounds, the mainframe is perfect for transactional workloads. That’s what it was built for. It handles them better than anything. But it’s not a web UI. You can’t do large scale distributed stuff on it. You can, but it’s not really the right fit. You really want to look for the right fit for the right piece of work in an enterprise application. In every application is going to span some combination of distributed cloud and mainframe pieces. And, yeah, that visibility then, is important. So if you want to track what’s happening to that application, the people who are tracking that have to know what’s happening in each system. And until we really did this work with WatchTower, I honestly don’t think that any other tool provides mainframe telemetry into these third party tools through open telemetry. I think we’re the first to do that. Can’t guarantee that, but I haven’t found anybody else doing it. So I think we are the first, but we are also then the only ones that actually integrate that with the mainframe team.

[00:13:51] – Amanda Hendley (Planet Mainframe)
So as we were talking about the SRE, let’s say they find that a problem is in DB two and they contact the DB two expert. That expert is using the WatchTower tools as well. So they’re now digging down into the mainframe part of it. So we’re trying to build this seamless correlation so that whoever is worrying about problems or concerns or tracking information, they can always get the problem to the next person and all that information flows with it. So no more worrying about a dozen different tools, a dozen different logons, no information writing it, scribbling it on a pad or on a whiteboard. Everything is consolidated and correlated, and that’s really the focus of WatchTower.

[00:14:30] – Amanda Hendley (Planet Mainframe)
Awesome. So tell me a little bit about what your personal take is on the future of observability.

[00:14:37] – Paul DiMarzio (Broadcom)
Well, we just have to keep strengthening these relationships, right? I mean, where we’re trying to go is not just, let’s say, looking at the mainframe resources now start to be able to correlate those with the applications they serve. So not just from the top down, we’re looking at apps and then driving down, but what if we’re looking at it from the mainframe side, which kicks transactions IMS, transactions databases are related to this transfer application. How can we make those visibilities known? So we’re focusing a lot now, not just on mainframe resources, but how these relate to particular applications to make it easier for those teams to know that, hey, if there’s an issue here, maybe it’s in an application that’s not highly used, so we can work on that later. But if it’s in an application that’s critical to the business, then I got to work on that one. Now, you may have fatal errors in two different places. 1 may not matter. Maybe it’s a backup system. Worry about that later. So it’s all about triage, right? Trying to find out which are the important problems to the business, and then let’s work on those first.

[00:15:47] – Paul DiMarzio (Broadcom)
So that’s kind of where we’re heading with this. Continuing to draw those interrelations and correlations with applications, I think, is probably the next step.

[00:15:57] – Amanda Hendley (Planet Mainframe)
Awesome. So, for anyone that’s interested in adopting this or learning more about WatchTower, where can they get the resources and information?

[00:16:06] – Paul DiMarzio (Broadcom)
Well, we’ve tried to consolidate everything on mainframe, broadcom.com slash WatchTower. So that’s a good place to start. If you really want to get into it, there’s a form out there, and one of our specialists will talk to you. These are the folks that are here at share working with people. They’ll be very happy to work with you. And we have lots of education programs within broadcom. We have lots of try and buys, so there’s many opportunities to kick the tires. And let’s face it, if you are already licensed to any of the products I had talked about, sysfue ops, MVs, Netmaster Vantage, you already have this, okay? So if you have those products, this is all yours right now. You just have to download it and start it. And you can just start small. Maybe the first thing you might want to do is work with topology, auto generate some topologies, and if you like that, maybe you want to see how you could connect that with your alerts. And if you like that, maybe you want to then start to do ML insights. So we tried to build this, and the reason we call it a platform is we wanted it to be modular, so you can get started on this journey wherever you are and go wherever you want to go and then continue to add on pieces as you move along.

[00:17:22] – Amanda Hendley (Planet Mainframe)
Awesome. Well, I appreciate you talking to me today about mainframe observability and the new WatchTower platform. Before we go, I wanted to know if I could get a book or a podcast recommendation from you for our listeners.

[00:17:35] – Paul DiMarzio (Broadcom)
A book or a podcast recommendation?

[00:17:38] – Amanda Hendley (Planet Mainframe)
It doesn’t have to be mainframe-related.

[00:17:43] – Paul DiMarzio (Broadcom)
Wow, you stumped me on that one. I consume my information in bits and pieces. So some of it comes from you guys, some of it comes from some of the feeds I get. So it comes from a variety of sources? No, I don’t really have one for you. I’m sorry.

[00:18:04] – Amanda Hendley (Planet Mainframe)
That’s all right. I think it’s fair. You’ve answered all my other questions. Great. Well, I really enjoyed talking to you and getting to know more about this.

[00:18:12] – Paul DiMarzio (Broadcom)
Yeah, same here. We’re very excited about this platform. It’s a project. I know this team was working on the pieces before I even joined Broadcom, and just starting to pull them, watching them pull together has been a special thing for me because, former engineer, I do love the engineering still, even though I don’t write any code. But I could see how this was moving together. And I’m looking at the plans ahead, which we can’t talk about today, but I think this is really going to be pretty cool. It’s cool now. It’s going to be cooler later.

[00:18:43] – Amanda Hendley (Planet Mainframe)
Awesome. Well, I am looking forward to finding out more about that, and I’ll ask you when we stop the recording for you to tell me.

[00:18:52]
Thank you for tuning in to another enlightening episode of the Planet Mainframe podcast. We hope you’ve gained valuable insights and discovered new horizons in the world of technology. This is the planet Mainframe podcast signing off. Stay curious.

0 Comments

Submit a Comment Cancel reply

Sign up to receive the latest mainframe information

← Previous Article Next Article →

Recently Published

The bill for technical debt just came due

by Mark Wilson

Stop Defending Mainframe Costs. Start Demonstrating Mainframe Business Value.

by Planet Mainframe

Compact z17 and LinuxONE Systems, Real-Time Mainframe Software Visibility, and more

by Sonja Soderlund

The General Ledger, the Mainframe, and the Programmer Who’s Taking Up Golf

by Allan Zander

From the Super Bowl to the System of Record: Why Winning Runs on Data Integrity

by Allan Zander

The Super Bowl looks like a game of skill and instinct: a quarterback scans the defense, the coach makes a call in seconds, a receiver adjusts a route in motion. All while 70,000 people hold their breath. From the outside, it feels like improvisation under pressure....

The Need for Effective Change Management

by Hugo Prittie

Introduction Over the last few years there has been an ever-increasing number of widely publicised problems involving notable corporate organisations and the failings of their IT systems. Incidents of ransomware, hacking and phishing are becoming worryingly...

From Mariner to Mainframer: The Reinvention of Uday Prasad | Tech Sharmit Podcast

by Amit Sharma

Amit Sharma, Content Creator at Tech Sharmit and Host of The Tech Sharmit Podcast, interviewed Uday Prasad, co–founder of ZedInfo Tech, about Prasad’s pivot from mariner to mainframer. Catch the full podcast on YouTube.Q: You have been in this industry for 30+ years....

🎧 Today’s Modernization is AI and Mainframes – featuring Venkat Balabhadrapatruni

by Amanda Hendley

In a Planet Mainframe podcast, Amanda Hendley, Managing Editor at Planet Mainframe, sat down with Venkat Balabhadrapatruni, a solution architect at Broadcom Software and 2025 IBM Champion. They discussed the company’s strategy for bringing modern AI capabilities to...