I was at a regional CMG meeting in Raleigh, North Carolina, giving a presentation on Big Data to a group of about forty Capacity Planners (big turnout for North Carolina a week after Hurricane Matthew hit). The group was interested in understanding the impact the nascent Big Data wave would have on their capacity plans. The answer is—a LOT.
The major benefit of “Big Data” is actually the data analytics provided by the various suppliers. But, in every case, there is an “ETL”—Extract, Transform, Load—process that must be conducted to gather the operations data (say IMS or DB2) and format it into a Big Data store. Each of those steps requires processor, storage, network and other technologies that must be provisioned by the Capacity Planners. Frequently, the Big Data project is departmental in nature, so the Capacity Planners may not even know the projects are underway until they are well along in development and deployment.
There are some solutions on the market that can mitigate some of the costs of the ETL process by capturing changes to the operational data and running the changed data through the same Transform and Load process. This can save a lot of process since it reduces the need for a full ETL cycle. In some cases, the ETL process can be completely bypassed and used only for emergency backup or for new data store initiation.
Another surprise to the Capacity Planners was the sheer number of players in the Big Data space. Most of them had heard of Hadoop, and a few others. I presented a slide with over 30 logos of other Big Data suppliers, most of them Open Source. With Open Source communities, all it takes is a couple of smart guys to look at one method and make some changes/additions to create a new supplier—so the 30 logos are sure to change and grow.
That being said, there are several players in the field that have such widespread acceptance, and such a broad base of open source contributor communities, that they will be around for a long time, and will continue to be well supported into the future. These would include Hadoop, Spark and Kafka, among others.
Big Data is the next wave of new technologies for Capacity Planners to include in their plans. It is still early in the hype cycle and many customers are just now launching exploratory projects to evaluate the cost and value of these technologies. It would behoove the Capacity Planners to talk with their Enterprise Architects about their plans so there are no surprises for new hardware and network bandwidth.
Big Data is real–not just a magazine management phenomenon—businesses interested in remaining competitive must embrace it. And capacity planners shouldn’t be in the dark about it.
Regular Planet Mainframe Blog Contributor
Over the past 25 years, Rick Weaver has become a well-known mainframe expert specializing in database protection, replication, recovery and performance. Because of his vast expertise, he has authored numerous articles, whitepapers and other valuable pieces on database technologies, and frequently spoken on the subjects of database recovery and performance at conferences, symposiums and user groups.