IT organizations are always chasing better performance. This involves monitoring, tuning, and improving all aspects of the IT infrastructure, encompassing servers, networks, applications, desktops, and databases. But let’s focus on the database, shall we?
Five factors influence database performance: workload, throughput, resources, optimization, and contention.
The workload that is requested of the DBMS defines the demand. It is a combination of online transactions, batch jobs, ad hoc queries, data warehousing analysis, utilities, and system commands directed through the system at any given time. Workload can fluctuate drastically from day to day, hour to hour, and even minute to minute. Sometimes workload can be predicted (such as heavy month-end processing of payroll, or light access after 7:00 p.m., when most users have left for the day), but at other times it is unpredictable.
Throughput defines the overall capability of the hardware and software to process data. It is a composite of I/O speed, CPU speed, parallel capabilities of the machine, the core of the DBMS, and the efficiency of the operating system and system software. The hardware and software tools at the disposal of the system are known as the resources of the system. Examples: database kernel, disk space, memory, cache controllers, and microcode.
The fourth defining element of database performance is optimization. All types of systems can be optimized, but many database systems can perform query optimization primarily accomplished internal to the DBMS. Yet there are other factors that need to be optimized (SQL formulation, database parameters, database organization, etc.) to enable the database optimizer to create the most efficient access paths to the data.
When the demand (workload) for a particular resource is high, contention can result. Contention is the condition in which two or more components of the workload are attempting to use a single resource in a conflicting way (for example, dual updates to the same piece of data). As contention increases, throughput decreases.
Therefore, database performance can be defined as the optimization of resource use to increase throughput and minimize contention, enabling the largest possible workload to be processed.
The core of the DBMS contributed to the performance speed equation. How fast can the DBMS handle read/write operations? How does the DBMS utilize system memory to enhance performance? What mechanism does the DBMS use to lock data to ensure integrity while maintaining concurrency?
These are core questions and each database system will implement different techniques and capabilities to deliver different levels of performance and integrity. In this day and age of polyglot persistence most organizations have multiple operational DBMSes installed and in production. This means that DBAs must be cognizant of the capabilities of each DBMS and recommend which database is most suitable for each application and system to be developed. This is a complex and difficult thing to do! Being an expert in one database system is challenging, whereas keeping up-to-date on many, different types of database systems will be a constant struggle.
The required speed of the application is an important consideration, but the complexity of the application is also important. Some types of processing are beyond the capabilities of some types of database systems, especially when the processing is complex. The anticipated size and growth of the data to be managed is also an important consideration. Will you be required to store, manage, and access unstructured data or just structured data? Are there many, complex relationships between and among the data?
Let’s not forget about the anticipated user base. Is it a static number of users or is it dynamic? Will you be serving in-house business users or customers? Will the application be available over the web or just behind the firewall? Can you even reliably predict the number of users? This, too, can impact database performance.
Are the developers knowledgeable on the access methodology needed to access teach type of DBMS under consideration? For relational database systems this means SQL, but even then Transact-SQL and PL/SQL, for example, are not exactly the same. And if you are using a NoSQL database system then you need to gauge the level of expertise for JSON, SparQL, GraphQL, CQL, or whatever method is used to access data.
STAYING UP-TO-SPEED ON THE OPTIONS
Truly, the speed and performance of your production database systems encompasses a wide range of parameters and decisions that are made well before implementation. Be sure that your DBAs understand the options available, the factors that impact performance and development with each DBMS option, and that they work to keep the IT organization up-to-speed and educated on all of the available options.
Yes, that is a lot of work … but it is necessary in this day and age where speed rules and there are numerous DBMS options available.
Originally published on Database Trends and Applications
Regular Planet Mainframe Blog Contributor
Craig Mullins is President & Principal Consultant of Mullins Consulting, Inc., and the publisher/editor of The Database Site. Craig also writes for many popular IT and database journals and web sites, and is a frequent speaker on database issues at IT conferences. He has been named by IBM as a Gold Consultant and an Information Champion. He was recently named one of the Top 200 Thought Leaders in Big Data & Analytics by AnalyticsWeek magazine.