Basics of VSAM

Mar 2, 2021

Colin Pearce

I have been a z/OS and CICS Systems Programmer for over 30 years and have been teaching more than 25 years. I have worked in permanent and contract roles. I have installed CICS and related software many times.

I am well versed in the Application side of CICS using Command level and with CICS Debugging in both the Transaction level and CICS Systems Dumps using IPCS. I have written many courses in CICS namely (CICS for System Programmers, CICS Transaction Debugging, CICS Performance and Tuning, CICS Command Level Programming and CICS Internals).

I am also well versed in z/OS and in 2013 I upgraded the Bank of America z/OS systems from V1.12 to V1.13 on more than 20 LPARS. I have written courses in RACF, SMPE, JES2, TSO/ISPF, JCL, Storage, VSAM and z/OS for System Programmers. I know and have taught Assembler and I’m conversant in COBOL programming language.

I have also written many articles and shared them on Linkedin.

I have conducted training for Customers in many countries, including, Australia, New Zealand, South Africa, India, Singapore, Malaysia, Hong Kong, Thailand, The Philippines, The Netherlands, Belgium, Hungary and the UK. Many of my customers are from financial industries, and I have worked at and taught for Development Bank of Singapore (DBS), Citibank, Bank of America, Westpac, Commonwealth Bank.

I have taught for IBM in the UK, Australia, India and Singapore.

This article will take you inside VSAM and explain the environment of Virtual Storage Access Method. VSAM has been around since the 1970s and provides a way of storing large amounts of data for 3 types of access.

Random
Sequential
Skip Sequential

VSAM has a number of different structures.

Key Sequenced Dataset structure (KSDS)
Entry Sequenced Dataset structure (ESDS)
Relative Record Dataset structure (RRDS)
Linear Dataset structure (LDS)
Variable Relative Record Dataset structure (VRRDS)

Each structure is defined in much the same way, but each one offers a different processing capability and access. Once the dataset has been defined, it can only be found via a Catalog. In each z/OS system, there is one Master Catalog, which is chosen at IPL, a number of User Catalogs and each volume that contains one or more VSAM datasets will have a VSAM Volume dataset (VVDS) defined.

The Master and User Catalogs are a KSDS structure, and the VVDS is an ESDS. These Catalogs are known as Integrated Catalog Facility (ICF). A User Catalog is known as a Basic Catalog Structure (BCS). So that the complete ICF Catalog is a combination of the BCS and the VVDS.

Key Sequenced Dataset:

The Key Sequenced Dataset (KSDS) is defined as 2 Components. The Index Component and the Data Component.

The Index Component is further divided into one or more Sets, Index Sets and a Sequence Set. The Sequence Set is the lowest Level of the Index.

Each record in the KSDS has a Key. The definition of the KSDS includes specifying the starting position of the Key in the record and the length of the key. The records must be in key sequence in order for them to be properly loaded using the REPRO command.

If REPRO finds 4 keys not in sequence, it abandons the load and reports which keys are out of sequence.

The KSDS is the most popular because it allows for the Adding, Updating, Inserting and Deleting of records. So the Index Component will consist only of the Keys and the Data Component will consist of the entire record.

This structure allows for 2 types of access, Random, where the record is found by the Index and Sequential, where the record is found by following the Sequence Set and using the Internal pointers for the next record in the Data Component.

In terms of Buffers. More Buffers are needed for the Index component, Random access and More buffers are needed for the Data component, Sequential Access.

Before we move on to the other VSAM structures, let’s take a look inside a VSAM dataset. Concentrating on the KSDS it has an Index Component and a Data Component. The definition is call a Cluster. Records can be fixed or variable. Records are stored in Control Intervals. They have set sizes, starting with 512bytes, increasing by 512 to 8192 and then by 2k to 32K. So choosing the correct CI size can be tricky.

CIs are further grouped together into Control Areas. We can specify the CI size but VSAM computes the CA size as the smallest of Primary, Secondary Space or 1 cylinder.

There are 2 additional fields in every CI. A Control Interval Definition Field (CIDF), which is 4 bytes in length. It indicates where the free space in the CI begins and how much is available.

The second field is the Record Definition Field (RDF). This is 3 bytes in length and describes the length of records in the CI and how many adjacent records are the same length. Freespace is important. It is specified in the Cluster Definition as 2 percentages.

The first percentage is how much freespace to leave in the CI and the second is how much freespace to leave in the CA. Freespace in Only allocated during a Load of the dataset and after a Control Interval Split.

The other options to be specified in the Define Cluster are also important. The option SPEED is important as it causes the load of the file to be faster than RECOVERY.

RECOVERY causes the preformatting of CAs, and SPEED must be specified if System Managed Buffering is to be employed.

RECORDSIZE has 2 parameters, the first one is the Average record size and the second one is the maximum record size. If they are the same, then the records are fixed in size. SHAREOPTIONS needs to be specified.

This defines how the dataset will be shared across different Address Spaces or Cross Region. There are two values, the first is Cross Region.

The dataset can be access by any number of Read requests, OR only one Write request.
The dataset can be accessed by any number of Read requests AND only one Write request.
The dataset can be fully shared by any number of Users, however the each User is responsible for maintaining Read and Write integrity.
Same as 3, except that buffers are refreshed for each request.

The second value is Cross System. This can be specified as 3, or 4. This is between z/OS LPARS. Sharing VSAM datasets between LPARs is not recommended. However Record Level Sharing (RLS) can overcome this.

One of the challenges with VSAM is that when a READ with UPDATE is issued, VSAM will return the entire CI to the request. So imagine we have a CI size of 4096 with 80 byte records, then we have about 50 records in the CI allowing for CIDF/RDF. If the request wanted record 27, the other records are locked until the request is released.

In CICS, this became a big problem mostly with Conversational Programs. The answer was to implement Record Level Sharing (RLS), as long as there was a Coupling Facility available. RLS needs 2 structures to be defined via the XCF Administration Utility IXCMIAPU. One is a Cache Structure and can be any name, the other one is a Lock Structure, named IGWLOCK00. RLS uses the SMSVSAM Address Space to communicate with the CF. The Define Cluster only needs the LOG option included to make this definition RLS. In CICS we need RLS=YES in the SIT and RLSACCESS(YES) in the File definition. If there is not a CF available then the SIT option of CILOCK needs to be set to NO, to allow CICS to release the CI after The READ UPDATE.

Entry Sequenced Dataset:

I have discussed the KSDS. The other VSAM structures are the Entry Sequenced Dataset (ESDS), the Relative Record Dataset (RRDS), the Linear Dataset (LDS) and

the Variable Relative Record Dataset (VRRDS). For the ESDS, all records are added in sequence in the order presented.

They receive a Relative Byte Address (RBA). So with a record size of 80, the first record receives an RBA of 0, the next record receives an RBA of 80, the next 160 and so on. After the record has been added, the RBA of that record is returned to the application program.

There is no support for Inserting records or Deleting records, although it is possible to replace a record with same RBA and the same length.

In CICS only records in a KSDS and marked as Recoverable in the File Entry will be recovered during a failure. There is no such facility in CICS for backout of an ESDS, although the exit XFCLDEL can mark the record as logically deleted.

The same with an RRDS, there is no backout. Records are added in sequence and receive a Slot number. The first record receives Slot 1, the second receives Slot 2 and so on.

Linear datasets have a fixed CI size of 4096 or multiples of 4096. They are usually used as a base for other systems such as DB2 and ZFS Aggregates.

Let’s discuss VSAM Catalogs. You cannot define a Master Catalog via IDCAMS, because z/OS will say, I have a Master Catalog so I can’t define another, so it will define a User Catalog and then it’s up to the Systems Programmer to build this User Catalog as a new Master Catalog, update the LOADxx member so it’s chosen at the next IPL.

The only definitions that should be in the Master Catalog, are SYS1 datasets such as SYS1.NUCLEUS, Aliases – High Level Qualifiers relating to their User Catalogs and Import Connects for those User Catalogs. These are the BCS definitions.

To complete the Catalog definition, you will need to use IDCAMS to define a VSAM VOLUME DATASET (VVDS) on the volume that will contain any VSAM datasets and any Non-VSAM SMS managed datasets.

If the VVDS is not found on the volume when the first VSAM dataset is defined there, then VSAM will define it implicitly as SYS1.VVDS.Vvolser.

This must be the name, however it’s defined. It is an ESDS. This implicit definition might not be large enough so it might be worthwhile to issue a Modify command F

CATALOG,VVDSSPACE(primary,secondary) in order to change the implicit definition of the VVDS. A F CATALOG,REPORT command will list what this VVDS definition of space is.

For VSAM processing consider using SMB (Storage Managed Buffering). There are a number of options depending on the way the VSAM Dataset is accessed. If you just rely on Bufferspace – defined in the Cluster definition, this is not the best Buffering.

In a COBOL program with Access is Sequential for a KSDS, VSAM will try to allocate more Data Buffers. with Access is Random then VSAM will try to allocate more Index buffers.

The problem comes when the Cobol program specifies Access is Dynamic, VSAM has no way of knowing how to structure the allocation of Buffers.

However, specify AMP=(ACCBIAS=DO, or DW or SO or SW) in the JCL for the Dataset then you can influence how VSAM Opens and Allocates the buffers to the dataset. The Cluster must be defined with SPEED, be SMS managed and specify Extended Format in the SMS DataClass assigned.

DO(Direct Optimize) specifies access is Random. DW(Direct Weighted) specifies access is mostly Random but has some Sequential access. SO(Sequential Optimize) specifies access is Sequential and SW(Sequential Weighted) specifies that access is mostly Sequential with some Random access. For Improvements when loading a KSDS, specify AMP=(ACCBIAS=SYSTEM) in the JCL, Repro will operate in CO (Create Optimized) mode.

Now we have all this ‘complexity’ of VSAM, as long as the structures work there is no problem, but what happens when things break between the BCS and the VVDS. Well VSAM offers 2 commands that can help.

One is EXAMINE and the other is DIAGNOSE. The command EXAMINE INDEXTEST examines the index component of the key-sequenced data set cluster by cross-checking vertical and horizontal pointers contained within the index control intervals, and by performing analysis of the index information and EXAMINE DATATEST which evaluates the data component of the key-sequenced data set cluster by sequentially reading all data control intervals, including free space control intervals.

DIAGNOSE ICFCATALOG (without compare) checks information integrity within each BCS record (inside the BCS only), and DIAGNOSE VVDS (without compare) checks information integrity within each VVDS record (inside the VVDS only).

For the VVDS the first record is the VSAM VOLUME CONTROL RECORD (VVCR) it contains the space and names of the BCS that have datasets on the volume. The next records are the VSAM VOLUME RECORD (VVRs). These have the names of each dataset on the volume with the name of the BCS the dataset is Cataloged in. Run a PRINT command COUNT(10) on a VVDS to see these.

Originally published on IBM Mainframer

0 Comments

Sign up to receive the latest mainframe information

← Previous Article Next Article →

Recently Published

The Need for Effective Change Management

by Hugo Prittie

2026 X-Force Threat Index, Expanded Data Integrity Suite, and more

by Sonja Soderlund

The Real Opportunity for AI on the Mainframe Isn’t Code

by Gil Peleg

Hybrid by Design: The Next-Gen Mainframe Quiz

by Sonja Soderlund

From Mariner to Mainframer: The Reinvention of Uday Prasad | Tech Sharmit Podcast

by Amit Sharma

Amit Sharma, Content Creator at Tech Sharmit and Host of The Tech Sharmit Podcast, interviewed Uday Prasad, co–founder of ZedInfo Tech, about Prasad’s pivot from mariner to mainframer. Catch the full podcast on YouTube.Q: You have been in this industry for 30+ years....

With Geniez, You Can Talk to Your Mainframe in Natural Language

by Penney Berryman

How Geniez Is Turning Real-Time Mainframe Data Into Fuel for Modern LLMs Amanda Hendley, Managing Editor of Planet Mainframe, met with Gil Peleg, CEO of Geniez, for a conversation about artificial intelligence (AI)—specifically, the future of AI on the mainframe.Read...

What’s Inside – Cheryl Watson’s Tuning Letter

by Mark Wilson

The latest Tuning Letter, 2025 No.3, is out now! Subscribers to Cheryl Watson’s Tuning Letter can find the latest issue, 2025 No.3, on our Tuning Letter website. If you are not a current Tuning Letter subscriber, check the website for information about subscription...

Looking Ahead to GS UK Conference 2025