Long Running IBM i (AS/400, iSeries) Batch Jobs

The duration of business critical batch processing within your AS/400, iSeries and System i environments can have a significant impact on end-users, customers and bottom-line.

Many shops look to hardware upgrades as an attempt to get these processes finished within service delivery guidelines. If your processing is only using 15% of your current system's CPU and memory resources, yet the jobs still take way too long to run - something is being overlooked.

The system and these jobs are I/O bound. Discovering the underlying I/O issues and culprits will give you the best chance of finding optimum solutions.

Tips & Techniques

Before any long running batch job can be improved, we must first analyze and understand its complexity. Some jobs simply call one program, open some files, perform a business function, close the files and then terminate.

Other jobs would be better classified as a 'job stream' rather than simply a job. These processes could consist of a CL program with numerous sequential calls to other programs - each performing different processing or business functions.

Imagine this CL program using numerous SBMJOB commands to a single-threaded job queue rather than just the CALL command. Each of these separate batch jobs would need to be analyzed individually to determine whether performance improvement opportunities exist. What is needed to optimize the performance of the first 'job' will likely be completely different than what is necessary for each of the subsequent jobs or steps.

Getting your long running batch job submitted with the LOG(4 00 *SECLVL) LOGCLPGM(*YES) parameters is a good start in determining its complexity. With these job settings, you can analyze the joblog and see each of the CALL commands and their date and time stamps.

Our Application Optimizer 'preview' features automate the collection and presentation of this data.

Frequently Asked Questions

What are some of the most common culprits makingbatch jobs run long?

Depends dramatically on what kind of code you are using. If your application is running extensive SQL or queries, you clearly need to be looking at database and SQL tuning first. Not having proper indexes or logical files on your system for the running of efficient SQL requests is a top issue.

The single biggest performance issue encountered within long running batch jobs would be what we call 'excessive initiation and termination'. If you have a job that runs 500,000 SQL requests (1 for each record read), it will not perform as well as 1 request that reads 500,000 records in a single initiation and termination of SQL.

Batch processing that submits a separate job for each transaction, initiating and terminating jobs excessively, is not going to perform as well as a single job doing the same work. A single job will not function at its highest efficiency if it is opening and closing files for every transaction that is processed. This same job could be calling external programs for each transaction. If these programs are not 'returning control' properly - they could be a big issue.

Any excessive initiation and termination is usually unnecessary and contributes dramatically to a jobs duration. There is significant overhead associated with the starting and stopping of a single SQL request. Minimizing the number of requests can reduce your job's duration. Opening files once at the beginning of a job and closing them once at the end will improve that job's performance. Any external calls that can be reduced or eliminated with inline subroutines can be a great help. You don't want the operating system loading a program in and out of memory 60,000 times per minute.

It is not uncommon to see a single external program-call make a batch job run for 3 hours. This program could be as simple as a stand-alone 'Date Conversion Routine'. All it does is flip around the format of dates from MM/DD/YY to YY/MM/DD. If this RPG program does a SETON *INLR operation - we have found a culprit. Simply changing this to a RETRN operation would tell the operating system to leave the program in memory each time it is called. This 3 hour batch job could now finish in less than 10 minutes.

This might seem like an extreme example or exaggeration - it is not. We see this all the time. It will depend though on the specific job and the amount of duration that is affected by these examples. You must use the IBM commands TRCJOB or STRTRC to gather the necessary data for this type of analysis. Our Application Optimizer 'trace' features can help automate this data collection and presentation process.

New Features

For existing Workload Performance Series customers, you will be most pleased with enhancements to our Application Optimizer data collection process. Like many IBM commands for working with or collecting data on jobs - a qualified job name has always been required.

A qualified job name consists of a system assigned job number, a user name and a job name. The system assigned job number is not known until the job is submitted. This makes data collection on a job running nightly at 3:00 AM a tedious and tiring task. Someone must be awake at 3:01 AM to find the active job and its job number so that data collection can be started.

Our Application Optimizer has 3 different levels of data collection - preview, trace and source. The 'preview' level is used on long running batch jobs as described earlier to determine their complexity and to perform a high level analysis of the steps within a complex 'job stream'.

Once specific steps are identified or single step jobs are found, a 'trace' level analysis would be appropriate to identify the possible excessive initiation and termination issues mentioned earlier. If a single program is identified, with many subroutines and complex, self-contained logic, a 'source' level analysis would be a good next step.

Regardless of which level of data you choose to collect, our Application Optimizer has a unique data collection approach which consists of multiple components. You can start a data collection 'monitor' job that runs for a pre-defined number of hours. This job looks for and waits for any job on the system that meets your filter criteria. You can specify just a job name or just a user name. This monitoring process finds that job at 3:00 AM automatically and submits a 'handler' job once the system assigned job number is known. The 'source' level analysis also submits a 'receiver' job for each detected job number. This additional job is a neccessary part of the batch debug process used by our 'source' level analysis.

Regardless, the main point is that you can define data collection by job name or by user name. You can do this in the middle of the day - while you are at work. No need to log into the system at 3:01 AM to manually run the STRSRVJOB, TRCJOB, STRTRC or STRDBG commands.

There are additional new features within the Workload Performance Series software that provide you with the ability to automatically send PDF documents (with bar and pie charts). You can distribute Microsoft Excel spreadsheets if you would prefer. Maybe you just need to take advantange of the new 'Automated Alerts' features so that you can closely monitor your long running batch jobs and stay informed as their durations keep increasing.

Our Application Optimizer 'preview' is definitely where you should start, but don't forget about our new Query Optimizer and our Journal Optimizer features.

Special Offer

We invite you to go to our web site at http://www.mb-software.com, take us up on our Free 30 Day Trial offer or just make use of our extensive online Resource Center. White Papers and Webcasts are available which provide educational value to managers, administrators and software developers.