Sunday, April 28, 2013

Datastage Sequence Looping

We've shown how to create DataStage jobs that can process tables and files generically. What we now need is a way to control those jobs in some sort of loop such that a new file or table is passed on every loop. There are many ways to do this - shell script with dsjob, a sequence with the names of the files and/or tables hard-coded into the StartLoop activity, a DataStage Server routine that uses the DataStage API to start and run a list of jobs read from a text file or even another job that calls the UtilityRunJob routine that comes with DataStage.
The approach taken here will be a combination of a Routine, a Sequence and a text file. This design will make maintenance very simple. If a file needs to be added or deleted to or from the list, only the text file and the job parameter changes.
Here's the plan:
  • Create a file containing one record for each file to be processed. Each record is a comma-delimited record containing the file path, the table name and the schema file path.
  • Write a routine that reads a comma delimited file with three fields - file name, table name and schema file name. Each
    record in the file will be loaded into an array and the routine returns the array.
  • Create a sequence that calls the routine once and then loops through each element in the array passing the required fields to a job

Create List of Files

Here is our list of files containing only one entry for this example.

Create Routine

The routine is a fairly simple one. It takes one argument which is the file name (including the path) of the file containing a comma-separated list of parameters for each file to be processed. It opens the file, breaks each line into fields and pushes them on to a dynamic array which it then returns to the caller.
Function ReadFile
Routine = 'ReadFile'

vFileName = Arg1
vArray = ''
vCounter = 0

OPENSEQ vFileName to vFileHandle
  Else Call DSLogFatal("Error opening file list: ":vFileName,Routine)
Loop
While ReadSeq vLine FROM vFileHandle 
  vCounter = vCounter + 1
  vArray = Fields(vLine,',',1)
  vArray = Fields(vLine,',',2)
  vArray = Fields(vLine,',',3)
Repeat
CLOSESEQ vFileHandle

Ans = vArray
Return Ans

Create Sequence


In the first activity of the sequence (uvReadFile), the routine is called and stored in a variable named fileList.

The next step is the loop initialization. We set up a numeric loop with a stop value defined by a job parameter whose value is the size of the list (in this case 1).

Now we set up a variable containing the ith record of the array. This record will contain the set of parameters required to run the job.

The set of parameters is then further split up into individual variables.

Finally we are ready to call the job using the required parameters.

1 comments:

Unknown said...

Hi Midhu,
Great work. :)
The images are not available.
Can you please correctly load it one again.

Post a Comment

Please Post your Comments..!