Sunday, April 28, 2013

ETL Run Management

This is a very simple run management system. It provides the bare minimum information but still enough to provide elapsed times, error frequency and trends in processing times. It is also quite easy to implement.

Requirements

  • Every run must be uniquely identifiable
  • The following run attributes must be recorded
    • start time
    • end time
    • data start date
    • data end date
    • run status
  • The run identifier must be system generated
  • A run after a failed run will be related back to the failed run

Process Flow


Implementation Details

The run id is defined as a decimal value. The value of the run id is set to the next highest integer after every successful run. If a run is unsuccessful, the next run id is set to the last run id + 0.01. This means that each failed run will be related to the original run. The increment chosen (0.01) allows up to 99 unsuccessful runs.
This process is implemented using a DataStage sequence.

0 comments:

Post a Comment

Please Post your Comments..!