Sunday, April 28, 2013

Datastage: Runtime Column Propagation

In its simplest form, RCP can be used to copy data from one table to another without the need for defining the meta-data. However this feature can be used in any other circumstance where tables may share a subset of column definitions. If the logic to be applied only involves the subset of columns, then a generic job can be used to move the data. A typical example is a set of tables that share one column - a checksum value on the natural key. In this case a generic job can be designed using the Change Capture stage to compare previous and current data producing a dataset containing a delta for the table being processed. This one job can be used to process any number of tables thus producing the necessary files for an incremental load.

Runtime Column Propagation - Simple Copy Job

One simple job is all that's needed to copy any number of tables.

The first thing is to ensure that RCP is enabled for both the project and the job.

Use job parameters to define the source and target table names.

Define properties for source and target database connectors


Note that we have not defined any columns in either the source or the target. At runtime, DataStage will interrogate the source to determine the column definitions and propagate those definitions all the way to the target.

That's all there is to it.
This very simple job can be enhanced in any number of ways.
  • Convert to multi-instance to allow running in parallel
  • Modify the table action in the target stage to Replace rather than Append so that DDL changes are automatically replicated
At the end of the day, we have a single job that can copy any number of tables.

0 comments:

Post a Comment

Please Post your Comments..!