Friday, July 26, 2013

Find Number of years between 2 dates DAtastage

I understand MONTHS_BETWEEN database function will work. 

Sunday, April 28, 2013

How to start Application Express

After installing Oracle Express, point your web browser to http://localhost:8080/apex
To change the default port:
sqlplus /nolog
connect sys/mypassword as sysdba
exec  dbms_xdb.sethttpport('9090');

Oracle How to pivot a table

create table winners
(year varchar(4) not null,
team varchar(30) not null,
constraint pk_winners primary key (year));

insert into winners (year,team) values ('1990','SAINTS');
insert into winners (year,team) values ('1991','SWANS');
insert into winners (year,team) values ('1992','DOGS');
insert into winners (year,team) values ('1993','MAGPIES');
insert into winners (year,team) values ('1994','TIGERS');


SQL> select * from winners;

YEAR TEAM
---- ------------------------------
1990 SAINTS
1991 SWANS
1992 DOGS
1993 MAGPIES
1994 TIGERS

5 rows selected.

SQL> select max(decode(year,'1990',team,'')) as "1990",
max(decode(year,'1991',team,'')) as "1991",
max(decode(year,'1992',team,'')) as "1992",
max(decode(year,'1993',team,'')) as "1993",
max(decode(year,'1994',team,'')) as "1994"
from winners

1990       1991       1992       1993       1994
---------- ---------- ---------- ---------- ----------
SAINTS     SWANS      DOGS       MAGPIES    TIGERS

1 row selected.

Oracle:How to get first and last rows in an aggregation

create table transactions (
id integer not null,
tdate date not null,
opening_bal number(8,2) not null,
closing_bal number(8,2) not null,
transaction number(8,2) not null,
constraint pk_transactions primary key (id,tdate));

insert into transactions (id,tdate,opening_bal,closing_bal,transaction) values (1,'01-jan-10',0,10,10);
insert into transactions (id,tdate,opening_bal,closing_bal,transaction) values (1,'02-jan-10',10,20,10);
insert into transactions (id,tdate,opening_bal,closing_bal,transaction) values (1,'03-jan-10',20,25,5);
insert into transactions (id,tdate,opening_bal,closing_bal,transaction) values (1,'04-jan-10',25,-15,-40);
insert into transactions (id,tdate,opening_bal,closing_bal,transaction) values (1,'05-jan-10',-15,0,15);
insert into transactions (id,tdate,opening_bal,closing_bal,transaction) values (1,'06-jan-10',0,13,13);


SQL> select * from transactions;

        ID TDATE     OPENING_BAL CLOSING_BAL TRANSACTION
---------- --------- ----------- ----------- -----------
         1 01-JAN-10           0          10          10
         1 02-JAN-10          10          20          10
         1 03-JAN-10          20          25           5
         1 04-JAN-10          25         -15         -40
         1 05-JAN-10         -15           0          15
         1 06-JAN-10           0          13          13

6 rows selected.
SQL> select id, count(tdate) as tcount
  ,max(opening_bal) keep (dense_rank first order by tdate) as opening_bal
  ,max(closing_bal) keep (dense_rank last order by tdate) as closing_bal 
from transactions 
where tdate between '02-jan-10' and '04-jan-10' 
group by id;

        ID TCOUNT       OPENING_BAL CLOSING_BAL
---------- ------------ ----------- -----------
         1            3          10         -15

SQL>

Oracle How to generate random data (without PL/SQL)

This is a handy way of generating random data. The number of rows generated is defined by the "connect by level" (currently set to 5).
column s format A5
column r format 999999999999

select level,dbms_random.random r,dbms_random.value(1,10) v,dbms_random.string('','5') s,sysdate+level d
from dual
connect by level <= 5;

LEVEL     r               V           s      D        
---------- ------------- ---------- ----- ---------
         1     384454788 5.86831193 PBZVI 14/AUG/10
         2     568121861 8.77328309 YHVDP 15/AUG/10
         3    -918508229 6.24565917 CHKXW 16/AUG/10
         4    1409236401 4.15254419 HAZUS 17/AUG/10
         5   -1356731387 4.12237254 HDDDX 18/AUG/10

5 rows selected.

Oracle How to create a user

SQL> create user DEV identified by DEV;
User created.
SQL> grant create session to dev;
Grant succeeded.
SQL> grant resource to DEV;
Grant succeeded.
SQL> grant select any table to dev;
Grant succeeded.
SQL> grant select_catalog_role to dev;
Grant succeeded.
SQL>

Oracle: How to convert to local time

alter session set nls_date_format='yyyy-mm-dd hh24:mi:ss';
select
cast(from_tz(cast (to_date('2011-01-01 16:00:00') as timestamp),'GMT') at local as date) as LOCAL_EVENT_DATE
from dual;

alter session set succeeded.
LOCAL_EVENT_DATE
-------------------------
2011-01-02 02:00:00
1 rows selected

How to check for valid numbers


-- select records from a table where a varchar field contains only valid numbers

-- Method 1
select emp_id,hours_worked
from emp
where trim(translate(hours_worked,'+-.0123456789',' ')) is null

translate(expr_string, from_string, to_string) converts searches through expr_string and replaces every character in from_string to the corresponding character in to_string. If no corresponding character is found (i.e. the nth character in from_string has no corresponding character in to_string), then the character is removed.
Examples
  • translate('12345', '14', '69') => '69345'
  • translate('12345', '41', '96') => '69345'
  • translate('12345', '12345', ' ') => ' '
Note that to_string must have at least one character (a space in the last example) otherwise all characters are removed because to_string is effectively NULL.
Note also that this will not work for values with the pattern 9999-9999
This function can be used to test if a value is numeric by trimming as shown in the SQL snippet above.
-- Method 2
select emp_id,hours_worked
from emp
where regexp_like(hours_worked,'^\d+$')

Oracle Dates

Internal Date Format
Oracle dates are stored as a structure made up of 7 fields each of which is 1 bytes long.
byte 1: century + 100
byte 2: year + 100
byte 3: month
byte 4: day of month
byte 5: hour + 1
byte 6: minute + 1
byte 7: second + 1
create table t (d date);

insert into t values (to_date('2001-05-30 08:05:30','yyyy-mm-dd hh24:mi:ss'));

select dump(d) from t;

DUMP(D)
--------------------------------------------------------------------------------
Typ=12 Len=7: 120,101,5,30,1,1,1
Default Date Format
The default date format is used whenever a date value has to be displayed (e.g. the result of a select). It is set using the nls_date_format initialization parameter. By default it is 'dd-MON-yy'. You can change this for the current session using the 'ALTER SESSION' command.
alter session set nls_date_format='yyyy-mm-dd hh24:mi:ss';
Adding days
Adding days is very simple - just use normal arithmetic operators (+ and -).
alter session set nls_date_format='yyyy-mm-dd';
var d varchar2(10);
exec :d := '2010-01-31';
select :d, to_date(:d)+1 from dual;

:D                               TO_DATE(:D
-------------------------------- ----------
2010-01-31                       2010-02-01
Adding months
Adding months is just as easy but you have to be aware of a little quirk. When adding a month to an end of month date, the result is also the end of month. Note that you can use negative numbers for the second argument.
Moving from a month with 30 days to a month with 31 days could "lose" a day.
exec :d := '2001-04-30';

select :d,add_months(:d,1) from dual;

:D                               ADD_MONTHS
-------------------------------- ----------
2001-04-30                       2001-05-31
This is what could happen when moving from a non-leap year to a leap year.
exec :d := '2003-02-28';

select :d,add_months(:d,12) from dual;

:D                               ADD_MONTHS
-------------------------------- ----------
2003-02-28                       2004-02-29

Oracle Analytic Function

SELECT
SALES_OFFICER_CODE
, SALES_OFFICER_NAME
, SOURCE_START_DATE
, EFFECTIVE_DATE
, END_DATE
, SOURCE_END_DATE
, ROW_NUMBER() OVER (PARTITION BY SALES_OFFICER_CODE ORDER BY SALES_OFFICER_CODE,SOURCE_START_DATE desc) as rn
, FIRST_VALUE(SOURCE_START_DATE) OVER (PARTITION BY SALES_OFFICER_CODE ORDER BY SALES_OFFICER_CODE,SOURCE_START_DATE desc) as fv
, NVL(LAG(SOURCE_START_DATE) OVER (PARTITION BY SALES_OFFICER_CODE ORDER BY SALES_OFFICER_CODE,SOURCE_START_DATE desc),to_date('1900/01/01','yyyy-mm-dd')) as lag
FROM SDMSYSDB.TSDMD_SALES_OFFICER_ROLE

Datastage :Importing Table Definitions With OrchDbUtil

orchdbutil is a command line utility that allows you to print out table table definitions. It connects to the database,
queries the database catalog and prints the table definition to the console. The table definition is an orchestrate
definition - i.e. it uses internal data types rather than SQL data types when defining columns.
The following example shows how orchdbutil is used.

$ orchdbutil show MSF020 -dbtype oracle -server orcl -dboptions {user=jim,password=jim}
##I IIS-DSEE-TFCN-00001 21:27:30(000) 
IBM WebSphere DataStage Enterprise Edition 8.5.0.5746
Copyright (c) 2001, 2005-2008 IBM Corporation. All rights reserved

##I IIS-DSEE-TDOR-00303 21:27:31(001)  Oracle session NLS parameters: CODESET=.1252  NLS_LANG="AMERICAN_AMERICA.WE8MSWIN1252" NLS_LANGUA
GE=AMERICAN NLS_TERRITORY=AMERICA NLS_CURRENCY=$ NLS_ISO_CURRENCY=AMERICA NLS_NUMERIC_CHARACTERS=., NLS_CALENDAR=GREGORIAN NLS_DATE_FORMAT=DD-MON-RR N
LS_DATE_LANGUAGE=AMERICAN NLS_CHARACTERSET=AL32UTF8 NLS_SORT=BINARY NLS_TIME_FORMAT=HH.MI.SSXFF AM NLS_TIMESTAMP_FORMAT=DD-MON-RR HH.MI.SSXFF AM NLS_T
IME_TZ_FORMAT=HH.MI.SSXFF AM TZR NLS_TIMESTAMP_TZ_FORMAT=DD-MON-RR HH.MI.SSXFF AM TZR NLS_DUAL_CURRENCY=$ NLS_NCHAR_CHARACTERSET=AL16UTF16 NLS_COMP=BI
NARY NLS_LENGTH_SEMANTICS=BYTE NLS_NCHAR_CONV_EXCP=FALSE.
##I IIS-DSEE-TCDU-00009 21:27:31(002)  Database connect options: dbname= , server='orcl', dboptions='{user=busi_ldg,passwor
d=busi_ldg}'.
##I IIS-DSEE-TCDB-00002 21:27:31(003)  Database connect options: dbname= , server='orcl', dboptions='{user=busi_ldg,passwor
d=busi_ldg}'Library orchoracle is loaded.
##I IIS-DSEE-TCDU-00001 21:27:31(004)  record
( ENTRY_TYPE: string[1];
  ENTITY: string[10];
  DSTRCT_CODE: string[4];
  DEFAULT_FLAG: string[1];
  PASS_WORD: string[10];
  DEFAULT_MENU: string[10];
  SECURITY_ACCESS: string[1];
  LAST_ACC_DATE: string[8];
  PSW_UPD_DATE: string[8];
  LAST_MOD_DATE: string[8];
  LAST_MOD_TIME: string[6];
  LAST_MOD_USER: string[10];
  CREATION_DATE: string[8];
  GLOBAL_PROFILE: string[10];
  EMPLOYEE_ID: string[10];
  PROFILE: string[250];
  OWNER_ID: string[10];
  OGUI_GLOBAL_PROF: string[10];
  LAST_NO_ERR_LOGS: string[2];
  PROFILE_LOGIN_LCKD: string[1];
  RECENT_PWD_GRP: string[204];
)
$
Unlike the Designer interface, the output is fairly raw. However it provides us with the necessary information to begin our scripting process.

We will use Perl to call orchdbutil, capture and parse its output and then convert the output to DSX format. The script will contain a configuration section that defines database default connection details and other information necessary to build the DSX file. It will accept a number of command line options that change the default values. One of the options is a list of tables. The script will process every table in the list producing a DSX file for each one. These DSX files can then be concatenated into a single DSX and imported using the dscmdimport utility.
Here's how it works.
$ ll
total 24
-rwxrwxrwa   1 Administrators  None              11429 Apr 16 07:07 importTableDefs.pl
-rwxrwxrwa   1 Administrators  None                 16 Apr 16 07:08 tables.lst
$ cat tables.lst
MSF020
MSF034
$ ./importTableDefs.pl -tableList allTables.txt
Exporting table definition for MSF020
Exporting table definition for MSF034
$ cat *.dsx > allTables.dsx
$ dscmdimport /D=kuatro:9080 /U=jim /P=jim /H=kuatro /NUA EPM ./allTables.dsx
Attaching to 'EPM' ...
Importing './allTables.dsx' to project 'EPM' ...
DSImport started at: 07:15:01 2011-04-16

Importing Table definitions
.
Importing Table definitions
.DSImport ended at: 07:15:02 2011-04-16

$
As you can see, we are now able to import dozens of tables very quickly and easily.

Scripting the import process provides a quick and painless way of importing table definitions for any number of tables and, if required, to refresh these on a regular basis. This saves Developers from having to manually import and they can also be confident that they are working with the latest table definitions. Furthermore, the table definition DSX files can be stored in a Change Repository such as subversion. This will keep track of when and what changes where found.
The full listing is in the attached file.

How To Clear Locked datastage Jobs

There are times when a user may be disconnected from the network while a job is open. You would normally use the web console to look for open sessions and then disconnect these but there are some occassions when the sessions do not appear on the web console.
This handy little script is what you need.

cd \IBM\InformationServer\ASBServer\bin
C:\IBM\InformationServer\ASBServer\bin>cleanup_abandoned_locks.bat
Enter the username: jim
Enter the password: jim
C:\IBM\InformationServer\ASBServer\bin>

Note that this script is found (and run) on the DSEngine Server.

Creating a project from the command line using dsadmin

reating a project using the Administrator client is easy and simple. The problem with graphical tools is that it is very difficult to ensure repeatability. Sure you can document the process with lots of pictures but at the end of the day there is still the human factor. The best way of ensuring repeatability is by scripting the process.

dsadmin

The dsadmin utility can be used to create and configure a DataStage project, set project properties and manage project environment variables.
Command Syntax:
        dsadmin [-file    | -domain  -user  -password  -server ]
                         []

Valid primary command options are:
        -createproject
        -deleteproject
        -oshvisible
        -enablercp
        -enablejobadmin
        -envadd
        -envdelete
        -envset
        -advancedruntime
        -basedirectory
        -deploymentdirectory
        -customdeployment
        -listprojects
        -listproperties
        -listenv
        -enablegeneratexml
        -protectproject
I've created the attached shell script that creates a project, sets project properties and then sets the project's environment variables. If the project exists, it simply skips the creation part. This script allows me to create the project and once it is created, I can run the script any time a variable is added or modified.
Once I'm satisfied with my development environment, I can run the process in my test and production environment with different values for project variables. Although environments may have differing sets of project variables, in most cases there will be a common subset. This script allows me to ensure that the environments share this common subset of project variables.

Compiling datastage jobs in parallel

function ds_compile_batch {
  if [[ "$DEBUG" = "Y" ]]; then
    set -x
  fi
  logVerbose "Compiling routines in ${ds_project}@${ds_host}"
  $CMD_DSCC /f /r '*' ${ds_project} >> $COMPILE_LOG 2>&1
  typeset -i COUNTER=0
  SLEEP=5
  cat ${DSJOB_LIST} | while read DSJOB ; do
    logVerbose "Compiling $DSJOB"
    if [[ "$DRY_RUN_FLAG" != "Y" ]]; then
      let COUNTER=$COUNTER+1
      ID="${temp_dir}/${COUNTER}_$(date '+%Y%m%d%H%M%S')"
      $CMD_DSCC /f /j ${DSJOB} ${ds_project} >>${ID}.log 2>&1 &
      PID=$!
      echo $PID > ${ID}.pid
      echo "Compiling $DSJOB (P${COUNTER})" >> ${ID}.log
      # sleep to avoid write errors to c:/Program\ Files/ITCB/ds_logs/dstage_wrapper_trace_n.log
      #sleep 2

      # wait if max processes running
      logDebug "Number of compile processes running $COUNTER (MAX=$MAX_PROCESS)"
      if [[ $COUNTER -ge $MAX_PROCESS ]]; then
      # wait for processes to finish
        WAIT="Y"
        while [[ "$WAIT" = "Y" ]]; do
          sleep $SLEEP
          for PID_FILE in ${temp_dir}/*.pid ; do
            PID=$(cat ${PID_FILE})
            ps -p ${PID} >/dev/null 2>&1
            if [[ $? -ne 0 ]]; then
              THIS_LOG="${temp_dir}/$(basename $PID_FILE .pid).log"
              cat "$THIS_LOG" >> $COMPILE_LOG
              echo "===========================================================" >> $COMPILE_LOG
              WAIT="N"
              rm "$PID_FILE"
              rm "$THIS_LOG"
            fi
          done
          COUNTER=$(ls ${temp_dir}/*.pid 2>/dev/null | wc -l)
          logDebug "Number of compile processes running $COUNTER (MAX=$MAX_PROCESS)"
        done
      fi
    fi
  done
}

Installing and Configuring IBM Information Server 8.5

After a few attempts trying to install on a VM (VMWare Workstation 6.5), I gave up and decided to set up on a physical machine. There were a couple problems with VM:
  1. It recommended a maximum of 2.8G for the guest machine
  2. The VM kept hanging when I tried to install Windows 2003 Enterprise
Anyway after reading the manual, I found out that DS8.5 requires a minimum of 4G of ram for each tier. There are four tiers - client, services, engine and metadata repository. If you intend to put more than one tier on a host, you will require a minimum of 6G. Since this is a development/sandpit box, I've decided to put all tiers on one host and the following is what I ended up with.
  • Intel Quad-core
  • 8G RAM
  • Windows 2003 Enterprise Edition (Standard Edition can only address up to 4G)
Having installed the base system, I then added the following in preparation for IIS installation:
  • Get latest Windows 2003 updates (Service Pack 2)
  • Install a copy of Acrobat Reader
  • Install a copy of Internet Explorer 7
  • Install WIC (Windows imaging component is required by IIS)
  • Install Oracle 10gR2
  • Create a single Oracle database with UTF8 and UTF16 character encoding
  • Install Oracle SQL Developer
  • Run scripts to create XMeta and IA objects (these can be found under DatabaseSupport in the IIS installation pack)
The actuall installation process is all web-based. Setup.exe starts up a web server in a command window. It then starts up a web browser and connects to that server. The installation process goes through quite a few validation checks before finally starting the install but once all the validation checks passed, I could leave it alone to do it's thing which was a good thing because the install process took something like 6 hours to complete.
The first thing I noticed after installation was that there were quite a few new icons on my desktop some of which were familiar others not so.
  • Administrator Client
  • Designer Client
  • IBM Import Export Manager
  • IBM InfoSphere FastTrack Client
  • IBM InfoSphere Information Server Command Line Interface
  • IBM InfoSphere Information Server Console
  • IBM InfoSphere Information Server Documentation
  • IBM InfoSphere Information Server Manager
  • Multi-Client Manager
  • Start ASB Agent
  • Stop ASB Agent
  • Web Console for IBM Information Server
Before I could use the DataStage clients, users and groups had to be created on the Engine host with the correct privileges. After installation, there is only one IIS user - wasadmin. This user needs to be mapped to an Administrator user on the operating system.
Create OS users and groups
  1. Login to the Engine host as administrator
  2. Create a new group (e.g. DataStage Developers)
  3. Create a new user (e.g. etluser)
  4. Add new user to new group
  5. Grant write permissions to new group on IIS server directory and all subdirectories (c:\IBM\InformationServer\Server)
Create/configure IIS users
  1. Login to the web console ("Web Console for IBM InfoSphere Information Server") as wasadmin
  2. Navigate to Administration/Domain Management/Engine Credentials
  3. Select the server engine then click on "Open My Credentials"
  4. Enter the name and password of an Administrator user on the Engine host then click on "Save and Close"
  5. Navigate to Administration/Users and Groups/New User
  6. Add a new DataStage user (e.g. jim) by entering the required fields
  7. Click on "Suite Component" and select "DataStage and QualityStage User"
  8. Click on "Save and Close"
Grant roles to the new user
  1. Open Administrator Client
  2. Navigate to Projects and select a project (e.g. dstage1)
  3. Select Permissions
    Note that wasadmin is shown with the role of "DataStage and QualityStage Administrator" but the new user is not shown.
  4. Click on "Add User or Group"
  5. Change the role to "DataStage and QualityStage Developer"
  6. Click on "OK" and "Close"
At this point the new user (jim) was able to create/manage new jobs with DataStage Designer and DataStage Director.

Datastage Sequence Looping

We've shown how to create DataStage jobs that can process tables and files generically. What we now need is a way to control those jobs in some sort of loop such that a new file or table is passed on every loop. There are many ways to do this - shell script with dsjob, a sequence with the names of the files and/or tables hard-coded into the StartLoop activity, a DataStage Server routine that uses the DataStage API to start and run a list of jobs read from a text file or even another job that calls the UtilityRunJob routine that comes with DataStage.
The approach taken here will be a combination of a Routine, a Sequence and a text file. This design will make maintenance very simple. If a file needs to be added or deleted to or from the list, only the text file and the job parameter changes.
Here's the plan:
  • Create a file containing one record for each file to be processed. Each record is a comma-delimited record containing the file path, the table name and the schema file path.
  • Write a routine that reads a comma delimited file with three fields - file name, table name and schema file name. Each
    record in the file will be loaded into an array and the routine returns the array.
  • Create a sequence that calls the routine once and then loops through each element in the array passing the required fields to a job

Create List of Files

Here is our list of files containing only one entry for this example.

Create Routine

The routine is a fairly simple one. It takes one argument which is the file name (including the path) of the file containing a comma-separated list of parameters for each file to be processed. It opens the file, breaks each line into fields and pushes them on to a dynamic array which it then returns to the caller.
Function ReadFile
Routine = 'ReadFile'

vFileName = Arg1
vArray = ''
vCounter = 0

OPENSEQ vFileName to vFileHandle
  Else Call DSLogFatal("Error opening file list: ":vFileName,Routine)
Loop
While ReadSeq vLine FROM vFileHandle 
  vCounter = vCounter + 1
  vArray = Fields(vLine,',',1)
  vArray = Fields(vLine,',',2)
  vArray = Fields(vLine,',',3)
Repeat
CLOSESEQ vFileHandle

Ans = vArray
Return Ans

Create Sequence


In the first activity of the sequence (uvReadFile), the routine is called and stored in a variable named fileList.

The next step is the loop initialization. We set up a numeric loop with a stop value defined by a job parameter whose value is the size of the list (in this case 1).

Now we set up a variable containing the ith record of the array. This record will contain the set of parameters required to run the job.

The set of parameters is then further split up into individual variables.

Finally we are ready to call the job using the required parameters.

Datastage Schema Files

Using RCP is great when your source is a database whose meta-data can be discovered but what about flat files? How do we create generic jobs that will process flat files.
The answer is Schema files. Schema files allow you to move column definitions from the job to an external file. This way a single generic job can be created to move flat files from source to destination.
There are two steps involved in creating a generic job using schema files:
  1. Create a schema file
  2. Create the DataStage job
    1. Example CSV File

      We have a very simple CSV file to load for this example. This file contains four fields - Name, Street Address, Suburb and Post Code.

      Creating a schema file

      A schema file is simply a text file containing an orchestrate table definition. The shema file for the employee file is shown below.
      Note that the data types used here are internal data types not SQL data types used in column definitions. Note also that the format tab of the Sequential file stage is ignored if a schema file is used. The schema file should contain the formatting options as shown below.

      The DataStage job is again a very simple job. It uses the schema file to define the formatting and meta-data of the sequential file and it uses RCP to propagate the column definitions.

      . The name of the schema file is defined in the Sequential File stage properties tab.

      The use of schema files allows a single DataStage job to populate multiple tables from multiple files. Since the schema files are plain text files, maintenance is easy and no special skills are required.

Datastage: Runtime Column Propagation

In its simplest form, RCP can be used to copy data from one table to another without the need for defining the meta-data. However this feature can be used in any other circumstance where tables may share a subset of column definitions. If the logic to be applied only involves the subset of columns, then a generic job can be used to move the data. A typical example is a set of tables that share one column - a checksum value on the natural key. In this case a generic job can be designed using the Change Capture stage to compare previous and current data producing a dataset containing a delta for the table being processed. This one job can be used to process any number of tables thus producing the necessary files for an incremental load.

Runtime Column Propagation - Simple Copy Job

One simple job is all that's needed to copy any number of tables.

The first thing is to ensure that RCP is enabled for both the project and the job.

Use job parameters to define the source and target table names.

Define properties for source and target database connectors


Note that we have not defined any columns in either the source or the target. At runtime, DataStage will interrogate the source to determine the column definitions and propagate those definitions all the way to the target.

That's all there is to it.
This very simple job can be enhanced in any number of ways.
  • Convert to multi-instance to allow running in parallel
  • Modify the table action in the target stage to Replace rather than Append so that DDL changes are automatically replicated
At the end of the day, we have a single job that can copy any number of tables.

Datastage :Generic Data Processing

ETL is all about moving and transforming data. Transformation is almost always specific to the file or table that is being processed but moving is a generic action. Data is moved from on place to another and the only thing that differs is the structure of the data.
In almost all cases, the very first set of jobs created in an ETL application is used to moved data from the source to a landing area. In the past, a job had to be created for every table or file that had to be moved. DataStage has some nice features that allow us to build a single job that can be used to process data in a generic way.

Datastage:Slowly Changing Dimension (SCD) Stage

There are a number of ways to manage slowly changing dimensions using DataStage. The choice of which method to use depends partly on the type of dimension you are populating.

Types of SCD

  • Type 1 - no history, records are simply overwritten
  • Type 2 - changes trigger an update that expire the existing record and an insert to add the new record
  • Type 1 and Type 2 - fields are classified as either type1 or type 2
DataStage offers specialized stages as well as the more generic stages. Regardless of the implementation, the basic algorithm for adding new records to a dimension is as follows:
for each new record get business key check if business key exists in dimension if so then check if record has changed if so then if (type 1 change) then update dimension record if (type 2 change) then expire dimension record and add new dimension record end if else add new dimension record end if next record
In this tutorial, I will be discussing the SCD stage. I've got a very simple employee table with the following structure.

The natural key for the table is Name. I will be using this table as the source for the employee dimension. The employee dimension structure is shown below.

SCD Stage

The SCD stage doesn't have what I would call an intuitive interface. It takes 2 inputs (a primary link, a reference link) and 2 output links . The primary link is connected to the incoming data while the reference link is connected to the dimension.

Open the SCD stage and there is a type of wizard called "Fast Path". I found this to be the simplest way of editing the stage.

Step 1 - Define the output

Define the output link. The output link is used to propagate fields to fact tables or to other stages including other SCD stages.

Step 2 - Define the lookup fields

On this page, we define the business key, type 1 and type 2 fields, current row indicator and the effective and expiry date fields. Note that if there are any type 2 fields, there must also be a current row indicator.

Step 3 - Define surrogate key source

This will be used to populate the EMP_ID field for new records.

Step 4 - Define the dimension fields

This page is where we define how the dimension fields are populated. Note that END_DATE and CURRENT_ROW_IND have 2 deriviations - one for adding records and one for expiring records. Note that these derivations must be either literals or functions. Also note that Job Parameters cannot be used here.

Step 5 - Define output fields

The final page defines what fields we want to send to the output. We have the choice of fields from the Primary and Reference link.


For my money, the SCD stage is not really great value. It basically combines a lookup and a transform stage but it does it in a very clumsy way. Also, it does not give you the ability to process deleted records. It's designed to process incremental loads so if you have a full load and you need to detect deleted records, this stage will not do it for you.

Datastage: Importing Table Definitions

As we all know, Oracle table definitions can be imported from DataStage Designer. The Designer client provides a simple and convenient interface that connects to the database, extracts the table definition using Orchestrate Schema Definitions and imports the definition into DataStage. A friendly wizard steps us through the process.
Select Import/Table Definitions/Orchestrate Schema Definiitions

Define database connection

This screen is a bit of a mystery to me. It's titled "Select Schema" and implies that you may be able to import more than
one schema at a time but, if this is possible, I have not figured out how.

The "Field Selection" screen allows us to choose which fields we want to import.

Next we define the Table Definition name and the folder it will be imported into.

The next screen allows us to change the default descriptions for the new table definition.

Finally we're ready to import and the wizard asks for final confirmation.

Eureka! - We've done it.

As I said - very simple and convenient.
But what if we need to import dozens or hundreds of tables? Even if we only have a few tables, what if we wanted to refresh the definitions
on a regular basis? This could become quite a tedious chore. Well here's one way of automating the process.

ETL Run Management

This is a very simple run management system. It provides the bare minimum information but still enough to provide elapsed times, error frequency and trends in processing times. It is also quite easy to implement.

Requirements

  • Every run must be uniquely identifiable
  • The following run attributes must be recorded
    • start time
    • end time
    • data start date
    • data end date
    • run status
  • The run identifier must be system generated
  • A run after a failed run will be related back to the failed run

Process Flow


Implementation Details

The run id is defined as a decimal value. The value of the run id is set to the next highest integer after every successful run. If a run is unsuccessful, the next run id is set to the last run id + 0.01. This means that each failed run will be related to the original run. The increment chosen (0.01) allows up to 99 unsuccessful runs.
This process is implemented using a DataStage sequence.

Date Transformations in datastage

DataStage provides a number of format strings that can be used to convert dates to strings and vice-versa. In the following examples, we use a DATE variable (date_value) that contains a value of "2000-02-29".
Description Syntax Result
Convert date to string DateToString(date_value,"%dd-%mm-%yyyy") 29-02-2000
Get short month name DateToString(date_value,"%mmm") Feb
Get long month name DateToString(date_value,"%mmmm") February
Get short weekday name DateToString(date_value,"%eee") Tue
Get long weekday name DateToString(date_value,"%eeee") Tuesday
Note that although the documentation mentions the "%eee" and "%eeee" format strings , they don't seem to work in isolation. I've had success with a format such as "%eee, %yyyy-%mm-%dd" but if I use "%eee" in isolation, I get the following error.
Transformer_2,0: Caught exception from runLocally(): APT_ParseError: Parsing pa rameters "%eee" for conversion "string=string_from_date[%yyyy-%mm-%dd](date)": APT_Conversion_String_Date: Invalid Format [%eee] used for string_from_date type conversion.

Calculating Last Day of Month in datastage


Calculating the last day of the month is a little bit more work but is certainly easier than using if-else statements as I've seen in some jobs. This is how it's done using stage variables. Of course you could just use a single variable with lots of nested functions but it's a lot easier to understand (and maintain) with mulitple variables.

svDate = inputlink.dateVar
svYear = YearFromDate(svDate)
svMonth = MonthFromDate(svDate)
svDay = DaysInMonth(svDate)
svEndOfMonth = DateFromComponents(svYear,svMonth,svDay)

Days Elapsed in datastage

Days Elapsed

Another common requirement is to calculate the number of days between 2 dates. This is easily accomplished with the DaysSinceFromDate function.
DaysSinceFromDate("2001-01-20","2001-01-28") = -8

Adding and Subtracting Days datatage

Adding and Subtracting Days
Converting to Julian date is quite a common way of performing date arithmetic. Once you have a Julian day, you can simply add or subtract any number of days from the original value.
JDATE = JulianDayFromDate(DT_VAR)
TOMORROW = JDATE + 1
YESTERDAY = JDATE - 1
LAST_WEEK = JDATE - 7
Once you have performed your arithmetic, you simply change back to a date.
NEW_DT = DateFromJulianDay(LAST_WEEK)
Another way is to use the DateFromDaysSince function.
TOMORROW = DateFromDaysSince(1,DT_VAR)
YESTERDAY = DateFromDaysSince(-1,DT_VAR)
The great thing about this function is it will accept a string so long as the string is in the format ("%yyyy-%mm-%dd").
Both options have the same drawback - no options to add months or years. Trying to roll your own functions to add months and years (although fairly simple) is a pain at the best of times and it's just another component that has to be deployed. It would be nice to have a built-in function that does the job - and there is. The best and most flexible option is to use the DateOffsetByComponents function. This function provides almost everything you will need for date aritmetic. The function takes four parameters - date, year_offset, month_offset, day_offset. In other words, this function can perform calculations similar to the following examples.
Use DateOffsetByComponents(%basedate%,%yearoffset%,%monthoffset%,%dayoffset%) to:
  • Calculate the date 5 days before or after date
  • Calculate the date 3 months before or after date
  • Calculate the date 2 years before or after date
  • Calculate the date 2 years, 3 months and 5 days before or after the date
Furthermore, this function also accepts strings in the format "%yyyy-%mm-%dd".
Date Year Offset Month Offset Day Offset Result
2004-02-29 0 0 5 2004-03-05
2004-02-29 0 3 0 2004-05-29
2004-02-29 2 0 0 2006-03-01
2004-02-29 2 3 5 2006-06-03

How to manipulate dates in DataStage 8.5

DataStage has two data types that can be used to store dates - DATE and TIMESTAMP. Transforming data often requires that we manipulate dates in some way. DataStage has a rich set of functions for all types of date manipulation yet I constantly see hand-written code to perform operations that are readily available out of the box.
The following is a discussion of some of the more common date operations and transformations.

Let's begin with some definitions of data types straight out of IBM's documentation.

Data Types

SQL Type Underlying Data Type Size Description
Date date 4 bytes Date with month, day, and year
Timestamp timestamp 9 bytes Single field containing both date and time value

Validation

If you are lucky enough to be working with Date types from a database then you won't have to worry about invalid dates but more often than not you will have to validate a string to ensure that you have a valid date. DataStage provides a function that can validate all data types.
IsValid(%typestring%,%valuestring%,[%format%]) 
Where typestring is one of the following:
"date", "decimal", "dfloat", "sfloat", "int8", "uint8", "int16", "uint16", "int32", "uint32", "int64", "uint64", "raw", "string", "time", "timestamp". "ustring"
DataStage also provides a number of convenience functions to make life just that little bit easier so instead of the function shown above you can use
IsValidDate(%date%) 

Date Arithmetic

Adding and Subtracting Days
Converting to Julian date is quite a common way of performing date arithmetic. Once you have a Julian day, you can simply add or subtract any number of days from the original value.
JDATE = JulianDayFromDate(DT_VAR)
TOMORROW = JDATE + 1
YESTERDAY = JDATE - 1
LAST_WEEK = JDATE - 7
Once you have performed your arithmetic, you simply change back to a date.
NEW_DT = DateFromJulianDay(LAST_WEEK)
Another way is to use the DateFromDaysSince function.
TOMORROW = DateFromDaysSince(1,DT_VAR)
YESTERDAY = DateFromDaysSince(-1,DT_VAR)
The great thing about this function is it will accept a string so long as the string is in the format ("%yyyy-%mm-%dd").
Both options have the same drawback - no options to add months or years. Trying to roll your own functions to add months and years (although fairly simple) is a pain at the best of times and it's just another component that has to be deployed. It would be nice to have a built-in function that does the job - and there is. The best and most flexible option is to use the DateOffsetByComponents function. This function provides almost everything you will need for date aritmetic. The function takes four parameters - date, year_offset, month_offset, day_offset. In other words, this function can perform calculations similar to the following examples.
Use DateOffsetByComponents(%basedate%,%yearoffset%,%monthoffset%,%dayoffset%) to:
  • Calculate the date 5 days before or after date
  • Calculate the date 3 months before or after date
  • Calculate the date 2 years before or after date
  • Calculate the date 2 years, 3 months and 5 days before or after the date
Furthermore, this function also accepts strings in the format "%yyyy-%mm-%dd".
Date Year Offset Month Offset Day Offset Result
2004-02-29 0 0 5 2004-03-05
2004-02-29 0 3 0 2004-05-29
2004-02-29 2 0 0 2006-03-01
2004-02-29 2 3 5 2006-06-03

Days Elapsed

Another common requirement is to calculate the number of days between 2 dates. This is easily accomplished with the DaysSinceFromDate function.
DaysSinceFromDate("2001-01-20","2001-01-28") = -8

Last Day of Month

Calculating the last day of the month is a little bit more work but is certainly easier than using if-else statements as I've seen in some jobs. This is how it's done using stage variables. Of course you could just use a single variable with lots of nested functions but it's a lot easier to understand (and maintain) with mulitple variables.

svDate = inputlink.dateVar
svYear = YearFromDate(svDate)
svMonth = MonthFromDate(svDate)
svDay = DaysInMonth(svDate)
svEndOfMonth = DateFromComponents(svYear,svMonth,svDay)

Date Transformations

DataStage provides a number of format strings that can be used to convert dates to strings and vice-versa. In the following examples, we use a DATE variable (date_value) that contains a value of "2000-02-29".
Description Syntax Result
Convert date to string DateToString(date_value,"%dd-%mm-%yyyy") 29-02-2000
Get short month name DateToString(date_value,"%mmm") Feb
Get long month name DateToString(date_value,"%mmmm") February
Get short weekday name DateToString(date_value,"%eee") Tue
Get long weekday name DateToString(date_value,"%eeee") Tuesday
Note that although the documentation mentions the "%eee" and "%eeee" format strings , they don't seem to work in isolation. I've had success with a format such as "%eee, %yyyy-%mm-%dd" but if I use "%eee" in isolation, I get the following error.
Transformer_2,0: Caught exception from runLocally(): APT_ParseError: Parsing pa rameters "%eee" for conversion "string=string_from_date[%yyyy-%mm-%dd](date)": APT_Conversion_String_Date: Invalid Format [%eee] used for string_from_date type conversion.

Sunday, April 7, 2013

Neeyam Thanalinu Thazhe Chords- Cocktail

Guitar tabs / chords for Neeyam Thanalinu Thazhe from Cocktail

Song: Neeyam Thanalinu Thazhe
Movie: Cocktail (2012)
Singer: Rahul Namiar | Vijay Yesudas, Thulasi Yatheendran
Music Director: Ratheesh Vegha


Check out a cover of me and my friends doing this song to get a better idea of how to play :)

 
Place a capo on the 1st fret 

Intro
Em        Bm         Em        G     


Em        C          Em           C


Em                     G                            C                  D
-7-7-7h8-5-7-3-5-5-7-5-3-2h3-0-0-0-2-3-2/3-3/4-4-------|  
-------------------------------------------------------|
-------------------------------------------------------|
-------------------------------------------------------|
-------------------------------------------------------|
-------------------------------------------------------| 
 
Em                            G                            C                  D  
-3/5-7-7-7h8-5-7-3-5-5-7-5-3-2h3-0-0-0-2-3-2-3-2-------|  
-------------------------------------------------3-0-3-|
-------------------------------------------------------|
-------------------------------------------------------|
-------------------------------------------------------|
-------------------------------------------------------| 

Em                         Bm 
Neeyam thanalinu thazhe,
C                       D
Njan ini aliyam kanavukalal
 Em                         Bm
Nin sneha mazhayude chottil,
C                       D 
Njan eni nanayam ninavukalal.
A                          
Kangalal manasin mozhikal
Em
Swanthamaki namal,
A
Neela jalakam nee
Bm
Thuranna neram.
Em                      D
Pakaram hridayamaduram
A                         D
pranayardramam.....
(Neeyam thanalinu thazhe..)

Background
Em            D            C           D

Em                D
Kattu padum aabheri raagam
A                  G         D
Modhamayi thalodiyo,

Bm                       D
Neratha sandhya meghangal ninte
C                          D
nerukayil charthi sindhooram.
Em                        A            
Niramolum nenjil oru thudi thalam thanjum neram
G                         
Tharum poovum theduvatharo
Bm
tharathirimizhiyo.
 Em                            D
Ennalum namonnayi kanum ponvanam
 C                                  D                           
charathaneram kootayi kanum ninchiriyum.
(Neeyam thanalinu thazhe...)

Nila Nila : Chords Unplugged - Tournament

Guitar tabs / chords for Nila Nila Mizhiye (Unplugged) from Tournament 

Song: Nila Nila Mizhiye
Movie: Tournament (2010)
Singer: Karthik
Music Director: Deepak Dev






















Place a capo on 1st fret

E                 Esus4       B5         
----------------------------0---|  
-----0-----0----------0---------|    
-------1-----2-2--------4-------|         
---2----------------4-----4-----|
-2-------0--------2-------------|
--------------------------------|
Nilaa         Nila           Mizhiye
----------------------------0---|  
-----0-----0----------0---------|    
-------1-----2-2--------4-------|         
---2----------------4-----4-----|
-2-------0--------2-------------|
--------------------------------|
Nilaa         Nila           Mozhiye

F#                      E
-----------------------0--------|  
-----2-----------0--------------|    
-------3---3-------1------------|         
---4-----4-----2-----2----------|
-4-----------2------------------|
--------------------------------|
Oliambine        Eruthe


B5                     E
-----------------------0--------|  
-----0-----------0--------------|    
-------4---4-------1------------|         
---4-----4-----2-----2----------|
-2-----------2------------------|
--------------------------------|
Ini   Ente   Sundariye

Now Strum the following chords

E Esus4   (Strumming DDU D)

Esus4 E B5 (DDU DU D)

E                      
----------------|  
-----------0----|    
-------1-----1--|         
-----2---2------|
---2------------|
-0--------------|
Kannadi Kadamizhi Munnil

E                      
----------------|  
-----------0----|    
-------1-----1--|         
-----2---2------|
---2------------|
-0--------------|
Ponnaninja thiri Neeyo

Esus4          E                  

Maayaathe Maayalle

Repeat the same.. 

Kalivakkin Munayude Naavil
Mulluthanne Ennaalum
Theenai theenai nee

G
Parayille ente kaathil

F#
Pathiye nee

Esus4       B5
Njaanalle Njanalle ninmaaran..

Aayiram Kannumai Guitar Chords- Nokketha Doorathu Kannum Nattu


Aayiram Kannumai - Nokketha Doorathu Kannum Nattu

Guitar tabs / chords for Aayiram Kannumai from the movie Noketha Doorath Kannum Nattu

For the version from Thattathin Marayathu (sung by Vineeth), click here.

Song: Aayiram Kannumai
Movie: Noketha Doorath Kannum Nattu (1985)
Singer: Yesudas
Music Director: Jerry Amaldev




Strumming: D UU

Capo 1st Fret

D                                                    A
Aayiram kannumaay kaathirunnu ninne njaan

A
Ennil ninnum parannakannoru

A                       D
Painkilee malar then kilee (painkilee)

(Aayiram)

----------

D                    A
Manjuveenatharinillaa

D
Painkilee malarthenkilee

D                     A
Veyil vannupoyatharinjilla

D
Painkilee malarthenkilee

(Manjuveenatharinjillaa)

Em       F#
Omane nee varum

A                     D
Naalumenniyirunnu njaan

D
Painkilee malarthenkilee

D
Vannu nee vannu ninnu neeyente

Am                 G
Janma saaphallyame

Em                      A
Vannu nee vannu ninnu neeyente

A                    
Janma saaphallyame
(Aayiram)

-----------

D                         A
Thennalum makalekiyo

D                                   A
Kunju thumbi thamburu meettiyo

G        Em           A                     D
Ullile maamayil neela peelikal veeshiyo

D                       G
Painkilee malar then kilee

A                       D
Painkilee malar then kilee (thennalum)

D
Ente ormayil poothu ninnoru

Am                     G
Manja mandaarame

Em                  A
Ennil ninnum parannupoyoru

A
Jeeva chaithannyame

(Aayiram)

Azhalinte Aazhangalil Guitar Chords- Ayalum Njanum Thammil

Guitar tabs / chords for Azhalinte Azhangalil from Ayalum Njanum Thammil

Song: Azhalinte Aazhangalil
Movie: Ayalum Njanum Thammil (2012)
Singers: Nikhil Mathew
Music Director: Ouseppachan

















F#           B                C#   
Azhalinte azhangalil aval maanju poyi

F#           B                C#            F#
Novinte theerangalil njan maathramaayi

Gm#        F#
Irul jeevane pothinju 


Am#      B
chithal praananil menju 

                   Am#      C#
kithekunnu nee shwasame

(Azhalinte..) 

------------


F#        G#          F#          C#

Pinnott nokaathe pokunnu nee

F#              C#           Dm#         B
marayunnu jeevante thirayaaya nee

F#        G#          Am#          B 
annente ull chundill then thulli nee

Am#       B            C#           F#
ini ente ull poovill mizhi neerum nee 

F#         Dm#                G#          Gm#
enthinu vithumbalaayi cherunnu nee 

Am#        B               C#
poku vishaatha raave enn nidhrayill 

F#              A
punaraathe nee...

(Azhalinte..)