DataStage has two data types that can be used to store dates - DATE
and TIMESTAMP. Transforming data often requires that we manipulate dates
in some way. DataStage has a rich set of functions for all types of
date manipulation yet I constantly see hand-written code to perform
operations that are readily available out of the box.
The following is a discussion of some of the more common date operations and transformations.
Let's begin with some definitions of data types straight out of IBM's documentation.
Data Types
SQL Type |
Underlying Data Type |
Size |
Description |
Date |
date |
4 bytes |
Date with month, day, and year |
Timestamp |
timestamp |
9 bytes |
Single field containing both date and time value |
Validation
If you are lucky enough to be working with Date types from a database
then you won't have to worry about invalid dates but more often than
not you will have to validate a string to ensure that you have a valid
date. DataStage provides a function that can validate all data types.
IsValid(%typestring%,%valuestring%,[%format%])
Where typestring is one of the following:
"date", "decimal", "dfloat", "sfloat", "int8", "uint8", "int16", "uint16", "int32", "uint32", "int64", "uint64", "raw", "string", "time", "timestamp". "ustring"
DataStage also provides a number of convenience functions to make life
just that little bit easier so instead of the function shown above you
can use
IsValidDate(%date%)
Date Arithmetic
Adding and Subtracting Days
Converting to Julian date is quite a common way of performing date
arithmetic. Once you have a Julian day, you can simply add or subtract
any number of days from the original value.
JDATE = JulianDayFromDate(DT_VAR)
TOMORROW = JDATE + 1
YESTERDAY = JDATE - 1
LAST_WEEK = JDATE - 7
Once you have performed your arithmetic, you simply change back to a date.
NEW_DT = DateFromJulianDay(LAST_WEEK)
Another way is to use the DateFromDaysSince function.
TOMORROW = DateFromDaysSince(1,DT_VAR)
YESTERDAY = DateFromDaysSince(-1,DT_VAR)
The great thing about this function is it will accept a string so long as the string is in the format ("%yyyy-%mm-%dd").
Both options have the same drawback - no options to add months or
years. Trying to roll your own functions to add months and years
(although fairly simple) is a pain at the best of times and it's just
another component that has to be deployed. It would be nice to have a
built-in function that does the job - and there is. The best and most
flexible option is to use the DateOffsetByComponents function. This
function provides almost everything you will need for date aritmetic.
The function takes four parameters - date, year_offset, month_offset,
day_offset. In other words, this function can perform calculations
similar to the following examples.
Use
DateOffsetByComponents(%basedate%,%yearoffset%,%monthoffset%,%dayoffset%) to:
- Calculate the date 5 days before or after date
- Calculate the date 3 months before or after date
- Calculate the date 2 years before or after date
- Calculate the date 2 years, 3 months and 5 days before or after the date
Furthermore, this function also accepts strings in the format "%yyyy-%mm-%dd".
Date |
Year Offset |
Month Offset |
Day Offset |
Result |
2004-02-29 |
0 |
0 |
5 |
2004-03-05 |
2004-02-29 |
0 |
3 |
0 |
2004-05-29 |
2004-02-29 |
2 |
0 |
0 |
2006-03-01 |
2004-02-29 |
2 |
3 |
5 |
2006-06-03 |
Days Elapsed
Another common requirement is to calculate the number of days between
2 dates. This is easily accomplished with the DaysSinceFromDate
function.
DaysSinceFromDate("2001-01-20","2001-01-28") = -8
Last Day of Month
Calculating the last day of the month is a little bit more work but
is certainly easier than using if-else statements as I've seen in some
jobs. This is how it's done using stage variables. Of course you could
just use a single variable with lots of nested functions but it's a lot
easier to understand (and maintain) with mulitple variables.
svDate = inputlink.dateVar
svYear = YearFromDate(svDate)
svMonth = MonthFromDate(svDate)
svDay = DaysInMonth(svDate)
svEndOfMonth = DateFromComponents(svYear,svMonth,svDay)
Date Transformations
DataStage provides a number of format strings that can be used to
convert dates to strings and vice-versa. In the following examples, we
use a DATE variable (date_value) that contains a value of "2000-02-29".
Description |
Syntax |
Result |
Convert date to string |
DateToString(date_value,"%dd-%mm-%yyyy") |
29-02-2000 |
Get short month name |
DateToString(date_value,"%mmm") |
Feb |
Get long month name |
DateToString(date_value,"%mmmm") |
February |
Get short weekday name |
DateToString(date_value,"%eee") |
Tue |
Get long weekday name |
DateToString(date_value,"%eeee") |
Tuesday |
Note that although the documentation mentions the "%eee" and "%eeee" format strings
, they don't seem to work in isolation. I've had success with a format
such as "%eee, %yyyy-%mm-%dd" but if I use "%eee" in isolation, I get
the following error.
Transformer_2,0: Caught exception from runLocally():
APT_ParseError: Parsing pa rameters "%eee" for conversion
"string=string_from_date[%yyyy-%mm-%dd](date)":
APT_Conversion_String_Date: Invalid Format [%eee] used for
string_from_date type conversion.