Removing time part of a date fails in Kettle

Might be an interesting info.

I encountered an error running a Pentaho Data Integration transformation. After simplification I got this:

Generate Row creates one row with the date field DATETIME=1981-01-01. Calculator calculates field DATE using operation "Remove time from a date". And this trivial transformation failed with a weird error:

Unexpected error : 
java.lang.IllegalArgumentException: MINUTE
	at java.util.GregorianCalendar.computeTime(Unknown Source)
	at java.util.Calendar.updateTime(Unknown Source)
	at java.util.Calendar.getTimeInMillis(Unknown Source)
	at java.util.Calendar.getTime(Unknown Source)
	at org.pentaho.di.core.Const.removeTimeFromDate(Const.java:1958)
	at org.pentaho.di.core.row.ValueDataUtil.removeTimeFromDate(ValueDataUtil.java:628)
	at org.pentaho.di.trans.steps.calculator.Calculator.calcFields(Calculator.java:476)
	at org.pentaho.di.trans.steps.calculator.Calculator.processRow(Calculator.java:165)
	at org.pentaho.di.trans.step.BaseStep.runStepThread(BaseStep.java:2889)
	at org.pentaho.di.trans.steps.calculator.Calculator.run(Calculator.java:626)

After some digging into Pentaho code, I have replicated the error in Java:

import java.text.DateFormat;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.Date;
import java.util.TimeZone;

public class Prog {

    public static void main(String[] args) throws ParseException {
        //String dateStr = "1982-01-01", timeZoneID = "Europe/Berlin";
        //String dateStr = "1981-01-01", timeZoneID = "Asia/Singapore";
        //String dateStr = "1982-01-01", timeZoneID = "Asia/Seoul";
        String dateStr = "1982-01-01", timeZoneID = "Asia/Singapore";

        DateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd");
        Date date = dateFormat.parse(dateStr);        

        Calendar calendar = Calendar.getInstance();
        calendar.setTimeZone(TimeZone.getTimeZone(timeZoneID));
        calendar.setLenient(false);
        calendar.setTime(date);
        calendar.set(Calendar.HOUR_OF_DAY, 0);
        calendar.set(Calendar.MINUTE, 0);
        calendar.set(Calendar.SECOND, 0);
        calendar.set(Calendar.MILLISECOND, 0);

        System.out.println(calendar.getTime());
    }
}

Here Asia/Singapore is the time zone setting of the Data Integration server.

The code failed with the same error:

Exception in thread "main" java.lang.IllegalArgumentException: MINUTE
	at java.util.GregorianCalendar.computeTime(GregorianCalendar.java:2482)
	at java.util.Calendar.updateTime(Calendar.java:2265)
	at java.util.Calendar.getTimeInMillis(Calendar.java:1049)
	at java.util.Calendar.getTime(Calendar.java:1022)
	at Prog.main(Prog.java:28)

Interesting is that the code works fine for very close parameters:

  • dateStr = "1982-01-01", timeZoneID = "Europe/Berlin"
  • dateStr = "1981-01-01", timeZoneID = "Asia/Singapore"
  • dateStr = "1982-01-01", timeZoneID = "Asia/Seoul"

Asia/Seoul and Asia/Singapore is actually the same time zone.

The cause of the error has been explained here.

"When you enter a date with no time, the time is assumed to be 12:00:00 AM.

But there was no 12:00:00 AM on January 1, 1982, in Singapore. After 11:59:59 PM on December 31, 1981, Singapore jumped ahead by half an hour to 12:30 AM. It had previously been at UTC+7:30, but moved to the whole-hour zone of UTC+8."