The usual approach to build a timestamp for a log files in a batch is to use standard Windows utilities date and time. However the format of the output depends on locale and it is almost not possible to make the script which runs on any machine. A solution might be to create a small java […]
Category Archives: Kettle
Archiving Pentaho log table in SQL Server
Pentaho Data Integration displays only the last 50 log entries but anytime you open a transformation it has to read the whole log table. When the log table gets large, opening a transformation may take long time and PDI freezes for this time. A solution is to archive old entries from the log table. Create […]
Removing time part of a date fails in Kettle
Might be an interesting info. I encountered an error running a Pentaho Data Integration transformation. After simplification I got this: Generate Row creates one row with the date field DATETIME=1981-01-01. Calculator calculates field DATE using operation "Remove time from a date". And this trivial transformation failed with a weird error: Unexpected error : java.lang.IllegalArgumentException: MINUTE […]
How to Obfuscate Password for Command Line Kettle
Pentaho Kettle does not provide possibility to obfuscate password in batch files (though the connection passwords are obfuscated in XML files). You can write: kitchen.bat /rep repos /job test /user admin /pass admin but this will not work: kitchen.bat /rep repos /job test /user admin /pass "Encrypted 2be98afc86aa7f2e4cb79ce71da9fa6d4" Here "Encrypted 2be98afc86aa7f2e4cb79ce71da9fa6d4" is encrypted version of "admin". You can get […]
Handling Records Removal in Dimension Tables
The records can be removed from a table in a operational system, but we never remove records from dimension tables. This post describes how the removal of records can be handled in dimension tables. Let's consider an example. There is a list of contact persons in a source system. The list is loaded into a […]
Creating Aggregate Tables
This post demonstrates an example how to create aggregation tables in Kettle. Let's assume the original transformation loading delta is the following: It reads data from source database, makes some transformations and loads data into data warehouse. The source table JobEntry has two grouping fields project_id and employee_id and one additive measure registered_hrs. The task […]
Use of Index and Characterset
This post demonstrates influence of characterset on use of indexes. See also Language and Use of Indexes in Oracle /* DROP TABLE t; ALTER SESSION SET NLS_LANGUAGE='AMERICAN'; */ SELECT PARAMETER, VALUE FROM V$NLS_PARAMETERS; CREATE TABLE t (x VARCHAR2(255)); INSERT INTO t VALUES ('a'); INSERT INTO t VALUES ('b'); INSERT INTO t VALUES ('c'); INSERT INTO […]
Language and Use of Indexes in Oracle
Symptoms It takes a very long time to load a data warehouse on Oracle database with Kettle when the language in the Windows Region and Language settings is different from English (United States). Cause Oracle JDBC driver used by Kettle issues command ALTER SESSION SET NLS_LANGUAGE and changes default database language. Changing NLS_LANGUAGE also changes […]
Io exception: Socket read timed out
If you get the following message, check the firewall settings. ERROR 30-09 09:13:36,485 – YOUR_LOG_CONNECTION – Error disconnecting from database: Error comitting connection Io exception: Socket read timed out When the transformation/job is started, Kettle opens a connection for logging. The transformation could run for hours, the connection for logging will stay idle all this […]
Influence of Nr of rows in rowset on Merge Join
Let's consider a transformation that merges two flows: Here are some experiments with different Nr of rows in rowset: So the speed of the Merge join depends on the parameter Nr of rows in rowset. It should be reasonably high (3000K). Note that if the parameter is too high, the transformation might fail with exception: java.lang.OutOfMemoryError: […]