Creating Aggregate Tables

This post demonstrates an example how to create aggregation tables in Kettle.

Let's assume the original transformation loading delta is the following:

It reads data from source database, makes some transformations and loads data into data warehouse.

The source table JobEntry has two grouping fields project_id and employee_id and one additive measure registered_hrs. The task is to update aggregate tables (by project_id, and by employee_id) without complete rebuilding them.

Before updating the fact table in data warehouse, we need to retain the current value from the fact table (step Get previous fact value). After the fact table is updated, we are updating the aggregate tables. We calculate difference between the new and old value (step Calculate change), summarize the change of value to necessary granularity (steps Sort rows, Aggregate), and add the change to the value in the aggregate table (steps Get old agg.value, Calculate new value, Insert/update agg table). The transformation may look like this:

Demo Data (SQL Server)

The source database table:

create table JobEntry (
  project_id int,
  employee_id int,
  registered_hrs numeric(22,2)
)

The script changing the table (updating existing or inserting new rows):

declare
   @project_id int = RAND() * 3,
   @employee_id int = RAND() * 3,
   @registered_hrs numeric(22,2) = RAND() * 10

declare @cnt int = (
      select COUNT(*) from JobEntry
      where employee_id = @employee_id
      and project_id = @project_id
   )

if @cnt = 0
begin
   insert JobEntry values (
      @project_id,
      @employee_id,
      @registered_hrs
   )
end
else begin
   update JobEntry set registered_hrs = @registered_hrs
   where employee_id = @employee_id
   and project_id = @project_id
end

select * from JobEntry

The transformation

The data warehouse tables:

create table f_jobentry(
  project_id int,
  employee_id int,
  registered_hrs decimal(22,2)
);
create table f_jobentry_employee (
  employee_id int,
  registered_hrs decimal(22,2)
);
create table f_jobentry_project (
  project_id int,
  registered_hrs decimal(22,2)
);

Kettle transformation

Test

  • Create the necessary tables.
  • Run the script generating data.
  • Run the transformation updating the fact table and aggregate tables.
  • Check the output tables.
  • Run the script.
  • Run the transformation.
  • Check the output tables.

Summary

This approach seems too complex. Maybe the complete rebuilding the aggregate tables is not bad…