Loading Dimension Tables with PyGramETL

There is a promising Python framework for ETL – PyGramETL (http://pygrametl.org/)

This post demonstrates an example how to import data from a CSV file and load it into a dimension table using PyGramETL in comparison with the same functionality implemented in a Kettle transformation.

The input file customer.txt has the following structure:

101;Paul;Brendt;19;(212) 555 2146;10 Jasper Blvd.
102;Robin;McCarthy;29;(214) 555 3075;27 Pasadena Drive
103;Peter;Travis;34;(510) 555 4448;7835 Hartford Drive

Dimention Table

MySQL is used for data warehouse. The following code create the dimension table.

CREATE TABLE customer_dim
    customer_sk INT,
    customer_id INT,
    valid_from DATETIME,
    valid_to DATETIME,
    version INT,
    phone_number VARCHAR(20),
    first_name VARCHAR(20),
    last_name VARCHAR(20),
    age INT,
    address VARCHAR(50)

INSERT customer_dim (customer_sk, version) VALUES (0, 1)


Let’s consider how this can be done in Kettle

The transformation has two steps:

The first step “Text file input” reads the CSV file. The second step “Dimensional lookup/update” generates surrogate key customer_sk as well as technical fields: valid_from, valid_to, version. The update type for first_name and last_name is set to Punch through.


The same can be done with the following code using PyGramETL

import MySQLdb
import pygrametl
from pygrametl.datasources import CSVSource
from pygrametl.tables import SlowlyChangingDimension

myconn = MySQLdb.connect(user='test', passwd='test', db='test')

connection = pygrametl.ConnectionWrapper(myconn)

customer_dim = SlowlyChangingDimension(
    attributes=['customer_id', 'valid_from', 'valid_to', 'version',
                'first_name', 'last_name', 'age', 'phone_number', 'address'],
    type1atts=('first_name', 'last_name'),

customer_dim.defaultidvalue = 0

def load_customer_dim():
    customer = CSVSource(file('./customer.txt', 'r', 16384), delimiter=';')
    for row in customer:

def main():

if __name__ == '__main__':

On the one hand, it is much easier to get overview of a transformation in Kettle. On the other hand, when the ETL transformation is small, it is much easier to see details in the Python code because you do not have to open each step to see its settings.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s