All steps in a transformation runs in parallel. The hops between steps are kind of buffers of rows. The maximum number of rows is controlled with transformation parameter Nr of rows in rowset. However it is important to understand that if this parameter is set to too high value, Kettle might not be able to allocate the required amount of memory and fail with exception java.lang.OutOfMemoryError: Java heap space.
Experiment:
Heap memory for Kettle is restricted with parameter -Xmx to 256m. We will set Nr of rows in rowset to 10M. The following transformation fails after some time when the number of required memory reaches the limit.
The error:
UnexpectedError: java.lang.OutOfMemoryError: Java heap space java.lang.OutOfMemoryError: Java heap space at org.pentaho.di.core.row.RowDataUtil.allocateRowData(RowDataUtil.java:34) at org.pentaho.di.core.row.RowMeta.cloneRow(RowMeta.java:311) at org.pentaho.di.trans.steps.rowgenerator.RowGenerator.processRow(RowGenerator.java:151) at org.pentaho.di.trans.step.BaseStep.runStepThread(BaseStep.java:2889) at org.pentaho.di.trans.steps.rowgenerator.RowGenerator.run(RowGenerator.java:215)
Generate rows adds a string field with the 100 bytes string. The other two steps do not do anything significant – step Dummy can be used instead of them.
We can calculate the number of rows in hops calculating the difference between read and write. Hop between Generate Rows and Add sequence holds 6445K-3650K=2795K of rows, hop between Add sequence and Calculator holds 3650K-3435K=215K of rows. So the total is 3010K rows each of which requires at least 100 bytes. The total required memory is more than 301M bytes.