## What is inside an OLAP Cube

Posted by & filed under OLAP.

What is inside an OLAP cube and why we need a special data structure for it? What is the difference between OLAP cube and the data in a relational database?

When I tried to find information about the data model behind OLAP cubes. It appeared not so easy. First, there is no common approach to OLAP. Each vendor use own physical model of the cube and own algorithm. Second, the multidimensional approaches far more complex then approaches used for relational databases. This post describes one of the approaches.

Let's consider a simplistic example: there are tree dimensions – Product, Customer and Date. The fact is a sale of a product to a customer on a date. The measures could be the number of items sold and the total price. Both in OLAP and RDB, we store this data as a list of records. Each record contains the reference to the product sold, the customer, the date, the number of items sold and the total price. The difference is aggregations and indexing.

To provide consistent performance OLAP system stores and maintains aggregations for all possible subsets of dimensions (well, not always "all"). A cube base on a subset of dimensions is called cuboid or group-by. For n-dimensional cube there are 2^n dimension combinations. So the full data cube has 1 base cube and 2^n-1 cuboids. E.g.

• Base cube: (Product, Customer, Time)
• 2 dimensional cuboids: (Product, Customer), (Customer, Time), (Product, Time)
• 1 dimensional cuboids: (Product), (Customer), (Time)
• 0 dimensional cuboid.

This inherent feature of OLAP system can be modeled in RDB but this is not so easy and natural. The indented use of OLAP is ad hoc queries, with unpredictable set of dimensions, therefore the best solution would be to store all possible combinations. RDB is mostly used for a settled number of queries, so storing many aggregations could be wasteful.

Another requirement is the consistent performance of slicing (when one or more attributes are restricted to a single value). RDB indexes are usually not good for this purpose. Multidimensional data requires multidimensional access methods. The basic problem is the nature of multidimensional data:

There exists no total ordering among spatial objects that preserves spatial proximity. In other words, there is no mapping from two- or higher- dimensional space into one-dimensional space such that any two objects that are spatially close in the higher-dimensional space are also close to each other in the one-dimensional sorted sequence. This makes the design of efficient access methods in the spatial domain much more difficult than in traditional databases, where a broad range of efficient and well-understood access methods is available.

A solution is to use heuristics that preserves spatial proximity at least to some extent. One of the efficient approaches is Hilbert R-Tree.

## Hilbert R-Tree

Let's consider 2 dimensional cuboid e.g. Product, Time.

We build a Hilbert space filling curve. It will give us a basis for one dimensional ordering of the data.

One dimensional ordering is necessary as the data is stored on the disk which has one dimensional nature. It is important to store the data in an order that allows efficient multidimensional queries. In case of Hilbert space filling curve, it is possible to build an efficient index.

In one dimensional space, B-tree (balanced tree) is commonly used for indexes, it allows searches, sequential access, insertions, and deletions in logarithmic time. In multidimensional space, R-tree is often used. R-tree is generalization of B-tree. It groups nearby points and represents them with their minimum bounding rectangle in the next higher level of the tree.

Hilbert R-tree is an R-tree variant that uses the Hilbert curve to impose a linear ordering on the data rectangles. Hilbert-Pack algorithm can be used to create the R-tree.

In the example, we group points 1..3 into first rectangle, 4..5 into second rectangle, 7..9 into third rectangle. After this, these three rectangles are covered with fourth rectangle. This fourth rectangle is the minimum bounding rectangle of the first three. And so on.

The result is a balanced tree (all leaf nodes are on the same level). The maximum number of rectangles in a node called capacity. In the example it is 3.

The intuitive reasons why our Hilbert-based methods will result in good performance is that the resulting R-tree nodes tend to be small square-like rectangles. This indicates that the nodes will likely have small area and small perimeters. Small area values result in good performance for point queries; small area and small perimeter values lead to good performance for larger queries. It was also shown experimentally that the Hilbert curve achieves the best clustering among other space filling curves.

Search is starting from the root, it descends the tree and examines all nodes that intersect the query rectangle (yellow). At the leaf level, it reports all entries that intersect the query rectangle as qualified data items.

Ok, that is basically it. This was short consideration of an OLAP cube structure. More details here:

## Removing the Failed Instances Manually From the Database

Posted by & filed under Administration.

The error in BO XI 3.1 SP3 causes millions of failed instances generated (SAP note 1448881 "Multiple instances spawned after daylight savings time change"). If the problem is not solved timely, the BO is getting slow, services is starting to fail often. It is usually possible to remove the instances programmically (SAP note 1568718 "How to delete all failed instances in XI3.1 programmatically"), however this script does not work if the number of the failed instances large – millions of records.

SAP note 1654183 "How to delete all the failed instances in SAP BusinessObjects Enterprise manually" is not that clear and more over it is incorrect. The correct script (for Oracle) is given below.

1. Stop SIA.
2. Make backup of CMS database and FileStore folder.
3. Execute the following script:

```CREATE TABLE cms_infoobjects6_temp AS (
SELECT * FROM cms_infoobjects6
WHERE schedulestatus != 3
OR schedulestatus IS NULL);

TRUNCATE TABLE cms_infoobjects6;

INSERT INTO cms_infoobjects6
SELECT * FROM cms_infoobjects6_temp;

COMMIT;

DROP TABLE cms_infoobjects6_temp;```

4. Start SIA and test the system.

## How to Upload an Excel File to CMS Programmatically Using BO Java SDK

Posted by & filed under SDK.

```static public void uploadFile(IInfoStore infoStore, String filename, String title, int parentId)
throws SDKException
{
IInfoObjects infoObjects = infoStore.newInfoObjectCollection();
newInfoObject.setTitle(title);
newInfoObject.setParentID(parentId);
infoStore.commit(infoObjects);
}```

## Example of Token Generation

Posted by & filed under SDK.

The following JSP page generates a token and logs on to the InfoView with the token.

The following file logonAuto.jsp can be placed into the folder:

[BOInstallationFolder]\Tomcat55\webapps\InfoViewApp

## logonAuto.jsp

```<%@ page language="java" contentType="text/html;charset=UTF-8"%>
<%@ page import="com.crystaldecisions.sdk.exception.SDKException"%>
<%@ page import="com.crystaldecisions.sdk.framework.CrystalEnterprise"%>
<%@ page import="com.crystaldecisions.sdk.framework.IEnterpriseSession"%>
<%@ page import="com.crystaldecisions.sdk.framework.ISessionMgr"%>
<%@ page import="com.crystaldecisions.sdk.occa.security.ILogonTokenMgr"%>
<%
String url = "http://servername:8080";
IEnterpriseSession enterpriseSession = null;
String token = "";
try {
ISessionMgr sessionMgr = CrystalEnterprise.getSessionMgr();
"", "localhost", "secEnterprise");
ILogonTokenMgr logonTokenMgr = enterpriseSession.getLogonTokenMgr();
token = logonTokenMgr.createLogonToken("", 60, 1);
} catch (SDKException ex) {
ex.printStackTrace();
} finally {
if (enterpriseSession != null)
enterpriseSession.logoff();
}
%>
<html>
<meta http-equiv="REFRESH"
content="0;url=<%=url%>/InfoViewApp/logon/start.do?ivsLogonToken=<%=token%>"/>
<body>
</body>
</html>```

## CMS operation timed out after 9 minutes

Posted by & filed under Administration.

The error "CMS operation timed out after 9 minutes" (see below) means that the BO was not able to complete a query to a CMS database. This could be the case when the CMS database became too large (e.g. if the system is affected by the issue described on the SAP note 1448881). A workaround is to increase the timeout in the registry from 9 minutes (which is the default value) to, say, 60 minutes.

For 64bit machine:

## Undocumented parameters of BIAR Command Line Tool

Posted by & filed under Administration.

There are a number of undocumented parameters of BIAR Command Line Tool. You might find some of them interesting.

 Documented Parameter Default Comment yes action null importXML/exportXML yes userName null yes password null yes CMS null yes authentication null yes includeSecurity true yes exportBiarLocation null yes exportDependencies false yes importBiarLocation null yes exportQuery yes exportQueriesTotal no exportFileLocation null ? no importFileLocation null ? no importXmlLocation null ? no exportXmlLocation null ? no exportXsdLocation null ? no rootFolderCUID null ? no useLegacyEngine false ? no token null no resolveDuplicateNames false ? no importFileToFRS true ? no exportFileFromFRS true ? no exportStrict true ? no exportWSStrict false ? no includeHash false ? no importRelationsNotDelta false ? no xsdDelta false ? no validateXML false ? no enforceUnimportable false ? no exportCallPlugins false ? no importCallPlugins false ? no outputIds false no twoStepImport false ? no validateParents true ? no printEvents false ? no stacktrace false

## BusinessObjects Tomcat Configuration in BOE XI 3.1

Posted by & filed under Administration.

For the default installation of BOE XI 3.1, the following command opens Tomcat properties

`"C:\Program Files\Business Objects\Tomcat55\bin\tomcat5w.exe" //ES//BOE120Tomcat`

## Slowly Changing Fact Tables

Posted by & filed under Design.

Let's assume the organization has a huge number of customers and the task is to track very detailed customer profiles that include customer preferences, market segmentation, customer geography etc. The customer profiles are changing very often. The table would be very wide, huge and growing very fast.

We can split the dimension table in a number of smaller tables (mini-dimensions). The question is how to join them together to be able to get complete customer profile as at any point in time.

### Snapshot Fact Table

The first design could be to add a snapshot factless fact table that joins these customer dimensions.

This is not the best solution as the current snapshot of the customers will be added to the fact table every load. The table will grow extremely fast.

### Slowly Changing Fact Tables

Most often the rows inserted into the snapshot fact table will be the same as previous just with a new date. Therefore we can apply Slowly changing dimension technique for the fact table to avoid duplication of rows when the attributes are not changing. Let's call this Slowly Changing Fact Table.

Customer RK is retained key of the Customer dimension, it is a surrogate key that does not change from one version of the row to another.

If any attribute of a customer profile changes, a new row will be inserted in the fact table. The number of rows in this fact table will be the same as the number of rows in the original dimension table Customer Profiles, however the data will take less of the disk space.

It is often more reasonable to apply "Slowly Changing Fact" technique to prevent the snapshot fact table from growing too fast.

### References

The ideas here were taken from the Kimball Design Tip #50 Factless Fact Tables and diluted with my  interpretations :)

Kimball's Design Tip #50: Factless Fact Tables

## What is Data Vault?

Posted by & filed under Design.

Data Vault model of data warehose proposed by Dan E. Linstedt is an alternative approach to the well known Ralph Kimball's Dimensional Modeling and Bill Inmon's 3NF data warehouse. It has not gained much recognition primarily because the author is not willing to share his ideas for free. You are welcome to support Dan by buying his book or attending Data Vault certification program :)

The main benefit of Data Vault is the ease of modification: you do not have to alter anything in your current model to add a new dimension or attributes of existing dimension.

The largest drawback of this modeling technique is the larger number of joins comparing to other modeling techniques, this hurts the performance. Also such complex data model makes it not relevant for the data access layer.

So let's consider this modeling technique on an example starting from 3NF and dimensional model.

## 3NF Model

Consider the following 3NF data model. Usually it is close to the original model in the data source. There are three master tables (Customer, Order and Product) and one transactional table (Order Line).

SK fields are surrogate keys, BK fields are business keys aka natural keys. Surrogate key is a generated key used instead of combination of business key plus start date of validity (Valid From).

## Dimensional Model

In our dimensional model, we have a fact table in the middle and dimension tables around the fact table. We added order date as a separate dimension. Customer is directly joined to the fact table.

Nothing new by now, right?

## Data Vault

There are 3 types of tables in Data Vault model: Hub, Link and Satellite. Let's consider on an example of transformation of the 3NF model to Data Vault model.

1) Instead of each master table in 3NF, we add a hub and a satellite.

Hub table – contains business key and retained key. The retained key is a key which is mapped to business key one-to-one. In comparison,  the surrogate key includes time and there can be many surrogate keys corresponding to one business key. This explains the name of the key, it is retained with insertion of a new version of a row while surrogate key is increasing.

Why we need the retained key? There can be many sources, so we cannot use business key as it may have different type or different number of attributes in different source systems.

Link table – contains retained keys of the hubs. E.g. Customer Order Link links Customer Hub and Order Hub by the corresponding keys – Customer RK and Order RK. Note that one-to-many relationship was replaced with many-to-many relationship.

Satellite table – contains attributes of the original tables.

## Resilience to Change

We can add new attributes to Customer (e.g. Customer Demographics) or even a new parent table (e.g. Delivery) without any change to the existing tables. We just need to add and populate new tables.

## References

Note that this was simplified consideration, just to give an overview of the methodology. You can find detailed information on the web. I would recommend to start from wikipedia.

## Top 144 Kimball Design Tips by the Number of References in Google Search

Posted by & filed under Data Warehousing.