Information Lifecycle Management (ILM) in SAP BW

This article covers good housekeeping principles specifically related to a SAP business warehouse solution. Deploying housekeeping activities to a business intelligence solution ensures that the solution can run to the optimum level. The article will cover several aspects of housekeeping from PSA deletion to storing data in near line storage.

Prerequisites

Aspects of this article will cover housekeeping of a SAP BW solution deployed using the layered scalable architecture (LSA). The following is an excellent blog on this topic:

https://blogs.sap.com/2009/05/12/sap-netweaver-bw-what-is-the-bw-layered-scalable-architecture-lsa-all-about/

Nearline Storage (NLS)

In this section, we look at Near-Line Storage (NLS) which is a very mature product for managing system size and is very effective when applied to suitable data sets.

Information Lifecycle Management (ILM) is the practice of applying policies for the effective management of information throughout its useful life. Transactional data is allocated a category according to how often it is used or "touched" for read or write. These categories are named according to temperatures. Hot data is used a lot, warm data is used a little, and cold data is used rarely. Eventually data will reach its End of Life (EoL) and can be physically deleted from all stores, when it is gone forever. Data will generally follow a lifecycle over time of being hot, warm, cold and EoL. Not all data will pass through all stages; data can jump from one to another, for example from hot to cold.This section will only cover the storage of cold data. For cold data, NLS is recommended so long as there are appropriate retention periods. NLS can handle large volumes easily without risk to memory.

Near line storage is not archiving. it is a reallocation of existing historical data with permanent access to it. When moving data on to NLS, you need to ensure you follow the following guidelines:

Straggler Management – if you reallocating data for example in a DSO using fiscal year / period and you defined a retention period of 18 months, check the change logs for this DSO (or source system) to ensure no records from the last few month’s extraction has a fiscal year/period older than 18 months. If this is the case, then you need to define a longer retention period for this DSO or this DSO might not be suitable for NLS.

Changing dates – it’s critical that the date you use for reallocating your data can never change its value. If you take clearing date on a document or a purchase order closing date for example, then this is not suitable for reallocating data as the date will change. If you use one of these time characteristics, then you’ll end up slowly double counting as documents get their dates changed and end up in NLS and live BW system at the same time.

Reporting performance - make sure all queries select on NLS time characteristic and there could be potentially poor performance with navigational attributes.

Transaction Data Deletion Strategy

As mentioned above Information Lifecycle Management (ILM) is the practice of applying policies for the effective management of information throughout its useful life. This section covers when transactional data reaches its End of Life (EoL) and can be physically deleted from all stores.

When deploying a LSA model it’s extremely important to have a ILM defined for all of the LSA layers as data is duplicated between the layers.

Once the data moves into End of life (EoL) then it should be deleted from the BW or NLS system. From experience, a deletion strategy of a number of weeks has been applied to the propagation layer (in LSA model). The propagation layer for this client was seen as a data holder and had no relevance to the BI reporting solution.

PSA & Data Acquisition Deletion

The first two layers in a traditional SAP business warehouse extraction utilizing LSA architecture is the persistence stage area (PSA) and data acquisition DSO. Data needs to be deleted for both these staging areas once the data is loaded to the propagation DSO. The deletion steps should be added into the process chain for each extraction area (for example CO-PA extraction).

If you use the new operational data provisioning (ODP) functionality that is available, you can load the data directly into the BW InfoProviders, bypassing the Persistent Staging Area (PSA) layer by using a Data Transfer Processes (DTP) make the PSA redundant.

Change Log deletion

A standard DSO has three tables new, active and change log tables. DSO uses the change log table to manage delta records.

The change log deletion activity should be a weekly activity in a time window when few or no extraction schedules are executing. From my experience, the ideal time is Friday night / Saturday morning but this obviously depends on the client’s overall extraction schedule.

From experience, the ideal retention period for change log’s is 45 days. This allows the support organization enough time to review change logs for incidents that has been raised. If there is pressure to reduce the overall size of the change logs, a suggestion would be to reduce the data flows that are stable and don’t have many data reconciliation issues but keep the problem data flows to at least 45 days.

Infocube compression and partitioning

When you load data into a InfoCube, entire requests can be inserted at the same time. Each of these requests has its own request ID, which is included in the fact table in the packet dimension. This makes it possible to pay attention to individual requests. One advantage of the request ID concept is that you can subsequently delete complete requests from the InfoCube.

However, the request ID concept can also cause the same data record (all characteristics agree, with the exception of the request ID) to appear more than once in the fact table. This unnecessarily increases the volume of data, and reduces performance in reporting, as the system must perform aggregation using the request ID every time you execute a query.

Using compressing, you can eliminate these disadvantages, and bring data from different requests together into one single request (request ID 0). This function is critical, as the compressed data can no longer be deleted from the InfoCube using its request ID. You must be certain that the data loaded into the InfoCube is correct.

With non-cumulative InfoCubes, compression has an additional effect on query performance, the marker for non-cumulatives in non-cumulative InfoCubes is updated. This means that, overall, less data is read for a non-cumulative query, and the reply time is therefore reduced.

If you want to avoid the InfoCube containing entries whose key figures are zero values (in reverse posting for example) you can run a zero-elimination at the same time as the compression. In this case, the entries where all key figures are equal to 0 are deleted from the fact table. Zero-elimination is permitted only for InfoCubes, where key figures with the aggregation behavior ‘SUM’ appear exclusively. In particular, you are not permitted to run zero-elimination with non-cumulative values.

From my experience, compression of infocubes should be done once a week with a retention of 7 days and the ideal time is Friday night / Saturday morning but this activity can be done more frequently if required.

The fact table of the new HANA optimized InfoCube can no longer be partitioned semantically, i.e. using a time characteristic to split the data into disjoint partitions. Query performance is fast even without such partitions on very large fact tables.

However, overall 4 different partitions are created automatically for every fact table: Partition 1 for non-compressed requests, partition 2 for compressed requests, partition 3 for reference points of the inventory data, and Partition 4 for so - called historic movements of inventory data. Partitions 3 and 4 are created also if the InfoCube does not contain any inventory/non-cumulative keyfigure, they are then always empty.

Even so there are no two different fact tables anymore for HANA optimized InfoCubes, you still have the option to run InfoCube - compression, i.e. aggregate an interval of Requests into Request = 0 in the InfoCube. This can be especially helpful, if you expect a significant size reduction due to many inverse bookings of requests. But it is not required to run the compression for query performance reasons, since the impact of size difference before and after compression of the fact table is negligible for the read performance.

Compressing the InfoCube can however have a positive impact on the loadtime, or to be exact the MERGE time for an InfoCube fact table. As mentioned above, fact data for Request > 0 and Request = 0 are stored in different partitions, therefore the MERGE process after loading a Request to an InfoCube only runs on the “Request > 0” partition – and the smaller this partition, the faster the MERGE. But this becomes only relevant for large and very large InfoCubes. “Request > 0” partitions with 10 - 50 million records are not a problem, i.e. if your InfoCube is only of this size, there is no need to run compression at all.

Index Creation / Deletion

With an increasing number of data records in the InfoCube, not only the load, but also the query performance can be reduced. This is attributed to the increasing demands on the system for maintaining indexes.

The system generated indexes that are created in the fact table for each dimension allow you to easily find and select the data. These indexes displayed are the secondary indexes of the F and E fact tables for the InfoCube.

When initially loading data into the InfoCube, you should not create the indexes at the same time as constructing the InfoCube, rather only afterwards.

Using the Check Indexes button in the mange tab of the infocube, you can check whether indexes already exist and whether these existing indexes are of the correct type (bitmap indexes).

Yellow status display: There are indexes of the wrong type

Red status display: No indexes exist, or one or more indexes are faulty

You can also list missing indexes using transaction DB02, push button missing Indexes. If a lot of indexes are missing, it can be useful to run the ABAP reports SAP_UPDATE_DBDIFF and SAP_INFOCUBE_INDEXES_REPAIR.

For delta uploads with a large quantity of data (more than a million records), you should not align the database indexes of the InfoCube with every roll up, rather delete the indexes first, and then completely reconstruct them after rolling up. If the data extraction volume to the cube is less than a million then it’s recommended to delete and recreate the indexes once a week.

The infocube index creation and deletion are not needed when using SAP HANA database.

DB statistics

The database statistics are used by the system to optimize the query performance. You should keep the database statistics up-to-date for this reason. It is recommended that you always update the statistics if you have loaded more than a million new records into the InfoCube since the last update. If the weekly volume to an infocube is less than one million it is recommended that you refresh the DB statistics once a week.

In the manage tab of the infocube, using the pushbutton Check Statistics, you can check the InfoCube to see if statistics exist. If no statistics exist yet, the status display changes to red. Using Recalculate Statistics, you can add the missing InfoCube statistics in the background.

You can determine the percentage of InfoCube data that is used to create the statistics. The percentage is set to 10% by default. The larger the InfoCube, the smaller you should choose the percentage, since the demand on the system for creating the statistics increases with the change in size.

For up to 10 million entries in the InfoCube, you should set the percentage of InfoCube data, that is used for creating the statistics, to 100%.

The DB statistics are not needed when using SAP HANA database.

Housekeeping on System Tables

The following is a list of housekeeping tasks on system tables.

SAP_BW_HOUSEKEEPING

SAP_BW_HOUSEKEEPING is the housekeeping task available in STC01 transaction code. In this task, deletion takes place for the tables contains data related to RSIXWW, Bookmark ID’s, DTP error logs and process chains logs, instances and traces. It’s recommend to run this housekeeping task’s weekly

RSDDSTAT_DATA_DELETE

The RSDDSTAT_DATA_DELETE program will deletes data from BW statistics tables including query statistics, deletion takes for the objects like Statistic Logging Tables, Aggregates/BWA Index Process, Delete Data, BWA Access Counter, DTP Statistics, WHM Statistics, Workspace Proposal-Statistic, BW2HANA auth. Generation and Data Slice statistics with a retention of one year. It’s recommend to run this housekeeping task’s weekly

RSORAVDV

The RSORAVDV program will display all BW shadow tables for Alerting, cockpit information, Collector jobs, Web Reports, Test Data, Lookup Table, Memory Allocation, Job Information, Object Status, DB Optimization and Sub-operations. This can be executed on an adhoc basis.

RSBATCH_DEL_MSG_PARM_DTPTEMP

The RSBATCH_DEL_MSG_PARM_DTPTEMP program will delete Old Messages, Parameters and Temp DTP Data for DTP’s. It’s recommend to run this housekeeping task’s weekly

RSBK_REORG_RSSTATMANREQMAP_DEL

The RSBK_REORG_RSSTATMANREQMAP_DEL program will delete old entries from the RSSTATMANREQMAP table. It’s recommend to run this housekeeping task’s weekly

RSSTATMAN_DELETE_OLD_REQMDEL

The RSSTATMAN_DELETE_OLD_REQMDEL program will delete old entries from the RSSTATMANREQMDEL table. It’s recommend to run this housekeeping task’s weekly

RSRA_CLUSTER_TABLE_REORG

RSRA_CLUSTER_TABLE_REORG will clean up reporting agent and broadcaster bookmarks from BW Production system. This program will delete all the user created bookmarks. It’s recommend to run this housekeeping task’s weekly.

RSBM_ERRORLOG_DELETE

Program RSBM_ERRORLOG_DELETE will delete all the DTP Error Logs from BW Production system. It’s recommend to run this housekeeping task’s weekly

RSPC_INSTANCE_CLEANUP

RSPC_INSTANCE_CLEANUP will delete old logs for process chains from BW Production system. It’s recommend to run this housekeeping task’s weekly

Business Intelligence Solutions Support Survey

If you’re open to contributing, we’d love to hear your views in a survey that we’re conducting.

The survey has 17 questions but we’ve carefully designed and tested it to ensure it can be completed quickly, it should take no longer than 10 minutes.

In return for doing so, we’ll send you a report on the study’s full findings and we think you’ll find the results to be of value to your business.

Survey

https://docs.google.com/forms/d/e/1FAIpQLSf1_eKK36uS6fQ7LOUCX3zoauhiydE2Pzwm7bGFcPQhp1S_VQ/viewform