BW Data Retention / Archival and Space Management Guidelines
This post provides guidelines for managing space in the SAP BW Production environment in general. It describes these guidelines in the context of generic business intelligence goals, principles, and architecture.
Principles of Data Warehousing Data Retention:
Data warehouses exist to serve as the “information memory” of an enterprise. Data warehouses record all the critical business transactions conducted by an enterprise, so decision makers and business strategists can leverage the information to meet their goals and objectives.
The Value of Historical Information:
The business intelligence team must ensure that information captured in the data warehouse supports both the current, known requirements for analyses, but also supports the future, unknown requirements.
Low level, historical transaction data captures information about past behavior of the enterprise that no other source of information provides. When we delete this data, we eliminate forever the ability of the enterprise to leverage the knowledge contained in the raw data. If we summarize data and delete the original details, we eliminate forever the ability to produce alternate summaries of that data.
As a general principle, then, the business intelligence group treats low level transaction data as a valuable company asset. It represents the irreplaceable raw material from which the enterprise derives most, if not all, of its information for strategic decision making.
Principle: Once changed or deleted, no process exists to recover the original detailed data. You cannot recreate details omitted from summaries.
The Paradox of Keeping Historical Data
We keep low level, historical data available because we know that some of it will prove valuable and useful in the future by fulfilling yet unknown critical information requirements. Our dilemma arises from the fact that we do not know which data about which transactions will prove the most valuable in the future.
Compared to the costs of bad strategic business decisions, the costs of online and nearline storage technologies look cheap. Most data warehousing practitioners, therefore, have adopted the position that the enterprise should store all of its detailed transaction data, as well as the corresponding master data and metadata, for all time. Data managers should find ways to retain this data that minimize both storage costs and the costs of retrieving data to meet future requirements.
Principle: Archive data no one presently uses, but only delete data permanently when it exceeds its legal retention period.
Behavioral Factors Affecting Data Retention:
Observations made by many enterprises and data warehousing practitioners show some common usage patterns across all business intelligence applications:
- Business analysts want more detail about events in the recent past. They can accept less detail (more summary) for older events. The farther back in time their analysis progresses, the less detail they usually need for that analysis.
- The dimensions by which organizations summarize transaction data tend to remain standard and static for a period of time, but not forever. These standard summaries reflect the areas of leverage upon which analysts have focused their attention.
- During a stable period with standard summaries of data, large amounts of detailed historical data go unused.
- When a major shift in business focus occurs, the standard summaries used in business intelligence applications change accordingly. This change in summary definitions makes some formerly useless historical data useful (and critical) again.
- No one can predict which new standard summaries of historical data will occur in the future, and which ones will yield the most valuable business insights. Because of this, all currently unused historical data has equal potential value to the enterprise.
Principle: A portion of the historical data currently unused will become valuable in the future. No one knows which portion, however.
Data Retention and BW
This section describes policies and guidelines for managing disk space in BW through the use of various data retention strategies.
The BW Persistent Staging Area (PSA)
The BW Persistent Staging Area, or PSA, provides a temporary, reusable space for staging source data. It increases staging options available to a BW ETL process at the expense of increased load times.
Since BW stores every extracted copy of every record in the PSA (not just the most recent one,) this space can fill up very quickly. As a result, BW developers and production applications must actively manage this space and take action to keep it from filling up.
Recommended Uses for the PSA
- Use the PSA to perform asynchronous loads if you require a separation in time between extraction of data and updating of the data targets. You can run an InfoPackage that extracts data only to the PSA, and then update data targets from the PSA at a later time.
- Use the PSA to enable restart of long running extractions that frequently fail due to missing or invalid master data.
Managing Data in the PSA
- Do not use the PSA for long term storage.
- The BW Persistent Staging Area, or PSA, exists to facilitate short term restarts and reloading after system or process failures. It stores every instance of every record loaded through the PSA. It can therefore grow very quickly if left unchecked.
- If you use the Data Acquisition layer, then the first level DataStore Objects in the data warehouse layer will always contain a full set of detailed, untransformed data. This layer provides a persistent resource for reloading objects in the Business Transformation layers, should that become necessary. You therefore do not need to preserve PSA data beyond the need for immediate restart of failed processes.
- Do not use the PSA when loading data into the data warehouse layer (first level Data Acquisition DataStore Objects,) unless absolutely necessary.
- Eliminate use of the PSA from as many data loads as possible. Use of PSA slows down data loads and consumes large amounts of disk space. Do not use the PSA when you can use other methods to safely reload or restart. For example, in most cases, you can restart a master data load just by running the InfoPackage again.
- Delete PSA entries older than seven days.
- If you use the PSA to stage data, always include a PSA deletion process in your process chain to delete entries older than seven days.
- Delete PSA entries from within process chains.
- Always include steps within your process chains to delete old entries from the PSA.
Data Retention and Archiving Guidelines
The following guidelines support the standard BW data management architecture and its principles. .
Managing Data in the Data Acquisition Layer:
- Never delete correct data from the data warehouse layer, unless it has exceeded its legal retention period.
- Archive unused detail data, but keep it available for quick reloading. The BW system permits archiving of data. Since most historical detail data remains unused for long periods of time, use the BW archiving features to remove the data from disk and place it in offline or near-line storage.
- As a rule of thumb, you can safely archive data greater than 12 months old from the data acquisition layer.
- Do not use homegrown methods of archiving BW data from DataStore Objects. Only use the BW archiving facility to archive and reload historical data.
DataStore Objects provide built-in detection of changes to records whose keys already exist in the DSO. This feature allows data targets in downstream layers to receive deltas, or offsets, in amounts and quantities when a change appears in the original OLTP system.
If you remove a row from a DataStore Object, and a change for that record occurs in the original OLTP system, BW will not process the change correctly. All subsequent data targets will receive an erroneous value for the amounts and quantities that appear on the record.
If you use BW’s built-in archiving, the system will not permit the changed record to enter the DataStore Object until you have restored the original record from archive. This prevents erroneous amounts and quantities from appearing in the Application Integration and Reporting layers.
Managing Data in the Business Transformation & Dimensional Reporting Layer:
If you use InfoSources for Business Transformation layer, then the following items do not apply. For those cases where you use DataStore objects in the Business Transformation layer, follow these guidelines:
- Archive data from the Business Transformation layer when no longer needed.
- As mentioned above, use only standard BW archiving functions to archive and reload data. Never attempt to archive data with homegrown, custom solutions.
- Typically, you can archive data from the Business Transformation greater than 3 years old.
The data in this layer supports drill through from the Dimensional Reporting layer in the rare cases where analysts need more detail than the Dimensional Reporting layer provides. It also supports reloading and re-summarizing data in the dimensional reporting layer.
For most applications, three years of online data will meet most of the detailed drill through requirements of current users. In those rare cases where users want to see older data, you can reload it from the archive.
Managing Data in the Dimensional Reporting Layer:
- Avoid storing detailed data in the Dimensional Reporting layer. Model the InfoCubes to provide the highest level of summary possible that will still support the most frequent, critical analyses. If you must report against transaction level detail in an InfoCube, then make sure your InfoCube includes aggregates that satisfy all but the most detailed requests. For example, you could create an aggregate that includes all characteristics in your cube except for document numbers and calendar day.
- Frequently delete unused data from InfoCubes in the Dimensional Reporting layer. As a rule of thumb, you can safely delete data greater than three years old. If necessary, you can always reload it from the Business Transformation layer.
- Use InfoCube compression as frequently as possible, to reduce the size of the InfoCubes and to speed up reporting performance.
Courtesy: Project and SAP help documents