You are on page 1of 90

WORKSHOP

BI
Fundamentos
Adquisición de datos
Scope Part 2.
 The first lesson describes the flow of data between BI
and source systems that contain data.
 The second lesson shows the procedure for loading
master data (attributes and texts) from an SAP system.
 On the third lesson we will discuss the data transfer
process with more complexity and more details. We
will discuss the available transformation rule types and
more advanced start and end routines. In addition, we
will visualize our data in the InfoCube upon
completion.
Generic Data Warehouse Positioning of the
Data Flow

The ETL process, sometimes called the data flow is a list of the steps that raw (source)
data must follow to be extracted, transformed, and loaded into targets in the BI system
BI Architecture: Positioning of the ETL
Process
BI Data Flow Details
Source Systems and DataSource

 A source system is any system that is available to BI for data


extraction and transfer purposes. Examples include mySAP ERP,
mySAP CRM, custom system-based Oracle DB, PeopleSoft, and
many others.

 DataSources are BI objects used to extract and stage data from


source systems. DataSources subdivide the data provided by a
source system into self-contained business areas. Our cost center
example includes cost center texts, master data, and Cost Center
Transaction DataSources from two different source systems. A
DataSource contains a number of logically-related fields that are
arranged in a flat structure and contain data to be transferred into BI
Source System Types and Interfaces
Persistent Staging Area

 Persistent Staging Area (PSA) is an industry term, but not everyone


agrees on an exact definition. In response to a posting on Ask the
Experts at DMreview.com, Evan Levy defines a PSA as:
1. The storage and processing to support the transformation of
data.
2. It is typically temporary.
3. Is not constructed to support end-user or tool access.
4. Specifically built to provide working (or scratch) space for ETL
processing.
(This definition comes to us from Evan Levy.s response to a posting on
ask the experts on DMreview.com)
BI 7.0 Transformation

Once the data arrives in the PSA, you then to cleanse / transform it prior to physical storage in your
targets. These targets include InfoObjects (master data), InfoCubes and DataStore Objects.
Optional BI InfoSources
InfoPackages and Data Transfer Processes 1

The design of the data flow uses metadata objects such as


DataSources, Transformations, InfoSources and InfoProviders.
Once the data flow is designed, the InfoPackages and the Data
Transfer Processes take over to actually manage the execution and
scheduling of the actual data transfer. As you can see from the
figure below, there are two processes that need to be scheduled.
InfoPackages and Data Transfer Processes 2

The first process is loading the data from the source system. This
involves multiple steps that differ depending on which source
system is involved. For example, if it is a SAP source system, a
function call must be made to the other system, and an extractor
program associated with the DataSource might be initiated. An
InfoPackage is the BI object that contains all the settings directing
exactly how this data should be uploaded from the source system.
The target of the InfoPackage is the PSA table tied to the specific
DataSource associated with the InfoPackage. In a production
environment, the same data in the same source system should only
be extracted once, with one InfoPackage; from there, as many data
transfer processes as necessary can push this data to as many
InfoProviders as necessary.
InfoPackages and Data Transfer Processes
Initiate the Data Flow
InfoPackages and Data Transfer Processes 3

The second process identified in the figure is the data transfer process.
It is this object that controls the actual data flow (filters, update
mode (delta or full) for a specific transformation. You might have
more than one data transfer process if you have more than one
transformation step or target in the ETL flow. This more complex
situation is shown below. Note if you involve more than one
InfoProvider, you need more than one data transfer process.
Sometime necessity drives very complex architectures.
More Complex ETL: Multiple InfoProviders
and InfoSource Use
Loading SAP source system Master Data
Scenario
Global Transfer Routines

Cleansing or transforming the data is accomplished in a dedicated BI


transformation.
Each time you want to convert incoming fields from your source
system to InfoObjects on your BI InfoProviders, you create a
dedicated TRANSFORMATION, consisting of one transformation
rule for each object.
SAP Source System Extraction
DataSource Creation Access and the Generic
Extractor
Replication

In order to access DataSources and map them to your


InfoProviders in BI, you must inform BI of the name and fields
provided by the DataSource. This process is called replication,
or replicating the DataSource metadata. It is accomplished
from the context menu on the folder where the DataSource is
located. Once the DataSource has been replicated into BI, the
final step is to activate it. As of the newest version of BI, you
can activate Business Content data flows entirely from within
the Data Warehousing Workbench. During this process the
Business Content DataSource Activation in the SAP source
system and Replication to SAP NetWeaver BI takes place using
a Remote Function Call (RFC).
DataSource in BI After Replication
Access Path to Create a Transformation

In this first load process, we are trying to keep it simple. Since we added some custom global transfer
logic directly to our InfoObject, we just need field-to-field mapping for our third step:Transformation.
Transformation GUI Master Data
InfoPackage: Loading Source Data to the
PSA
Creation and Monitoring of the Data
Transfer Process
Complete Scenario: Transaction Load from
mySAP ERP
Emulated DataSources
Issues Relating to 3.x DatasSources
Using the Graphical Transformation GUI
The Transformation Process: Technical
Perspective
Start Routine 1
Start Routine 2
Transformation Rules: Rule Detail
Transformation Rules: Options and Features
Transformation: Rule Groups

A rule group is a group of transformation rules. It contains one transformation rule for each key field
of the target. A transformation can contain multiple rule groups. Rule groups allow you to combine
various rules. This means that you can create different rules for different key figures for a
Transformation Groups: Details
End Routine
Data Acquisition Layer
Extraction using DB Connect and UD
Connect
UD Connect Extraction Highlights
DB Connect Extraction
Technical View of DB Connect
XML Extraction
XML Purchase Order Example
XML Extraction Highlights
Loading Data from Flat Files: Complete
Scenario
Flat File Sources
Features of the BI File Adapter and File-
Based DataSources

Basically a DataSource based on a flat file is an object that contains all the settings necessary to load
and parse the file when it is initiated by the InfoPackage. Some of features of the BI file adapter are
listed below.
File System DataSource: Extraction Tab
File System DataSource: Proposal Tab
File System DataSource: Fields tab
File System DataSource: Preview Tab
BI Flexible InfoSources
A New BI InfoSource in the Data Flow
Complex ETL: DataSource Objects and
InfoSources
DTP: Filtering Data
Error Handling

The data transfer process supports you in handling data records with
errors. The data transfer process also supports error handling for
DataStore objects. You can determine how the system responds if
errors occur. At runtime, the incorrect data records are sorted and
can be written to an error stack (request-based database table). In
addition, another feature supports debugging bad transformations. It
is called temporary storage.
Error Processing
Features of Error Processing
More Error Handling Features
DTP Temporary Storage Features
Access to the Error Stack and Temporary
Storage via the DTP Monitor
Loading and Activation in DataStore Objects

A standard DataStore Object has three tables. Previously, we described


the three tables and the purpose for each, but we only explained that
a data transfer process is used to load the first one. In the following
section, we will examine the DataStore Object activation process,
which is the technical term used to describe how these tables get
their data. In addition, we will look at an example to illustrate
exactly what hapens when data is uploaded and subsequently
activated in a DataStore Object.
Let us assume that two requests, REQU1 and REQU2, are loaded into
the DataStore Object. This can occur sequentially or in parallel. The
load process posts both requests into the activation queue.
Loading Data into the Activation Queue of a
Standard DataStore Object
Activation Example: First Load Activated
Activation Example: Offsetting Data Created
by Activation Process 1
Activation Example: Offsetting Data Created
by Activation Process 2
If the DataStore Object was not in the flow of data in this example, and
the source data flowed directly to a InfoCube, the InfoCube would
add the 10 to the 30 and get an incorrect value of 40. If, instead, we
feed the change log data to the InfoCube, 10,-10, and 30 add to the
correct 30 value.
In this example, a DataStore Object was required in the data flow
before the InfoCube. It is not always required, but many times it is
desired
Integrating a New Target
MultiProviders

A MultiProvider is a special InfoProvider that combines data from


several InfoProviders, providing it for reporting. The MultiProvider
itself (like InfoSets and VirtualProviders) does not contain any data.
Its data comes exclusively from the InfoProviders on which it is
based. A MultiProvider can be made up of various combinations of
the following InfoProviders:
 InfoCubes

 DataStore objects

 InfoObjects

 InfoSets

 Aggregation levels (slices of a InfoCube to support BI Integrated

Planning)
MultiProvider Concept
Advantages of the MultiProvider

 Simplified design: The MultiProvider concept provides you with


advanced analysis options, without you having to fill new and
extremely large InfoCubes with data. You can construct simpler
BasicCubes with smaller tables and with less redundancy.
 Individual InfoCubes and DataStore Objects can be partitioned
separately. “Partitioned separately” can either relate to the concept
of splitting cubes and DataStore Objects into smaller ones
 Performance gains though parallel execution of subqueries.
MultiProviders Are Unions of Providers
Example: Plan And Actual Cost Center
Transactions
MultiProvider Queries
Selecting Relevant InfoProviders for a
MultiProvider
MultiProvider Design GUI
Characteristic Identification in a
MultiProvider
Key Figure Selection
Centralized Administration Tasks
Process Chains: Automating Warehouse
Tasks
Summary of Dedicated BI Task Monitors
Administration / Managing InfoCubes

The Manage function allows you to display the contents of the fact
table or the content with selected characteristic values (through a
view of the tables provided by the Data Browser). You can also
repair and reconstruct indexes, delete requests that have been loaded
with errors, roll up requests in the aggregates, and compress the
contents of the fact table. Select the InfoCube that you want to
manage and choose Manage from the context menu. Six tab pages
appear:
 Contents
 Performance
 Requests
 Roll-Up
 Compress
 Reconstruct ( Only valid with 3.x data flow objects)
Managing InfoCubes
Requests in InfoCubes
Compressing InfoCubes
Management Functions of DataStore Objects

The functions on the Manage tab are used to manage standard


DataStore Objects. Although there are not as many tabs for
managing DataStore Objects as in the equivalent task for
InfoCubes, the functions for InfoCubes are more complex. The
three tabs under the Manage option for DataStore Objects are:
Contents, Requests, and Reconstruction.
DataStore Object Administration
Contents and Selective Deletion
DataStore Object Administration: Requests
Tab

The Query icon, indicating readability by BEx queries, is set when activation is started for a
request. The system does not check whether the data has been successfully activated.
DataStore Object Change Log: Maintenance
Required

You might also like