You are on page 1of 385

History Preserving with precise timestamps

Posted by Martin Bernhardt 13-Jul-2015


SAP Data Services' HistoryPreserving transform does a
good job to reduce complexity of history preserving when
loading data into a data warehouse. However, it has the
limitation that ValidFrom and ValidTo columns can only be
a date - not a timestamp. So to allow for history
preserving of intra-day changes, we need a workaround.
In this blog post I'm showing how this can be achieved by
using the Map_Operation transform multiple times:

The dataflow shown above loads table CUSTOMER to


table CUSTOMERHIST. CUSTOMER has 3 columns (PK
INTEGER, FirstName VARCHAR(100), LastName
VARCHAR(100)) with PK being the primary key.
CUSTOMERHIST has two more columns ValidFrom, ValidTo
both of type TIMESTAMP; Its primary key is (PK,
ValidFrom). We also need to set a variable $CurDateTime

at the beginning of the job to use the exact same


timestamp in UPDATEs and INSERTs:
$CurDateTime = concat_date_time(sysdate(),
systime());

The TableComparison transform looks up incoming


records in CUSTOMERHIST that have the same value in
field PK and where ValidTo is null (compare to current
records only). In this example we also enable "Detect
deleted row(s) from comparison table".

TableComparison outputs rows of type INSERT, UPDATE


and DELETE. We are multiplying this output and send it to
Map_Operation transforms: one for INSERTs (I2I), two for

UPDATEs (U2U, U2I) and one for DELETEs (D2U). In the


"Map Operation" tab of each Map_Operation transform we
configure the output type of the record; we discard the
records that are handled by the other Map_Operation
transforms:

Operation/Transfo
rm
NORMAL
UPDATE
INSERT
DELETE

I2I
NORMAL
DISCARD
INSERT
DISCARD

U2U
DISCARD
UPDATE
DISCARD
DISCARD

U2I
DISCARD
INSERT
DISCARD
DISCARD

D2U
DISCARD
DISCARD
DISCARD
UPDATE

Now we set the column mapping for each case:


I2I:
If TableComparision sends an INSERT, there's not much
todo. We keep the values as they are and just set the
ValidFrom column to $CurDateTime

U2U:
If TableComparision sends an UPDATE, the first action is
to set ValidTo to $CurDateTime since this record is not the
current one anymore. We want to keep FirstName,
LastName as it is in CUSTOMERHIST and do not overwrite
it with the new values from CUSTOMER, so we need to
map FirstName, LastName to
before_image(FirstName),before_image(LastName), resp.

U2I:
If TableComparision sends an UPDATE, the second action
is to insert a new record with the current values from
CUSTOMER. As I2I above, we just need to set ValidFromto
$CurDateTime. Note that we need to change the update
section (middle), not the insert/normal section (left).

D2U:

If TableComparision sends a DELETE, we need to update


the current record by setting ValidTo to $CurDateTime.

With this configuration, the four Map_Operation


transforms together replace one HistoryPreserving
transform. The example does not include an ISCURRENT
column but it should be straight forward to do this
enhancement. If there is a generated key column in
CUSTOMERHIST, this could be populated using the
KeyGeneration transform after merging U2I and I2I. The
picture below show the status of both tables after
inserting, updating deleting a record in table CUSTOMER:
History preserving after INSERT of record 4711:

History preserving after UPDATE of FirstName to 'John':

History preserving after UPDATE of FirstName to 'John D.':

History preserving after DELETE of record 4711:

File path configuration in BO Data Services


Posted by Debapriya Mandal 19-Apr-2012
In this article I shall be describing how to configure the
BODS for file access so that multiple teams can work
simultaneously.

In this particular situation there were 2 different


projects/releases being developed simultaneously: REL1
and REL2. For each of these releases, there are 2 different
teams were involved: the development team and the
testing team. These projects have a number of tab
delimited flat files as input. After cleansing,
transformation and validation in BODS, the output is
again written to flat files so that they can be loaded to
SAP system by LSMW transaction codes. We need to
maintain different paths for Input files and output files
based on the release, and the usage (DEV or TEST).
This can be achieved by maintaining the separate paths
to be used by the different teams in the BOBJ server and
then maintain reference to these paths via the database.
As is shown in the image below, we have created a table
(TEST_BOBJ_PARAMETERS) which stores the file path
configured in the BODS server in the columns OBJ_TEXT.
The columns RELEASE, CONVERSION and OBJ_NAME can
be used as key to determine the required folder path.
Image for table TEST_BOBJ_PARAMETERS

For example, if I am part of the development team and I


need to determine the input and output folder paths for
Release 1 then I shall be using the below key values:

For Input folder: RELEASE = REL1, CONVERSION=DEV,


OBJ_NAME=INPUT_FOLDER
For Output folder: RELEASE = REL1,
CONVERSION=DEV, OBJ_NAME=OUTPUT_FOLDER
In order to ensure uniform usage of these settings, the
following global variables need to be created in every job:

A initializing script can then set the input and output


folder paths dependent on the key values . This script will
contain a sql query to the table TEST_BOBJ_PARAMETERS
and will fetch the file path based on keys set in global
variables.

In actual case it is sufficient to set the folders once only


with the correct set of KET values. For demo purpose I
have run the job after setting the KEY values for different
combinations of RELEASE and CONVERSION.

Data Services code migration & maintenance: simplify


your lifecycle & improve your code
Posted by Scott Broadway 06-Nov-2012
For so many of my customers, SAP Data Services is a
relatively new tool. A typical DS project is mainly focused
on ensuring the solution works for the business and is
launched on time. Unfortunately, many of these projects
fail to utilize some of the built-in features of Data Services
to help simplify how code is managed in the solution. This

is an architecture gap that adds hidden costs to owning


and operating the solution.

In this article, I outline the framework for managing Data


Services code that I have taught to dozens of the largest
customers in the Americas. Ideally, you should implement
this during the blueprint phase so that you provide your
developers with the tools and processes to create better
code the first time. However, you can still benefit from
this framework even if you are approaching the testing
phase of a Go-Live.

The elements of this framework include:

1. Implement multiple central repositories


1.1 Additional best practices for central
repositories
2. Define substitution parameters & multiple
substitution parameter configurations
3. Define datastore configurations
3.1 Use aliases to map table owners (optional)
4. Define system configurations to map together
combinations of substitution parameters & datastore
configurations

1. Implement multiple central repositories

In Data Services, a "central repository" is a different type


of repository used only for version control of objects. This
is comparable to version control systems like CTS+,
Visual SourceSafe, Apache Subversion, BusinessObjects
LCM, etc. You can check in new code, check it out to work
on it, check in new versions, and get copies of specific
versions.

Many customers do not use central repositories. Instead,


they create their code in a local repository, export the
code to an ".atl" file, and import it into the test or
production local repository. You can save backups of
the .atl file and keep track of them in a number of
ways...even Apple Time Machine and Dropbox can keep
track of multiple versions of a file through time. However,
this is likely not a scalable or trustworthy solution for
enterprise IT.

If you want to learn how to work with a central repository,


the Data Services Tutorial Chapter 12 "Multi-user
Development" does a fantastic job at demonstrating all
the techniques. The "Using a Central Repo" Wiki Page also
captures some of the basic techniques. But neither will
tell you why, or discuss how you should set up your
landscape or processes.

[Note: There are two different types of central


repositories: non-secure and secure. Secure central
repositories allow only specific users permissions on
specific objects and provide an audit trail of who changed
which objects. Non-secure central repositories lack these
features. Due to this gap, I never recommend the use of
non-secure central repositories. In this article, whenever I
refer to a central repository, I am talking about secure
central repositories. Chapters 23-24 in the Data Services
Designer Guide discuss these differences.]

This is how I recommend for you to configure up your


secure central repositories.

Development Central a central repository that


can be accessed by developers and testers. Developers
create content in their local repositories and check in this
content into the development central repository. Each
logical set of code should be checked in with the same
label (e.g. "0.1.2.1a") so that they can be easily identified
and grouped together.
During a test cycle, a tester logs into a local repository
dedicated to testing and connects to the development
central repository. The tester gets all objects to be tested
from the development central repository. The tester
deactivates the connection to the development central
repository and then connects to the test central
repository.

Test Central a central repository that can be


accessed by testers and production administrators.
During the test cycle, testers check in development
objects before and after testing, labeling them
appropriately (e.g. "0.1.2.1pretest" and "0.1.2.1passed").
Thus, the test central repository contains only objects
that have been promoted from development to test and
have passed testing.

Production Central a central repository that can


be accessed only by production administrators. When
testers certify that the code can be migrated to
production, a production administrator logs into a
production local repository. The administrator activates a
connection to the test central repository and gets a copy
of all objects to be promoted to production (e.g.
"0.1.2.1passed"). The administrator deactivates the test
central repository and then activates the production
central repository. All objects that were promoted into
production are then checked into the production central
repository (e.g. "0.1.2.1prod"). Thus, the production
central repository contains only objects that have been
successfully put into production.

Remember, central repositories are only for version


control, storing your code, and helping you migrate it. You
never run batch jobs or launch real-time services from a
central repo -- only from a local repo.

This tiered approach plan looks like this:

The repositories themselves are just database schemas -you can put them in any supported database. Check
theProduct Availability Matrix for Data Services to see
which databases are supported. However, I would
recommend for you to group them together within the
same physical database within your specific tier. For
instance:

Dev database -- dev local repositories, dev central


repository, and dev CMS database. Co-located with the
dev Data Services hardware.

Test database -- test local repository and test central


repository, and test CMS database. Co-located with test
Data Services hardware.

Prod database -- prod local repository and prod


central repository, and prod CMS database. Co-located
with prod Data Services hardware.

1.1 Additional best practices for central


repositories

Security -- Set up group-based permissions for


repository authentication and for individual objects. Refer
to theDesigner Guide section 24.1.1, Management
Console Guide section 3.3.1, and Administrator's
Guide section 4.1.

Checking out datastores -- Using the security


features of secure central repositories, make sure that
only specific groups have read+write permissions on
datastores. Everyone always has permissions to edit
datastores in their local repository, but it would be
disorganized to let all of them check in these datastore
changes to the central repository. Thus, you should have
administrators create your datastores and check them
into your local repository. Anyone can get them from the
central repo but only administrators have permissions to
check them out, modify them, and check in their
changes. For more info on defining datastores, see below
"3. Defining datastore configurations".

Backup -- These repositories contain most of your


investment in your DS solution! Make sure to back up
these databases regularly as you would with any other
database. Too often I see no backups taken on the
development central repository because "it's not part of
the productive tier." This is a terrible business decision!

What if your development central repository database


crashes and your developers lose everything?

Designer performance -- DS Designer requires a


database connection to the local and central repositories.
I always meet people who complain about Designer being
too slow. Okay, but you are using Designer on your laptop
in the Toronto airport from a VPN connection to your
Seattle network hub and the repo database is in your
Chicago datacenter. Designer performs numerous small
transactions that each require network round-trips -- if
the connection is slow, Designer is going to be slow to
save anything to your local repository or interact with a
central repository.
Are you regularly using a thick-client Windows app like
Designer from remote locations? Maybe you should think
about putting Designer on Citrix Presentation Server -check the Installation Guide for Windows section 6.6.
Additionally, Designer 4.1 introduces the ability to use DS
under multiple Windows Terminal Server users.

Concurrent usage -- I often hear issues about


developers connected to the same central repo who have
their Designer hang up on them whenever their
colleagues do anything ("Get Latest Version", "Check-Out
by Label", etc.). To protect the code from being corrupted
by multiple people trying to do multiple things at the
same time, Designer performs table locking on certain
central repo tables. While one user has an exclusive table
lock on a central repo table, any other users trying to
interact with the same table will be queued until the first
user's exclusive table lock is released. How to work
around this? Simple -- don't keep your connection to the
central repo active all the time. There's a Designer option

that allows you to activate a central repo connection


automatically, and you should disable this option. Only
activate your central repo connection when you need to
get code from or check code into the central repo.

2. Define substitution parameters & multiple substitution


parameter configurations
Substitution Parameters are such a handy feature, but I
seldom see them used to their full potential! If you know
C++, they are similar to compiler directives. They are
static values that never change during code execution (so
we don't call them variables). They are called
"substitution" parameters because their values
get substituted into the code by the optimizer when you
run the job. They can thus change the run-time behavior
of your code.

Often I see many programmers use a script block at the


beginning of a job to set global variable values. These
global variables are then used to control the logic or
mappings later in the job. However, in 90% of these cases
the global variables NEVER CHANGE during runtime. So
now you have several problems in your code:

You hid your global variable declarations in a script


somewhere in your job. How do you expect other people
to understand what you did in your code?

A global variable is specific to one job only. Other


jobs do not inherit global variable names, types, or
values. So if you have 100 jobs that use a variable named
$START_DATE, you have to declare $START_DATE in every
one of those 100 jobs.

Global variables have no way of being set quickly en


masse. You can override them individually at run-time,
but this introduces the risk of human error.

Substitution parameters fix all of these global variable


short-comings. They are defined for an entire repository,
not per individual job. Their values are controlled at a
repository level, so you don't have to include scripts to
set them. They cannot change through run-time, so they
don't have the risk of being modified erroneously. Lastly,
they don't just have one default value -- you can set up
multiple substitution parameter configurations for your
repository so that you have multiple different sets of runtime values.

Here are some common uses for substitution parameters:

File paths and file names -- tell jobs where to find


files in a specific staging area or target location. If you
always set flat file and XML file sources and targets to use
substitution parameters instead of hard-coded paths, you
can change all file locations at once globally instead of
having to find every single object, drill into it, and change

the path. This is also used to specify reference data


locations.

Control logic -- tell the same job how to run


differently if a different substitution parameter value is
found. You can use this to set up one job that does both
initial loading and delta loading. You can have a
conditional block evaluate a parameter named [$
$IS_DELTA] and decide whether to process the "delta"
workflow or the "initial" workflow. This lets you have
fewer jobs and simplifies your life!

Transform options -- tell transforms to behave in a


specific way. This is often used in Data Quality transform
options to set country-specific options, engine options,
performance parameters, or rules. However, you can use
them in most of the transforms and mappings to override
hard-coded values with your own substitution
parameters.

Substitution Parameter Configurations are helpful


because they let you set multiple different sets of
substitution parameters. You can use this to set up
multiple configurations for:

Dev / Test / Prod


Initial vs. Delta
Enabling verbose debug code in your own script
blocks or custom functions

Specifying multiple file paths, e.g. fileshares in


Chicago, L.A., Shanghai, Wrocaw, and San Leopoldo.

Substitution Parameters are not objects that can be


checked into a central repository, since they aren't
actually code objects. As such, there is a specific way to
move them between local repositories. You must export
them to an ATL file and import them into another local
repository. Please refer to the example below:

This is an additional step to include in your migration


plans from Dev -> Test -> Production. However, it is
relatively quick procedure for an administrator.

3. Define datastore configurations

Datastore mistake 1: In many customer environments, I


log into a local repository and see several datastores
named similarly ("HANA_TARGET_DEV",
"HANA_TARGET_QA", and "HANA_TARGET_PROD"). Or
maybe I see many SAP datastores named after their SIDs
("BWD", "BWQ", "BWP). If you make this mistake, you
need to go through the following unnecessary steps:

If you move a job from development to test, you


have to edit every single dataflow and delete every single
table object, replacing the table objects from datastore
"HANA_TARGET_DEV" with the ones from
"HANA_TARGET_QA".

This increases the risk of human error -- what if you


pick the wrong table by mistake?

This increases the number of table objects to


maintain -- you have to import the same table object 3
different times, one from each different datastore.

You risk having differences in the table metadata


from the different development/test/production
datastores. Don't you want to ensure that the code is
always the same?

Datastore mistake 2: Since this gets to be so timeconsuming, many developers realize that they can just
reuse one datastore from dev to test to production. So
you see a datastore named "HANA_TARGET_DEV" or
"BWD" in a production local repository. In this case, the
administrators just explain how they change the
hostname, username, and password of the datastore

when they move it to test or production. Though this


sounds simple, you still run the risk that you must change
more than just username/password. In the case of an SAP
ECC source datastore, are the transport file paths the
same between your different ECC sources?

The solution to both of these mistakes? Datastore


configurations.

Datastore configurations are very powerful. They allow


you to have a single datastore that can connect to
multiple different sources. They work very similar to
substitution parameter configurations: at run-time, the
optimizer selects a single configuration, and this
connection information is used for the entire execution of
the job and cannot be modified. You set them up in the
datastore editor...the Data Services Wiki shows a good
example.

I would strongly urge you to avoid the two mistakes


above by starting your project with the following
principles:

1.

Give datastores meaningful names that describe


their data domain. Do NOT name them after a specific

tier (dev/test/prod) or a specific region (AMER/EMEA/APJ)


or a specific database ("DB2", "HANA", "SYBASEIQ") or a
specific SAP SID (ECD/ECQ/ECP). Just name them after
their data: "SALES", "VENDOR", "MATERIALS", "VERTEX",
"BANKING". This is important because you cannot
rename a datastore once it is defined.
2.
Set up multiple datastore configurations inside of
each datastore. Multiple datastore configurations should
be used when the same metadata exists in multiple
systems. If the metadata is different between two
systems, they belong in separate datastores.
3.
If you have Dev/Test/Prod tiers, make sure to set up
separate datastore configurations for Dev/Test/Prod in
your development local repositories. No, you don't have
to know the correct usernames/passwords for the test or
production systems (and in fact, this would be a serious
risk!). Get them set up anyway! When testers and
production administrators go into production, the only
thing they will need to change will be the username and
password. This helps avoid the risk of human error during
a critical Go-Live.
For advanced users, you can even use datastore
configurations to move from one database platform to
anotherwithout having to re-develop all your code.

3.1 Use aliases to map table owners (optional)


If you are using database sources or targets, these tables
always have an owner name or schema name (e.g.
"SCOTT"."TIGER"). In the Data Services Designer

interface, these owner names exist but are not usually


very obvious to the user.

This is usually a problem that manifests itself when you


migrate from Dev to Test or Test to Production. Let's say
you developed your dataflow and used a source table
named "SQLDEVUSR1"."EMPLOYEES". The username
"SQLDEVUSR1" is the table owner. You also set up a
second datastore configuration for the Test environment,
and the username is "SQLTESTUSR5". When you run the
job and set the Test datastore to be default, the job
crashes at this dataflow with a "TABLE NOT FOUND" error.
Why? It connected to the database specified in the Test
datastore configuration as username "SQLTESTUSR5" and
tried to find a table named "SQLDEVUSR1"."EMPLOYEES".
This is a design problem, not a Data Services error.

Instead, you need to tell Data Services how to interpret


the name "SQLDEVUSR1" differently depending on which
datastore configuration is active. There is a feature called
"Aliases" in each database datastore that lets you control
this!

You can create one or more aliases in each database


datastore to automatically change the table owner name
defined in the dataflow with the table owner name of your

choice. At runtime, the optimizer does a search and


replace through the code for any objects from that
datastore and maps an alias named 'SQLDEVUSR1" to be
"SQLTESTUSER5".

Here's another example:

This is a little-known feature, but it saves you a ton of


time if you have many developers who connected to
various sandbox databases when developing the code.
You can simply set up multiple aliases to search for
various (and possibly incorrect) owner names and map

them to what their "real" owner names should be within


your official Dev/Test/Production datastore configurations.

4. Define system configurations to map together


combinations of substitution parameters & datastore
configurations

At this point, you have done the following:

Created substitution parameters


Created multiple substitution parameter
configurations to control various aspects of run-time
behavior

Created datastores

Created multiple datastore configurations to connect


to different sources of data that have identical metadata

Your setup might look like this:

The final step is to create system configurations. These


are combinations of datastore configurations and
substitution parameters that let you set up job execution
profiles that can be quickly and easily set at run-time.
The optimizer then chooses only that combination of
configurations for the execution of the entire job. If you
have never defined a system configuration in a specific
local repository, you will never see it as a drop-down
option when you try to run a job. However, after you
configure system configurations, you will now see a
convenient drop-down box that shows the names of your
various system configurations:

If we use the example above with the 3 datastores with 3


different configurations and the 6 different substitution
parameter configurations, you can now create system
configurations as combinations of these. Here is how you

might set up your system configurations:

After this, when you run or schedule a job, you would see
a drop-down with your 6 different system configuration
names:

DevInit
DevDelta
TestInit
TestDelta
ProdInit
ProdDelta

To be honest, this isn't a very good example. Why would


you want your production local repository to have the
ability to easily execute jobs in your Dev landscape?
Thus, you would probably want to set up system
configurations that specifically correspond to the
substitution parameter configurations and datastore
configurations that you really want to use when you
actually run the jobs. So in this example you would
probably want to set up your production local repository
system configurations to only include "ProdInit" and
"ProdDelta" so you never make the mistake of selecting
one of the Dev or Test configs.

What if you don't select a system configuration at runtime? Each datastore has a "default" datastore
configuration. Likewise, there is also a default substitution
parameter configuration. If no substitution parameter
configuration is selected, the optimizer selects the default
datastore configuration for each datastore and the default
substitution parameter configuration for that repository.

Similar to substitution parameter configurations, system


configurations cannot be checked into a central
repository. They can be migrated in the same way you
saw above with exporting substitution parameters to an
ATL file. However, this is probably not necessary -system configurations are very quick to define, and you

probably only want to create them for the environment


that you want to run in (e.g. "ProdInit" and "ProdDelta").

Let me know if this framework makes sense. If you see


weird errors, search the KBase or file a SAP Support
Message to component EIM-DS.

Legal Disclaimer
35755 Views 29 Comments PermalinkTags: application_lif
ecycle_management, data_integration_and_quality_mana
gement, eim, data_services,version_management, code_e
ncapsulation, version_control, central_repositories, code_r
eusability

Whats new in Data Services 4.1


Posted by Louis de Gouveia 29-Oct-2012
In this blog I'm going to focus on some of the new
features in Data Services 4.1.

Lets jump straight into the new features.

When designing, developers can now preview the results


of the transform. So no need to execute the data flow first
and view the results.

There is a new transform for manipulating nested data.


This will be useful for working with XML, iDocs, BAPI calls
and Web Service calls.

In previous versions of Data Services, when using ABAP


data flow in the background data was transfered to a file,
data services would then read this file to get the data. In
4.1 we can bypass that by making use of RFC connection.
This allows us to stream the data directly from SAP to
Data Services. No need for staging data in files anymore.

Data services 4.1 now supports Hadoop as a sources and


target.

Within this release there is a new eclipse based user


interface tool known as data services workbench. This
tool can be used to easliy transfer tables and the data
within the tables into HANA, IQ or Terradata. The
advantage of this tool is we do not need to create all the
data flows, mapping, etc. It will create it automatically in
the background. Also data services workbench can
monitor loads within the workbench.

The monitor sample rate is now time based instead of row


based. There has also been improvements to what is
written to the logs, now contains CPU usage for each
transfom and input row buffer utilization % for each
transform.

Enhanced HANA support including

Repository support for HANA.


Bulk updates - your updates can now be at the same
speed as your inserts. Updates are now pushed down to
the in-memory database

Support for stored procedures - you can call stored


procedures from within your DS jobs.

There are a few other features, but these were the main
features that caught my eye.

Hope you enjoyed reading the blog

Follow me on twitter @louisdegouveia for more updates

SAP Data Services


Running & Scheduling Data Services Jobs from
Linux Command Line using Third party Scheduler

Document Information
Doc SAP Data Services - Running & Scheduling Data Services
ume Jobs from Linux Command Line

nt
Title:
Doc
ume
nt
Purp
ose:
File
Nam
e:
Refe
renc
e

The purpose of this document is to provide details on


starting Data Services (BODS) Jobs from Linux/Unix
Command Line utility and scheduling the Jobs on Unix
Platform.
SAP BODS - Starting Jobs from Linux Command
Line.DOCX
Admin
Guide:http://help.sap.com/businessobject/product_guides/boexi
r32SP1/en/xi321_ds_admin_en.pdf

Table of Contents
1.
Introduction.
2.
Using a third-party scheduler
3.
To export a job for scheduling.
4.
Setting up a cron job in UNIX-type operating
systems.
5.
To execute a job with a third-party scheduler
6.
About the job launcher
7.
Job launcher error codes.
1. Introduction
SAP BODS Jobs can be started & Scheduled from other
Operating Systems like Linux, HP-Unix etc in addition to
windows using third party utilities.
This Document provides information on running &
scheduling the SAP BODS jobs from UNIX command
prompt utility (Crontab).
2. Using a third-party scheduler

When you schedule jobs using third-party software:


The job initiates outside of SAP BusinessObjects Data
Services
The job runs from an executable batch file (or shell
script for UNIX) exported from SAP BusinessObjects Data
Services.
Note:
When a third-party scheduler invokes a job, the
corresponding Job Server must be running.
3. To export a job for scheduling
1.
2.
3.

Select Batch > repository.


Click the Batch Job Configuration tab.
For the batch job to configure, click the Export
Execution Command link.
4.
On the Export Execution Command page, enter
the desired options for the batch job command file that
you want the Administrator to create:
Option

File name
System
configuration

Description
The name of the batch file or script
containing the job. The third-party
scheduler executes this file. The
Administrator automatically appends the
appropriate extension:
.sh for UNIX
.bat for Windows
Select the system configuration to use
when executing this job. A system

Job Server or
server
group

Enable
auditing

Disable data
validation
statistics
collection

Enable
recovery

configuration defines a set of Datastore


configurations, which define the Datastore
connections.
If a system configuration is not specified,
the software uses the default Datastore
configuration for each Datastore.
This option is a run-time property. This
option is only available if there are system
configurations defined in the repository.
Select the Job Server or a server group to
execute this schedule.

Select this option if you want to collect


audit statistics for this specific job
execution. The option is selected by
default.

Select this option if you do not want to


collect data validation statistics for any
validation transforms in this job. The
option is not selected by default.

Select this option to enable the automatic


recovery feature. When enabled, the

Recover from
last failed
Use password
file

Collect
statistics for
optimization

Collect
statistics for
monitoring

software saves the results from completed


steps and allows you to resume failed jobs.
Select this option to resume a failed job.
The software retrieves the results from any
steps that were previously executed
successfully and re-executes any other
steps. This option is a run-time property.
This option is not available when a job has
not yet been executed or when recovery
mode was disabled during the previous
run.

Select to create or update a password file


that automatically updates job schedules
after changes in database or repository
parameters.
Deselect the option to generate the batch
file with a hard-coded repository user
name and password.
Select this option to collect statistics that
the
optimizer will use to choose an optimal
cache
type (in-memory or pageable). This option
is
not selected by default.

Select this option to display cache


statistics

in the Performance Monitor in the


Administrator.
The option is not selected by default.
Use collected
statistics

Select this check box if you want the


optimizer
to use the cache statistics collected on a
previous execution of the job. The option is
selected by default.

Export Data
Generates and exports all specified job
Quality reports reports
to the location specified in the
Management >
Report Server Configuration node. By
default,
the reports are exported to
$LINK_DIR\DataQuality\re
ports\repository\job.
Distribution
level

Select the level within a job that you want


to
distribute to multiple Job Servers for
processing:
Job: The whole job will execute on one
Job Server.
Data flow: Each data flow within the job
will execute on a separate Job Server.
Sub data flow: Each sub data flow (can
be a separate transform or function) within

a data flow can execute on a separate Job


A.
Server.

1.

Click Export.
The Administrator creates command files filename.txt
(the default for filename is the job name) and a batch file

for the job and writes them to the local LINK_DIR\log


directory.
(E.g. C:\Program Files\Business Objects\BusinessObjects
Data Services\log )
Note:
You can relocate the password file from the LINK_DIR\conf
directory, but you must edit the filename.txt file so that it
refers to the new location of the password file. Open the
file in a text editor and add the relative or absolute file
path to the new location of the password file in the
argument R "repositoryname.txt".
If you are Exporting the Job for Unix Environment append
.sh extension to the File Name

4. Setting up a cron job in UNIX-type operating


systems

Cron jobs can be used for setting up scheduled Job runs in


UNIX-type operating systems, e.g. UNIX, Linux, FreeBSD
and Darwin (Mac OS X).
Steps:
Note: The syntax may differ, depending on which version
of cron is present on your computer.
Open a root shell and type the following:
crontab -u root -e
to open a VI style editor. Press 'i' to insert text.
The crontab comprises five entries indicating the
schedule time, and also the name and path of the
program to be run. Use a space or a tab between each
entry:
minute(0-59) hour(0-23) day_of_month(1-31)
month(1-12) day_of_week(0-7)
/path/Job_BODSJobNM.sh
NOTE: Job_BODSJobNM.sh is the name of the .sh
file Exported form BODS Administrator
You can replace a field value with "*". So:
0 10 * * * /path/script.sh is the same as 0 10 1-31
1-12 0-7 /path/script.sh
The Job would run at 10 each morning.
For example, to run a BODS Job at 8 o'clock each night in
Linux, use:
0 20 * * * /usr/local/bin/directory/BODS
Names can be used (e.g. March) for month and day of
week. In day of week, Sunday can be 0 or 7.
When you have created the crontab, press 'Escape' to
leave insert mode.
Type 'ZZ' (upper case 'z' twice). A message similar to the
following should be displayed:

/crontab.zUcAAFwPVp: 1 lines, 24 characters


crontab: installing new crontab
The crontab has now been set up. Cron will automatically
send an email to root to confirm that the scan has run.
For more information on cron and crontab, read the
relevant manual pages. For example, type:
man crontab
to see the crontab manpage.
5.To execute a job with a third-party scheduler
1.

Export the job's execution command to an


executable batch file (.bat file for Windows or .sh file for
UNIX environments).
2.
Ensure that the Data Services Service is running (for
that job's Job Server) when the job begins to execute.
The Data Services Service automatically starts the Job
Server when you restart the computer on which you
installed the Job Server.
You can also verify whether a Job Server is running at
any given time using the Designer. Log in to the
repository that contains your job and view the Designer's
status bar to verify that the Job Server connected to this
repository is running.
You can verify whether all Job Servers in a server group
are running using the Administrator. In the navigation
tree select Server Groups
> All Server Groups to view the status of server groups
and the Job Servers they contain.
1.
Schedule the batch file from the third-party software.
Note:

To stop an SAP BusinessObjects Data Services job


launched by a third-party scheduling application, press
CTRL+C on the application's keyboard.
6.About the job launcher
SAP BusinessObjects Data Services exports job execution
command files as batch files on Windows or CRON files on
UNIX. These files pass parameters and call
AL_RWJobLauncher. Then, AL_RWJobLauncher executes
the job, sends it to the appropriate Job Server, and waits
for the job to complete.
Caution:
Do not modify the exported file without assistance from
SAP Technical Customer Assurance.
The following shows a sample Windows NT batch file
created when the software exports a job. ROBOT is the
host name of the Job Server computer.
All lines after inet:ROBOT:3513 are AL_Engine arguments,
not AL_RWJobLauncher arguments.
D:\Data Services\bin\AL_RWJobLauncher.exe
"inet:ROBOT:3513"
"-SrepositoryServer
-Uusername
-Ppassword
-G"b5751907_96c4_42be_a3b5_0aff44b8afc5"
-r100 -T14
-CTBatch -CmROBOT -CaROBOT
-CjROBOT -Cp3513"

Flag

Value

-w

-t

-s

-C
-v

-S

The job launcher starts the job(s) and


then waits before
passing back the job status. If -w is not
specified, the
launcher exits immediately after
starting a job.
The time, in milliseconds, that the Job
Server waits before
checking a job's status. This is a
companion argument for
-w.
Status or return code. 0 indicates
successful completion,
non-zero indicates an error condition.
Combine -w, -t, and -s to execute the
job, wait for completion,
and return the status.
Name of the engine command file
(path to a file which contains the
Command line arguments to be sent to
the engine).
Prints AL_RWJobLauncher version
number.
Lists the server group and Job Servers
that it contains using
the following syntax:
"SvrGroupName;inet:JobSvr1Name:Job
Svr1Host:JobSvr1Port;inet:JobSvr2Nam
e:Job
Svr2Host:JobSvr2Port";
For example:
"SG_DEV;inet:HPSVR1:3500;in
et:WINSVR4:3505";

-R

-xCR

The location and name of the password


file. Replaces the
hard-coded repository connection
values for -S, -N, -U,
-P.
Generates and exports all specified job
reports to the location
specified in
the Management > Report Server
Configuration
a.
node. By default, the reports are
exported to
$LINK_DIR\DataQuality\reports\reposit
ory\job.
In order to use this flag, you must
disable the security for the
Export_DQReport operation in
the Administrator > Web
Services > Web Services
Configuration tab.

There are two arguments that do not use flags:


inet address: The host name and port number of the Job
Server. The string must be in quotes.
For example:
"inet:HPSVR1:3500"
If you use a server group, inet addresses are
automatically rewritten using the -S flag arguments. On
execution, the first Job Server in the group checks with
the others and the Job Server with the lightest load
executes the job.

server log path: The fully qualified path to the location


of the log files. The server log path must be in quotes.
The server log path argument does not appear on an
exported batch job launch command file. It appears only
when SAP BusinessObjects Data Services generates a file
for an active job schedule and stores it in the following
directory:
LINK_DIR/Log/JobServerName/RepositoryName/JobInstanc
e
A.
Name.
You cannot manually edit server log paths.

7. Job launcher error codes


The job launcher also provides error codes to help debug
potential problems.
The error messages are:
Error number
180002
180003
180004
180005
180006
180007

Error message
Network failure.
The service that will run the
schedule has not started.
LINK_DIR is not defined.
The trace message file could
not be created.
The error message file could
not be created.
The GUID could not be

found. The status cannot be


returned.
No command line arguments
180008
were found.
Invalid command line
180009
syntax.
Cannot open the command
180010
file.
20130 Views 3 Comments Permalink

Steps to create Real time service


Posted by Sasikala Dhanapal 25-Jan-2013
If a real time job needs to be executed through a front
end (eg.Enterprise Portal), a corresponding Real time
service and a web service needs to be created. The below
shown job is a real time job where in the Data flow is
encapsulated with in the Real Time Process.

Steps to create Real time service


1.
Login to the Management Console. Under
Administrator, expand the Real Time tab.
2.
Under the access server name, click on the Real
time services.

1.

3.

There are 2 tabs shown on the right side.


o
Real Time Services Status
o
Real Time Services Configuration
4.
Click on the Real Time Services Configuration. This
allows to Add, edit or remove a service.
5.
The real time services that are already created will
be listed. If there is no real time service created, it
appears to be blank.

1.

6.

Click on the Add button to add a new service.

7.
Under Service Configuration, provide a unique name
for the service.
8.
Click Browse Jobs to view a list of all the real-time
jobs available in the repositories that is connected to the
Administrator.
9.
Select a job from the appropriate repository to map
it to the real time service.

10.
Under Service Provider, click on the check box to
select a Job server. Select the appropriate Job server to
control the service provider.
11.
In the Min instances and Max instances fields, enter
a minimum and a maximum number of service providers
that you want this Job Server to control for this
service.
12.

Then click on Apply.

Steps to start a Real time service:


1.

1.
Under the access server name, click on the Real
time services.

2.
o

There are 2 tabs shown on the right side.


Real Time Services Status

Real Time Services Configuration.


3.
Click on the Real Time Services Status. This allows
viewing the status of service, applying an action, or
selecting a service name to see statistics and service
providers.
4.
In the Real-Time Services Status tab, select the
check box next to the service or services that you want to
start.
5.
Click Start. The Access Server starts the minimum
number of service providers for this service.
782 Views 1 Comments Permalink
Data Services Workbench Part 1
Posted by Louis de Gouveia 24-Jan-2013
In this blog I'm going to focus on introducing Data
Services Workbench. Will be showing how we can logon
and use the wizard to create a simple batch job to load
data from a source to a target.
Other blogs will follow where we will focus on other
aspects of Data Services Workbench.
Data Services Workbench is new and was released with
Data Services 4.1. So in order to use it you will need to
install Data Services 4.1.
So after installing Data Services 4.1 SP 1 you will see a
new menu item for Data Services Workbench.

When opening the Data Services Workbench you will be


prompted for logon details.

Once you have logged in you will see the below. In this
example I'm going to start the replication wizard to show
how we can create a simple job to load data from a
source to a target..

The wizard starts with the below screen. You need to


provide a project name.

We then have to select our source. As shown the


following is supported as a source

Database

SAP Applications

SAP BW Source

For this blog I'm going to use a database as a source. As


one can see several databases are supported as a source.

You will need to enter all your connection details once


selecting your database, in my example I will be using MS
SQL 2008 as my source. The connection details are
similar as when creating a datastore in Data Services
Designer. You can also test the connection.

We then can select all the tables we would like to use,


you will have a list of tables to select from. These are
tables that are available from the source.

Then you will need to select your target destination. For


target destinations only databases are supported. Then
not all databases are supported yet, but a good portion to
start with.

Will then need to complete the target destination


connection details. We can also test the connection.

Then in the final step we can choose whether we want to


execute immediately, can also select the job server we
want to use.

The monitoring part then comes up and shows execution


details. We can see by table the status and that it
completed successfully.

Hope the above gives you a kick introduction into the


Data Services Workbench.
For more information follow me on twitter
@louisdegouveia
5614 Views 12 Comments PermalinkTags: data_services_4
.1, data_services_workbench, sap_businessobjects_data_s
ervices_workbench
Data Transfer from XML File to Database by Using XSD
Format.
Posted by Rahul More 22-Jan-2013
Data Transfer from XML file to Database by using XSD format
Introduction:In this scenario we are transferring data from XML file to database
by using XML Schema Definition.
We are not covering the creation of XSD file in this scenario.
I) Creating DataStore for SQL database.
1.
2.

Logon to the SAP Business Object Data Designer.


In Local Object Library click on Datastore Tab.

3. RIght click in the Local Object Library area & select "New".
Window for creating new datastore will open.

GIve all the details.


DataStore Name:- any name you want to give.
DataStore Type:- Database.
Database Type:- here we are using Microsoft SQL
Server
Database version:- Microsoft SQL Server 2008
Database Server Name:- server name

User name & Password:- Details about User name &


password.
4. Click "OK" & the Datastore will be created & can be seen in
the Local Object Library.

II) Creation of XML Schemas File Format:We are creating xsd file for the following type of xml file.

XSD file created for the above xml file is as follows.

1. Creating XML Schema File Format.


o
Select Format tab from Local Object Library, right
click on "XML Schemas" & select "New".
o
"Import XML Schema Format" window appears.
Give all required details.

Format Name:- name of the file format(any name)

File Name:- full path of the xsd file.


Root Element Name:- Root element name of the xml
file(here it is Employee)
o
o

After filling all the information click "OK".


Now XML Schema file format will be created & you can
see it in the Local Object Library

III) Creation of a job in Data Service Designer.


1. Create Project
2. Create a Batch Job.
o
Right click on the project & click on "New Batch
job".
o
Give appropriate name to the job.
3.Add a dataflow into the job.
Select the job, drag dataflow from palette into it & name

o
it.

4. Build the scenario in the dataflow.


o
Double click on the dataflow.
o
Drag XML Schema format created earlier into the
dataflow & mark it as a "Make XML File Source".

Add details to the XML File Format.


Double click on File Format & provide the name of the
source xml file.

Enable Validation to validate the xml file with provided


XSD format.

Drag a query into the dataflow & connect it to Excel file

o
format.

o
5. Open the query & do Mapping.

Select "Employee_nt_1" node & drag it to RHS.

Right click on "Employee_nt_1" node & UnNest it.


We are un nesting the nodes because we don't want them to
be made available at the target side.

Right click on the "ID" field & mark it as a Primary Key.

6. Inserting target table.

We can either import the already created table or we can use


template table which afterwards will actually be created in the
database.
We are using template table here.

Drag a template table from palette into the dataflow.


Give name to the table & select appropriate datastore.

Click on "OK".
Template table can be seen in the dataflow.
Connect Template table to Query.

7. Save the Job & Validate it.

Click "Validate All" icon from the tool bar.

Following message is displayed if no error found in the job.

8. Execute the job.

Select the job, right click & press "Execute".


9. Check the output in the table.

Click on the Magnifying Glass icon of the table.

1239 Views 0 Comments Permalink Tags: xml_file_to_data


base_using_bods
Data Transfer from Excel File to Database
Posted by Rahul More 18-Jan-2013
Data Transfer from excel file to Database(HANA)
Introduction:-

In this scenario we are transferring the data from excel file to


HANA database.
Here I am demonstrating the scenario in a step by step
manner.
I) Creating DataStore for HANA database.
1.
2.

Logon to the SAP Business Object Data Designer.


In Local Object Library click on Datastore Tab.

3. RIght click in the Local Object Library area & select "New"
Window for creating new datastore will open.

GIve all the details.


DataStore Name:- any name you want to give.
DataStore Type:- Database.
Database Type:- HANA
Database version:- available version of the
database, here its "HANA 1.x"
Data source:- Give ODBC name, hers its
"HANATEST"
User name & Password:-Details about User name
& password.
4. Click "OK" & the Datastore will be created & can be seen
in the Local Object Library.

II) Creating Excel Workbook File Format


We are creating Excel Workbook File Format for the following
file.

1.

Creating Excel Workbook File Format.


Select Format tab from Local Object Library, right click on
"Excel Workbooks" & select "New".

"Import Excel Workbook" window appears

Give all required details ie. Directory & File name


Mark "Worksheet" option & select the appropriate sheet name.
(here it is Article)
Range:- Select "All fields" to take all fields into consideration.
Mark "Use first row values as column names"

Click on "Import Schema"


Excel file schema gets imported & can be seen in the
window.

Click "OK".
Now the excel workbook will be created & can be seen in the
Local Object Library.

III) Creating a job in Data Service Designer.

1.
2.

Create a project.
Create a Batch Job.
Right click on the project & click on "New Batch job".
Give appropriate name to the job.
3. Add a dataflow into the job.
Select the job, drag dataflow from palette into it & name it.

4. Build the scenario in the dataflow.


o
Double click on the dataflow.
o
Drag a excel workbook format created by us
earlier into the dataflow.

Drag a query into the dataflow & connect it to Excel


file format.

5. Open the query & do Mapping.

Select all the fields on LHS, right click & select "Map
to Output".

Right click on the "DATAID" field & mark it as a


Primary Key.

6. Inserting target table.


We can either import the already created table or we
can use template table which afterwards will actually be
created in the database.
We are using template table here.

Drag a template table from palette into the dataflow.

Give name to the table & select appropriate


datastore.(here HANA_TEST)

Click on "OK".
Template table can be seen in the dataflow.
Connect Template table to Query.

7. Save the Job & Validate it.

Click "Validate All" icon from the tool bar.

Following message is displayed if no error found in


the job.'

8. Execute the Job.

Select the job, right click & press "Execute".

9. Check the output in the table.

Click on the Magnifying Glass icon of the table.

View Design-Time Data in DS 4.1


Posted by Louis de Gouveia 10-Jan-2013
Last year I wrote a blog about the new features in data
services, this blog can be seen at this
link http://t.co/iYqXUnbL
Today I'm going to focus specifically on one of the new
features in Data Services 4.1 called "View Design-Time
Data".

Anyone that has worked with Data Services will know that
you can only see the results of your transformations by
running or debugging the dataflow. So while designing it
is not possible to see what the result is.
So below is an example of a simple dataflow with a query
transform.

Now when you double click on the query transform the


below is viewed. As seen below the transform has a
simple calculation to calculate profit. But we also can see
the "Schema In" and what columns we want to be used in
"Schema Out".

Now we have the option to turn on View Design-Time


Data as shown below.

This then provides two new windows showing the data


coming into the schema and the data going out of the
schema based on our mappings. But more importantly we
can see the results from the calculation done in our profit
column.

The above is a simple example, but anyone that has done


some complex Data Services data flows will know how
handy this new feature will be.
So now we can view the results of our transforms while
designing.
Follow me on twitter @louisdegouveia for more updates

BODS Audit on SAP Source Tables - Apply Auditing on SAP


Source tables and write the Source Read Count into any
control table.(Only way to Capture Audit Label Value.)
Posted by Prakash Kumar 11-Dec-2012

BODS Audit on SAP Source Tables


Audit Feature on SAP Sources works in the same
manner as in Database Source and Target tables.
Source SAP Table Count and Target Table Count are
Computed in Audit window.
Source Table count should be applied on ABAP
Data Flow and Not on SAP Table.
Source Count and Target Count are checked as
described below in the screenshots.
If the Audit condition set is violated, an Exception is
raised.
We can give Custom Conditions apart from
$Count_ABAP_DF = $Count_Target_DB_Table.

We can Insert the Source Count and Target Count values


from Audit Labels, into any Table using below Script.
This Script is used in Custom Audit Condition.
$Count_Source_Count=$Count_Target_Count
and
sql('Target_SQL', 'insert
into dbo.Source_table_Count values
([$Count_Source_Count],[$Count_Target_Count])')
is NULL
## THIS CONDITION IS ALWAYS TRUE as
Output of Insert query is NULL
After successful completion of the Job, You will find the
Source and Target Counts inserted into the Control Table.
select * from Source_table_Count
Source_Count

Target_Count

66

66

Note: This Script can be used in Any Audit Condition regardless of


the type of Source/Target table, to Capture Audit Label Value.
1.

2.

3.

4.Apply the above mentioned script in Custom Audit


Condition.

Saving BODS Dataflow Audit statistics


Posted by Debapriya Mandal 03-Apr-2013
My previous BLOG (http://scn.sap.com/community/dataservices/blog/2013/04/02/dataflow-audit-feature-in-bods40) describes how to collect Audit Statistics on
DataFlows and display them as part of Job Log during Job
execution. However, it would be more meaningful to
insert these Label values in a Database Table so that they
can be used for analysis or reconciliation report creation.

Plan 1:
A script is placed after the Dataflow on which Audit
functionality is implemented. An insert statement is
written in the script to insert the value in the Audit Label
to a database table. However an error message is
generated because the Audit label is not valid outside the
Dataflow.

Plan 2 :

The Audit Label values are saved by BODS in the BODS


Repository table AL_AUDIT_INFO. A query can be written
on this table, to select the latest value for the specified
Label. The query would look something like this :
SQL('<repo datastore store','SELECT * FROM
AL_AUDIT_INFO
WHERE LABEL = \'<label>\'
AND AUDIT_KEY = (
SELECT MAX(AUDIT_KEY) FROM AL_AUDIT_INFO
WHERE LABEL = \'<label>\')
'));
This approach is not recommended as it involves
querying the BODS Repository. Usually this is not made
available as a Datastore to BODS developers.
Plan 3: (Recommended)
The third plan to write the insert script within the
Dataflow as the Audit Label values are valid there. It is a
well-known fact that scripts cannot be placed within a
dataflow. However , if we open the Audit window, we can
find the means to write this insert script. The Audit
Window contains a second tab to define Audit Rules.
These Audit rules can consist of comparison of different
Audit Labels (say source Label Value = Target Label
Value). Alternatively the rule can be written in the form of
a script. This facility is used to write the insert statement
as shown below. As the insert statement will always
return NULL, the rule will always be satisfied. Hence the
insert operation is achieved through creation of a DUMMY
Audit Rule.

File Processing in BODI (Multiple Format File)


Posted by Debapriya Mandal 06-Mar-2013
Problem Synopsis: To process a Flat file containing
data with multiple formats.
Relevant Business Example: A Batch process is run to
stream data from a database. Streaming is where
multiple occurrences of the same program run in parallel
with each instance accessing a sub-set of the database.
In this way the overall time taken to extract data can be
substantially reduced, depending on the number of
streams used. The output file will contain data from
multiple tables, each table format being identified by the
first column: RECTYPE
Sample File: The below diagram shows a Flat File with 2
types of record formats. The format with RECTYPE= REC1
has 5 columns, whereas the column with RECTYPE=REC2
has 7 columns.

Problem with normal processing in this scenario: A


Run time error will be thrown by BODS if such a file is

processed normally because the row delimiter in such a


file will be placed after 5 columns for RECTYPE= REC1
and after 7 columns for RECTYPE=REC2.

File Format Options to process a file with multiple


Formats: Create a file format with the maximum
number of columns in all the record formats present in
the input file. In this example this number is 7. In the File
Format, select Yes for Adaptable Schema option. This
indicates that the schema is adaptable. If a row contains
lesser number of columns, as is the case for
RECTYPE=REC1, the software loads Null values into the
columns missing data.

After using the aabove settings in the file format, data


can be extracted from the input file without any errors.
The below screenshot shows the extracted data.

The data for the 2 record formats can be easily separated


using a Case transform. Two different cases are added to
the case transform . Case_RECTYPE_1 is created for
RECTYPE=REC1 and Case_RECTYPE_2 is created for
RECTYPE=REC2. The data for the 2 different record
formats are hence successfully separated into two tables
with correct record structure.

Data Services Workbench Part 3


Posted by Louis de Gouveia 24-Feb-2013
In Part 1 and Part 2 I focused on content created by the
Data Services Workbench wizard.
Now we all know that a wizard does not always
compensate for the more complex scenarios, or
sometimes we just prefer to create content without the
wizard.
So this blog i'm going to focus how you can create
content more manually.
I have the same project created with the wizard open, we
will add extra tables manually and map them.

By double clicking on the source datastore it will display


tables that are made available.

We can now add additional tables by pressing the import


object button

Now the tables is added to the source datastore. Save the


settings.

You can now add the replication job, you will see that on
the left window the new table added does not have a
green tick, this means it is not being used.

You can drag the table from the left window to the right
window.

By clicking on the table on the right you can then see a


window that displays all teh properties for that table.
From here you can magae several things
Add/Remove Columns
Create Primary Keys
Create Indexes
Apply Filters
Control how data is loaded, full or delta. etc.

On the columns tab I have added a column and provide it


a name.

Once the column is created you can map the column or


provide a formula. By clicking on the edit button you will
open the advanced formula editor.

You will notice that the advanced editor looks similar to


the editor we first saw in information steward 4.0. From
here we can add the formula, for my purposes I have said
Sales-Cost in order to calculate profit. Can also validate.

Then save the work, then press execute.

Will then ask you what repository you would like to


execute against. Select one and say next

You can then set further settings. We will keep it on initial


load.

The job will then execute. You will then also be able to
view load status by table.

When you then view the data you will notice that the
table loaded all columns plus the additional one added.

Data Services Workbench Part 2


Posted by Louis de Gouveia 25-Jan-2013
In Part 1 of my blog which can be found
here http://t.co/WMIlAIpR I focused on how to create
content with the wizard. Today I'm going to focus on what
did that wizard create.
So once you have logged in again this will be the first
screen you see. You can just click the welcome tab
closed.

Once you have done that you will have a project explorer
view. Note that the look and feel of all new tools are being
delivered in the eclipse base look and feel shell. Similar to
Information Design Tool and Design Studio (AKA ZEN).

The project explorer will have a project, the name of the


project we defined in the first step of the wizard. Then it
has created two datastores, one for a source and one for
a target. Once again we defined this in the wizard and is
the same thing as a datastore in Data Services. Then

there is a "Workbench_Example.rep", this is new but


basically has all the mappings of which tables we want to
replicate.

So now let's have a look at what the workbench has done


in the background. If we log into SAP BusinessObjects
Data Services you will see a new project, a new job with
workflows and dataflows. Also datastores created. You will
notice all the names and settings match what we used in
the workbench.

If we look at the detail you will notice it has done the


following.

Created a script to create all the tables. Uses SQL to


create the tables.

Created one dataflow per table.

Created a script that creates all indexes on the table,


also creates a load status table.

Slowly Changing Dimensions


Posted by Richa Goel 31-May-2013
SCD- Slowly Changing Dimensions
SCDs are dimensions that have data that changes over
time. The following methods of handling SCDs are
available:
Type 1 : No history preservation
v Natural consequence of normalization
Type 2: Unlimited history preservation and new rows
v New rows generated for significant changes
v Requires use of a unique key
v New fields are generated to store history data

v Requires an Effective_Date field


.
Type 3: Limited history preservation
v Two states of data are preserved: current and old
Slowly Changing Dimension Type 1(SCD Type1)
For SCD Type 1 change, you find and update the
appropriate attributes on a specific dimensional
record. The new information simply overwrites the
original information. In other words, no history is
kept.
Example

Customer
Key
1001

Name

State

Christina

Illinois

After Christina moved from Illinois to California,


the new information replaces the new record, and we
have the following table:
Customer
Key
Name
State
1001

Christina

California

Advantages:
This is the easiest way to handle the Slowly Changing
Dimension problem, since there is no need to keep track
of the old information
Disadvantages:
All history is lost. By applying this methodology, it is not
possible to trace back in history. For example, in this

case, the company would not be able to know that


Christina lived in Illinois before.
Slowly Changing Dimension Type 2(SCD Type2)
With a Type 2 change, we dont make structural changes
in the table. Instead we add a record. A new record is
added to the table to represent the new information.
Therefore, both the original and the new record will be
present. The new record gets its own primary key.
In our example, recall we originally have the following
table:
Customer Key
Name
State
1001

Christina

Illinois

After Christina moved from Illinois to California, we add


the new information as a new row into the table.
Customer Key
Name
State
1001

Christina

Illinois

1005
Christina
California
Advantages:
This allows us to accurately keep all historical
information.
Disadvantages:
This will cause the size of the table to grow fast. In cases
where the number of rows for the table is very high to
start with, storage and performance can become a
concern.
This necessarily complicates the ETL process.
Slowly Changing Dimension Type 3(SCD Type3)

With a Type 3 change, we change the dimension structure


so that it renames the existing attribute and add two
attributes, one to record the new value and one to record
the date of change.
In our example, recall we originally have the
following table:
Customer Key
Name
State
1001

Christina

Illinois

After Christina moved from Illinois to California, the


original information gets updated, and we have the
following table (assuming the effective date of change is
January 15, 2003):
Custo
mer Key

Name

Christi

1001

na

Origin
Curren
Effect
al State
t State
ve Date
Illinois

Californ
ia

Advantages:
This does not increase the size of the table, since new
information is updated.
This allows us to keep some part of history.
Disadvantages:
Type 3 will not be able to keep all history where an
attribute is changed more than once. For example, if
Christina later moves to Texas on December 15, 2003,
the California information will be lost.
1547 Views 7 Comments Permalink

15-JAN
2003

Idoc Status Simplified


Posted by Mayank Mehta 18-Sep-2013
Hi this blog is meant to help the ABAP programmers and
Developers with all the status messages, while posting an
IDOC in SAP.
Also I mentioned the error reason and possible solution.
Hope this is helpful to all.
Sequence of Inbound and Outbound statuses
Starting statuses may be: 01 (outbound), 50 (inbound),
42 (outbound test), 74 (inbound test)
Next
Statu Type
succ
s
of
Status
ess Next
num statu descriptio statu error
ber s
n
s
status
Outbound
Succ IDoc
1
ess
created
30
29

Error
passing
data to
Error port

Error
reason

Correct
the
error
and
Execute
RSEOUT
00
program
again

Solution
to error

12

25
26

Outbound
IDoc
successfu
Succ lly sent to None
ess
port
, 32
within
control
informati
on on EDI
subsyste
Error m
during
translatio
Error n

Succ Dispatch
ess
OK
Processin
g
outbound
IDoc
despite
Succ syntax
ess
errors
Error during
syntax
check of

Change
d from
status
03 by
BD75
transac
tion
(see
below)

Missing You may


manda edit the
tory
IDoc or

29

30

31
32

Error

Succ
ess

Error
Succ
ess

outbound
IDoc
ALE
service
(for
29,
example 31
Outbound
IDoc
ready for
dispatch
(ALE
service) 3
no further
processin
g
Outbound
IDoc was
edited

segme
nt for
exampl
e

force it
to be
process
ed

Partner
profile
customi
zed to
not run

There
was a
manual
update
of the
IDoc in
SAP
tables,
the
original
was
saved
to a
new
IDoc
with

Execute
RSEOUT
00
program

status
33

33

35

37

Original
of an
IDoc
which
was
edited. It
is not
possible
Succ to post
ess
this IDoc None None
IDoc
reloaded
from
archive.
Can't be
Succ processe
ess
d
Erroneou
s control
record
(for
example,
"referenc
e" field
should be
blank for
outbound None
Error IDocs)
, 37

Backup
of
another
IDoc
manuall
y
updated
, see
status
32

42

50

51

53
56

Outbound
IDoc
manually
created
Succ by WE19
ess
test tool 1
Inbound
Succ IDoc
ess
created
64

inbound
IDoc data
contains
Error errors
inbound
Succ IDoc
ess
posted
Error IDoc with
errors
added
(You
should

53,
64
None
, 53
50,
51,
56,
62,
68

37

65

Error
triggere
d by
SAP
applicati
on,
incorrec
t values
in the
51, 66, IDoc
68, 69 data

Ask
function
al
people,
modify
erroneou
s values
in the
IDoc
(WE02
for
example
) and run
it again
using
BD87

60

Error

61

Error

62

Succ
ess

63

Error

64
65

never see
this error
code)
syntax
check of
inbound
IDoc
Processin
g inbound
IDoc
despite
syntax
error
inbound
IDoc
passed to
applicatio
n
passing
IDoc to
applicatio
n

56,
61,
62

64

53

Inbound
IDoc
ready to
be
passed to
Succ applicatio
ess
n
62
Error ALE
64,
service - 65

51

51, 60,
63, 68,
69

Execute
BD20
transacti
on
(RBDAPP
01
program
)

66

68

incorrect
partner
profiles
Waiting
for
predeces
sor IDoc
Waiti (Serializat
ng
ion)

no further
Succ processin
ess
g
68

51

None

The
IDoc
was
created
using
inbound
test tool
(WE19)
and
written
to file to
do file
inbound
test.
Another
IDoc is
created
if
immedi
ate
processi
ng is
chosen

69

70
71

There
was a
manual
update
of the
IDoc in
SAP
tables,
the
original
was
saved to
a new
IDoc
with
Succ IDoc was
51, 68, status
ess
edited
64
69
70
Original
Backup
of an
of
IDoc
another
which
IDoc
was
manuall
edited. It
y
is not
updated
possible
, see
Succ to post
status
ess
this IDoc None None
69
Succ Inbound
ess
IDoc
reloaded
from
archive.
Can't be
processe

74

d
Inbound
IDoc
manually
created
Succ by WE19 50,
ess
test tool 56

Thanks,
Mayank Mehta
1438 Views 0 Comments Permalink
DS : Things In & Out
Posted by Rishabh Awasthi 17-Sep-2013
Just to have fun with DS..

If you want to stop staring at the trace log and wait for it to
show "JOB COMPLETED SUCCESSFULLY".
You can use an easy aid in DS..
Go to Tools --> Options...
Then break out Designer and click on General.
Then click the box that says: Show Dialog when job is
completed.

Now whenever your job completes, you'll get a little


dialog box popping up to let you know.

One of the annoying defaults in Data Services is that all the


names in Workflows or Dataflows are cut off after 17 characters.

So to fix this go to Tools ->Options


Then break out Designer and click on General.

Where it says: Number of characters in workspace


name. Change the number 17 to 100.
Click OK when it asks you if you want to overwrite
the job server parameters.

Njjoy...
603 Views 7 Comments Permalink Tags: performance, eim
, data_services, tuning, data_integrator
Missing internal datastore in Designer
Posted by Karol Frankiewicz 05-Sep-2013
I will show how to make the internal datastores visiable,
as in note 1618486 we dont give the method to make
the internal datastores, the method
is as follows:
You need to add a String DisplayDIInternalJobs=TRUE in
DSConfig.txt under the [string] tab like the screenshot:

How to find the real DSConfig.txt file?


First check if there exist a folder named Documents and
Settings or ProgramData in the C disk.

For Documents and Settings the path maybe: C:\


Documents and Settings\All
Users\Application Data\SAP BusinessObjects\Data
Services\conf

For ProgramData the path maybe: C:\


ProgramData\Application Data\SAP
BusinessObjects\Data Services or C:\ ProgramData\SAP
BusinessObjects\Data
Services\conf

If there have both check the ProgramData first.

If there have none, please make sure that your


hidden files and folders
is set to Show hidden files, folders, and drives, (which
set in the Folder
Options --> View), if the configuration is ok, then you can
go to the
directory where your DS installed, for example D disk,

then go to the install


directory maybe: D:\SAP BusinessObjects\Data
Services\conf; Moreover, if your
DS version is DS4.0, in some cases, you may have to go
to the D:\SAP
BusinessObjects\Data Services\bin to find the
DSConfig.txt file.
After you change the DSConfig.txt, restart your DS
Designer, or maybe DS Services is advisable.
Then you will find the internal datastores
CD_DS_d0cafae2.
At last, in the note 1618486 we just want to describe
that, you need to make the internal datastores
CD_DS_d0cafae2 property information just the same as
your DB logon information, as the default user name et
al. maybe not correct as your own DB logon information is
different from it. The table owner is also need to change,
because the default table own set in DS is DBO, and
this may cause errors, as your table owner set in your
own DB is not DBO. In note 1618486 you can find the
method how to change it.

201 Views 0 Comments Permalink Tags: sap_solution_man


ager
Steps for executing BODS job from Unix Script with user
defined global parameters
Posted by Lijo Joseph 02-Sep-2013
Steps for executing BODS job from Unix Script

This will help you understand how to change the global


parameters used in the job during execution of the job
invoked via a Unix Script.
While you export the .SH for job execution, the default
parameter values or last used parameter value will be
attached within the .SH file. Whenever you execute
that .SH file, the job starts with the same parameters all
the time. You may need to modify the .SH file all the time,
whenever you need to make changes in the user

parameter (Global parameter) values. Go through the


simple steps involved in resolving this issue effectivelty
with minor modifications and a simple unix script to pass
new user defined values and execution of the BODS job.
Log in to Data Service Management Console
Go to Administrator-> Batch (Choose Repository)

Click on Batch Job Configuration tab to choose the


job which needs to be invoked through Unix

Click on Export Execution Command Option against


the job

Click on Export.
Two Files then will get generated and placed in the Unix
Box. **
One .TXT file named as Reponame.Txt in
/proj/sap/SBOP_INFO_PLAT_SVCS_40_LNX64/dataservices/
conf
One .sh file named as jobname.sh in
/proj/sap/SBOP_INFO_PLAT_SVCS_40_LNX64/dataservices/l
og
**Location will be changed according to the setup

1. For a job with no user entry parameters


required, we can directly call the .sh file generated
for job execution.
. ./Job_Name.sh
2. For a job which has parameters the script
will look like this
/
proj/sap/SBOP_INFO_PLAT_SVCS_40_LNX64/dataservices/b
in/AL_RWJobLauncher
"/proj/sap/SBOP_INFO_PLAT_SVCS_40_LNX64/dataservices
/log/DEV_JS_1/"
-w "inet:acas183.fmq.abcd.com:3500" " -PLocaleUTF8
-R\"REPO_NAME.txt\"
-G"1142378d_784a_45cd_94d7_4a8411a9441b"
-r1000 -T14 -LocaleGV -GV\"\
$Character_123=MqreatwvssQ;\
$Integer_One=Qdasgsssrdd;\" -GV\"DMuMDEn;\"
-CtBatch -Cmacas183.fmq.abcd.com -CaAdministrator
-Cjacas183.fmq.abcd.com -Cp3500 "
The highlighted items are parameters default values
provided in the job. While executing the job , if the user
wants to change this default values to the user defined
entries, we have to make the following changes in the
script.
/
proj/sap/SBOP_INFO_PLAT_SVCS_40_LNX64/dataservices/b
in/AL_RWJobLauncher
/proj/sap/SBOP_INFO_PLAT_SVCS_40_LNX64/dataservices/l
og/DEV_JS_1/"

-w "inet:acas183.fmq.abcd.com:3500" " -PLocaleUTF8


-R\"REPO_NAME.txt\"
-G"1142378d_784a_45cd_94d7_4a8411a9441b"
-r1000 -T14 -LocaleGV -GV\"\
$Character_123=$1;\$Integer_One=$2;\"
-GV\"DMuMDEn;\" -CtBatch -Cmacas183.fmq.abcd.com
-CaAdministrator -Cjacas183.fmq.abcd.com -Cp3500 "
Where $1 and $2 are the parameters passed by the user
replacing the default values.
Thus the job should execute in the following way
. ./Job_Name.sh $1 $2
Areas of difficulty.
The user entries should feed to the script as encrypted
data. For this encryption, the value should be encrypted
usingAL_Encrypt utility.
That means if user need to pass an integer value 10 for a
parameter variable say $Integer_One in the job, then he
cannot use $Integer_One=10; instead he has to
pass MQ which is the result of the utility AL_Encrypt
Al_Encrypt 10;
Result : - MQ
I have created a Script that can resolve the issue to a very
good extend.
Logic of the Custom Script

Name of the Script: Unix_JOB_CALL.sh


Pre-Requisite:- Keep a parameter file (Keep entries line by line
in the parameter order under a nameJob_Parm_File)
(As we have different scripts for different jobs, we can keep
different param files as well. Whenever there is a change
invalue needed, user can simply go and modify the value
without changing the order of parameter)
Sample Script Code
rm -f Unix_Param_Temp_File;
touch Unix_Param_Temp_File;
chmod 777 Unix_Param_Temp_File;
cat Job_Parm_File | while read a
do
AL_Encrypt $a >> Unix_Param_Temp_File
done
JOB_PARAMS=`tr '\n' ' '<Unix_Param_Temp_File`
. ./Test_Unix.sh $JOB_PARAMS
rm Unix_Param_Temp_File;
Through this you can execute the job with user defined
parameter values using unix.

ntroduction to Operation-Codes & The Behavior of


Map_Operation Transform when used individually.
Posted by Manprabodh Singh 11-Aug-2013
Although it is not very complex transform, but let us go
somewhat deeper into the basics of it and find out the

basic needs of this transform. and a view of operationcodes for beginners.


The purpose of Map_Operation transform is to
map/change the Operation-Code of incoming/input row to
the desired Operation-Code/Operation to be performed on
the target table.
Why We need it After all? : It might be the case that
the incoming/input row needs to be updated in the target
table ( because some data has been changed since the
target was populated last time) , but instead of updating
it in the Target_table, you are willing to insert a new row
and keep the older one as well OR you might be willing to
delete the changed row from the target table OR you
might be willing to do nothing at all for the rows which
are changed. Hence, the operation-codes are the arms
which help doing all these things.
But what are these Operation-Codes? : Let us
suppose you have one source table named "source_table"
and one target table named "target_table" . Both have
exactly same schema. Now , in the first run you populate
the target_table with all the data in source_table , hence
both tables have got exactly the same data as well. Now
some changes are done on the Source_table, few new
rows are added , few rows got updated , few rows are
deleted . Now if we compare the Source_table with the
Target_table and try to set the status of each row in the
Source_table relative to the Target_table then we'll have
one of the following statuses for each input row :
-The rows which are there in the Source_table and not in
the Target_table, basically the new rows: These rows need

to be inserted to the Target_table, hence the OperationCode "insert" will be associated with these rows coming
from the Source_table.
-The rows found in both tables, but are updated in the
Source_Table : These rows need to be updated in the
Target_table, hence the Operation-Code "update" will be
associated with these rows coming from the Source_table.
-The rows which are there in Target_Table and deleted
from Source_Table after the last run. These rows need to
deleted from the Target_table (although we hardly
perform deletion in a datawarehouse), hence the
Operation-Code "delete" will be associated with each row
of this kind.
-The rows which are there in both of the tables. These
rows ideally doesn't need any operation , hence the
Operation-Code "normal" is associated with such rows.
Well, how to perform this comparison of Source
and Target Tables? : This can be done by the
Table_Comparison transform. It compares the input table
(Source_table in our example) with the another table
called as comparison_table in BODS jargon (Target_table
in our recent example). and after comparing each row it
associates an Operation-Code to each row of the input
table. and if we choose , it also detects the rows which
were deleted from input table (to choose whether we
need to perform deletion on Target_table or not). But we
are not going in the details of Table_Comparison
transform here, as i was going to play with the
map_operation transform alone, and i know it looks crazy
to do so. Because , like in the figure given below, if i

connect the Source Table directly to the Map_Operation


transform, "by default" the operation-code associated
with all rows is "normal", until we change it using
table_comparison transform.

Playing with Basics of Map_Operation


Transform: So, as said earlier the job of map_operation
transform is to change/map the incoming op-code
(operation-code) to the desired/outgoing op-code. See the
pic below we have options of "Input row type" and
"Output row type". But, in the above mapping because
we have connected the source directly to the
map_operation transform, All Incoming Rows Have the
Op-Code "normal" Associated with them. Hence, the
second, third and fourth option doesn't matter for now
because there are no input rows with operation-code
"update" or "insert" or "delete" associated with them.

So, let us see what will happen if we use one by one the
provided options in the drop-down menu given for the
"Output row type". The interesting ones are "update" and
"delete". let us see why:

First, Let us suppose there is no primary-key defined


on the target table :
-"normal" : If we choose "normal" as Output row type opcode, all input rows will be inserted to the target table.
-"insert" : same as "normal" , all input rows will be
inserted to the target table.
-"discard": Nothing will be done.
-"delete": It'll delete the matching rows from the target
table. Even if the primary key is not defined on target ,
delete operation is performed by matching the entire row.
i.e. Value of each field in the input row matched with
Value of each field in the Target table row, if matched the
target row is deleted. No need of a Key column for delete
operation.
-"update": This time, when no primary key is defined on
the target , the update operation will not do anything.
This is actually logical, I have an input row with a key field
(say EmpNo) , but how can i ask my software to update
the this row in target table if i haven't mentioned the
same key in the target table as well? How my software
will find the row which needs to be updated ? If i say ,
match the entire row , then it'll find the exact match of
entire row if found, and that would mean that there is
nothing to be updated as the entire row matches. So, i

need to mention something (some column) in the target


using which i can find the row to be updated.
So, there are two ways to do this:
First is to define a Primary key on the target table and
re-import the table.
Second (assuming that input table has a primary key
defined) is open the table by double-clicking on it, in the
option tab we have the option to "Use Input keys" as
shown below. By choosing yes here, It'll use the Primary
key mentioned in the input table, provided that the
column with same name and datatype is present in the
Target table as well.

Secondly If a primary key is defined on the Target table as


well then the "normal" and "insert" operations will fail if a
row with same primary-key value will be tried to be
inserted again, the job will fail and stop.Whereas the
"update" and "delete" operations would work as before.
The behavior of map_operation transform changes
somewhat when it is preceded by the table_comparison
transform. The need of primary key on the target Or
mentioning to use the input keys (as shown above)
eliminates, It updates the target rows without it as well.
Might be because we mention the primary key column of
the input table in the table_comparison transform along
with the comparison columns, and might be the rows

carry this information along with the associated op-code


to the map_operation transform. I checked and
experimented this practically but can just guess about
the internal engineering of it.
1936 Views 6 Comments Permalink

Data Services uses the Central Management Server (CMS) for


user and rights management. In a stand-alone DS environment,
the same functionality is supplied by the Information Platform
Services (IPS). Setting up user security is a rather cumbersome
process. The procedure for granting access to a DS developer
consists of four steps:

Create the user


Grant access to the DS Designer application
Grant access to one or more (or all) repositories
Allow automatic retrieving of the DS repository
password from the CMS

1. Creating the user


By default, the DS installation program does not create any user
accounts. Use the Users and Groups management area of the
CMC to create users.

Figure 1: User List

Right click on the User List entry, select New > New User and
specify the required details.

Figure 2: Create New User

Select the Create & Close button to finalize this step.

2. Granting access to DS Designer

User name and password are entered in the DS Designer


Repository Logon window.

Figure 3: DS Repository logon

2.1. User management

Unfortunately, the newly created user only has a limited number


of access rights by default. More specifically, authorization to run
DS
Designer is not granted automatically.
When trying to start the application with this user and password,
access is denied:

Figure 4: Access Denied

Access can be granted to an individual user in the Applications


area of the CMC. Right-click Data Services Application and
select User Security.

Figure 5: Applications area in CMC

Select the Add Principals button:

Figure 6: User security

Select the user from the User List in the Available


users/groups panel and select the > button to move it to the
Selected users/groups panel.

Figure 7: Add Principals

Select the Advanced tab and then the Add/Remove Rights link.

Figure 8: Assign Security

Grant access to Designer and select OK.

Figure 9: Add/remove Rights

2.2. Group management


As mentioned above, the DS installation program does not create
any default user accounts. But it does create several default
group accounts. One of these groups is called Data Services
Designer. Members of this group automatically have access to
the DS Designer.
After creating a new user, assign it to this group account. That will
grant the user with access to DS Designer, the same result as
with
the explicit user-level grant, but achieved in a much simpler way.
Return to the Users and Groups management area of the CMC.
Right-click on the user and select Join Group.

Figure 10: Users and Groups

Select the group from the Group List in the Available groups
panel and select the > button to move it to the Destination
Group(s) panel and hit OK.

Figure 11: Join Group

3. Granting access to the repositories

When an authorized user connects to the DS Designer


application, following error message is displayed:

Figure 12: No repositories are associated with the user

That is because a user in the Data Services Designer Users


group has no default access to any of the DS repositories:

Figure 13: Access control list: No access by default

If a user needs access to a given repository, that access has to be


explicitly granted to him.

Navigate to the Data Services area in the CMC. Right-click on


the name of the repository and select User Security.

Figure 14: Data Services

The "User Security" dialog box appears and displays the access
control list for the repository. The access control list specifies the
users and groups that are granted or denied rights to the
repository.

Figure 15: User Security

Select the Add Principals button. Then select the users or


groups from the User List or Group List respectively in the
Available users/groups panel and select the > button to move it
to the Selected users/groups panel. Finally, select Add and
Assign Security.

Figure 16: Add principals

Select the access level to be granted to the user or group:

To grant read-only access to the repository, select


View.

To grant full read and write access to the repository,


select Full Control.

To deny all access to the repository, select No


Access.

Select the > button to move it from the Available Access Levels
to the Assigned Access Levels panel. And hit OK.

Figure 17: Assign security

Note: By applying the same method at the level of the


Repositories folder in the Data Services area in the CMC, the
user will be granted the same access level to all repositories at
once. Both mechanisms can be combined to give the developers
full control over their own repository and read access to anybody
elses:

Grant View access to every individual developer (or to the


Data Services Designer Users group or to a special dedicated
group, for that matter) at the level of the Repositories
folder. Make sure that, when using the default group for
this, it comes with the default settings. If it doesnt,
simply reset security settings (on object repositories and

on all children and descendants of object repositories) on


the default group before attempting this operation.

Grant Full Control access to every individual developer for


his own repository.

When logging in to DS, developers see the full list of repositories


they are granted access to. A value of No in the second column
means full access, Yes means read-only.

Figure 18: Typical DS Designer logon screen

Dont make the list too long. The logon screen is not resizable.
And scrolling down may become very tedious!

4. Retrieving the DS repository password from the CMS

The users can now connect to the repositories from within DS


Designer. When he starts the application, as an extra security
feature, he is prompted for the (database) password of the
repository:

Figure 19: Repository password

If this extra check is not wanted, it can be explicitly removed.

Return to the "User Security" dialog box that displays the access
control list for the repository. Select the User, then the Assign
Security button.

In the Assign Security dialog box, select the Advanced tab and
then the Add/Remove Rights link.

Figure 20: Assign Security

Grant both Allow user to retrieve password and Allow user to


retrieve password that user owns privileges and hit OK.

Figure 21: Add/remove Rights

DS Designer will not prompt for a database password anymore


when the user tries to connect to this repository.

Note: By applying the same method at the level of the


Repositories folder in the Data Services area in the CMC, this
extra check will be removed from all repositories accessible by
this user at once.

BODS alternatives and considerations

Posted by Kiran Bodla 09-Jan-2014


1.SQL transform
When underlying table is altered (add/delete columns) in
database the SQL transform should be "UPDATE SCHEMA"
If not this will not pull any records from the table and it
neither error nor warns when we validate the job from
designer
2. Date format
to_date() function
'YYYYDDD' format is not supported by BODS.
Documentation manual dont provide any information to
convert 7 digit Julian dates (Legacy COBOL dates).
We may need to write custom function to convert these
dates or get date from underlying database "if database
supported this date format"
------Sample function ---------IF(substr($julian_dt, 1, 4) = 2000 or substr($julian_dt, 1,
2) = 18)
begin
RETURN(sql('DS_Legacy', 'select to_char(to_date('||
$julian_dt||',\'YYYYDDD\'),\'YYYY.MM.DD\') from dual'));
END
return decode((substr($julian_dt, 1, 2) = 19),
jde_date(substr($julian_dt, 3, 5)), (substr($julian_dt, 1, 2)
= 20), add_months(jde_date(substr($julian_dt, 3, 5)),
1200), NULL );
-------------sample function END --------------3. Getting timestamp column from SQL transform
In SQL transform a timestamp field is not pulled directly,
instead alternatively we can convert that to text or

custom format accordingly and pill and convert back to


desired date time format.
4. When a character field is mapped to numeric field , if
the value is not numeric equivalent then the value is
converted to NULL.
if the value is equivalent to numeric that is typecast to
numeric.
Alternative: ADD nvl AFTER YOU MAP it to numeric field, if
you dont want to populate NULL for that field in target
5. for character fields while mapping higher length field to
less length fields the value will be truncated and
propagated.
if the source value is all blanks NULL will be propagated
to the next transform
(something similar to above)
6. When using gen_rownum() function:
If this function is used in a query transform in which there
is join operation then there is possibility to generate
duplicate values.
The issue is not with the function instead the query
transform functionality in combination with join and
gen_rownum() function.
Reason: - For every transform BODI will generate SQL
query and pushes to database to fetch/compute result
data
- When joining BODI caches one table and then fetches
other table and joins returns the data.
- While caching these row numbers are generated. Here is
the issue

-- example When joining table with 100 records(cached)


with table 200 records (assuming all 200 match join
criteria) then output volume of join
is 200 records since the row numbers are already
generated with 100 records table there will be 100
duplicate values in output.
7.BODS 14 version allows multiple users operate
simultaneously on single local repository.
This leads to code inconsistency, if the same object
(Datastore/JOB/WORKFLOW/DATAFLOW) is being modified
by two different users at the same time
the last saving version is stored to the repository.
Solution: Mandatory to use central repository concept to
check-out-check-in code safely.
8. "Enable recovery" is one of the best feature of BODS,
when we use Try-Catch approach in the job automatic
recovery option will not recover in case of job failure.
- Must be careful to choose try-catch blocks, when used
this BODS expects developer to handle exceptions.
9. Bulk loader option for target tables:
Data directly written to database data files (skips SQL
layer), when enables back the constraints even PK may
also be not valid because of duplicate values in the
column because data is not validated while loading data.
- This error is shown at the end of the job and job will
have successful completion with an error saying "UNIQUE
constraint is in unusable state"

9a. While enabling/rebuilding UNIQUE index on the table


if there is any oracle error to enable the index sill the
error from BODS log is shown as duplicate values found
cannot enable UNIQUE index.
actually the issue is not with data issue is with oracle
database temp segment.
When used API bulk load option the data load will be
faster and all the constraints on the table are
automatically disabled and enabled back after the data
loaded to the table.
10.LOOKUP,TARGETS,SOURCE Objects from data store are
hard coded schema names.
When we update schema name at datastore level that is
not sufficient to point to updated schema for these
objects.
Instead we need to use "Alias" in the data store.
11.Static repository instance:
Multiple users can login to same local repository and work
simultaneously, When any user updates the repository
object those changes are not visible immediately to other
logged in users, to reflect those other users should relogin to the repository.
12.BODI-1270032
This error shows then we want to execute the job, the job
will not start and not even take this to "Execution
Properties" window.
Simply says cannot execute.
If you validate the job the job validates successfully
without any errors or issues

This may be cause of following issues


1. Invalid MERGE transform (go to merge transform
validate, take care of warnings)
2. Invalid validation transform columns (Check each
validation rule)
Best alternate:
Go to Tools>options>Designer>general un check the
option "perform complete validation before job execution"
then start the job
now the job fails with proper error message in error log
13. How to use global variables in SQL transform:
you can use global variables in SQL Transform in SQL
Statement
you will not be able to import schema with reference to
global variable in SQL Statement, so when importing
schema use constant values instead of global variable,
once the schema is imported, you can replace the
constant with global variable, it will be replaced with the
value you set for that variable when job is executed
the other thing, I don't think you will be able to retain the
value of global variable outside the DF, to verify this add
script after the first DF and print the value of variable, is
it same as that set inside the DF ?
if the data type of the column is VARCHAR the enclose the
variable in { }, for example:- WHERE BATCH_ID =
{$CurrentBatchID} if its NUMERIC then use WHERE
BATCH_ID = $CurrentBatchID
14. Expression value type cast:

example-1: decode(1<>1,2,1.23), this evaluates to 1


example-2: decode(1<>1,2.0,1.23) this evaluates to
1.23
example-3: decode(1<>1,2.0,1) this evaluates to 1
What is happening here?
BODS will cast from left to right, i.e. what ever the
datatype of the left most operand is the final datatype of
the expression.
in example 1 first operand is 1, entire expression
castes/converts to INTEGER
in example 2 first operand is a floting point, entire
expression casts/converts to a float/double
in example 3 first operand is a floting point, entire
expression casts/converts to a float/double
This issue is applicable where ever a mathematical
expression is possible like in query transform, lookup etc.
Source Versus Target Based Change Data Capture
Posted by Barry Dodds 07-Jan-2014

My original plan for this post was to wrap up source and


target based changed data capture in a single instalment
unfortunately, I seem to have got carried away and will
post a follow up on target based CDC in the future.

Once the method of data delivery has been established


(push or pull) the next area of consideration is how can
change data capture (CDC) be applied within Data
Services? More often than not, when a project demands
data to be extracted many times from the same source
system using Data Services, the area of change data
capture will be discussed. As the name suggests CDC is
all about identifying what has changed in the source
system since the data was previously extracted and then
only pulling the new or modified records in the next
extraction process. The net result is that effective
CDC enables users to build efficient data processing
routines within Data Services reducing batch windows
and the overall processing power.
Source Based CDC
With Source based change data capture any record
changes are identified at the source and only those
records are processed by the Data Services engine. These
changes can be pre-identified using techniques such as
SAP IDOCs or Sybase Replication Server (now integrated
in the latest version of Data Services) or dynamically
identified by using pushdown logic and timestamps etc.
There are various methods available with Data Services, I
have used a number of these within many different
scenarios. With Data Services there is nearly always two
or more ways to get where you need to be to achieve the
required result and this comes down to the ETL
developers level of creativity and caffeine one the
day. The following are just a rule of thumb that I use, they
dont always work in all scenarios as there are usually
many variables that need to be taken into consideration,
but as far as my thought processes go these are the

different stages I go through when trying to identify the


best methods for change data capture.
So the customer has said that they want to do CDC the
first questions I always ask are:
database/application?

What

is

the

source

How much data is in the source


system tables we are extracting from?
How often do we need to extract,
and what is the batch window?

If the source table has low data volumes and the


batch window is large, then usually I will go for the
easiest path especially in a proof of concept, which for me
will be reading all of the data every time and applying
auto correct load in Data Services to carry out updates
and inserts.

If the source data is changing often and is of high


volume, but there is a reasonable overnight batch
window, I would typically ask if the source table that I am
extracting data out of have a valid time stamp. A valid
trustworthy timestamp is key, some system dont always
update the timestamp only on insert for example. If this
were available then I would consider looking
at timestamp pushdown in Data Services.
TimeStamp push down requires a number of job steps to
be configured:-

Firstly a global variable to hold last run


date time would need to be defined for the job.

A database table would need to be created


with a date time field in it and JOBID field.
The last run date time variable would be
populated using a script at the start of the job workflow
to get the last run date time value from the lookup table.
Within the Query where you would set the
following (sourcedatetime field > lastrundate variable) .
Check to see that the where clause is being
pushed down by viewing SQL or ABAP.
The last step is back at the workflow level to
then use either a script or Dataflow (I prefer dataflow) to
then update the lookup table with a new datetime value.
In the latest version of Data Services (4.2) within the
Workbench the above timestamp step example above
can be configured as part of the replication wizard. If the
source system is SAP I would also look at using the CDC
functions available within the content extractors as this is
preconfigured functionality and doesnt require any of the
above job configuration steps.
If data needs to be extracted at various points throughout
the day then the pushdown method could still be an
option however, I am always very cautious about
impacting performance on the source systems and if
there is a chance that performance degradation is going
to affect a business-transacting then I would opt for a
different approach where possible.

If the source system is changing regularly, has high


data volumes, the data needs to be transfer intraday and
the extract should have little/no impact, I would look at

either using IDOC for SAP or using the database native


CDC mechanisms supported by Data Services.
Configuration of these methods are fully documented
within the Data Services manuals but typically they
require the customer to have some database functions
enabled which is not always possible. Also depending on
the database type a slightly different mechanism is used
to identify changes. This has in the past limited me to
when I have been able to take advantage of this
approach. Within the latest version of Data Services 4.2,
configuring database CDC is made easier as this can be
done through the wizard within the Workbench or
configured users can simply define a CDC method based
on their data store configuration. If the option is greyed
out then this datastore type does not support native
application CDC.

If the source data changes frequently and needs to


be processed nearly instantaneously and have little or no
impact on the source systems, I would consider using a
log interrogation based approach or a message queue
which has changes pre-identified within the messages (eg
iDoc/ALE). For noninvasive log based CDC , Sybase
replication working with Data Services enables data to be
identified as a change using the database native logs,
flagged with their status (insert/update) and shipped to
Data Services for processing. If this real-time non invasive
approach to data movement is something that is key to a
project then I would recommend complimenting Data
Services with Sybase Replication Server.
When working on site with customer data the source
systems and infrastructure will nearly always determine
what methods of change data capture can be put to best

advantage with Data Services. However, given a free


reign the best and most efficient approach without doubt
is to carry out the data identification process, as close to
the source system as possible, however, that isnt always
an available option. In the next blog post I will dig a little
deeper into using target based change data capture
method.
3119 Views 1 Comments Permalink

New features are added in SAP Business Objects Data


Services management console 4.2":
Object Promotion:

Import Configuration

Export Configuration

Export Configuration using two way:

FTP

Shared Directory

Substitution Parameter : Now we can change the


"Substitution Parameter" settings through SAP
Business Objects Data Services Mananegment console
also.
New Features added in Adapter like "Hive Adapter &
VCF Adapter"
Changes in "Query and Validation" transform in SAP
Business Objects Data Services Designer 4.2
Changes in "Architecture" of SAP Business Objects Data
Services please refer below upgrade guide for more
details.

New Features added like "REST web


services": Representational State Transfer (REST or

RESTful) web service is a design pattern for the World


Wide Web. Data Services now allows you to call the REST
server and then browse through and use the data the
server returns
Relevant Systems
These enhancements were successfully implemented in
the following systems:

SAP Business Objects Information Platform Services


4.1 SP2/SAP Business Objects Enterprise Information
Management 4.1 SP2

SAP Business Objects Data Services 4.1 SP2


This document is relevant for:

SAP Business Objects Data Services Administrator

This blog does not cover:

SAP Business Objects Data Quality Management UpGradation

Executing a job by another job in BODS 4.1 using simple


script
Posted by Balamurugan SM 04-Dec-2013
Step1:
In DATA SERVICES MANAGEMENT
CONSOLE goto Batch Job Configuration Tab and click on
Export Execution Command.

This will create a .bat file with the job name


(Job_TBS.bat) in the following path:
D:\Program Files (x86)\SAP BusinessObjects\Data
Services\Common\log\

Step2:
Use the below script to check whether the respective .bat
file exist in the below path.
exec('cmd','dir "D:\\Program Files (x86)\SAP
BusinessObjects\Data
Services\Common\log\"*>\\D:\\file.txt');
Step3:

Create a new job (J_Scheduling) to trigger the job which


needs to be executed (Job_TBS).

Use the below script to trigger the job.


exec('cmd','D:\\"Program Files (x86)"\"SAP
BusinessObjects"\"Data
Services"\Common\log\Job_TBS.bat');
Now the job J_Scheduling will trigger the
job Job_TBS using simple script
Some tips for fine tuning the BODS job for faster and
efficient executions with optimal resource utilizations.
Posted by Santosh Vonteddu 28-Oct-2013
Hello Guys,
Often we skip or ignore some of the minimal things which
may make your jobs to be executed in a faster way. For
the very reason, I had consolidated some key points by
which we can make the BODS jobs in more efficient with
optimal consumtion of resources. This discussion might
me more helpful and efficient to the beginers in this area.
1. Increase monitor sample rate. ex..to 50K in prod
environment.
2. Exclude virus scan on data integrator job logs.
3. While executing the job for first time or when changes
occur with re-run. Select the option COLLECT STATISTICS
FOR OPTIMIZATION (this is not selected by default).
4. While executing the job second time onwards. Use

collected stats.(this is selected by default)


5. Degree of parallelism (DOP) option for your data flow
to a value greater than one, the thread count per
transform will increase. For example, a DOP of 5 allows
five concurrent threads for a Query transform. To run
objects within data flows in parallel, use the following
Data Integrator features:
Table partitioning
File multithreading
Degree of parallelism for data flows
6. Use the Run as a separate process option to split a
data flow or use the Data Transfer transform to create two
sub data flows to execute sequentially. Since each sub
data flow is executed by a different Data Integrator
al_engine process, the number of threads needed for
each will be 50% less
7. If you are using the Degree of parallelism option in
your data flow, reduce the number for this option in the
data flow Properties window.
8. Design your data flow to run memory-consuming
operations in separate sub data flows that each use a
smaller amount of memory, and distribute the sub data
flows over different Job Servers to access memory on
multiple machines.
9. Design your data flow to push down memoryconsuming operations to the database.
10. Push-down memory-intensive operations to the
database server so that less memory is used on the Job
Server computer.
11. Use the power of the database server to execute
SELECT operations (such as joins, Group By, and common
functions such as decode and string functions). Often the
database is optimized for these operations
12. You can also do a full push down from the source to

the target, which means Data Integrator sends SQL


INSERT INTO... SELECT statements to the target database.
13. Minimize the amount of data sent over the network.
Fewer rows can be retrieved when the SQL statements
include filters or aggregations.
14. Using the following Data Integrator features to
improve throughput:
a) Using caches for faster access to data
b) Bulk loading to the target.
15. Always views the SQL that Data Integrator generates
and adjust your design to maximize the SQL that is
pushed down to improve performance.
16. Data Integrator does a full push-down operation to
the source and target databases when the following
conditions are met:
All of the operations between the source table and
target table can be pushed down.
The source and target tables are from the same data
store or they are in data stores that have a database link
defined between them.
A full push-down operation is when all Data Integrator
transform operations can be pushed own to the
databases and the data streams directly from the source
database to the target database. Data Integrator sends
SQL INSERT INTO... SELECT statements to the target
database
Where the SELECT retrieves data from the source.
17. Auto correct loading ensures that the same row is not
duplicated in a target table, which is useful for data
recovery operations. However, an auto correct load
prevents a full push-down operation from the source to
the target when the source and target are in different
data stores.

18. For large loads where auto-correct is required, you


can put a Data Transfer transform before the target to
enable a full push down from the source to the target.
Data Integrator generates an SQL MERGE INTO target
statement that implements the Ignore columns with value
and Ignore columns with null options if they are selected
on the target.
19. The lookup and lookup_ext functions have cache
options. Caching lookup sources improves performance
because Data Integrator avoids the expensive task of
creating a database query or full file scan on each row.
20. You can control the maximum number of parallel Data
Integrator engine processes using the Job Server options
(Tools > Options> Job Server > Environment). Note that if
you have more than eight CPUs on your Job Server
computer, you can increase Maximum number of engine
processes to improve performance.

What is better Table Comparison or AutoCorrect Load?


Skip to end of metadata

Attachments:2
Added by Vicky Bolster, last edited by Vicky Bolster on Mar 01,
2012 (view change)

show comment
Go to start of metadata
Summary
Whenever there are two options the question asked is "What is
better?". So in what cases is Autocorrect Load preferred over
Table Comparison?

Well, first of all we need to know the technology behind each.


The idea of autocorrect load is to push down as much as possible
thus the decision of "Should the row be inserted/updated or
skipped" is left to the database.

With Table Comparison, the decision is made within the Data


Integrator engine, thus you have much more options, but data has
to be read into the engine and written to the target database.

So there are cases where you have no choice other than using
Table Comparison:

Your target table does have a surrogate key. In this case you
would use Table Comparison Transform and Key Generation.

You want to have a LOAD_DATE column, a column where


you can see which record has been inserted/updated when. With
Table Comparison you can select columns to compare - all
columns but this one - and only if at least one of the compare
columns has changed the row will be outputted. With AutoCorrect
Load, all records would get the new date, even if nothing actually
changed.
Source rows might get deleted by the application.
For all simple cases we cannot give a clear answer, however
some guidelines.
Databases in general do not like "update" statements and do not
run as fast as they deal with inserts. Reason is simply the
overhead involved, just imagine a text column had a string of 5
characters before, now we update it with a string of 50 characters.
The entire row has to be migrated to somewhere else to allow for
the additional space required! Second, databases simply process
updates, they do not check if the update actually changed a
value. With autocorrect load, typically all rows are inserted or
updated, never ignored like the Table Comparison does if nothing
really changed. So when you have a delta load and almost all
rows will remain unchanged, Table Comparison is the way to go.
Updaing the primary key column is even more expensive, then
you update not only the row value but also the index. Therefore
by default we do not include the the primary key columns in the
update statement, so instead of saying "update table set key=1,
name='ABC' where key=1" we remove the key from the set list
like in "update table set name='ABC' where key=1". That's no
problem as the key for the row to be updated is very likely the
same value before but there are exceptions. Unlikely exceptions
but possible. Most common is reading from a CDC table where
we get the information that the key was changed from 1 to 2. In
that case you have to check the update-keys flag. In all other
cases we save time be removing the unneeded update of the key.

But there are also reasons to go for autocorrect load: Some


databases support a so called "upsert" statement in case the
entire dataflow can be pushed down. So if source to target
movement can be pushed down, you should check if Data
Integrator can execute that with an upsert in case of autocorrect
load. So for example, if the database type is Oracle and the
Version 9i or higher, you will find in this case a "merge into
target...insert when ... update when ..." kind of SQL statement
pushed down to the database. And then autocorrect load will be
executed with database performance...
For DB2, you should consider using the bulkloader as there you
have the option to choose "insert/update" - a capability unique to
DB2.
And a last thought: We have spend lots of resources to optimize
Table Comparison. So in general you will find that to be fast, but
that might change in future releases (valid at least for DI 12.0).
So what is faster? I don't know. Give it a try, but keep above said
in mind.

Autocorrect Load Pushdown Example


Skip to end of metadata

Attachments:4
Added by Vicky Bolster, last edited by Vicky Bolster on Mar
01, 2012 (view change)

show comment
Go to start of metadata
Biggest advantage of autocorrect load is the full pushdown similar
to an insert..select. To demo that, let us have a look at the
dataflow attached. We read the table CUSTOMER_MASTER,

query does just a 1:1 mapping of most columns and load the
target into the target table.

As the source and target table is either in the same datastore or


there is a database link defined between both (or DI implicitly
treats both datastores as linked as they point to the same
database), the entire task can be pushed down to the database.
And the query does not use functions that can be executed by the
engine only.
In above case, if the target table is Oracle in version 9i or higher
and autocorrect load is checked, we generate a merge statement.

Now the entire task is handled by the database and DI is not


involved at all. A potential problem could be the transaction size
this SQL command has to execute at once. Therefore the table
loader has the additional option to allow merge statements or not.

But this flag "Use Merge" is meant as a "Use Merge if possible". It


does not mean we force a merge pushdown, even if a pushdown
is not possible as such. If a pushdown is possible and the flag not
checked, then we create a PL/SQL block doing the same thing
manually with intermediate commits.
Degree of Parallelism
Skip to end of metadata

Attachments:16
Added by Vicky Bolster, last edited by Vicky Bolster on Feb
29, 2012 (view change)

show comment
Go to start of metadata
Summary
Imagine a case where one transform is doing most of the
processing. In DI, one transform (if not merged with others or
even pushed down) is always one CPU thread. So at the end, in
such a case the entire multiprocessor server is idle except for the
one CPU. Although this is a rather unlikely case as modern CPUs
are times faster than the database can actually read or write,
especially for high end servers, we need a solution and it is called
Degree of Parallelism, Number of Loaders and Partitioning, the
latter discussed in the sub-chapter DoP and Partitions.
With those flags the optimizer is able to split transforms into
multiple instances and merge the data later, if required.
Take this dataflow: It consists of a Query calling a custom script
function mysleep() and a map operation. Obviously the query will
take a long time to process each row - it does a sleep to simulate
the point - and we want to increase the throughput somehow.

Assuming we have to do that manually, how would above


dataflow look like? We would add a Case Transform to split the
data into multiple sets and copy the query_heavy_processing a
couple of times to finally merge the data. This way each query will
still take the same time per row, but as we have multiple query
instances each has to process a quarter of the data only, it will be
four times faster.

You can get exactly the same kind of processing if you take the
original dataflow with the single query and set the dataflow
property "Degree of Parallelism" (short: DoP) to 4.

Additionally, we need to let DI know if our custom function can be


actually executed in parallel. For the DI internal functions, this
information is provided internally. A substr() function can obviously
be parallelized, a gen_row_num() function cannot, hence all
parallel streams of data upfront that function are merged and fed
into the query transform with the function. For custom functions,
there is a flag...

When executing, you can get a hint of what the optimizer is doing
by looking at the thread names of the monitor log.

We have a round robin split that acts as our case transform, the
Query and three additional AlViews being the generated copies of
the query, the Map_Operation plus three more and the final
merge. (Since 11.7 the threadnames are even more meaningful)
But actually, why do we merge all the data at the end? We could
also have four instances of the target table. Like in this diagram:

To do that automatically, there is an additional parameter at the


table loader object, the "Number of Loaders". If set to 4 as well,
the optimizer automatically creates a flow like the manual version
above.

And what if even want to read the data in parallel streams? We


cannot simply add the source table four times since then we
would read the entire table four times. We need to filter the data
with some constraints thus effectively partitioning the entire
source table into four (hopefully) equal sized parts.

To get to the same result with DOP, the source table has to be
partitioned according to the query where clause. If that table is
partitioned in the database, DI imports the partition information
already and would read in parallel if the reader flag "enable

partitioning" is turned on. If this is a plain table, you can create the
partitions manually via the table properties.

On a side note, why did we use a custom function that is simply


calling the sleep() function for the exercise? The sleep() functions
is marked as a singularization point, as we assumed that waiting
is meant as wait. Therefore we have embedded the sleep into a
parallel aware custom function.
But not only functions are singularization points, some transforms
as well. Transforms where it is obvious like the key_generation
transform. Other transforms like a query with order by or group by
are singularization points too, but internally some steps are
processed in parallel. The sort for example is processed as
individual sort for each stream of data and then all streams are
fed into an internal merge-sort transform that puts the pre-sorted
data into a global order. And then there are transforms that are
singularization points which do not need to be. Table Comparison

in sorted mode is an example for that, one that will get addressed
soon.
When you have many transforms in a dataflow and execute that
with DOP and some transforms are singularization points, some
are not you will find lots of round robin splits and merges. The
logic of DOP is to execute everything as parallel as possible - so
add splits after each singularization point again. And only if two
transforms that have singularization points follow each other, then
they are not split. On the other hand, we have seen already that
Case transform and Merge effectively process millions of rows per
second.
The fun starts if the partitions, number of loaders and DOP do not
match. Then you will find even more splits and merges after the
readers and upfront the loaders to re-balance the data. So this
should be avoided as much as possible just to minimize the
overhead. But actually, it is no real problem unless you are on a
very very big server.
n the previous chapter we said the source table has to be
partitioned in order to allow for parallel reading. So either the table
is partitioned already or we edit the table via the object library and
maintain partition information ourselves. The result will be similar
but not identical.
If the table is physically partitioned, each reader will add a clause
to read just the partition of the given name. In Oracle that would
look like "select * from table partition (partitonname)". That has
three effects. First, the database will read the data of just the one
partition. No access the other partitions. Second, that does work
with hash partitions as well. And third, if the partition information is
not current in the DI repo, e.g. somebody did add another partition
and had not re-imported that table, DI would not read data from
that this new partition. And to make it worse, DI would not even

know it did not read the entire table although it should. In order to
minimize the impact, the engine does check if the partition
information is still current and raise a warning(!) if it is not.
Another problem with physical partitions is, the data might not be
distributed equally. Imagine a table that is partitioned by year. If
you read the entire table, it will be more or less be equal row
numbers in each partition. But what if I am interested in last years
data only? So I have 10 readers, one per partitions and each
reader will have the where clause YEAR >= 2007. Nine of them
will not return much data, hmm? In that case it would be a good
idea to delete the partition information of that table in the
repository and add another, e.g. partition by sales region or
whatever.
Something that is not possible yet, is having two kinds of
partitions. In above example you might have an initial load that
reads all years and a delta load where you read just the changed
data and most of them are in the current year obviously. So for
the initial load using the physical partition information would make
sense, for the delta the manual partition. That cannot be done yet
with DI. On the other hand, a delta load deal with less volumes
anyway, so one can hope that parallel reading is not that
important, just the transformations like Table Comparison should
be parallelized. So the deltaload dataflow would have DoP set but
not the enable-partitions in the reader.
Manual partitions do have an impact as well. Each reader will
have to read distinct data, so each one will have a where clause
according to the partition information. In worst case, each reader
will read the entire table to find the rows matching the where
condition. So for ten partitions we created manually, we will have
ten readers each scanning the entire source table. Even if there is
an index on the column we used as manual partition, the
database optimizer might find that reading index plus table would

take longer than just scanning quickly through the table. This is
something to be very careful with. In the perfect world, the source
table would be partitioned by one clause we use for the initial load
and subpartitioned by another clause, one we can use as manual
partition for the delta load. And to deal with the two partitions
independently, the delta load is done reading from a database
view instead of the table so we have two objects in the object
library, each with its one partitioning scheme.
As said, DoP is used whenever the database throughput is so
high, one transform of DI cannot provide the data fast enough. In
some cases that would be more than a million rows per second if
just simple queries are used, with other transforms like Table
Comparison in row-by-row mode it is just in the 10'000 rows per
second area.
But normally you will find the table loader to be the bottleneck with
all the overhead for the database. Parse the SQL, find empty
space to write the row, evaluate constraints, save the data in the
redo log, copy the old database block to the rollback segment so
other select queries can still find out the old values if the were
started before the insert,... So when we aim for high performance
loading, very quickly you have no choice other than using the API
bulkloaders. They bypass all the SQL overhead, redo log,
everything and write into the database file directly instead. For
that, the table needs to be locked for writing. And how do you
support number of loaders if the table is locked by the first loader
already? You can't. The only option for that is to use API
bulkloaders loading multiple physical tables in parallel, and that
would be loading partitioned tables. Each API bulkloader will load
one partition of the table only and hence lock the partition
exclusively for writing, but not the entire table. The impact for the
DI dataflow is, as soon as the enable partitions on the loader is
checked, the optimizer has to redesign the dataflow to make sure
each loader gets the data of its partition only.

Each stream of data has a Case transform that routes the data
according to the target table partition information into one loader
instance. This target table partition obviously has to be a physical
partition and it has to be current or the API bulkloader will raise an
error saying that this physical partition does not exist.
Using the enable partitions on the loader is useful for API
bulkloaders only. If regular inert statements are to be created, the
number of loaders parameter is probably the better choice

Slow Changing Dimension Type 2


Skip to end of metadata

Attachments:13
Added by Robbie Young, last edited by Robbie Young on Mar 01,
2012 (view change)

show comment
Go to start of metadata
The goal of a slow changing dimension of type two is to keep the
old versions of records and just insert the new ones.
Like in this example, the three input rows are compared with the
current values in the target and for CUSTOMER_ID = 2001 the
city did change to Los Angeles. Therefore, in the target table we
have two rows for this customer, one with the old city name which is not current anymore (CURRENT_IND = N) and has a
VALID_TO date of today - plus the new row with current data as
start date.

All of this is done using common transforms like Query etc. They
all have specific tasks and each collects the information required
for the downstream objects.

The first interesting object is the Query. Since the Table


Comparison Transform is configured to use sorted input mode, an
order by clause got added.

In addition, at this point we add a default VALID_FROM date which shall be mapped to sysdate().

The Table Comparison now compares the input dataset with the
current values of the compare (=target) table based on the key
specified as "input primary key columns" list (CUSTOMER_ID).

This primary key list is important as it tells the transform what row
we want to compare with. Of course, in most cases it will be the
primary key column of the source table but by manually specifying
it we just have more options. But it will be the primary key of the
source table, not the target table's primary key. Keep in mind, one
CUSTOMER_ID will have multiple versions in the target! (In case
you ask yourself why the transform is grayed out: the flow was
debugged at the time the screenshot was made)

With this primary key we can identify the target table row we want
to compare with. But actually, in the target table we just said there
can be multiple rows. Which one should we compare with?? That
is easy, with the latest version. And how can we identify the latest
version? We could use the CURRENT_IND = 'Y' information or
the VALID_TO = '9000.12.31' date. But neither do we not know
the column names storing this information nor the values. And
who said that those columns have to exist! We can use another
trick: As the generated key is truly ascending, we know that the
higher the key value is, the more current it will be. And this is what
table comparison does, it reads all rows for our "input primary
key" with an additional order by on the column identified as
generated key descending, so it will get the most current record
first.
Next is the compare step. All columns from the input schema are
compared with the current values of the target table. If anything
changed here, the row will be sent to the output with the OP code
Update and the current values of the table will be in the before
image, the new values of the input in the after image of the
update row. If the row is entirly new it will be an Insert row and if
nothing did change, it will get discarded.
In our example, there is always at least one change: the
FROM_DATE. In the upstream query we did set that do sysdate!
To deal with that, the Table Comparison transform has an
additional column list for the columns to be compared. There, we
pulled into all columns except the FROM_DATE. Hence, it will be
ignored in the comparison and the row will be discarded if
everything else is still current.
Also, watch at the output structure of the Table Comparison: It is
the compare table schema. The logic is, the transform is
performing the lookup against the compare table and copies all
values into the before image and after image buffer of this row.
Then the input columns overwrite the after image values.

Therefore, columns like the KEY_ID that do not yet exist in the
input schema will contain the compare table value.
The next transform in the row is History Preserving. In the most
simple case, all this transform does is sending insert rows to the
output as is, and for update rows, change the the OP code to
insert as well. This way, records that did not change at all will be
filtered away by the Table Comparison transform, new records are
added and changed records are added as a new version as well.
However, the transform does have more options.

First of all, who said we want to create a new version record if


anything did change? Very often, you are interested in e.g. a
change of the region, but just because the firstname was

misspelled we should add a new row? No, in such cases the


existing record should be updated only. And where does the
transform know what columns did change? By comparing the
before image and after image value of the update row. Only if
anything important did change, the OP code flag is modified to
insert. All other update rows are forwarded as is.
The second optional kind of flags are the valid_from/to dates and
the current indicator. Both are optional but used almost all the
times, at least the date. If we have many versions in the target
table, it would be nice to have an idea when the version was valid
or what the todays version is. For insert rows, this is no big deal,
they will be marked as current, the from date is taken from the
input, the to-date is the default chosen. For a new version record,
the values are the same, but the row in the target table is still
marked as current with valid-to date of 9000.12.31. Therefore, this
transform has to generate an update statement as well leaving all
the values the same as they are right now in the target table these are the before image values - and only the valid-to date is
overwritten with the valid-from date and the current indicator
obviously set to not-current.

As you can see here, the Table Comparison identified a change in


the city and did send the row flagged as Update. And since this
record did exist, we now know the KEY_ID in the after image of
the update. The old from_date is stored in the before image
(unfortunately cannot be viewed) but the after image contains the
new from date. History Preserving now outputs the new version
as insert row with all the after image values. And in addition the
update row where city, from-date, etc..all columns contain
the before image value so will be updated with the same value as
they have right now except the current indicator and the valid-to
date, those are changed to make the record not-current.
If you look carefully at the KEY_ID column, you will find that both,
the update and the new version do have the same key value of 1,
the new customer - the insert row - has a NULL as KEY_ID. And
what do we need?
For the update row we do need the KEY_ID=1, otherwise the
wrong row will be updated - keep in mind there might be many old
versions already. And both insert rows require new key values. So
we but this dataset through the Key Generation transform. This
transform does let pass all update rows as is, and does overwrite
the the KEY_ID with new values for insert rows, regardless their
current value.

And finally, the target table does have a primary key defined,
therefore the table loader will generate an update .... where
KEY_ID = :before_image_value for updates, insert rows are just
inserted.
The performance of that entire dataflow is the same as for Table
Comparison in its respective mode as this transform has the most
overhead. It does lookup the row in the table or inside the cache.
The other transforms are purely executed inside the engine, just
checking if something changed. The table loader will be slower as
before too, simply because it will have more rows to process
- insert new version - update old version. On the other hand, in
many cases Table Comparison will find that no change occurred
at all, so the loader has to process less rows...
One thing the transforms do not support are separate columns for
insert, last_update. The information is there, the valid_from date
of the oldest version is the insert_date, for all other versions it is
the update date. However you cannot have this information in
each column. If you need that, you likely will have to use
database triggers to fill the additional columns.

How to create a Database links in Data Services using


SQL Server
Posted by Mohammad Shahanshah Ansari 12-Mar-2014
Sometimes you need to use multiple databases in a
project where source tables may be stored into a
database and target tables into another database. The
drawback of using two different databases in BODS is that
you cannot perform full pushdown operation in dataflow
which may slow down the job execution and create
performance issue. To overcome this we can create a
database link and achieve full push down operation. Here

is step by step procedure to create a database link in


BODS using SQL Server on your local machine.
Pre-Requisite to create a database Link:
1.
You should have two different datastores created in
your Local repository which are connected to two
different databases in SQL Server (Ex: Local Server).
Note: You may have these databases on a single server
or two different servers.It is up to you.
2.
These two different databases shall exits in your
Local SQL Server.
How to create a Database Links:
Step 1: Create two databases named DB_Source and
DB_Target in your Local SQL Server.
SQL Server Code to create databases. (Execute this in
your query browser)
CREATE Database DB_Source;
CREATE Database DB_Target;
Step2: Create two datastores in your local repository
named DS_Source and connect this to DB_Source
database. Create another datastore named DS_Target and
connect this to DB_Target database.
Now, I want to link DS_Target datastore with DS_Source
datastore so that it behaves as a single datastore in data
services.
Use below details in screenshot to create your
Datastores:
a) Create DS_Source Datastore as shown under

b) Create DS_Target Datastore as shown under

Before we go for third step lets create a Job and see what
will happen without using a database link when we use
the tables from these datastores in a dataflow. Will it
perform full pushdown?
Step 3:
Follow the below screen shot to create your Project, Job
and Dataflow in Designer.

Now go to your Sql Server database and open a query


browser and use the below sql code to create a table with
some data in DB_Source database.
a)
--Create a Sample Table in SQL Server
Create table EMP_Details(EmpID int identity, Name
nvarchar(255));
--Inserting some sample records
Insert into EMP_Details values (1, 'Mohd Shahanshah
Ansari');
Insert into EMP_Details values (2, 'Kailash Singh');
Insert into EMP_Details values (3, 'John');.
b) Once table is created import this
table EMP_Details into your DS_Sourcedatastore.
c) Drag a table from the datastore in your dataflow and
use it as source table. Use a query transform then drag a

template table and fill it the data as shown in the screen


shot below. So, you are creating a target table int
DS_Target datastore.

Once target table created your dataflow will look as


under.

d) Map the columns in Q_Map transform as under.

Now you have source table coming from one database i.e.
DB_Source and Target table is stored into another
database i.e. DB_Target. Lets see if the dataflow is
performing full pushdown or not.
How to see whether full pushdown is happening or
not?
Go to Validation Tab in your designer and select Display
Optimized SQL. Option. Below is the screen shot for the
same.
http://2.bp.blogspot.com/EbP7mLrxp4U/UdkiBhd1iVI/AAAAAAAABxM/VruWWofGQ8
8/s1600/6.png

Below window will pop up once you select above option.

If optimized SQL Code is starting from Select


Clause that means Full pushdown is NOT performing. To
perform the full pushdown your SQL Query has to start
with Insert Command.
Step 4:
How to Create a Linked Server in SQL Server
Now go to SQL Server Database and Create a linked
Server as shown in the screen below.

Fill the details as shown in the screen below for General


Tab

Now, go to Security tab choose the option as shown in


below dialog box.

Click on OK Button. Your link server is created


successfully.
Step 5:
Now It is time to create a datastore link and then see
what optimized SQL it will generate.
Go to advance mode of your DS_Target datastore
property and Click on Linked Datastore and choose

DS_Source Datastore from the list and then click OK


Button.

Below dialog box will appear. Choose Datastore as


DS_Source and click Ok.

Then Click on the browser button as shown below.

Then, select the option as show in dialog box below and


then Click OK button.

Now you have successfully established a database link


between two datastores i.e. between DS_Source and
DS_Target.

Now Save the BODS Job and check the Optimized


SQL from Validation Tab as done earlier.
Go to the dataflow and see what code is generated in
Optimized SQL.

Below optimized code will be shown.

You can see that SQL has insert command now which
means full pushdown is happening for your dataflow.
This is the way we can create a database link for SQL
Server in DS and use more than one databases in a Job
and still perform full pushdown operations.

Data Services 4.2 Workbench


Posted by Louis de Gouveia 01-Mar-2014
A while ago I posted about the new Data Services 4.2
features. The post can be found here Data Services 4.2
What's New Overview
There is obviously several new features. But one of the
new features that SAP will add more and more features to
with each release will be the workbench.
In Data Services 4.1 the workbench was first released but
had limited functionality. I posted about 4,1 workbench
inData Services Workbench Part 1 and Data Services
Workbench Part 2 .
In 4.2 SAP has extended the Workbench functionality. This
blog will focus more on the new functionality.
One of the biggest changes would be that you can now
design the dataflow in the workbench, the first release
did not have this functionality yet. In comparison with
Data Services Designer the concept is to be able to do
most of the dataflow in one window. So in this version
when you click on the query transform all the mapping
will be shown in the below windows. This is illustrated in
Figure 1 below.

Figure 1

Unfortunately not all transforms are available yet in the


workbench. Figure shows the transforms that are
available in the workbench with this version.

Figure 2

Nice little feature I noticed was that when you click on a


column it shows the path of where that column came
from. This could be very handy for complex dataflows.

Figure 3

As shown in figure 4, you can now go to advanced editor


when doing your mappings if needed. The functions have
been also arranged in a similar manner as in Information
Steward.

Figure 4

The workbench makes us of projects still. However in the


workbench the projects shows everything related. So in
the below example it shows the Data Store called
STSSouthEastDemo, also shows two dataflows. Can also
create folders to arrange content

Figure 5

As shown in figure 6 below the log is slightly different,


shown in a table, but still shows the same info.

Figure 6

In Data Services you have always been able to view the


data. But now that the workbench is using the eclipse
based shell in we can then view data as other eclipsed
based sap products. Figure 7 illustrates this. You will
notice this has same look as feel as hana studio and as
IDT. This unfortunately doesnt allow you to view two
tables of data next to each other like, a feature that is
available in the designer and is useful.

Figure 7

So I have shown you some of the new features in the


workbench. Many of them being the replication of the
Data Services Designer but into the eclipse look and feel,
in some instances some new little features or end user
experience improvements.
But I'm still missing a lot before I will switch from the
Designer to the workbench.
Here is a list of what is missing

No workflows, so cant link multiple dataflows to make


one workflow.

No Jobs. Every dataflow is basically creates a job in


the background. So limits how many dataflows and
workflows you can string together.
No debug, break point options
No scripts yet
Not all the transforms are available yet
No cobol support, excel or xml yet
Let the database do the hard work! Better performance in
SAP Data Services thanks to full SQL-Pushdown
Posted by Dirk Venken 13-Feb-2014
SAP Data Services (DS) provides connections to data
sources and targets of different categories. It supports a
wide range of relational database types (HANA, Sybase
IQ, Sybase ASE, SQL Anywhere, DB2, Microsoft SQL
Server, Teradata, Oracle). It can also read and write
data into files (text, Excel, XML), adapters (WebServices,
salesforce.com) and applications (SAP, BW et al.).
Typically, to enable transformations during an ETL
process, non-database data are temporarily stored
(staged, cached) in databases, too. When interfacing with
relational databases, DS generates SQL-statements for
selecting, inserting, updating and deleting data records.

When processing database data, DS can leverage the


power of the database engine. That may be very
important for performance reasons. The mechanism

applied is called SQL-Pushdown: (part of) the


transformation logic is pushed downed to the database in
the form of generated SQL statements. That is because,
although DS itself is a very powerful tool, databases are
often able to process data much faster. On top of that,
internal processing within the database layer avoids or
significantly reduces costly time-consuming data
transfers between database server memory and DS
memory and vice versa.

In many cases, the DS engine is smart enough to take the


right decisions at this level. But it is obvious that a good
dataflow (DF) design will help. The overall principle
should consist in minimizing processing capacity and
memory usage by the DS engine. In fact, following are
the most important factors influencing the performance
of a DS dataflow:

Maximize the number of operations that can be


performed by the database

Minimize the number of records processed by the DS


engine

Minimize the number of columns processed by the


DS engine ( a bit less important, because often with lower
impact)

During development of a DS dataflow, it is always


possible to view the code as it will be executed by the DS
engine at runtime. More in particular, when reading from
a relational database, one can always see the SQL that
will be generated from the dataflow. When a dataflow is
open in the DS Designer, select Validation Display
Optimized SQL from the menu:

Figure 1: Display Optimised SQL

It will show the SQL code that will be generated and


pushed down by the DS engine:

Figure 2: Optimised SQL

Make sure that the dataflow has not been modified after
it has last been saved to the repository. If the dataflow is
modified, it must be saved before displaying the
generated SQL. The Optimized SQL popup window will
always show the code corresponding to the saved version
and not to the one displayed in DS Designer.

When all sources and targets in a flow are relational


database tables, the complete operation will be pushed
to the database under following conditions:

All tables exist in the same database, or in linked


databases.
The dataflow contains Query transforms only. (Bear
with me! In a next blog I will describe some powerful new
features. When connected to HANA, DS 4.2 is able
to push down additional transforms such as
Validation, Merge and Table_Comparison.)
For every DS function used theres an equivalent
function at database level. This has to be true for any
implicitly generated functions, too. For instance, when
data types of source and target columns are different, DS
will include a conversion function, for which possibly no
equivalent function exists at database level! There are no
substitution parameters in the where-clause (replace
them by global variables if necessary).
Bulk loading is not enabled.
The source sets are distinct for every target.

This functionality is commonly called full SQLPushdown. Without any doubt, a full pushdown often
gives best performance, because the generated code will
completely bypass any operations to DS memory. As a
matter of fact that constitutes the best possible
application of the main principle to let the database do
the hard work!

Dont bother applying the performance improvements


described here, if your applications are already
performing well. If thats the case, you can stop reading
here .

Dont fix if its not broken. Check the overall performance


of your job. Concentrate on the few dataflows that take
most of the processing time. Then try and apply the tips
and tricks outlined below on those.

1.

Pushdown_sql function

DS functions for which there is no database equivalent (or


DS does not know it!) prevent the SQL-Pushdown. Check
out the AL_FUNCINFO table in the DS repository to find
out about which DS functions can be pushed down:

SELECT NAME,FUNC_DBNAME FROM AL_FUNCINFO


SOURCE = <your_database_type>

where

Figure 3: DS does not know equivalent database function

There is a solution though when the culprit function is


used in the where-clause of a Query transform. Using the
DS built-in pushdown_sql function this code can be
isolated from DS processing and pushed down to the
database so that the complete statement can be
executed at database level again.

Figure 4: Use of sql_pushdown

2.

Use global variables

There is not always a database equivalent for all DS date


functions. As a result the function is not pushed down to
the database.

Figure 5: Date function no pushdown

Whenever a system timestamp or a derivation thereof


(current year, previous month, today) is needed in a
mapping or a where-clause of a Query transform, use a
global variable instead. Initialize the variable; give it the
desired value in a script before the dataflow. Then use it
in the mapping. The database will treat the value as a
constant that will be pushed to the database.

Figure 6: Use of a global variable

3. Single target table

Best practice is to have one single target table only in a


dataflow.

Figure 7: Single target table

For an extract dataflow that always means a single


driving table, eventually in combination with one or more
lookup sources. For transform, load and aggregate flows,
the columns of the target table are typically sourced from
multiple tables that have to be included as sources in the
dataflow.

By definition, a full SQL-Pushdown cannot be achieved


when theres more than one target table sharing some of
the source tables. With multiple target tables it is
impossible to generate a single SQL insert statement with
a sub-select clause.

Figure 8: More than one target table

Whenever the dataflow functionality requires multiple


target table, adding a Data_Transfer transform (with
transfer_type = Table) between the Query transform and
the target tables might help in solving performance
issues. The full table scan (followed by further DS
processing and database insert operations) is now

replaced by three inserts (with sub-select) that are


completely pushed down to the database.

Figure 9: Data_Transfer transform

Figure 10: Data_Transfer Table type

4.

Avoid auto-joins

When multiple data streams are flowing out of a single


source table, DS is not able to generate the most optimal

SQL code. To that extent, best practice is to include


additional copies of the source table in the flow.

Figure 11: Auto-join

When designing the flow as shown below, DS will


generate a full SQL-Pushdown.

Figure 12: Without auto-join

5.

Another application of the Data_Transfer


transform

When joining a source table with a Query transform (e.g.


containing a distinct-clause or a group by) DS does not
generate a full pushdown.

Figure 13: Sub-optimal DS dataflow

An obvious correction to that problem consists in


removing the leftmost Query transform from the dataflow
by including its column mappings in the Join.

When thats not possible, the Data_Transfer transform


may bring the solution. By using a Data_Transfer

transform, with transfer_type = Table, between the two


Query transforms, performance may be significantly
improved. For the dataflow below, DS will generate 2 full
pushdown SQL statements. The first will insert the Query
results into a temporary table. The second will insert the
Join results into the target.

Figure 14: Optimization with Data_Transfer transform

6.

The Validation transform

In a non-HANA environment, when using transforms


different from the Query transform, processing control will
pass to the DS engine preventing it from generating a full
pushdown. There exists a workaround for validation
transforms, though.

Figure 15: Validation transform

Replacing the Validation by two or more Query


transforms, each with one of the validation conditions in
its where clause will allow DS to generate a (separate)
insert with sub-select for every data stream.

Figure 16: Parallel queries

What is substitution parameter?

Substitution parameters are used to store constant


values and defined at repository level.

Substitution parameters are accessible to all jobs in a


repository.

Substitution parameters are useful when you want to


export and run a job containing constant values in a
specific environment.
Scenario to use Substitution Parameters:
For instance, if you create multiple jobs in a repository
and those references a directory on your local computer
to read the source files. Instead of creating global
variables in each job to store this path you can use a
substitution parameter instead. You can easily assign a
value for the original, constant value in order to run the
job in the new environment. After creating a substitution
parameter value for the directory in your environment,
you can run the job in a different environment and all the
objects that reference the original directory will
automatically use the value. This means that you only
need to change the constant value (the original directory
name) in one place (the substitution parameter) and its
value will automatically propagate to all objects in the job
when it runs in the new environment.
Key difference between substitution parameters
and global variables:

You would use a global variable when you do not


know the value prior to execution and it needs to be
calculated in the job.

You would use a substitution parameter for constants


that do not change during execution. By using a
substitution parameter means you do not need to define
a global variable in each job to parameterize a constant
value.

Substitution
Global Variables
Parameters
Defined
at
Repository
Defined at Job Level
Level
Can not be shared across Available to all Jobs in a
Jobs
repository
Data-Type specific
No data type (all strings)
Fixed value set prior to
Value can change during execution
of
Job
job execution
(constants)
How to define the Substitution Parameters?
Open the Substitution Parameter Editor from the
Designer by selecting
Tools > Substitution Parameter Configurations....
You can either add another substitution parameter in
existing configuration or you may add a new
configuration by clicking the Create New Substitution
Parameter Configuration icon in the toolbar.
The name prefix is two dollar signs $$ (global variables
are prefixed with one dollar sign). When
adding new substitution parameters in the Substitution
Parameter Editor, the editor automatically
adds the prefix.
The maximum length of a name is 64 characters.
In the following example, the substitution parameter $
$SourceFilesPath has the value D:/Data/Staging in the
configuration named Dev_Subst_Param_Conf and the
value C:/data/staging in the Quality_Subst_Param_Conf
configuration.

This substitution parameter can be used in more than one


Jobs in a repository. You can use substitution parameters
in all places where global variables are supported like
Query transform WHERE clauses, Scripts, Mappings, SQL
transform, Flat-file options, Address cleanse transform
options etc. Below script will print the source files path
what is defined above.
Print ('Source Files Path: [$$SourceFilesPath]');
Associating a substitution parameter configuration
with a system configuration:
A system configuration groups together a set of datastore
configurations
and
a
substitution
parameter
configuration. For example, you might create one system
configuration for your DEV environment and a different
system configuration for Quality Environment. Depending
on your environment, both system configurations might
point to the same substitution parameter configuration or
each system configuration might require a different

substitution parameter configuration. In below example,


we are using different substitution parameter for DEV and
Quality Systems.
To associate a substitution parameter configuration with a
new or existing system configuration:
In the Designer, open the System Configuration Editor by
selecting
Tools > System Configurations
You may refer this blog to create the system
configuration.
The following example shows two system configurations,
DEV and Quality. In this case, there are substitution
parameter configurations for each environment. Each
substitution parameter configuration defines where the
data source files are located. Select the appropriate
substitution parameter configuration and datastore
configurations for each system configuration.

At job execution time, you can set the system


configuration and the job will execute with the values for
the associated substitution parameter configuration.
Exporting and importing substitution parameters:
Substitution parameters are stored in a local repository
along with their configured values. The DS does not
include substitution parameters as part of a regular
export. Therefore, you need to export substitution
parameters and configurations to other repositories by
exporting them to a file and then importing the file to
another repository.
Exporting substitution parameters
1.
Right-click in the local object library and
select Repository > Export Substitution Parameter
2.
Configurations.
3.
Select the check box in the Export column for the
substitution parameter configurations to export.
4.
Save the file.
The software saves it as a text file with an .atl extension.

Importing substitution parameters


The substitution parameters must have
exported to an ATL file.

first

been

1.

In the Designer, right-click in the object library and


select Repository > Import from file.
2.
Browse to the file to import.
3.
Click OK.

How to use Pre-Load and Post-Load command in Data


Services.
Posted by Ramesh Murugan 28-Mar-2014
In this article we will discuss How to use Pre-Load and
Post-Load command in data services.
Business Requirement: Need to execute two programs
before and after transformation. The first program will
create or update status to receive data from source to
Target system and the second program will publish the

post transformed data in Target system. These two


program
needs
to
execute
before
and
after
transformation.
For this scenario, we can use Pre-Load and Post-Load
command. Below the details
What is Pre-Load and Post Load?
Specify SQL commands that the software executes before
starting a load or after finishing a load. When a data flow
is called, the software opens all the objects (queries,
transforms, sources, and targets) in the data flow. Next,
the software executes Pre-Load SQL commands before
processing any transform. Post-Load command will
process after transform.
How to use for our business requirement?
We can use both Pre-Load and Post-Load command to
execute program before and after transform, below the
steps will explain in details
Right click on target object in Dataflow and press open

The Target object option will be shown as below

Both the Pre Load Commands tab and the Post Load Commands
tab contain a SQL Commands box and a Value box. The SQL
Commands box contains command lines. To edit/write a line,
select the line in the SQL Commands box. The text for the SQL
command appears in the Value box. Edit the text in that box.

To add a new line, determine the desired position for the new line,
select the existing line immediately before or after the desired
position, right-click, and choose Insert Before to insert a new line
before the selected line, or choose Insert After to insert a new line
after the selected line. Finally, type the SQL command in the
Value box. You can include variables and parameters in pre-load
or post-load SQL statements. Put the variables and parameters in
either brackets, braces, or quotes.

To delete a line, select the line in the SQL Commands box,


right click, and choose Delete.

Open Post-Load Tab and write post transformation


command as same ad Pre-Load

Save and execute. The job will execute Pre-Load, Transform and
Post-Load in a sequence.

Data processing successfully completed as per Business


requirement.
Note:
Because the software executes the SQL commands as a unit of
transaction, you should not include transaction commands in PreLoad or Post-Load SQL statements.
How to capture error log in a table in BODS

Posted by Mohammad Shahanshah Ansari 19-Mar-2014


I will be walking you through (step by step procedure)
how we can capture error messages if any dataflow fails
in a Job. I have taken a simple example with few columns
to demonstrate.
Step 1: Create a Job and name it as ERROR_LOG_JOB
Step 2: Declare following four global variables at the Job
level. Refer the screen shot below for Name and data
types.

Step 3: Drag a Try Block, Dataflow and Catch block in


work area and connect them as shown in diagram below.
Inside dataflow you can drag any existing table in your
repository as a source and populate few columns to a
target table. Make sure target table is a permanent table.
This is just for demo.

Step 4: Open the Catch block and Drag one script inside
Catch Block and name it as shown in below diagram.

Step 5: Open the scrip and write below code inside as


shown in the diagram below.

The above script is to populate the values in global


variables using some in-built BODS functions as well as
calling a custom function to log the errors into a
permanent table. This function does not exits at this
moment. We will be creating this function in later steps.
Step 6: Go to Custom Function section in your repository
and create a new custom function and name it as under.

Step 7: Click next in above dialog box and write the


below code inside the function. You need to declare
parameters and local variables as shown in the editor
below. Keep the datatypes of these parameters and local
variables what we have for global variables in setp 2.
Validate the function and save it.

Step 8: Now your function is ready to use. Considering


that you have SQL Server as a database where you want
to capture these errors in a table. Create a table to store
the information.
CREATE TABLE [dbo].[ERROR_LOG](
[SEQ_NO] [int] IDENTITY(1,1) NOT NULL,
[ERROR_NUMBER] [int] NULL,
[ERROR_CONTEXT] [varchar](512) NULL,
[ERROR_MESSAGE] [varchar](512) NULL,
[ERROR_TIMESTAMP] [VARCHAR] (512) NULL
)
You may change the datastore as per your requirement. I
have taken ETL_CTRL as a datastore in above function
which is connected to a SQL Server Database where
above table is being created.
Step 9: Just to make sure that dataflow is failing, we will
be forcing it to throw an error at run time. Inside your
dataflow use permanent target table. Now double click
the target table and add one text line below existing
comment under load triggers tab. Refer below screen
shot. This is one way to throw an error in a dataflow at
run time.

Step 10: Now your Job is ready to execute. Save and


Execute your Job. You should get an error message
monitor log. Open the table in your database and check if
error log information is populated. Error Log shall look like
as shown below.

ERROR_LOG table shall capture the same error message


in a table as under.

Hope this helps. In case you face any issue, do let me


know.
Advantage of Join Ranks in BODS
Posted by Mohammad Shahanshah Ansari 18-Mar-2014
What is Join Rank?
You can use join rank to control the order in which sources (tables
or files) are joined in a dataflow. The highest ranked source is
accessed first to construct the join.
Best Practices for Join Ranks:

Define the join rank in the Query editor.

For an inner join between two tables, in the Query editor


assign a higher join rank value to the larger table and, if possible,
cache the smaller table.
Default, Max and Min values in Join Rank:
Default value for Join Rank is 0. Max and Min value could be any
non negative number.
Consider you have tables T1, T2 and T3 with Join Ranks as 10,
20 and 30 then table T3 has highest join rank and therefore T3
will
act
as
a
driving
table.
Performance Improvement:

Controlling join order can often have a huge effect on the


performance of producing the join result. Join ordering is relevant
only in cases where the Data Services engine performs the join.
In cases where the code is pushed down to the database, the
database server determines how a join is performed.
Where Join Rank to be used?
When code is not full push down and sources are with huge
records then join rank may be considered. The Data Services
Optimizer considers join rank and uses the source with the
highest join rank as the left source. Join Rank is very useful in
cases where DS optimizer is not being able to resolve the
most efficient execution plan automatically. If join rank
value is higher that means that particular table is driving
the join.
You can print a trace message to the Monitor log file
which allows you to see the order in which the Data
Services Optimizer performs the joins. This information
may
help
you
to
identify
ways
to
improve
theperformance. To add the trace, select Optimized
Data
Flow in
the Trace tab
of
the "Execution
Properties"dialog.
Article shall continue with a real time example on Join Rank
soon.
2615 Views 9 Comments Permalink Tags: performance_op
timization, join_rank
Some cool options in BODS
Posted by Mohammad Shahanshah Ansari 16-Mar-2014

I find couple of cool options in BODS and used to apply in almost


all the projects I have been doing. You may also give a try if not
done yet. Hope you would like these. You can see all these
options in designer.
Monitor Sample Rate:
Right Click the Job > Click on Properties> Then click on
Execution Options
You can change this value of monitor sample rate here
and every time when you execute the Job it shall take the
latest value set.
Setting this value to a higher number has performance
improvement as well as every time you need not to enter this
value while executing the Job. The frequency that the Monitor log
refreshes the statistics is based on this Monitor sample rate. With
a higher Monitor sample rate, Data Services collects more data
before calling the operating system to open the file, and
performance improves. Increase Monitor sample rate to reduce
the number of calls to the operating system to write to the log
file. Default value is set to 5. Maximum value you can set is
64000.
Refer the below screen shot for reference.

Click on the Designer Menu Bar and select Tool >


Options (see the diagram below). There are couple of cool
options available here which can be used in your project.
Note that if you change any option from here,it shall
apply to whole environment.

Once selected Go to:


Designer >General > View data sampling size
(rows)
Refer the below screen shot. You can increase this value to a
higher number if you want to see more no. of records while
viewing the data in BODS. Sample size can be controlled from
here.

Designer >General > Perform complete validation


before Job execution
Refer the below screen shot. I prefer this to set from here as I
need not to worry about validating the Job manually before
executing any Job. If you are testing the Job and there is chance
of some syntax errors then I would recommend this to set before
hand. This will save some time. Check this option if you want to
enable.

Designer >General > Show dialog when job is


completed
Refer the screen shot below. This is also one of the cool
option available in designer. This option facilitate the
program to open a dialog box when Job completes. This
way you need not to see the monitor log manually for
each Job when it completes. I love this option.

Designer >Graphics>
Refer the screen shot below. Using this option you change the
line
type
as
per
your
likes.
I
personally
likeHorizontal/Vertical as all transforms looks more clean
inside the dataflow. You can also change the color scheme,
background etc.

Designer > Fonts


See the dialog box below. Using this option, you can change the
Font Size.

Do feel free to add to this list if you have come across more cool
stuffs
in
BODS.
1325 Views 0 Comments Permalink Tags: bods_options, bo
ds_environment_settings
Quick Tips for Job Performance Optimization in BODS
Posted by Mohammad Shahanshah Ansari 15-Mar-2014

Ensure that most of the dataflows are optimized.


Maximize the push-down operations to the database as
much as possible. You can check the optimized SQL using
below option inside a dataflow. SQL should start with
INSERT INTOSELECT statements.....

Split complex logics in a single dataflow into multiple


dataflows if possible. This would be much easier to
maintain in future as well as most of the dataflows can be
pushed down.

If full pushdown is not possible in a dataflow then


enable Bulk Loader on the target table. Double click the
target table to enable to bulk loader as shown in below
diagram. Bulk loader is much faster than using direct
load.

Right click the Datastore. Select Edit and then go to


Advanced Option and then Edit it. Change the Ifthenelse
Support to Yes. Note that by default this is set to No in
BODS. This will push down all the decode and ifthenelse
functions used in the Job.

Index Creation on Key Columns: If you are joining


more than one tables then ensure that Tables have
indexes created on the columns used in where clause.
This drastically improves the performance. Define
primary keys while creating the target tables in DS. In
most of the databases indexes are created automatically
if you define the keys in your Query Transforms.
Therefore, define primary keys in query transforms itself
when you first create the target table. This way you can
avoid manual index creation on a table.

Select Distinct: In BODS Select Distinct is not


pushed down. This can be pushed down only in case you
are checking the Select Distinct option just before the
target table. So if you require to use select distinct then
use it in the last query transform.

Order By and Group By are not pushed down in


BODS. This can be pushed down only in case you have
single Query Transform in a dataflow.

Avoid data type conversions as it prevents full push


down. Validate the dataflow and ensure there are no
warnings.

Parallel
Execution
of
Dataflows
or
WorkFlows: Ensure that workflows and dataflows are not
executing in sequence unnecessarily. Make it parallel
execution wherever possible.

Avoid parallel execution of Query Transforms in a


dataflow as it prevents full pushdown. If same set of data
required from a source table then use another instance of
the same Table as source.

Join Rank: Assign higher Join Rank value to the


larger table. Open the Query Editor where tables are
joined. In below diagram second table has millions of
records so have assigned higher join rank. Max number
has higher join rank. This improves performance.

Database links and linked datastores: Create


database links if you are using more than one database
for source and target tables (multiple datastores) or in
case using different database servers. You can refer my
another article on how to create the DB Link. Click URL

Use of Joining in place of Lookup Functions: Use


Lookup table as a source table and set as an outer join in
dataflow instead of using lookup functions. This technique
has advantage over the lookup functions as it pushes the

execution of the join down to the underlying database.


Also, it is much easier to maintain the dataflow.
Hope this will be useful.
3791 Views 8 Comments Permalink Tags: performance_op
timization, job_optimization, optimization_tips
How to Create System Configuration in Data Services
Posted by Mohammad Shahanshah Ansari 14-Mar-2014
Why do we need to have system configuration at first
place? Well, the advantage of having system
configuration is that you can use it for the lifetime in a
project.
In
general
all
projects
have
multiple
environments to load the data when project progresses
over the period of time. Examples are DEV, Quality and
Production Environments.
There are two ways to execute your Jobs in multiple
environments:

Edit the Datastores configuration manually for


executing Jobs in different environment and default it to
latest environment

Create the system configuration one time and select


the appropriate environment while executing of the Job
from the Execution Properties window. We are going to
discuss this option in this blog.
Followings are the steps to create system configuration in
Data Services.
Prerequisite to setup the System Configuration:

You need to have at least two configurations ready in


any of your datastores pointing to two different
databases. For example, one for staged data and another
for target data. This can be done easily by editing the
datastore. Right click the datastore and select Edit.
Step 1: Execute any of the existing Job to check if your
repository does not have any system configuration
already created. Below dialog box shall appear once you
execute any Job. Do not click on the OK Button to
execute. This is just to check the execution properties.
If you look at the below dialog box, there is no system
configuration to select.

Step 2:
Cancel the above Job execution and Click on
the Tool menu bar as shown below and select System
Configurations.

Step 3: You can see the below dialog box now. Click on
the icon (red circle) as shown in the below dialog box
toCreate New Configuration. This dialog box will
show all the data stores available in your repository.

Step 4: Once clicked on the above button it will show the


below dialog box with default config details for all

datastores. Now you can rename the system config name


(by default it is System_Config_1, System_Config_1 etc. ).
Select an appropriate configuration Name against each
data stores for your system config. I have taken the DEV
and History DB as an example for configuration. Note
that these configs should be available in your datastores.
See the below dialog box how it is selected. You can
create more than one configuration (Say it one for DEV,
another for History).
Once done, click the OK Button. Now your system
configuration is ready to use.

Step 5: Now execute the any of the existing Job again.


You
can
see
System
Configuration
added
to
the 'Execution Properties' Window which was not
available before. From the drop down list you can select
appropriate environment to execute your Job.

Transfer data to SAP system using RFC from SAP Data


Services
Posted by Ananda Theerthan 13-Mar-2014
This just an sample to demonstrate data transfer to SAP
systems using RFC from Data Services.
To server the purpose of this blog, I am going to transfer
data to SAP BW system from Data Services.
Sometimes we may need to load some lookup or
reference data into SAP BW system from external
sources.
Instead of creating a data source, this method will directly
push data to the database table using RFC.

Below, will explain the steps that I used to test the


sample.
1) Create a transparent table in SE11.

2) Create a function module in SE37 with import and


export parameters.

3) The source code for the FM goes below.


FUNCTION ZBODS_DATE.
*"---------------------------------------------------------------------*"*"Local Interface:
*" IMPORTING
*"
VALUE(I_DATE) TYPE CHAR10
*"
VALUE(I_FLAG) TYPE CHAR10
*" EXPORTING
*"
VALUE(E_STATUS) TYPE CHAR2
*"---------------------------------------------------------------------data: wa type zlk_date.
if not I_DATE is INITIAL.
clear wa.
CALL FUNCTION 'CONVERT_DATE_TO_INTERNAL'
EXPORTING
DATE_EXTERNAL
= i_date
* ACCEPT_INITIAL_DATE
=
IMPORTING
DATE_INTERNAL
= wa-l_date
*
EXCEPTIONS
*
DATE_EXTERNAL_IS_INVALID
=1
*
OTHERS
=2

.
IF SY-SUBRC <> 0.
* Implement suitable error handling here
ENDIF.
wa-flag = i_flag.
insert zlk_date from wa.
if sy-subrc ne 0.
update zlk_date from wa.
endif.
e_status = 'S'.
endif.
ENDFUNCTION.
4) Remember to set the attribute of the FM to RFC
enabled, otherwise it will not be accessible from Data
Services.

5) Make sure both the custom table and function module


are activated in the system.
6) Login to DS Designer,Create new data store of type
"SAP APPLICATION" using required details.
7) In the Object library, you will see an option for
Functions.Right click on it and choose "Import By
Name".Provide the Function module name you just
created in the BW system.

8) Now, build the job with source data, a query transform


and an output table to store the result of function call.

9) Open the query transform editor, do not add any


columns, right click and choose "New Function Call".

10) The imported function will be available in the list of


available objects. Now, just choose and required function
and provide input parameters.

11) Note that for some reason, Data Services doesn't


recognizes DATS data type from SAP. Instead, you have to
use as CHAR and do the conversion latter.

Hence, I am using to_char function to do the conversion


to character format.
12) Now, save the Job and Execute. Once completed,
check the newly created table in BW system to see the
transferred data.

As this is just an sample, an RFC enabled function module


can be designed appropriately to transfer data to any SAP
system. The procedure is similar for BAPIs and IDOCs. You
just need to provide the required parameters in correct
format and it works.
In this blog i would like to explain an approach to build the target
file in the desired Excel format using the xsl style sheet. As we
are aware SAP BusinessObjects Data Services accesses Excel
workbooks as sources only (not as targets). So to overcome this
limitation we can adopt this approach to display our output in the
desired excel format with the help of XSL.
Details on the approach
In this approach we will be building a xml file using the BODS and
will be displaying the xml content in the desired tabular format
with the help to XSL.

So first we have to create a batch job that creates a xml which


contain the required data. Special care must be taken while
designing the Xml structure that holds the data need to be
displayed in tabular structure. Consider this excel structure in the
below example.

In this we have two tabular structure one to hold the header part
and second to hold the category part. So when we define the xml
structure in the BODS we need to create two schema to hold the
Header tabular information and Category tabular information.And
these schema will hold the records that need to be populated in
the target.So for our sample scenario the xml structure will be as
follows

Next we have to build the xsl to describes how to display an XML


document. An XSL style sheet is, like with CSS, a file that
describes how to display an XML document of a given type.XML
does not use predefined tags (we can use any tag-names we
like), and therefore the meaning of each tag is not well
understood. So a without an XSL sheet browser does not know
how to display xml document.
XSL consists of three parts:
XSLT - a language for transforming XML documents
XPath - a language for navigating in XML documents
XSL-FO - a language for formatting XML documents

The root element that declares the document to be an XSL style


sheet is <xsl:stylesheet>

An XSL style sheet consists of one or more set of rules that are
called templates.A template contains rules to apply when a
specified node is matched.
The <xsl:template> element is used to build templates.The match
attribute is used to associate a template with an XML element.
(match="/" defines the whole document. i.e.
The match="/" attribute associates the template with the root
of the XML source document.)

The XSL <xsl:for-each> element can be used to select every XML


element of a specified node-set. So we cab specify how to display
values coming in that specified note-set. Considering our sample
scenario we can select every element in the Header & Category
schema to mention how to display values coming inside that node
set.The <xsl:value-of> element can be used to extract the value of
an XML element and add it to the output stream of the
transformation.

After building the xsl file we need to place that file in the target
folder where BODS will be building the target file. And we also
need to alter the XML header in the target XML structure inside
the job. Default Header defined in the XML header will be <?xml
version="1.0" encoding = "UTF-8" ?> we need to change that
to<?xml version="1.0" encoding = "UTF-8" ?><?xml-stylesheet
type="text/xsl" href="<xsl_fileName>"?>

And In our target XML, hearder will be like this

Target xml generated after the execution of the job can be opened
with Excel. where you will promted with option to open the xml
after applying the stylesheet. And in that we need select our
stylesheet to get the output in the desired Excel format.

And our output in Excel will be displayed as given below

Note: Both the XSL file and the xml target file should be available
in the same folder for getting the desired output.
Attaching the sample xsl and xml file for reference

Calling RFC from BODS


Introduction:-

In this scenario I am demonstrating about how to call Remote


enabled Function Module from BODS.
1) Create SAP Application Datastore.
In this example I am using the SAP_BI as the SAP
Application datastore.
As i have created the Fm in BI system, i have crated datastor
for that system.
2) Import RFC from SAP system.

In Local Object Library expand SAP datastore.

Right click on Functions & click "Import By Name".

Enter the name of the RFC to import & click on


"Import".
Here I am using the ZBAPI_GET_EMPLOYEE_DETAILS
as the RFC.

RFC will be imported & can be seen in the Local Objet


Library.

Note :- This RFC takes Employee ID as the input & displays


Employee details.
I have stored Employee id in the text file, so to read text
file I am using File format as the source.
3) Create File Format for flat (text) file.
This file format(here "Emp_Id_Format") has the list of employee
ids.

4) Create Job, Workflow, Dataflow as usual.


5) Drag File Format into dataflow & mark it as a Source.
6) Drag a query platform also in to data flow & name it (here
"Query_fcn_call").

7) Assign RFC call from Query

Double click on Query.

Right click on "Query_fcn_call" & click "New Function


Call".

Select Function window will open. Choose


appropriate function & click "Next".


In below window click on
parameter.

button & define an input

Select the file format that we have created earlier in


"Input Parameter" window & press OK.

Select Column name from the input file format &


press "OK".
Here the file format has one column only with name as Id.

Click "Next" & select Output Parameters.

Select the required output parameters & click


"Finish".
Here i am selecting all the fields.

Now the Query editor for query platform "Query_fcn_call" can be


seen as follows.

8) Add another query platform into dataflow for mapping


& name it (here "Query_Mapping").

9) Add a template table also.

10) Mapping.

Double click on query "Query_Mapping" & do the


necessary mappings.

11) Save the Job, validate & execute.


12) During execution employee id is taken as a input to the RFC
& output of the rfc is stored in the table.
Output can be seen as follow after execution.
Here employee ids are taken from the File Format &
given to RFC as an input.

Output of the RFC is given as an input to the query


"Query_Mapping" where it is mapped to the target table
fields.

REAL TIME JOB DEMO


A real-time job is created in the Designer and then configured in the
Administrator as a real-time service associated with an Access Server
into the management console..
This Demo will briefly explain the management console setting ..
We can execute the Real time job from any third party tool. let us use
SOAPUI(third party tool) to demonstrate our Real time job.
Below is the screenshot of Batch Job used to create a sample table in the
database(First Dataflow) and create the XML target file(second
Dataflow). The XML Target file(Created in the second Dataflow) can be
used to create the XML MESSAGE SOURCE in the real time job.

Below is the screenshot transformation logic of


dataflow(DF_REAL_Data)

Below is the screenshot transformation logic of


dataflow(DF_XML_STRUCTURE)

Below is the screenshot transformation logic of Query


Transform "Query" used in DF_XML_STRUCTURE

Below is the screenshot transformation logic of Query


Transform "Query" used in DF_XML_STRUCTURE

Below image show the creation of the Real time job in


Data services.

FINALLY RUN THE REAL-TIME JOB USING SOAP UI


TOOL
1.
2.

Run the SoapUI tool


Create the project browser the WSDL file.

3.

Under project Real-time servicescheck the project


namesend the request.
4.
Request Window will open now enter the search
string in it.
5.
Finally the record will come.
1773 Views 3 Comments Permalink

Demo on Real time job


Posted by Ravi Kashyap 29-Jul-2014
REAL TIME JOB DEMO

A real-time job is created in the Designer and then configured in the


Administrator as a real-time service associated with an Access Server
into the management console..
This Demo will briefly explain the management console setting ..
We can execute the Real time job from any third party tool. let us use
SOAPUI(third party tool) to demonstrate our Real time job.
Below is the screenshot of Batch Job used to create a sample table in the
database(First Dataflow) and create the XML target file(second
Dataflow). The XML Target file(Created in the second Dataflow) can be
used to create the XML MESSAGE SOURCE in the real time job.

Below is the screenshot transformation logic of


dataflow(DF_REAL_Data)

Below is the screenshot transformation logic of


dataflow(DF_XML_STRUCTURE)

Below is the screenshot transformation logic of Query


Transform "Query" used in DF_XML_STRUCTURE

Below is the screenshot transformation logic of Query


Transform "Query" used in DF_XML_STRUCTURE

In the Below second query transform to nest the data. Select the
complete Query from schema IN and import under the
Query of schema out

Creation of the XML schema from the Local Object Library

Go to the Second Query again and make the Query name same
as in the XML schema(Query_nt_1).
Note: If we do not change the Query name it give a ERROR

In the Below Image the Query name is rename the same


name what its displayed in the XML schema

The Below image show the creation of the Real time job.

To Test and Validate the job


In the Demo, The End user pass the EMP_ID(1.000000)
using the third party tool which triggers the Real-time job
taking the input as XML MESSAGE SOURCE and obtains
other details from the database table based on the
EMP_ID Value to the End user in XML MESSAGE TARGET..
Below is the output of XML file ..

FINALLY RUN THE REAL-TIME JOB USING SOAP UI


TOOL :
1. Run the SoapUI tool
browser the WSDL file.2. Create the project
send the request.check the project nameReal-time
services3. Under project
4. Request Window will open now enter the search string
in it.
5. Finally the record will come

DS Standard Recovery Mechanism


Posted by Samatha Mallarapu 04-Jul-2014
Introduction:

This document gives overview of standard recovery


mechanism in Data Services.
Overview: Data Services provides one of the best inbuilt
features to recover job from failed state. By enabling
recovery, job will start running from failed instance
DS provides 2 types of recovery
Recovery: By default recovery is enabled at Dataflow
level i.e. Job will always start from the dataflow which
raised exception.
Recovery Unit: If you want to enable recovery at a set
of actions, you can achieve this with recovery unit option.
Define all your actions it in a Workflow and enable
recovery unit under workflow properties. Now in recovery
mode this workflow will run from beginning instead of
running from failed point.

When recovery is enabled, the software stores results


from the following types of steps:
Work flows
Batch data flows
Script statements
Custom functions (stateless type only)
SQL function
exec function
get_env function
rand function
sysdate function
systime function

Example:
This job will load data from Flat file to Temporary Table. (I
am repeating the same to raise Primary Key exception)

Running the job:


To recover the job from failed instance, first job should be
executed by enabling recovery. We can enable under
execution properties.

Below Trace Log shows that Recovery is enabled for this


job.

job failed at 3rd DF in 1st WF. Now i am running job in


recovery mode

Trace log shows that job is running in Recovery mode


using recovery information from previous run and Starting
from Data Flow 3 where exception is raised.

DS Provides Default recovery at Dataflow Level

Recovery Unit:
With recovery, job will always starts at failed DF in
recovery run irrespective of the dependent actions.
Example: Workflow WF_RECOVERY_UNIT has two
Dataflows loading data from Flat file. If any of the DF
failed, then both the DFs have to run again.
To achieve, This kind of requirement, we can define all the
Activities and make that as recovery unit. When we run
the job in recovery mode, if any of the activity is failed,
then it starts from beginning.
To make a workflow as recovery unit, Check recovery Unit
option under workflow properties.

Once this option is selected,on the workspace diagram,


the black "x" and green arrow symbol indicate that a work
flow is a recovery unit.

Two Data Flows under WF_RECOVERY_UNIT

Running the job by enabling recovery , Exception


encountered at DF5.

Now running in recovery mode. Job uses recovery


information of previous run. As per my requirement, job
should run all the activities defined under Work Flow
WF_RECOVERY_UNIT instead of failed DataFlow.

Now Job Started from the beginning of the


WF_RECOVERY_UNIT and all the Activities defined inside
the workflow will run from the beginning insted of starting
from Failed DF (DF_RECOVERY_5).
Exceptions:
when you specify a work flow or a data flow should only
execute once, a job will never re-execute that work flow
or data flow after it completes successfully, except if that
work flow or data flow is contained within a recovery unit
work flow that re-executes and has not completed
successfully elsewhere outside the recovery unit.
It is recommended that you not mark a work flow or data
flow as Execute only once when the work flow or a parent
work flow is a recovery unit.
1321 Views 1 Comments Permalink Tags: recovery, disast
er_recovery, bods_concepts, restore;, job_recovery

How to improve performace while using auto correct load


Posted by Sivaprasad Sudhir 27-Jun-2014
Using auto correct load option in target table will degrade
the performance of BODS jobs. This prevents a full pushdown operation from the
source to the target when the source and target are in
different datastores.
But then Auto correct load option is an inavoidable
scenario where no duplicated rows are there in the target.
and its very useful for data recovery operations.
When we deal with large data volume how do we improve
performance?
Using a Data_Transfer transform can improve the
performance of a job. Lets see how it works :-)
Merits:

Data_Transfer transform can push down the


operations to database server.

It enables a full push-down operation even if the


source and target are in different data stores.

This can be used after query transforms with GROUP


BY, DISTINCT or ORDER BY functions which do not allow
push down
The idea behind here is to improve the performance is to
push down to database level.
Add a Data_Transfer transform before the target to enable
a full push-down from the source to the target. For a

merge operation there should not be any duplicates in


the source data. Here the data_transfer pushes down the
data to database and update or insert record into the
target table until duplicates are not met in source.

This example may help us to understand the usage of


SCD Type 1 and with how to handle the error messages.

Brief about Slowly Changing Dimensions: Slowly


Changing Dimensions are dimensions that have data that
changes over time.
There are three methods of handling Slowly Changing
Dimensions are available: Here we are concentrating only
on SCD Type 1.

Type 1- No history preservation - Natural consequence of


normalization.

For a SCD Type 1 change, you find and update the


appropriate attributes on a specific dimensional record.
For example, to update a record in the

SALES_PERSON_DIMENSION table to show a change to an


individuals SALES_PERSON_NAME field, you simply
update one record in the
SALES_PERSON_DIMENSION table. This action would
update or correct that record for all fact records across
time. In a dimensional model, facts have no meaning until
you link them with their dimensions. If you change a
dimensional attribute without appropriately accounting
for the time dimension, the change becomes global
across all fact records.

This is the data before the change:

15

SALES_PERSON_

SALES_PERSON_

KEY

ID
00120

NAME
Doe, John B

This is the same table after the salespersons name has


been changed:

SALES_PERSON_

SALES_PERSON_

KEY

ID

15

00120

NAME
Smith, John B

However, suppose a salesperson transfers to a new sales


team. Updating the salespersons dimensional record
would update all previous facts so that the
salesperson would appear to have always belonged to the
new sales team. This may cause issues in terms of
reporting sales numbers for both teams. If you want to
preserve an accurate history of who was on which sales
team, Type 1 is not appropriate.

Below is the step by Step Batch Job creation using SCD


Type 1 using error Handling.

Create new job

Add Try and "Script" controls from the pallet and drag to
the work area

Create a Global variable for SYSDATE

Add below script in the script section.

# SET TODAYS DATE


$SYSDATE = cast( sysdate( ), 'date');
print( 'Today\'s date:' || cast( $SYSDATE, 'varchar(10)' ) );

Add DataFlow.

Now double click on DF and add Source Table.

Add Query Transformation

Add LOAD_DATE new column in Query_Extract


Map created global variable $SYSDATE. If we mention
sysdate() this functional call every time which may hit the
performance.

Add another query transform for lookup table

Create new Function Call for Lookup table.

Required column added successfully via Lookup Table.

Add another Query Transform. This query will decide


whether source record will insert and update.

Now remove primary key to the target fileds.

Create new column to set FLAG to update or Insert.

Now write if then else function if the LKP_PROD_ID is null


update FLAG with INS if not with UPD.

ifthenelse(Query_LOOKUP_PRODUCT_TIM.LKP_PROD_KEY
is null, 'INS', 'UP')

Now Create case Transform.

Create two rules to FLAG filed to set INS or UPD


Create Insert and Update Query to align the fields
Change LKP_PROD_KEY to PROD_KEY and PROD_ID to
SOURCE_PROD_ID for better understanding in the target
table.
Now create Key Generation transform to generate
Surrogate key
Select Target Dimension table with Surrogate key
(PROD_KEY)
Set Target instance

Add a Key_Generation transformation for the Quary_Insert


to add count for the new column.

And for Query _Update we need Surrogate key and other


attributes. Use the Map Operation transform to update
records.

By default Normal mode as Normal. We want to update


records in normal mode.

Update Surrogate key, Product key and other attributes.

Go back to insert target table --> Options -->


Update Error Handling as below:

Go back to Job screen and create catch block

Select required exception you want to catch. and Create


script to display error messages

Compose your message to print errors in the


script_ErrorLogs as below.

print( 'Error Handling');


print( error_message() || ' at ' || cast( error_timestamp(),
'varchar(24)'));
raise_exception( 'Job Failed');

now Validate script before proceed further.

Now these messages will catch errors with job completion


status.
Now create a script to print error message if there is any
database rejections:

# print ( ' DB Error Handling');


if( get_file_attribute( '[$$LOG_DIR]/
VENKYBODS_TRG_dbo_Product_dim.txt ', 'SIZE') > 0 )
raise_exception( 'Job Failed Check Rejection File');

note: VENKYBODS_TRG_dbo_Product_dim.txt is the file


name which we mentioned in the target table error
handling section.

Before Execute, Source and Target table data of


Last_updated_Date.

Now Execute the job and we can see the


Last_Updated_Dates.

Now try to generate any error to see the error log


captured our error Handling.

try to implement the same and let me know if you need


any further explanation on this.

Thanks
Venky
840 Views Permalink Tags: scd, scdtype, nohistorypreserv
ation

Better Python Development for BODS: How and Why


Posted by Jake Bouma 23-Apr-2014
Not enough love: The Python User-Defined
Transform

In my opinion, the python user-defined transform (UDT)


included in Data Services (Data Quality -> UserDefined)
bridges several gaps in the functionality of Data
Services. This little transform allows you to access
records individually and perform any manipulation of

those records. This post has two aims: (1) to encourage


readers to consider the Python transform the next time
things get tricky and (2) to give experienced developers
an explanation on how to speed up their Python
development in BODS.

Currently, if you want to apply some manipulation or


transformation record by record you have two options:
1.

Write a custom function in the BODS Scripting


language and apply this function as a mapping in a query.
2.
Insert a UDT and write some python code to
manipulate each record.

How to choose? Well, I would be all for keeping things


within Data Services, but the built-in scripting language is
a bit dry of functionality and doesn't give you direct
access to records simply because it is not in a data flow.
In favour of going the python route are the ease and
readability of the language, the richness of standard
functionality and the ability to import any module that
you could need. Furthermore with Python data can be
loaded into memory in lists, tuples or hash-table like
dictionaries. This enables cross-record comparisons,
aggregations, remapping, transposes and any
manipulation that you can imagine! I hope to explain
how useful this transform is in BODS and how nicely it
beefs up the functionality.

For reference, the UDT is documented chapter 11


ofhttp://help.sap.com/businessobject/product_guides/sbod
s42/en/ds_42_reference_en.pdf
The best way to learn python is perhaps just to dive in,
keeping a decent tutorial and reference close at hand. I
won't recommend a specific tutorial; rather google and
find one that is on the correct level for your programming
ability!

Making Python development easier


When developing I like to be able to code, run, check
(repeat). Writing Python code in the Python Smart Editor
of the UDT is cumbersome and ugly if you are used to a
richer editor. Though it is a good place to start with
learning to use the Python in BODS because of the "I/O
Fields" and "Python API" tabs, clicking through to the
editor every time you want to test will likely drive you
mad. So how about developing and testing your
validation function or data structure transform on your
local machine, using your favourite editor or IDE
(personally I choose Vim for Python)? The following two
tips show how to achieve this.

Tip#1: Importing Python modules


Standard Python modules installed on the server can be
imported as per usual using import. This allows the
developer to leverage datetime, string manipulation, file
IO and various other useful built-in modules. Developers
can also write their own modules, with functions and
classes as needed. Custom modules must be set up on
the server, which isn't normally accessible to Data
Services Designers.

The alternative is to dynamically import custom modules


given their path on the server using the imp module. Say
you wrote a custom module to process some records
called mymodule.py containing a function myfunction.
After placing this module on the file server at an
accessible location you can access its classes and
functions in the following way

1. import imp
2. mymodule = imp.load_source('mymodule', '/path/to/mymodul
e.py')
3. mymodule.myfunction()

This enables encapsulation and code reuse. You can


either edit the file directly on the server, or re-upload it
with updates, using your preferred editor. What I find
particularly useful is that as a data
analyst/scientist/consultant/guy (who knows these days) I
can build up an arsenal of useful classes and functions in
a python module that I can reuse where needed.

Tip#2: Developing and testing from the comfort of


your own environment
To do this you just need to write a module that will mimic
the functionality of the BODS classes. I have written a
module "fakeBODS.py" that uses a csv file to mimic the
data that comes into a data transform (see attached).
Csv input was useful because the transforms I was
building were working mostly with flat files. The code
may need to be adapted slightly as needed.

Declaring instances of these classes outside of BODS


allows you to compile and run your BODS Python code on
your local machine. Below is an example of a wrapping
function that I have used to run "RunValidations", a
function that uses the DataManager and Collection,
outside of BODS. It uses the same flat file input and
achieves the same result! This has sped up my
development time, and has allowed me to thoroughly test

implementations of new requirements on a fast changing


project.

1. def test_wrapper():
2.
3.

4.
5.

import fakeBODS
Collection = fakeBODS.FLDataCollection('csv_dump/tmet
a.csv')
DataManager = fakeBODS.FLDataManager()
RunValidations(DataManager, Collection, 'validationFuncti
ons.py', 'Lookups/')

Limitations of UDT
There are some disappointing limitations that I have
come across that you should be aware of before setting
off:

The size of an output column (as of BODS 4.1) is


limited to 255 characters. Workaround can be done using
flat files.

You can only access data passed as input fields to


the transform. Variables for example have to be mapped
to an input column before the UDT if you want to use
them in your code.

There is no built-in functionality to do lookups in


tables or execute sql through datastore connections from
the transform.

How a powerful coding language complements a


rich ETL tool
Python code is so quick and powerful that I am starting to
draw all my solutions out of Data Services into custom
python modules. It is faster, clearer for me to
understand, and more adaptable. However, this is
something to be careful of. SAP BODS is a great ETL tool,
and is a brilliant cockpit from which to direct your data
flows because of its high-level features such as
authorizations, database connections and graphical job
and workflow building. The combination of the two, in my
opinion, makes for an ideal ETL tool.

This is possibly best demonstrated by example. On a


recent project (my first really) with the help of Python
transforms and modules that I wrote I was able to solve
the following:

Dynamic table creation and loading


Executeable metadata (functions contained in excel
spreadsheets)

Complicated data quality analysis and reporting


(made easy)

Reliable unicode character and formatting export


from excel

Data Services 4.1 on the other hand was indispensable in


solving the following requirements

Multi-user support with protected data (aliases for


schemas)

Maintainable centralized processes in a central object


library with limited access for certain users

A framework for users to build their own Jobs using


centralized processes.
The two complemented each other brilliantly to reach a
solid solution.

Going forward
With the rise of large amounts of unstructured data and
the non-trivial data manipulations that come with it, I
believe that every Data analyst/scientist should have a
go-to language in their back pocket. As a trained
physicist with a background in C/C++ (ROOT) I found
Python incredibly easy to master and put it forward as
one to consider first.

I do not know what the plan is for this transform going


forward into the Data Services Eclipse workbench, but
hopefully the merits of allowing a rich language to
interact with your data inside of BODS are obvious
enough to keep it around. I plan to research this a bit
more and follow up this post with another article.

about me...
This is my first post on SCN. I am new to SAP and have a
fresh perspective of the products and look forward to
contributing on this topic if there is interest. When I get
the chance I plan to blog about the use of Vim for a data
analyst and the manipulation of data structures using
Python.

More security features in SAP Data Services


Posted by Dirk Venken 21-Jan-2015
This message contains some internal system
details which have been hidden for security. If you
need to see the full contents of the original
message, ask your administrator to assign
additional privileges to your account.
Have you ever run into this error message before? And you were
curious to see the original message? Here's how to get it.
Start the Central Management Console. Navigate to Data
Services Application:

Select User Security:

Select the user or group you want to authorise and select "Assign
Security":

Select the Advanced tab, then "Add/remove Rights":

Grant "View internal information in log" and apply changes in both


panels.
Next time your DS job runs into an error, you'll see the complete
original error message.
Pre-requisites for connecting SAP BODS with ECC system
Posted by ANNESHA BHATTACHARYA 08-Dec-2014
For connecting SAP BODS with ECC system, we need to create a
SAP Applications datastore in Data Services. For this we need to
specify the data transfer method. This method defines how data
that is extracted by the ABAP running on the SAP application
server becomes available to the Data Services server.
The options are:

RFC: Use to stream data from the source SAP


system directly to the Data Services data flow process
using RFC.
o Direct download: The SAP server transfers the
data directly to the Local directory using the SAPprovided function GUI_DOWNLOAD or
WS_DOWNLOAD.
o Shared directory: Default method. The SAP
server loads the transport file into the Working
directory on SAP server. The file is read using the
Application path to the shared directory from the Job
Server computer.
o FTP: The SAP server loads the Working directory
on SAP server with the transport file. Then the Job
Server calls an FTP program and connects to the SAP
server to download the file to the Local directory.
o Custom Transfer: SAP server loads the Working
directory on SAP server with the transport file. The file
is read by a third-party file transfer (custom transfer)
program and loaded to the Custom transfer local
directory.
Prerequisites:
1.
Need to define a SAP Applications datastores which
includes the following information
o Connection information including the application
server name, the language used by the SAP client
application, the client and system numbers
o Data transfer method used to exchange information
between Data Services and the SAP application.
o Security information, specifically the SAP security
profile to be used by all connections instigated from this
datastore between Data Services and the SAP
application.

2.

In case the Data Transfer Method is Direct


Download the following checks should be ensured
o Check whether direct download is the right
method for us as it is actually calling the
gui_download ABAP function call which is very
unreliable with bigger amounts of data.
o Transport of data takes about 40 times longer
than with the other protocols.
o We cannot use 'execute in background' with this
option
o Configuring it is simple; we just specify a
directory on the jobserver in the field Client
Download Directory.
o But we need to ensure whether this directory
actually exists
3.
In case the Data Transfer Method is Shared
Directory the following checks should be ensured
o While the 'working directory on SAP server' is
the point where the ABAP will write the file to, the
'Application path to the shared directory' is the
path to access this same directory from the
jobserver.
o Whatever we specify as working directory, SAP
should have the write access to that.
o The files generated by the SAP account, the
BODS user has to have read permissions for.
Typically, this is done by placing the BODS user
into the same group as SAP is.
4.

In case the Data Transfer Method is FTP the


following checks should be ensured
o Ensure that through the command prompt we
are able to login by using the hostname the ftp
server is running on, the username to login to ftp ,

and the password (In the command prompt, call


ftp 'hostname' and type username password)
o Next check what 'cd' (change directory)
command we have to do in order to get to the
working directory on SAP server? Copy this path
as the 'ftp relative path' in the datastore
properties.
o Next step would be to check permissions on the
files. In general, SAP should create the files with
read permission on its main group; the ftp user
should be part of that SAP group so it can read
the files.
o Ensure that the directory the file should be
downloaded will be a directory on the jobserver
computer.
5.
In case the Data Transfer Method is Custom
Transfer we need to ensure
o A batch file needs to be specified that does all
the download.
6.
The execution mode should be
generate_and_execute

To define SAP Application Datastore:


a)
In the Datastore tab of the object library, right-click
and select New.
b)
Enter a unique name for the datastore in the
Datastore name box.
c)
The name can contain alphanumeric characters and
underscores. It cannot contain spaces.
d)
For Datastore type, select SAP Applications.
e)
Enter the Application server name.
f)
Enter the User name and Password information.

g)

To add more parameters, click Advanced, enter the


information as below and click OK to successfully create
a SAP Application Datastore.

Here the Working directory on SAP server is the point


where the ABAP will write the file to and the Generated
ABAP directory is the path to access this same directory
from the jobserver.

Use of History Preserving Transform


Introduction:It is used to preserve the history of the source records. If
the source row has operation code of Insert/Updatethen
it insert a new record in the target table.
Scenario:-

We are doing a scenario where we want to insert the updated


record into target table to preserve the history of the source
records.
1) Create project, job, workflow & dataflow as usual.
2) Drag a source table to dataflow. Its contents are as follows.

3) Drag a target table to dataflow. Its contents are as follows.

4) Drag query, Table-Comparison, History_Preserving transform


as shown in the figure.

5) Open Query & do mappings as you do normally.

6) Open Table_Comparison block & enter all the properties.

Table Name:- Select Target Table from the dropdown


box.
Generated Key Column:- Specify key column

Select the "EMP_ID" node from the tree on LHS &


drag into "Input primary key columns" list box. Now
the comparison of the target table will take place based
on whether the source EMP_ID is present in the target or
not & comparison will be made based on the column s
given under "Compare columns" list box.

Similarly select the columns that are to be compared


while transferring the data & drag it to "Compare
Columns" list box.

Select "Cached comparison table" radio button.


7) Similarly provide details for the History_Preserving block.

In Compare column select the columns as specified


in the Table Compassion transform.

Specify Date columns as specified.

Here we are mentioning the valid date as


9000.12.31.

In target table we have maintained the column as


"Flag" & based on the Update operation the original value
of the column for that particular record will be replaced
from Y to N. And new records will be inserted with the
status as 'Y'.
8) Now after this we have updated 1st 3 rows of source records &
4th row is deleted .

Fields where changes are made are circled with the red marks
as seen in the above figure.
9) Validate & Execute the job.
10) 3 new records got added in the target table as shown
below.

You can see that new entry for updated record is made in the
target table along with the 'Y' flag & new END_DATE as
'9000.12.31'
& the flag of the original records are changed to 'N'.
Summary:-

So in this way History Preserving block is useful in preserving the


History if the source records.
Capture Killed job status in BODS
Posted by Tanvi Ojha 11-Nov-2014
Error Handling and recovery mechanisms are very important
aspect of any ETL tool. BO Data Services have in-built errorhandling and automatic recovery mechanisms in place. Also by
using different dataflow designs, we can manually recover a job
from a failed execution and ensure proper data in the target.
In manual recovery, Each and every dataflow/workflow's
execution status should be captured in a table(we call it as control
table) which helps to execute only failed datflow/workflow in next
run.
But if we have got a scenario where the job is stuck and
we have to kill the job manually then the status of the
killed job will not be automatically updated from 'Running'
to 'Killed'/'Failed' in the control table as when a job is
killed,job gets terminated there itself,the flow doesn't go
inside catch block also where we put the script or
dataflow to capture the job status.

In this scenario, We can put a script at the starting of our job


which will first check the previous execution status of the job in
control table,if it shows 'Running' then we can update the previous
instance status in control table as 'Failed'/'Completed' using

AL_HISTORY table(This metadata table captures all the jobs


status with job name,job runid,start and end date):
$G_PREV_RUNID = sql('<DATASTORE_NAME>','select
max(JOB_RUN_ID) from JOB_CONTROL where JOB_NAME =
{$G_JOB_NAME } and JOB_STATUS = \'R\'') ;
$G_ERR_STATUS = sql('DS_DBH','select STATUS from
AL_HISTORY where SERVICE = {$G_JOB_NAME } and
(END_TIME) = select max(END_TIME) from JOB_CONTROL
where JOB_NAME = {$G_JOB_NAME }) ;
IF($G_ERR_STATUS=\'E\')
sql('DS_DBH','UPDATE JOB_CONTROL SET JOB_STATUS
= \'F\' WHERE JOB_RUN_ID=[$G_PREV_RUNID]');
AL_HISTROY table contains following columns :

NOTE : We need to have 'Select' access to the database


on which BODS repository is created.
Efficient extraction of most recent data from a history
table
Posted by Dirk Venken 01-Oct-2014
You have a table that contains multiple time stamped
records for a given primary key:
Key
03
01
02
01
02
01
03
01
03

Att
747
ABC
UVW
DEF
XYZ
JKL
777
GHI
737

Timestamp
2012.11.11 04:17:30
2014.09.30 17:45:54
2014.04.16 17:45:23
2014.08.17 16:16:27
2014.08.25 18:15:45
2012.04.30 04:00:00
2014.07.15 12:45:12
2013.06.08 23:11:26
2010.12.06 06:43:52

Output required is the most recent record for every key


value:
Key Att
Timestamp
01
ABC 2014.09.30 17:45:54
02
XYZ 2014.08.25 18:15:45
03
777 2014.07.15 12:45:12

Solution #1: Use the gen_row_num_by_group


function
Build a dataflow as such:

In the first query transform, sort the input stream


according to Key and Timestamp desc(ending). The sort
will be pushed to the underlying database, which is often
good for performance.
Key
01
01
01
01
02
02
03
03
03

Att
ABC
DEF
GHI
JKL
XYZ
UVW
777
747
737

Timestamp
2014.09.30 17:45:54
2014.08.17 16:16:27
2013.06.08 23:11:26
2012.04.30 04:00:00
2014.08.25 18:15:45
2014.04.16 17:45:23
2014.07.15 12:45:12
2012.11.11 04:17:30
2010.12.06 06:43:52

In the second query transform, add a column Seqno and


map it to gen_row_num_by_group(Key).
Key
01
01
01
01
02
02
03
03
03

Att
ABC
DEF
GHI
JKL
XYZ
UVW
777
747
737

Timestamp
2014.09.30 17:45:54
2014.08.17 16:16:27
2013.06.08 23:11:26
2012.04.30 04:00:00
2014.08.25 18:15:45
2014.04.16 17:45:23
2014.07.15 12:45:12
2012.11.11 04:17:30
2010.12.06 06:43:52

Seqno
1
2
3
4
1
2
1
2
3

In the third query transform, add a where-clause Seqno =


1 (and dont map the Seqno column).
Key Att
Timestamp
01
ABC 2014.09.30 17:45:54
02
XYZ 2014.08.25 18:15:45
03
777 2014.07.15 12:45:12

Solution #2: use a join


Suppose were talking Big Data here, there are millions of
records in the source table. On HANA. Obviously. Although
the sort is pushed down to the database, the built-in
function is not. Therefore every single record has to be

pulled into DS memory; and then eventually written back


to the database.
Now consider this approach:

The first query transform selects two columns from the


source table only: Key and Timestamp. Define a group by
on Key and set the mapping for Timestamp to
max(Timestamp).
Key
Timestamp
01
2014.09.30 17:45:54
02
2014.08.25 18:15:45
03
2014.07.15 12:45:12
In the second transform, (inner) join on Key and
Timestamp and map all columns from the source table to
the output.
Key Att
Timestamp
01
ABC 2014.09.30 17:45:54
02
XYZ 2014.08.25 18:15:45

Key Att
Timestamp
03
777 2014.07.15 12:45:12
If you uncheck bulk loading of the target table, youll
notice that the full sql (read and write) will be pushed to
the underlying database. And your job will run so much
faster!
Note: This second approach produces correct results only
if there are no duplicate most recent timestamps within a
given primary key.
Validation transform is used to filter or replace the
source dataset based on criteria or validation rules to
produce desired output dataset.
It enables to create validation rules on the input dataset,
and generate the output based on whether they have
passed or failed the validation
condition.
In this Scenario we are validating the data from the database
table with correct format of the zip code.
If the zip code is less than 5 digit then we will filter that data &
pass it to another table.
The Validation transform can generate three output
dataset Pass, Fail, and RuleViolation.
1.

The Pass Output schema is identical with the Input


schema.
2.
The Fail Output schema has 3 more columns,
DI_ERRORACTION and DI_ERRORCOLUMNS, DI_ROWID.

3.

The RuleViolation has three columns DI_ROWID,


DI_RULENAME and DI_COLUMNNAME.
Steps:1) Create project, job, workflow, dataflow as usual.
2) Drag source table, Validate transform& provide details.

Double click on Validation transform to provide


details. You can see the 3 types of dataset as described
above.

Add a validation rule.

Click Add & fill the details about the rule as follows.

Action on Fail:1) Send to Fail:- on failure of the rule the record will
sent to another target with "Fail" records.
2) Send to Pass:- even on failure pass the record to the
normal target
3) Send to Both:- sends to both the targets.
Column Validation:Select the column to be validated, then decide the
condition.
We have selected "Match Pattern" as the condition
pattern as '99999'.
So it will check whether Zip code is of 5 digits or not.

Press OK. Then you can see the entry get added as
follows.

3) Add a Target table to the dataflow & link the Validate Transform
to it.

Choose the validate condition as "Pass"

Similarly do the connection for "Fail" & "Rule


Violation" condition.

4) Validate the job & execute it.


5) Check the input & output.

Input:-

You can see in the input in the above figure where


the last row has zip code of less than 5 digits. Now view
the output.

Output for Pass condition:-

Output for Fail condition

You can see that the invalid record from input is transferred to
the "CUST_Fail" table as shown above.

Three more columns "DI_ERRORACTION",


"DI_ERRORCOLUMNS", "DI_ROWID" can also be seen.

Output of the "RuleViolation" condition.

Summary:So in this way Validate transform is useful in validating the


records based on the rules & categorising the bad records into
different target which can be analysed later.
Thanks & Regards,
Rahul More
(Project Lead)

often see look up functions is being used when performing value


mapping. I see there are some disadvantages using the look up
functions over joins

1.Visibility.When you review a job to fix or change the mapping


rule, it hard to identify where the lookup has been used. If lookup
is done using joins its easy for the programmers to locate the
mapping.

2. Handling duplicate data .When there's duplicate in the look up


table, its not safe to use look up function, when do so, it simply
return one of the value.Say you are looking for a new material
type from the look up table, what happens if it contains two
different new material type for an old material type? It returns one
of the new material type based on the return policy you specified
(Max/Min).When a join is used, and if a duplicate is found as
given in the above scenario then both the values will be returned
and it can be identified by looking at the result set.

3. Picking more than one column from the look up table. The
value return by the look up function can be mapped to only one
column. But a join can return more than one column and can be
mapped to more than one column in the same query transform

4. Slower performance. Theres a greater possibility that a join


can be pushed down rather than a look up function used within a
query transform

Why do we need substitution parameters in


excel?
In designer, we see substitution parameters in a grid
view.

Whereas, when we have to export it, we will only have


XML and ATL as option. These are not the straight forward
information for humans to understand.
If there is a wider audience who wants to take a look at
substitution parameters, instead of allowing everyone to
login to designer, You can email them the substitution
parameters in excel file

Idea behind the approach


Plan was to create a utility to export substitution
parameters to CSV file from the repository. VB-Script was
the easy way we could think of, as we were using
Windows machine. Repository databases are hosted

on SQL server.
Idea was to read repository database credentials from
user. Export substitution parameters to XML file through
al_engine.exe, and then convert it to CSV file.

Issues with comma separated values


If there is a comma in SP value, cell value gets to
split and span to multiple columns in excel. Tab separator
was ideal.
o
Date value will automatically undergo format change
in excel upon opening the file. Cell value has been
formatted as text.
o

VB-Script Code:
' Don't worry if you don't understand. Just copy
paste the code in notepad, save it with vbs as
extension and double click
' Or download it from attachment.

Option Explicit
Dim SQLHost, SQLDB, SQLUN, SQLPWD
SQLHost = InputBox ("Enter target SQL
Host,port:", "Export SP to tab delimited text
file","")
SQLDB = InputBox ("Enter target SQL database:",
"Export SP to tab delimited text file","")
SQLUN = InputBox ("Enter target SQL username:",
"Export SP to tab delimited text file","")
SQLPWD = InputBox ("Enter target SQL password:",
"Export SP to tab delimited text file","")
build_and_execute_command
SP_XML_to_CSV "SP.xml", "SP.txt"
Msgbox "Open generated tab delimited text file
SP.txt in Excel." & vbCrLf & "If required,
format it as table with header.",

vbInformation
file"

,"Export SP to tab delimited text

Function build_and_execute_command()
Dim command, objShell, filesys
set
filesys=CreateObject("Scripting.FileSystemObject
")
Set objShell = WScript.CreateObject
("WScript.shell")
command = """%LINK_DIR%\bin\al_engine.exe""
-NMicrosoft_SQL_Server -passphraseATL -z""" &
"SP_error.log"" -U" & SQLUN & " -P" & SQLPWD & "
-S" & SQLHost & " -Q" & SQLDB & " -XX@" & "v" &
"@""" & "SP.xml"""
export_execution_command "%LINK_DIR%\log\",
"SP",command
'objShell.run "%LINK_DIR%\log\" & "SP" &
".bat",0,true
objShell.run "SP.bat",0,true
filesys.DeleteFile "SP.bat", true
if filesys.FileExists("SP_error.log") then
msgbox ("Encountered issue while
exporting SP from repo")
build_and_execute_command = -1
End if
Set filesys = Nothing
End Function
Function export_execution_command(FilePath,
FileName, FileContent)
Dim objFSO, objFile, outFile

Set
objFSO=CreateObject("Scripting.FileSystemObject"
)
'outFile = FilePath & FileName & ".bat"
outFile = FileName & ".bat"
Set objFile =
objFSO.CreateTextFile(outFile,True)
objFile.Write FileContent & vbCrLf
objFile.Close
export_execution_command = 0
End Function
Function SP_XML_to_CSV (xmlFile, csvFile)
Dim ConfigList, SubParamList, objXMLDoc,
Root, Config, SubParam, Matrix(1000,50)
Dim i, j, iMax, jMax, Text, sessionFSO,
OutFile, objShell
Set sessionFSO =
CreateObject("Scripting.FileSystemObject")
Set OutFile
=
sessionFSO.CreateTextFile(csvFile, 1)
Set objShell = WScript.CreateObject
("WScript.shell")
Set objXMLDoc =
CreateObject("Microsoft.XMLDOM")
objXMLDoc.async = False
objXMLDoc.load(xmlFile)
Set ConfigList =
objXMLDoc.documentElement.getElementsByTagName("
SVConfiguration")
i = 1
Matrix(0,0) = "Substitution Parameter"
For Each Config In ConfigList
Set SubParamList =
Config.getElementsByTagName("SubVar")

j = 1
Matrix(0,i) =
Config.getAttribute("name")
For Each SubParam In SubParamList
If i = 1 Then Matrix(j,0) =
SubParam.getAttribute("name")
Matrix(j,i) = "=""" & SubParam.text
& """"
j = j + 1
Next
i = i + 1
Next
iMax=i
jMax=j
For i=0 to jMax-1
Text = ""
For j=0 to iMax-1
Text = Text & Matrix(i,j) & vbTab
Next
OutFile.WriteLine Text
Next
OutFile.Close
End Function

Usage screenshots:

In Excel, open the text file:

Select all the data cells and format it as table

Finally, Data looks like this:

If you don't have access to repository database or


Jobserver, you can export substitution parameters to XML
file manually from designer and use the function
SP_XML_to_CSV from the given VB-Script.

You might also like