Professional Documents
Culture Documents
Operation/Transfo
rm
NORMAL
UPDATE
INSERT
DELETE
I2I
NORMAL
DISCARD
INSERT
DISCARD
U2U
DISCARD
UPDATE
DISCARD
DISCARD
U2I
DISCARD
INSERT
DISCARD
DISCARD
D2U
DISCARD
DISCARD
DISCARD
UPDATE
U2U:
If TableComparision sends an UPDATE, the first action is
to set ValidTo to $CurDateTime since this record is not the
current one anymore. We want to keep FirstName,
LastName as it is in CUSTOMERHIST and do not overwrite
it with the new values from CUSTOMER, so we need to
map FirstName, LastName to
before_image(FirstName),before_image(LastName), resp.
U2I:
If TableComparision sends an UPDATE, the second action
is to insert a new record with the current values from
CUSTOMER. As I2I above, we just need to set ValidFromto
$CurDateTime. Note that we need to change the update
section (middle), not the insert/normal section (left).
D2U:
The repositories themselves are just database schemas -you can put them in any supported database. Check
theProduct Availability Matrix for Data Services to see
which databases are supported. However, I would
recommend for you to group them together within the
same physical database within your specific tier. For
instance:
Datastore mistake 2: Since this gets to be so timeconsuming, many developers realize that they can just
reuse one datastore from dev to test to production. So
you see a datastore named "HANA_TARGET_DEV" or
"BWD" in a production local repository. In this case, the
administrators just explain how they change the
hostname, username, and password of the datastore
1.
Created datastores
After this, when you run or schedule a job, you would see
a drop-down with your 6 different system configuration
names:
DevInit
DevDelta
TestInit
TestDelta
ProdInit
ProdDelta
What if you don't select a system configuration at runtime? Each datastore has a "default" datastore
configuration. Likewise, there is also a default substitution
parameter configuration. If no substitution parameter
configuration is selected, the optimizer selects the default
datastore configuration for each datastore and the default
substitution parameter configuration for that repository.
Legal Disclaimer
35755 Views 29 Comments PermalinkTags: application_lif
ecycle_management, data_integration_and_quality_mana
gement, eim, data_services,version_management, code_e
ncapsulation, version_control, central_repositories, code_r
eusability
There are a few other features, but these were the main
features that caught my eye.
Document Information
Doc SAP Data Services - Running & Scheduling Data Services
ume Jobs from Linux Command Line
nt
Title:
Doc
ume
nt
Purp
ose:
File
Nam
e:
Refe
renc
e
Table of Contents
1.
Introduction.
2.
Using a third-party scheduler
3.
To export a job for scheduling.
4.
Setting up a cron job in UNIX-type operating
systems.
5.
To execute a job with a third-party scheduler
6.
About the job launcher
7.
Job launcher error codes.
1. Introduction
SAP BODS Jobs can be started & Scheduled from other
Operating Systems like Linux, HP-Unix etc in addition to
windows using third party utilities.
This Document provides information on running &
scheduling the SAP BODS jobs from UNIX command
prompt utility (Crontab).
2. Using a third-party scheduler
File name
System
configuration
Description
The name of the batch file or script
containing the job. The third-party
scheduler executes this file. The
Administrator automatically appends the
appropriate extension:
.sh for UNIX
.bat for Windows
Select the system configuration to use
when executing this job. A system
Job Server or
server
group
Enable
auditing
Disable data
validation
statistics
collection
Enable
recovery
Recover from
last failed
Use password
file
Collect
statistics for
optimization
Collect
statistics for
monitoring
Export Data
Generates and exports all specified job
Quality reports reports
to the location specified in the
Management >
Report Server Configuration node. By
default,
the reports are exported to
$LINK_DIR\DataQuality\re
ports\repository\job.
Distribution
level
1.
Click Export.
The Administrator creates command files filename.txt
(the default for filename is the job name) and a batch file
Flag
Value
-w
-t
-s
-C
-v
-S
-R
-xCR
Error message
Network failure.
The service that will run the
schedule has not started.
LINK_DIR is not defined.
The trace message file could
not be created.
The error message file could
not be created.
The GUID could not be
1.
3.
1.
6.
7.
Under Service Configuration, provide a unique name
for the service.
8.
Click Browse Jobs to view a list of all the real-time
jobs available in the repositories that is connected to the
Administrator.
9.
Select a job from the appropriate repository to map
it to the real time service.
10.
Under Service Provider, click on the check box to
select a Job server. Select the appropriate Job server to
control the service provider.
11.
In the Min instances and Max instances fields, enter
a minimum and a maximum number of service providers
that you want this Job Server to control for this
service.
12.
1.
Under the access server name, click on the Real
time services.
2.
o
Once you have logged in you will see the below. In this
example I'm going to start the replication wizard to show
how we can create a simple job to load data from a
source to a target..
Database
SAP Applications
SAP BW Source
3. RIght click in the Local Object Library area & select "New".
Window for creating new datastore will open.
II) Creation of XML Schemas File Format:We are creating xsd file for the following type of xml file.
o
it.
o
format.
o
5. Open the query & do Mapping.
Click on "OK".
Template table can be seen in the dataflow.
Connect Template table to Query.
3. RIght click in the Local Object Library area & select "New"
Window for creating new datastore will open.
1.
Click "OK".
Now the excel workbook will be created & can be seen in the
Local Object Library.
1.
2.
Create a project.
Create a Batch Job.
Right click on the project & click on "New Batch job".
Give appropriate name to the job.
3. Add a dataflow into the job.
Select the job, drag dataflow from palette into it & name it.
Select all the fields on LHS, right click & select "Map
to Output".
Click on "OK".
Template table can be seen in the dataflow.
Connect Template table to Query.
Anyone that has worked with Data Services will know that
you can only see the results of your transformations by
running or debugging the dataflow. So while designing it
is not possible to see what the result is.
So below is an example of a simple dataflow with a query
transform.
Target_Count
66
66
2.
3.
Plan 1:
A script is placed after the Dataflow on which Audit
functionality is implemented. An insert statement is
written in the script to insert the value in the Audit Label
to a database table. However an error message is
generated because the Audit label is not valid outside the
Dataflow.
Plan 2 :
You can now add the replication job, you will see that on
the left window the new table added does not have a
green tick, this means it is not being used.
You can drag the table from the left window to the right
window.
The job will then execute. You will then also be able to
view load status by table.
When you then view the data you will notice that the
table loaded all columns plus the additional one added.
Once you have done that you will have a project explorer
view. Note that the look and feel of all new tools are being
delivered in the eclipse base look and feel shell. Similar to
Information Design Tool and Design Studio (AKA ZEN).
Customer
Key
1001
Name
State
Christina
Illinois
Christina
California
Advantages:
This is the easiest way to handle the Slowly Changing
Dimension problem, since there is no need to keep track
of the old information
Disadvantages:
All history is lost. By applying this methodology, it is not
possible to trace back in history. For example, in this
Christina
Illinois
Christina
Illinois
1005
Christina
California
Advantages:
This allows us to accurately keep all historical
information.
Disadvantages:
This will cause the size of the table to grow fast. In cases
where the number of rows for the table is very high to
start with, storage and performance can become a
concern.
This necessarily complicates the ETL process.
Slowly Changing Dimension Type 3(SCD Type3)
Christina
Illinois
Name
Christi
1001
na
Origin
Curren
Effect
al State
t State
ve Date
Illinois
Californ
ia
Advantages:
This does not increase the size of the table, since new
information is updated.
This allows us to keep some part of history.
Disadvantages:
Type 3 will not be able to keep all history where an
attribute is changed more than once. For example, if
Christina later moves to Texas on December 15, 2003,
the California information will be lost.
1547 Views 7 Comments Permalink
15-JAN
2003
Error
passing
data to
Error port
Error
reason
Correct
the
error
and
Execute
RSEOUT
00
program
again
Solution
to error
12
25
26
Outbound
IDoc
successfu
Succ lly sent to None
ess
port
, 32
within
control
informati
on on EDI
subsyste
Error m
during
translatio
Error n
Succ Dispatch
ess
OK
Processin
g
outbound
IDoc
despite
Succ syntax
ess
errors
Error during
syntax
check of
Change
d from
status
03 by
BD75
transac
tion
(see
below)
29
30
31
32
Error
Succ
ess
Error
Succ
ess
outbound
IDoc
ALE
service
(for
29,
example 31
Outbound
IDoc
ready for
dispatch
(ALE
service) 3
no further
processin
g
Outbound
IDoc was
edited
segme
nt for
exampl
e
force it
to be
process
ed
Partner
profile
customi
zed to
not run
There
was a
manual
update
of the
IDoc in
SAP
tables,
the
original
was
saved
to a
new
IDoc
with
Execute
RSEOUT
00
program
status
33
33
35
37
Original
of an
IDoc
which
was
edited. It
is not
possible
Succ to post
ess
this IDoc None None
IDoc
reloaded
from
archive.
Can't be
Succ processe
ess
d
Erroneou
s control
record
(for
example,
"referenc
e" field
should be
blank for
outbound None
Error IDocs)
, 37
Backup
of
another
IDoc
manuall
y
updated
, see
status
32
42
50
51
53
56
Outbound
IDoc
manually
created
Succ by WE19
ess
test tool 1
Inbound
Succ IDoc
ess
created
64
inbound
IDoc data
contains
Error errors
inbound
Succ IDoc
ess
posted
Error IDoc with
errors
added
(You
should
53,
64
None
, 53
50,
51,
56,
62,
68
37
65
Error
triggere
d by
SAP
applicati
on,
incorrec
t values
in the
51, 66, IDoc
68, 69 data
Ask
function
al
people,
modify
erroneou
s values
in the
IDoc
(WE02
for
example
) and run
it again
using
BD87
60
Error
61
Error
62
Succ
ess
63
Error
64
65
never see
this error
code)
syntax
check of
inbound
IDoc
Processin
g inbound
IDoc
despite
syntax
error
inbound
IDoc
passed to
applicatio
n
passing
IDoc to
applicatio
n
56,
61,
62
64
53
Inbound
IDoc
ready to
be
passed to
Succ applicatio
ess
n
62
Error ALE
64,
service - 65
51
51, 60,
63, 68,
69
Execute
BD20
transacti
on
(RBDAPP
01
program
)
66
68
incorrect
partner
profiles
Waiting
for
predeces
sor IDoc
Waiti (Serializat
ng
ion)
no further
Succ processin
ess
g
68
51
None
The
IDoc
was
created
using
inbound
test tool
(WE19)
and
written
to file to
do file
inbound
test.
Another
IDoc is
created
if
immedi
ate
processi
ng is
chosen
69
70
71
There
was a
manual
update
of the
IDoc in
SAP
tables,
the
original
was
saved to
a new
IDoc
with
Succ IDoc was
51, 68, status
ess
edited
64
69
70
Original
Backup
of an
of
IDoc
another
which
IDoc
was
manuall
edited. It
y
is not
updated
possible
, see
Succ to post
status
ess
this IDoc None None
69
Succ Inbound
ess
IDoc
reloaded
from
archive.
Can't be
processe
74
d
Inbound
IDoc
manually
created
Succ by WE19 50,
ess
test tool 56
Thanks,
Mayank Mehta
1438 Views 0 Comments Permalink
DS : Things In & Out
Posted by Rishabh Awasthi 17-Sep-2013
Just to have fun with DS..
If you want to stop staring at the trace log and wait for it to
show "JOB COMPLETED SUCCESSFULLY".
You can use an easy aid in DS..
Go to Tools --> Options...
Then break out Designer and click on General.
Then click the box that says: Show Dialog when job is
completed.
Njjoy...
603 Views 7 Comments Permalink Tags: performance, eim
, data_services, tuning, data_integrator
Missing internal datastore in Designer
Posted by Karol Frankiewicz 05-Sep-2013
I will show how to make the internal datastores visiable,
as in note 1618486 we dont give the method to make
the internal datastores, the method
is as follows:
You need to add a String DisplayDIInternalJobs=TRUE in
DSConfig.txt under the [string] tab like the screenshot:
Click on Export.
Two Files then will get generated and placed in the Unix
Box. **
One .TXT file named as Reponame.Txt in
/proj/sap/SBOP_INFO_PLAT_SVCS_40_LNX64/dataservices/
conf
One .sh file named as jobname.sh in
/proj/sap/SBOP_INFO_PLAT_SVCS_40_LNX64/dataservices/l
og
**Location will be changed according to the setup
to be inserted to the Target_table, hence the OperationCode "insert" will be associated with these rows coming
from the Source_table.
-The rows found in both tables, but are updated in the
Source_Table : These rows need to be updated in the
Target_table, hence the Operation-Code "update" will be
associated with these rows coming from the Source_table.
-The rows which are there in Target_Table and deleted
from Source_Table after the last run. These rows need to
deleted from the Target_table (although we hardly
perform deletion in a datawarehouse), hence the
Operation-Code "delete" will be associated with each row
of this kind.
-The rows which are there in both of the tables. These
rows ideally doesn't need any operation , hence the
Operation-Code "normal" is associated with such rows.
Well, how to perform this comparison of Source
and Target Tables? : This can be done by the
Table_Comparison transform. It compares the input table
(Source_table in our example) with the another table
called as comparison_table in BODS jargon (Target_table
in our recent example). and after comparing each row it
associates an Operation-Code to each row of the input
table. and if we choose , it also detects the rows which
were deleted from input table (to choose whether we
need to perform deletion on Target_table or not). But we
are not going in the details of Table_Comparison
transform here, as i was going to play with the
map_operation transform alone, and i know it looks crazy
to do so. Because , like in the figure given below, if i
So, let us see what will happen if we use one by one the
provided options in the drop-down menu given for the
"Output row type". The interesting ones are "update" and
"delete". let us see why:
Right click on the User List entry, select New > New User and
specify the required details.
Select the Advanced tab and then the Add/Remove Rights link.
Select the group from the Group List in the Available groups
panel and select the > button to move it to the Destination
Group(s) panel and hit OK.
The "User Security" dialog box appears and displays the access
control list for the repository. The access control list specifies the
users and groups that are granted or denied rights to the
repository.
Select the > button to move it from the Available Access Levels
to the Assigned Access Levels panel. And hit OK.
Dont make the list too long. The logon screen is not resizable.
And scrolling down may become very tedious!
Return to the "User Security" dialog box that displays the access
control list for the repository. Select the User, then the Assign
Security button.
In the Assign Security dialog box, select the Advanced tab and
then the Add/Remove Rights link.
What
is
the
source
Import Configuration
Export Configuration
FTP
Shared Directory
Step2:
Use the below script to check whether the respective .bat
file exist in the below path.
exec('cmd','dir "D:\\Program Files (x86)\SAP
BusinessObjects\Data
Services\Common\log\"*>\\D:\\file.txt');
Step3:
Attachments:2
Added by Vicky Bolster, last edited by Vicky Bolster on Mar 01,
2012 (view change)
show comment
Go to start of metadata
Summary
Whenever there are two options the question asked is "What is
better?". So in what cases is Autocorrect Load preferred over
Table Comparison?
So there are cases where you have no choice other than using
Table Comparison:
Your target table does have a surrogate key. In this case you
would use Table Comparison Transform and Key Generation.
Attachments:4
Added by Vicky Bolster, last edited by Vicky Bolster on Mar
01, 2012 (view change)
show comment
Go to start of metadata
Biggest advantage of autocorrect load is the full pushdown similar
to an insert..select. To demo that, let us have a look at the
dataflow attached. We read the table CUSTOMER_MASTER,
query does just a 1:1 mapping of most columns and load the
target into the target table.
Attachments:16
Added by Vicky Bolster, last edited by Vicky Bolster on Feb
29, 2012 (view change)
show comment
Go to start of metadata
Summary
Imagine a case where one transform is doing most of the
processing. In DI, one transform (if not merged with others or
even pushed down) is always one CPU thread. So at the end, in
such a case the entire multiprocessor server is idle except for the
one CPU. Although this is a rather unlikely case as modern CPUs
are times faster than the database can actually read or write,
especially for high end servers, we need a solution and it is called
Degree of Parallelism, Number of Loaders and Partitioning, the
latter discussed in the sub-chapter DoP and Partitions.
With those flags the optimizer is able to split transforms into
multiple instances and merge the data later, if required.
Take this dataflow: It consists of a Query calling a custom script
function mysleep() and a map operation. Obviously the query will
take a long time to process each row - it does a sleep to simulate
the point - and we want to increase the throughput somehow.
You can get exactly the same kind of processing if you take the
original dataflow with the single query and set the dataflow
property "Degree of Parallelism" (short: DoP) to 4.
When executing, you can get a hint of what the optimizer is doing
by looking at the thread names of the monitor log.
We have a round robin split that acts as our case transform, the
Query and three additional AlViews being the generated copies of
the query, the Map_Operation plus three more and the final
merge. (Since 11.7 the threadnames are even more meaningful)
But actually, why do we merge all the data at the end? We could
also have four instances of the target table. Like in this diagram:
To get to the same result with DOP, the source table has to be
partitioned according to the query where clause. If that table is
partitioned in the database, DI imports the partition information
already and would read in parallel if the reader flag "enable
partitioning" is turned on. If this is a plain table, you can create the
partitions manually via the table properties.
in sorted mode is an example for that, one that will get addressed
soon.
When you have many transforms in a dataflow and execute that
with DOP and some transforms are singularization points, some
are not you will find lots of round robin splits and merges. The
logic of DOP is to execute everything as parallel as possible - so
add splits after each singularization point again. And only if two
transforms that have singularization points follow each other, then
they are not split. On the other hand, we have seen already that
Case transform and Merge effectively process millions of rows per
second.
The fun starts if the partitions, number of loaders and DOP do not
match. Then you will find even more splits and merges after the
readers and upfront the loaders to re-balance the data. So this
should be avoided as much as possible just to minimize the
overhead. But actually, it is no real problem unless you are on a
very very big server.
n the previous chapter we said the source table has to be
partitioned in order to allow for parallel reading. So either the table
is partitioned already or we edit the table via the object library and
maintain partition information ourselves. The result will be similar
but not identical.
If the table is physically partitioned, each reader will add a clause
to read just the partition of the given name. In Oracle that would
look like "select * from table partition (partitonname)". That has
three effects. First, the database will read the data of just the one
partition. No access the other partitions. Second, that does work
with hash partitions as well. And third, if the partition information is
not current in the DI repo, e.g. somebody did add another partition
and had not re-imported that table, DI would not read data from
that this new partition. And to make it worse, DI would not even
know it did not read the entire table although it should. In order to
minimize the impact, the engine does check if the partition
information is still current and raise a warning(!) if it is not.
Another problem with physical partitions is, the data might not be
distributed equally. Imagine a table that is partitioned by year. If
you read the entire table, it will be more or less be equal row
numbers in each partition. But what if I am interested in last years
data only? So I have 10 readers, one per partitions and each
reader will have the where clause YEAR >= 2007. Nine of them
will not return much data, hmm? In that case it would be a good
idea to delete the partition information of that table in the
repository and add another, e.g. partition by sales region or
whatever.
Something that is not possible yet, is having two kinds of
partitions. In above example you might have an initial load that
reads all years and a delta load where you read just the changed
data and most of them are in the current year obviously. So for
the initial load using the physical partition information would make
sense, for the delta the manual partition. That cannot be done yet
with DI. On the other hand, a delta load deal with less volumes
anyway, so one can hope that parallel reading is not that
important, just the transformations like Table Comparison should
be parallelized. So the deltaload dataflow would have DoP set but
not the enable-partitions in the reader.
Manual partitions do have an impact as well. Each reader will
have to read distinct data, so each one will have a where clause
according to the partition information. In worst case, each reader
will read the entire table to find the rows matching the where
condition. So for ten partitions we created manually, we will have
ten readers each scanning the entire source table. Even if there is
an index on the column we used as manual partition, the
database optimizer might find that reading index plus table would
take longer than just scanning quickly through the table. This is
something to be very careful with. In the perfect world, the source
table would be partitioned by one clause we use for the initial load
and subpartitioned by another clause, one we can use as manual
partition for the delta load. And to deal with the two partitions
independently, the delta load is done reading from a database
view instead of the table so we have two objects in the object
library, each with its one partitioning scheme.
As said, DoP is used whenever the database throughput is so
high, one transform of DI cannot provide the data fast enough. In
some cases that would be more than a million rows per second if
just simple queries are used, with other transforms like Table
Comparison in row-by-row mode it is just in the 10'000 rows per
second area.
But normally you will find the table loader to be the bottleneck with
all the overhead for the database. Parse the SQL, find empty
space to write the row, evaluate constraints, save the data in the
redo log, copy the old database block to the rollback segment so
other select queries can still find out the old values if the were
started before the insert,... So when we aim for high performance
loading, very quickly you have no choice other than using the API
bulkloaders. They bypass all the SQL overhead, redo log,
everything and write into the database file directly instead. For
that, the table needs to be locked for writing. And how do you
support number of loaders if the table is locked by the first loader
already? You can't. The only option for that is to use API
bulkloaders loading multiple physical tables in parallel, and that
would be loading partitioned tables. Each API bulkloader will load
one partition of the table only and hence lock the partition
exclusively for writing, but not the entire table. The impact for the
DI dataflow is, as soon as the enable partitions on the loader is
checked, the optimizer has to redesign the dataflow to make sure
each loader gets the data of its partition only.
Each stream of data has a Case transform that routes the data
according to the target table partition information into one loader
instance. This target table partition obviously has to be a physical
partition and it has to be current or the API bulkloader will raise an
error saying that this physical partition does not exist.
Using the enable partitions on the loader is useful for API
bulkloaders only. If regular inert statements are to be created, the
number of loaders parameter is probably the better choice
Attachments:13
Added by Robbie Young, last edited by Robbie Young on Mar 01,
2012 (view change)
show comment
Go to start of metadata
The goal of a slow changing dimension of type two is to keep the
old versions of records and just insert the new ones.
Like in this example, the three input rows are compared with the
current values in the target and for CUSTOMER_ID = 2001 the
city did change to Los Angeles. Therefore, in the target table we
have two rows for this customer, one with the old city name which is not current anymore (CURRENT_IND = N) and has a
VALID_TO date of today - plus the new row with current data as
start date.
All of this is done using common transforms like Query etc. They
all have specific tasks and each collects the information required
for the downstream objects.
In addition, at this point we add a default VALID_FROM date which shall be mapped to sysdate().
The Table Comparison now compares the input dataset with the
current values of the compare (=target) table based on the key
specified as "input primary key columns" list (CUSTOMER_ID).
This primary key list is important as it tells the transform what row
we want to compare with. Of course, in most cases it will be the
primary key column of the source table but by manually specifying
it we just have more options. But it will be the primary key of the
source table, not the target table's primary key. Keep in mind, one
CUSTOMER_ID will have multiple versions in the target! (In case
you ask yourself why the transform is grayed out: the flow was
debugged at the time the screenshot was made)
With this primary key we can identify the target table row we want
to compare with. But actually, in the target table we just said there
can be multiple rows. Which one should we compare with?? That
is easy, with the latest version. And how can we identify the latest
version? We could use the CURRENT_IND = 'Y' information or
the VALID_TO = '9000.12.31' date. But neither do we not know
the column names storing this information nor the values. And
who said that those columns have to exist! We can use another
trick: As the generated key is truly ascending, we know that the
higher the key value is, the more current it will be. And this is what
table comparison does, it reads all rows for our "input primary
key" with an additional order by on the column identified as
generated key descending, so it will get the most current record
first.
Next is the compare step. All columns from the input schema are
compared with the current values of the target table. If anything
changed here, the row will be sent to the output with the OP code
Update and the current values of the table will be in the before
image, the new values of the input in the after image of the
update row. If the row is entirly new it will be an Insert row and if
nothing did change, it will get discarded.
In our example, there is always at least one change: the
FROM_DATE. In the upstream query we did set that do sysdate!
To deal with that, the Table Comparison transform has an
additional column list for the columns to be compared. There, we
pulled into all columns except the FROM_DATE. Hence, it will be
ignored in the comparison and the row will be discarded if
everything else is still current.
Also, watch at the output structure of the Table Comparison: It is
the compare table schema. The logic is, the transform is
performing the lookup against the compare table and copies all
values into the before image and after image buffer of this row.
Then the input columns overwrite the after image values.
Therefore, columns like the KEY_ID that do not yet exist in the
input schema will contain the compare table value.
The next transform in the row is History Preserving. In the most
simple case, all this transform does is sending insert rows to the
output as is, and for update rows, change the the OP code to
insert as well. This way, records that did not change at all will be
filtered away by the Table Comparison transform, new records are
added and changed records are added as a new version as well.
However, the transform does have more options.
And finally, the target table does have a primary key defined,
therefore the table loader will generate an update .... where
KEY_ID = :before_image_value for updates, insert rows are just
inserted.
The performance of that entire dataflow is the same as for Table
Comparison in its respective mode as this transform has the most
overhead. It does lookup the row in the table or inside the cache.
The other transforms are purely executed inside the engine, just
checking if something changed. The table loader will be slower as
before too, simply because it will have more rows to process
- insert new version - update old version. On the other hand, in
many cases Table Comparison will find that no change occurred
at all, so the loader has to process less rows...
One thing the transforms do not support are separate columns for
insert, last_update. The information is there, the valid_from date
of the oldest version is the insert_date, for all other versions it is
the update date. However you cannot have this information in
each column. If you need that, you likely will have to use
database triggers to fill the additional columns.
Before we go for third step lets create a Job and see what
will happen without using a database link when we use
the tables from these datastores in a dataflow. Will it
perform full pushdown?
Step 3:
Follow the below screen shot to create your Project, Job
and Dataflow in Designer.
Now you have source table coming from one database i.e.
DB_Source and Target table is stored into another
database i.e. DB_Target. Lets see if the dataflow is
performing full pushdown or not.
How to see whether full pushdown is happening or
not?
Go to Validation Tab in your designer and select Display
Optimized SQL. Option. Below is the screen shot for the
same.
http://2.bp.blogspot.com/EbP7mLrxp4U/UdkiBhd1iVI/AAAAAAAABxM/VruWWofGQ8
8/s1600/6.png
You can see that SQL has insert command now which
means full pushdown is happening for your dataflow.
This is the way we can create a database link for SQL
Server in DS and use more than one databases in a Job
and still perform full pushdown operations.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Make sure that the dataflow has not been modified after
it has last been saved to the repository. If the dataflow is
modified, it must be saved before displaying the
generated SQL. The Optimized SQL popup window will
always show the code corresponding to the saved version
and not to the one displayed in DS Designer.
This functionality is commonly called full SQLPushdown. Without any doubt, a full pushdown often
gives best performance, because the generated code will
completely bypass any operations to DS memory. As a
matter of fact that constitutes the best possible
application of the main principle to let the database do
the hard work!
1.
Pushdown_sql function
where
2.
4.
Avoid auto-joins
5.
6.
Substitution
Global Variables
Parameters
Defined
at
Repository
Defined at Job Level
Level
Can not be shared across Available to all Jobs in a
Jobs
repository
Data-Type specific
No data type (all strings)
Fixed value set prior to
Value can change during execution
of
Job
job execution
(constants)
How to define the Substitution Parameters?
Open the Substitution Parameter Editor from the
Designer by selecting
Tools > Substitution Parameter Configurations....
You can either add another substitution parameter in
existing configuration or you may add a new
configuration by clicking the Create New Substitution
Parameter Configuration icon in the toolbar.
The name prefix is two dollar signs $$ (global variables
are prefixed with one dollar sign). When
adding new substitution parameters in the Substitution
Parameter Editor, the editor automatically
adds the prefix.
The maximum length of a name is 64 characters.
In the following example, the substitution parameter $
$SourceFilesPath has the value D:/Data/Staging in the
configuration named Dev_Subst_Param_Conf and the
value C:/data/staging in the Quality_Subst_Param_Conf
configuration.
first
been
1.
Both the Pre Load Commands tab and the Post Load Commands
tab contain a SQL Commands box and a Value box. The SQL
Commands box contains command lines. To edit/write a line,
select the line in the SQL Commands box. The text for the SQL
command appears in the Value box. Edit the text in that box.
To add a new line, determine the desired position for the new line,
select the existing line immediately before or after the desired
position, right-click, and choose Insert Before to insert a new line
before the selected line, or choose Insert After to insert a new line
after the selected line. Finally, type the SQL command in the
Value box. You can include variables and parameters in pre-load
or post-load SQL statements. Put the variables and parameters in
either brackets, braces, or quotes.
Save and execute. The job will execute Pre-Load, Transform and
Post-Load in a sequence.
Step 4: Open the Catch block and Drag one script inside
Catch Block and name it as shown in below diagram.
Designer >Graphics>
Refer the screen shot below. Using this option you change the
line
type
as
per
your
likes.
I
personally
likeHorizontal/Vertical as all transforms looks more clean
inside the dataflow. You can also change the color scheme,
background etc.
Do feel free to add to this list if you have come across more cool
stuffs
in
BODS.
1325 Views 0 Comments Permalink Tags: bods_options, bo
ds_environment_settings
Quick Tips for Job Performance Optimization in BODS
Posted by Mohammad Shahanshah Ansari 15-Mar-2014
Parallel
Execution
of
Dataflows
or
WorkFlows: Ensure that workflows and dataflows are not
executing in sequence unnecessarily. Make it parallel
execution wherever possible.
Step 2:
Cancel the above Job execution and Click on
the Tool menu bar as shown below and select System
Configurations.
Step 3: You can see the below dialog box now. Click on
the icon (red circle) as shown in the below dialog box
toCreate New Configuration. This dialog box will
show all the data stores available in your repository.
.
IF SY-SUBRC <> 0.
* Implement suitable error handling here
ENDIF.
wa-flag = i_flag.
insert zlk_date from wa.
if sy-subrc ne 0.
update zlk_date from wa.
endif.
e_status = 'S'.
endif.
ENDFUNCTION.
4) Remember to set the attribute of the FM to RFC
enabled, otherwise it will not be accessible from Data
Services.
In this we have two tabular structure one to hold the header part
and second to hold the category part. So when we define the xml
structure in the BODS we need to create two schema to hold the
Header tabular information and Category tabular information.And
these schema will hold the records that need to be populated in
the target.So for our sample scenario the xml structure will be as
follows
An XSL style sheet consists of one or more set of rules that are
called templates.A template contains rules to apply when a
specified node is matched.
The <xsl:template> element is used to build templates.The match
attribute is used to associate a template with an XML element.
(match="/" defines the whole document. i.e.
The match="/" attribute associates the template with the root
of the XML source document.)
After building the xsl file we need to place that file in the target
folder where BODS will be building the target file. And we also
need to alter the XML header in the target XML structure inside
the job. Default Header defined in the XML header will be <?xml
version="1.0" encoding = "UTF-8" ?> we need to change that
to<?xml version="1.0" encoding = "UTF-8" ?><?xml-stylesheet
type="text/xsl" href="<xsl_fileName>"?>
Target xml generated after the execution of the job can be opened
with Excel. where you will promted with option to open the xml
after applying the stylesheet. And in that we need select our
stylesheet to get the output in the desired Excel format.
Note: Both the XSL file and the xml target file should be available
in the same folder for getting the desired output.
Attaching the sample xsl and xml file for reference
In below window click on
parameter.
10) Mapping.
3.
In the Below second query transform to nest the data. Select the
complete Query from schema IN and import under the
Query of schema out
Go to the Second Query again and make the Query name same
as in the XML schema(Query_nt_1).
Note: If we do not change the Query name it give a ERROR
The Below image show the creation of the Real time job.
Example:
This job will load data from Flat file to Temporary Table. (I
am repeating the same to raise Primary Key exception)
Recovery Unit:
With recovery, job will always starts at failed DF in
recovery run irrespective of the dependent actions.
Example: Workflow WF_RECOVERY_UNIT has two
Dataflows loading data from Flat file. If any of the DF
failed, then both the DFs have to run again.
To achieve, This kind of requirement, we can define all the
Activities and make that as recovery unit. When we run
the job in recovery mode, if any of the activity is failed,
then it starts from beginning.
To make a workflow as recovery unit, Check recovery Unit
option under workflow properties.
15
SALES_PERSON_
SALES_PERSON_
KEY
ID
00120
NAME
Doe, John B
SALES_PERSON_
SALES_PERSON_
KEY
ID
15
00120
NAME
Smith, John B
Add Try and "Script" controls from the pallet and drag to
the work area
Add DataFlow.
ifthenelse(Query_LOOKUP_PRODUCT_TIM.LKP_PROD_KEY
is null, 'INS', 'UP')
Thanks
Venky
840 Views Permalink Tags: scd, scdtype, nohistorypreserv
ation
1. import imp
2. mymodule = imp.load_source('mymodule', '/path/to/mymodul
e.py')
3. mymodule.myfunction()
1. def test_wrapper():
2.
3.
4.
5.
import fakeBODS
Collection = fakeBODS.FLDataCollection('csv_dump/tmet
a.csv')
DataManager = fakeBODS.FLDataManager()
RunValidations(DataManager, Collection, 'validationFuncti
ons.py', 'Lookups/')
Limitations of UDT
There are some disappointing limitations that I have
come across that you should be aware of before setting
off:
Going forward
With the rise of large amounts of unstructured data and
the non-trivial data manipulations that come with it, I
believe that every Data analyst/scientist should have a
go-to language in their back pocket. As a trained
physicist with a background in C/C++ (ROOT) I found
Python incredibly easy to master and put it forward as
one to consider first.
about me...
This is my first post on SCN. I am new to SAP and have a
fresh perspective of the products and look forward to
contributing on this topic if there is interest. When I get
the chance I plan to blog about the use of Vim for a data
analyst and the manipulation of data structures using
Python.
Select the user or group you want to authorise and select "Assign
Security":
2.
g)
Fields where changes are made are circled with the red marks
as seen in the above figure.
9) Validate & Execute the job.
10) 3 new records got added in the target table as shown
below.
You can see that new entry for updated record is made in the
target table along with the 'Y' flag & new END_DATE as
'9000.12.31'
& the flag of the original records are changed to 'N'.
Summary:-
Att
747
ABC
UVW
DEF
XYZ
JKL
777
GHI
737
Timestamp
2012.11.11 04:17:30
2014.09.30 17:45:54
2014.04.16 17:45:23
2014.08.17 16:16:27
2014.08.25 18:15:45
2012.04.30 04:00:00
2014.07.15 12:45:12
2013.06.08 23:11:26
2010.12.06 06:43:52
Att
ABC
DEF
GHI
JKL
XYZ
UVW
777
747
737
Timestamp
2014.09.30 17:45:54
2014.08.17 16:16:27
2013.06.08 23:11:26
2012.04.30 04:00:00
2014.08.25 18:15:45
2014.04.16 17:45:23
2014.07.15 12:45:12
2012.11.11 04:17:30
2010.12.06 06:43:52
Att
ABC
DEF
GHI
JKL
XYZ
UVW
777
747
737
Timestamp
2014.09.30 17:45:54
2014.08.17 16:16:27
2013.06.08 23:11:26
2012.04.30 04:00:00
2014.08.25 18:15:45
2014.04.16 17:45:23
2014.07.15 12:45:12
2012.11.11 04:17:30
2010.12.06 06:43:52
Seqno
1
2
3
4
1
2
1
2
3
Key Att
Timestamp
03
777 2014.07.15 12:45:12
If you uncheck bulk loading of the target table, youll
notice that the full sql (read and write) will be pushed to
the underlying database. And your job will run so much
faster!
Note: This second approach produces correct results only
if there are no duplicate most recent timestamps within a
given primary key.
Validation transform is used to filter or replace the
source dataset based on criteria or validation rules to
produce desired output dataset.
It enables to create validation rules on the input dataset,
and generate the output based on whether they have
passed or failed the validation
condition.
In this Scenario we are validating the data from the database
table with correct format of the zip code.
If the zip code is less than 5 digit then we will filter that data &
pass it to another table.
The Validation transform can generate three output
dataset Pass, Fail, and RuleViolation.
1.
3.
Click Add & fill the details about the rule as follows.
Action on Fail:1) Send to Fail:- on failure of the rule the record will
sent to another target with "Fail" records.
2) Send to Pass:- even on failure pass the record to the
normal target
3) Send to Both:- sends to both the targets.
Column Validation:Select the column to be validated, then decide the
condition.
We have selected "Match Pattern" as the condition
pattern as '99999'.
So it will check whether Zip code is of 5 digits or not.
Press OK. Then you can see the entry get added as
follows.
3) Add a Target table to the dataflow & link the Validate Transform
to it.
Input:-
You can see that the invalid record from input is transferred to
the "CUST_Fail" table as shown above.
3. Picking more than one column from the look up table. The
value return by the look up function can be mapped to only one
column. But a join can return more than one column and can be
mapped to more than one column in the same query transform
on SQL server.
Idea was to read repository database credentials from
user. Export substitution parameters to XML file through
al_engine.exe, and then convert it to CSV file.
VB-Script Code:
' Don't worry if you don't understand. Just copy
paste the code in notepad, save it with vbs as
extension and double click
' Or download it from attachment.
Option Explicit
Dim SQLHost, SQLDB, SQLUN, SQLPWD
SQLHost = InputBox ("Enter target SQL
Host,port:", "Export SP to tab delimited text
file","")
SQLDB = InputBox ("Enter target SQL database:",
"Export SP to tab delimited text file","")
SQLUN = InputBox ("Enter target SQL username:",
"Export SP to tab delimited text file","")
SQLPWD = InputBox ("Enter target SQL password:",
"Export SP to tab delimited text file","")
build_and_execute_command
SP_XML_to_CSV "SP.xml", "SP.txt"
Msgbox "Open generated tab delimited text file
SP.txt in Excel." & vbCrLf & "If required,
format it as table with header.",
vbInformation
file"
Function build_and_execute_command()
Dim command, objShell, filesys
set
filesys=CreateObject("Scripting.FileSystemObject
")
Set objShell = WScript.CreateObject
("WScript.shell")
command = """%LINK_DIR%\bin\al_engine.exe""
-NMicrosoft_SQL_Server -passphraseATL -z""" &
"SP_error.log"" -U" & SQLUN & " -P" & SQLPWD & "
-S" & SQLHost & " -Q" & SQLDB & " -XX@" & "v" &
"@""" & "SP.xml"""
export_execution_command "%LINK_DIR%\log\",
"SP",command
'objShell.run "%LINK_DIR%\log\" & "SP" &
".bat",0,true
objShell.run "SP.bat",0,true
filesys.DeleteFile "SP.bat", true
if filesys.FileExists("SP_error.log") then
msgbox ("Encountered issue while
exporting SP from repo")
build_and_execute_command = -1
End if
Set filesys = Nothing
End Function
Function export_execution_command(FilePath,
FileName, FileContent)
Dim objFSO, objFile, outFile
Set
objFSO=CreateObject("Scripting.FileSystemObject"
)
'outFile = FilePath & FileName & ".bat"
outFile = FileName & ".bat"
Set objFile =
objFSO.CreateTextFile(outFile,True)
objFile.Write FileContent & vbCrLf
objFile.Close
export_execution_command = 0
End Function
Function SP_XML_to_CSV (xmlFile, csvFile)
Dim ConfigList, SubParamList, objXMLDoc,
Root, Config, SubParam, Matrix(1000,50)
Dim i, j, iMax, jMax, Text, sessionFSO,
OutFile, objShell
Set sessionFSO =
CreateObject("Scripting.FileSystemObject")
Set OutFile
=
sessionFSO.CreateTextFile(csvFile, 1)
Set objShell = WScript.CreateObject
("WScript.shell")
Set objXMLDoc =
CreateObject("Microsoft.XMLDOM")
objXMLDoc.async = False
objXMLDoc.load(xmlFile)
Set ConfigList =
objXMLDoc.documentElement.getElementsByTagName("
SVConfiguration")
i = 1
Matrix(0,0) = "Substitution Parameter"
For Each Config In ConfigList
Set SubParamList =
Config.getElementsByTagName("SubVar")
j = 1
Matrix(0,i) =
Config.getAttribute("name")
For Each SubParam In SubParamList
If i = 1 Then Matrix(j,0) =
SubParam.getAttribute("name")
Matrix(j,i) = "=""" & SubParam.text
& """"
j = j + 1
Next
i = i + 1
Next
iMax=i
jMax=j
For i=0 to jMax-1
Text = ""
For j=0 to iMax-1
Text = Text & Matrix(i,j) & vbTab
Next
OutFile.WriteLine Text
Next
OutFile.Close
End Function
Usage screenshots: