Ebook850 pages5 hours

SAP Data Services 4.x Cookbook

Name: SAP Data Services 4.x Cookbook
Author: Shomnikov Ivan
ISBN: 9781782176572

By Shomnikov Ivan

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Delve into the SAP Data Services environment to efficiently prepare, implement, and develop ETL processes

About This Book

- Install and configure the SAP Data Services environment
- Develop ETL techniques in the Data Services environment
- Implement real-life examples of Data Services uses through step-by-step instructions to perform specific ETL development tasks

Who This Book Is For

This book is for IT technical engineers who want to get familiar with the EIM solutions provided by SAP for ETL development and data quality management. The book requires familiarity with basic programming concepts and basic knowledge of the SQL language.

What You Will Learn

- Install, configure, and administer the SAP Data Services components
- Run through the ETL design basics
- Maximize the performance of your ETL with the advanced patterns in Data Services
- Extract methods from various databases and systems
- Get familiar with the transformation methods available in SAP Data Services
- Load methods into various databases and systems
- Code with the Data Services scripting language
- Validate and cleanse your data, applying the Data quality methods of the Information Steward

In Detail

Want to cost effectively deliver trusted information to all of your crucial business functions? SAP Data Services delivers one enterprise-class solution for data integration, data quality, data profiling, and text data processing. It boosts productivity with a single solution for data quality and data integration. SAP Data Services also enables you to move, improve, govern, and unlock big data.
This book will lead you through the SAP Data Services environment to efficiently develop ETL processes. To begin with, you’ll learn to install, configure, and prepare the ETL development environment. You will get familiarized with the concepts of developing ETL processes with SAP Data Services. Starting from smallest unit of work- the data flow, the chapters will lead you to the highest organizational unit—the Data Services job, revealing the advanced techniques of ETL design.
You will learn to import XML files by creating and implementing real-time jobs. It will then guide you through the ETL development patterns that enable the most effective performance when extracting, transforming, and loading data. You will also find out how to create validation functions and transforms.
Finally, the book will show you the benefits of data quality management with the help of another SAP solution—Information Steward.

Style and approach

This book is an easy-to-follow guide with step-by-step instructions to perform specific ETL development tasks.

Skip carousel

LanguageEnglish

PublisherPackt Publishing

Release dateDec 3, 2015

ISBN9781782176572

Author

Shomnikov Ivan

Related authors

Skip carousel

Related to SAP Data Services 4.x Cookbook

Related ebooks

Skip carousel

SAP BusinessObjects Dashboards 4.0 Cookbook
Ebook
SAP BusinessObjects Dashboards 4.0 Cookbook
byXavier Hacking
Rating: 0 out of 5 stars
0 ratings
SAP BusinessObjects Reporting Cookbook
Ebook
SAP BusinessObjects Reporting Cookbook
byYoav Yahav
Rating: 5 out of 5 stars
5/5
Apache Hive Cookbook
Ebook
Apache Hive Cookbook
byShrey Mehrotra
Rating: 0 out of 5 stars
0 ratings
Oracle Warehouse Builder 11g: Getting Started
Ebook
Oracle Warehouse Builder 11g: Getting Started
byBob Griesemer
Rating: 0 out of 5 stars
0 ratings
Business Intelligence with MicroStrategy Cookbook
Ebook
Business Intelligence with MicroStrategy Cookbook
byDavide Moraschi
Rating: 0 out of 5 stars
0 ratings
Microsoft SQL Server 2014 Business Intelligence Development Beginner’s Guide
Ebook
Microsoft SQL Server 2014 Business Intelligence Development Beginner’s Guide
byReza Rad
Rating: 0 out of 5 stars
0 ratings
Tableau 10 Business Intelligence Cookbook
Ebook
Tableau 10 Business Intelligence Cookbook
bySantos Donabel
Rating: 0 out of 5 stars
0 ratings
Tableau Cookbook – Recipes for Data Visualization
Ebook
Tableau Cookbook – Recipes for Data Visualization
byShweta Sankhe-Savale
Rating: 0 out of 5 stars
0 ratings
Tabular Modeling with SQL Server 2016 Analysis Services Cookbook
Ebook
Tabular Modeling with SQL Server 2016 Analysis Services Cookbook
byDerek Wilson
Rating: 4 out of 5 stars
4/5
OData Programming Cookbook for .NET Developers
Ebook
OData Programming Cookbook for .NET Developers
bySteven Cheng
Rating: 0 out of 5 stars
0 ratings
MDX with SSAS 2012 Cookbook
Ebook
MDX with SSAS 2012 Cookbook
bySherry Li
Rating: 0 out of 5 stars
0 ratings
Talend Open Studio Cookbook
Ebook
Talend Open Studio Cookbook
byRick Barton
Rating: 2 out of 5 stars
2/5
QlikView for Developers Cookbook
Ebook
QlikView for Developers Cookbook
byStephen Redmond
Rating: 0 out of 5 stars
0 ratings
Microsoft Tabular Modeling Cookbook
Ebook
Microsoft Tabular Modeling Cookbook
byPaul te Braak
Rating: 0 out of 5 stars
0 ratings
SAP BusinessObjects Dashboards 4.1 Cookbook
Ebook
SAP BusinessObjects Dashboards 4.1 Cookbook
byXavier Hacking
Rating: 0 out of 5 stars
0 ratings
PostgreSQL 9 High Availability Cookbook
Ebook
PostgreSQL 9 High Availability Cookbook
byShaun M. Thomas
Rating: 5 out of 5 stars
5/5
SQL Server 2016 Reporting Services Cookbook
Ebook
SQL Server 2016 Reporting Services Cookbook
byDinesh Priyankara
Rating: 5 out of 5 stars
5/5
Hadoop Real-World Solutions Cookbook - Second Edition
Ebook
Hadoop Real-World Solutions Cookbook - Second Edition
byDeshpande Tanmay
Rating: 0 out of 5 stars
0 ratings
Amazon S3 Cookbook
Ebook
Amazon S3 Cookbook
byNaoya Hashimoto
Rating: 0 out of 5 stars
0 ratings
Software Development on the SAP HANA Platform
Ebook
Software Development on the SAP HANA Platform
byMark Walker
Rating: 5 out of 5 stars
5/5
SAP XI Exchange Infrastructure
Ebook
SAP XI Exchange Infrastructure
byEquity Press
Rating: 1 out of 5 stars
1/5
Real Time Analytics with SAP HANA
Ebook
Real Time Analytics with SAP HANA
byVinay Singh
Rating: 0 out of 5 stars
0 ratings
SAP Cloud Platform Complete Self-Assessment Guide
Ebook
SAP Cloud Platform Complete Self-Assessment Guide
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
SAP MDG-M A Complete Guide
Ebook
SAP MDG-M A Complete Guide
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
BI and Big Data Management
Ebook
BI and Big Data Management
byUlrich Hambuch
Rating: 0 out of 5 stars
0 ratings
Implementing SAP BPC Embedded 2nd Edition
Ebook
Implementing SAP BPC Embedded 2nd Edition
byMarc Schindler
Rating: 0 out of 5 stars
0 ratings
Unofficial SAP WebDynpro for ABAP
Ebook
Unofficial SAP WebDynpro for ABAP
byEquity Press
Rating: 5 out of 5 stars
5/5
SAP Collateral Management System (CMS): Configuration Guide & User Manual
Ebook
SAP Collateral Management System (CMS): Configuration Guide & User Manual
byArjan Hogenes
Rating: 4 out of 5 stars
4/5
Learning SAP BusinessObjects Dashboards
Ebook
Learning SAP BusinessObjects Dashboards
byTaha M. Mahmoud
Rating: 0 out of 5 stars
0 ratings
UI5 User Guide: How to develop responsive data-centric client web applications
Ebook
UI5 User Guide: How to develop responsive data-centric client web applications
byCarsten Heinrigs
Rating: 0 out of 5 stars
0 ratings

Data Modeling & Design For You

Skip carousel

Data Analytics for Beginners: Introduction to Data Analytics
Ebook
Data Analytics for Beginners: Introduction to Data Analytics
byAnthony S. Williams
Rating: 4 out of 5 stars
4/5
Python Data Analysis - Second Edition
Ebook
Python Data Analysis - Second Edition
byArmando Fandango
Rating: 0 out of 5 stars
0 ratings
Advanced Deep Learning with Python: Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch
Ebook
Advanced Deep Learning with Python: Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch
byIvan Vasilev
Rating: 0 out of 5 stars
0 ratings
Python: Master the Art of Design Patterns
Ebook
Python: Master the Art of Design Patterns
byDusty Phillips
Rating: 4 out of 5 stars
4/5
Raspberry Pi :Raspberry Pi Guide On Python & Projects Programming In Easy Steps
Ebook
Raspberry Pi :Raspberry Pi Guide On Python & Projects Programming In Easy Steps
byJason Scotts
Rating: 3 out of 5 stars
3/5
Quality metrics for semantic interoperability in Health Informatics
Ebook
Quality metrics for semantic interoperability in Health Informatics
byAlberto Moreno Conde
Rating: 0 out of 5 stars
0 ratings
Programmable Logic Controllers
Ebook
Programmable Logic Controllers
byWilliam Bolton
Rating: 4 out of 5 stars
4/5
AI and UX: Why Artificial Intelligence Needs User Experience
Ebook
AI and UX: Why Artificial Intelligence Needs User Experience
byGavin Lew
Rating: 0 out of 5 stars
0 ratings
Mastering VB.NET: A Comprehensive Guide to Visual Basic .NET Programming
Ebook
Mastering VB.NET: A Comprehensive Guide to Visual Basic .NET Programming
byKameron Hussain
Rating: 0 out of 5 stars
0 ratings
Supercharge Power BI: Power BI is Better When You Learn To Write DAX
Ebook
Supercharge Power BI: Power BI is Better When You Learn To Write DAX
byMatt Allington
Rating: 5 out of 5 stars
5/5
Data Analytics with Python: Data Analytics in Python Using Pandas
Ebook
Data Analytics with Python: Data Analytics in Python Using Pandas
byFrank Millstein
Rating: 3 out of 5 stars
3/5
DAX Patterns: Second Edition
Ebook
DAX Patterns: Second Edition
byMarco Russo
Rating: 5 out of 5 stars
5/5
Living in Data: A Citizen's Guide to a Better Information Future
Ebook
Living in Data: A Citizen's Guide to a Better Information Future
byJer Thorp
Rating: 4 out of 5 stars
4/5
The Secrets of ChatGPT Prompt Engineering for Non-Developers
Ebook
The Secrets of ChatGPT Prompt Engineering for Non-Developers
byCea West
Rating: 5 out of 5 stars
5/5
Learning Cypher
Ebook
Learning Cypher
byOnofrio Panzarino
Rating: 0 out of 5 stars
0 ratings
Hacks To Crush Plc Program Fast & Efficiently Everytime... : Coding, Simulating & Testing Programmable Logic Controller With Examples
Ebook
Hacks To Crush Plc Program Fast & Efficiently Everytime... : Coding, Simulating & Testing Programmable Logic Controller With Examples
byMichael Blake
Rating: 5 out of 5 stars
5/5
Python Data Analysis
Ebook
Python Data Analysis
byIvan Idris
Rating: 4 out of 5 stars
4/5
Minding the Machines: Building and Leading Data Science and Analytics Teams
Ebook
Minding the Machines: Building and Leading Data Science and Analytics Teams
byJeremy Adamson
Rating: 0 out of 5 stars
0 ratings
Data Visualization: a successful design process
Ebook
Data Visualization: a successful design process
byAndy Kirk
Rating: 4 out of 5 stars
4/5
Power Pivot and Power BI: The Excel User's Guide to DAX, Power Query, Power BI & Power Pivot in Excel 2010-2016
Ebook
Power Pivot and Power BI: The Excel User's Guide to DAX, Power Query, Power BI & Power Pivot in Excel 2010-2016
byRob Collie
Rating: 4 out of 5 stars
4/5
What Makes Us Smart: The Computational Logic of Human Cognition
Ebook
What Makes Us Smart: The Computational Logic of Human Cognition
bySamuel J. Gershman
Rating: 0 out of 5 stars
0 ratings
The Esri Guide to GIS Analysis, Volume 3: Modeling Suitability, Movement, and Interaction
Ebook
The Esri Guide to GIS Analysis, Volume 3: Modeling Suitability, Movement, and Interaction
byAndy Mitchell
Rating: 0 out of 5 stars
0 ratings
Graph Databases in Action: Examples in Gremlin
Ebook
Graph Databases in Action: Examples in Gremlin
byJosh Perryman
Rating: 0 out of 5 stars
0 ratings
Kafka in Action
Ebook
Kafka in Action
byDylan Scott
Rating: 0 out of 5 stars
0 ratings
Principles of Data Science
Ebook
Principles of Data Science
bySinan Ozdemir
Rating: 4 out of 5 stars
4/5
MDX with Microsoft SQL Server 2016 Analysis Services Cookbook - Third Edition
Ebook
MDX with Microsoft SQL Server 2016 Analysis Services Cookbook - Third Edition
bySherry Li
Rating: 0 out of 5 stars
0 ratings
Learn T-SQL Querying: A guide to developing efficient and elegant T-SQL code
Ebook
Learn T-SQL Querying: A guide to developing efficient and elegant T-SQL code
byPedro Lopes
Rating: 0 out of 5 stars
0 ratings
Deep Learning: An Essential Guide to Deep Learning for Beginners Who Want to Understand How Deep Neural Networks Work and Relate to Machine Learning and Artificial Intelligence
Ebook
Deep Learning: An Essential Guide to Deep Learning for Beginners Who Want to Understand How Deep Neural Networks Work and Relate to Machine Learning and Artificial Intelligence
byHerbert Jones
Rating: 5 out of 5 stars
5/5
Think Like a Data Scientist: Tackle the data science process step-by-step
Ebook
Think Like a Data Scientist: Tackle the data science process step-by-step
byBrian Godsey
Rating: 0 out of 5 stars
0 ratings
Neural Networks: Neural Networks Tools and Techniques for Beginners
Ebook
Neural Networks: Neural Networks Tools and Techniques for Beginners
byJohn Slavio
Rating: 5 out of 5 stars
5/5

Related podcast episodes

Skip carousel

Managing SAP to the Cloud
Podcast episode
Managing SAP to the Cloud
byThe Cloudcast
0 ratings
0% found this document useful
#456: Data Architectures with AWS Hero Elliott Cordo: AWS Data Hero and Head of Data at Capsule, Elliott Cordo, has built many ground-up data architecture
Podcast episode
#456: Data Architectures with AWS Hero Elliott Cordo: AWS Data Hero and Head of Data at Capsule, Elliott Cordo, has built many ground-up data architecture
byAWS Podcast
0 ratings
0% found this document useful
SAP with Thomas Jung and Lucia Subatin: Brian Dorsey and Mark Mirchandani team up this week to speak with Thomas Jung and Lucia Subatin about SAP.
Podcast episode
SAP with Thomas Jung and Lucia Subatin: Brian Dorsey and Mark Mirchandani team up this week to speak with Thomas Jung and Lucia Subatin about SAP.
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
Reduce The Overhead In Your Pipelines With Agile Data Engine's DataOps Service: A significant portion of the time spent by data engineering teams is on managing the workflows and operations of their pipelines. DataOps has arisen as a parallel set of practices to that of DevOps teams as a means of reducing wasted effort. Agile Data Engine is a platform designed to handle the infrastructure side of the DataOps equation, as well as providing the insights that you need to manage the human side of the workflow. In this episode Tevje Olin explains how the platform is implemented, the features that it provides to reduce the amount of effort required to keep your pipelines running, and how you can start using it in your own team.
Podcast episode
Reduce The Overhead In Your Pipelines With Agile Data Engine's DataOps Service: A significant portion of the time spent by data engineering teams is on managing the workflows and operations of their pipelines. DataOps has arisen as a parallel set of practices to that of DevOps teams as a means of reducing wasted effort. Agile Data Engine is a platform designed to handle the infrastructure side of the DataOps equation, as well as providing the insights that you need to manage the human side of the workflow. In this episode Tevje Olin explains how the platform is implemented, the features that it provides to reduce the amount of effort required to keep your pipelines running, and how you can start using it in your own team.
byData Engineering Podcast
0 ratings
0% found this document useful
Dataprep with Eric Anderson: Eric Anderson joins the podcast to talk about how Dataprep is simplifying data wrangling!
Podcast episode
Dataprep with Eric Anderson: Eric Anderson joins the podcast to talk about how Dataprep is simplifying data wrangling!
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
How Data Platforms Affect ML & AI // Jake Watson // #207
Podcast episode
How Data Platforms Affect ML & AI // Jake Watson // #207
byMLOps.community
0 ratings
0% found this document useful
Storytime for DataOps - Christopher Bergh
Podcast episode
Storytime for DataOps - Christopher Bergh
byDataTalks.Club
0 ratings
0% found this document useful
396: Build vs. Buy: Joël has been fighting a frustrating bug where he's integrating with a third-party database, and some queries just crash. Stephanie shares her own debugging story about a leaky stub that caused flaky tests. Additionally, they discuss the build vs. buy decision when integrating with third-party systems. They consider the time and cost implications of building their own integration versus using off-the-shelf components and conclude that the decision often depends on the specific needs and priorities of the project, including how quickly a solution is needed and whether the integration is core to the business's value proposition.
Podcast episode
396: Build vs. Buy: Joël has been fighting a frustrating bug where he's integrating with a third-party database, and some queries just crash. Stephanie shares her own debugging story about a leaky stub that caused flaky tests. Additionally, they discuss the build vs. buy decision when integrating with third-party systems. They consider the time and cost implications of building their own integration versus using off-the-shelf components and conclude that the decision often depends on the specific needs and priorities of the project, including how quickly a solution is needed and whether the integration is core to the business's value proposition.
byThe Bike Shed
0 ratings
0% found this document useful
Privacy-aware Data Pipelines with Skyflow’s Piper Keyes: A data analytics pipeline is important to modern businesses because it allows them to extract valuable insights from the large amounts of data they generate and collect on a daily basis. This leads to better decision making, improved efficiency, and ...
Podcast episode
Privacy-aware Data Pipelines with Skyflow’s Piper Keyes: A data analytics pipeline is important to modern businesses because it allows them to extract valuable insights from the large amounts of data they generate and collect on a daily basis. This leads to better decision making, improved efficiency, and ...
byPartially Redacted: Data Privacy, Security & Compliance
0 ratings
0% found this document useful
Gitting After It with Katie Sylor-Miller: Katie Sylor-Miller is a frontend architect at Etsy, a company she joined in November 2015. Prior to this position, Katie worked as a senior front end developer at Constant Contact, a technical lead at EF Education, a front end web developer at Miller Syst
Podcast episode
Gitting After It with Katie Sylor-Miller: Katie Sylor-Miller is a frontend architect at Etsy, a company she joined in November 2015. Prior to this position, Katie worked as a senior front end developer at Constant Contact, a technical lead at EF Education, a front end web developer at Miller Syst
byScreaming in the Cloud
0 ratings
0% found this document useful
How to Get Better Traffic, Free with Semantic Data: Everyone wants better ranking but that’s not really what they want. Stores want more traffic. They want that because it leads to sales. Sales is what really matters. All of SEO is just a marketing channel to get more visitors who turn into customers. The main FIX for SEO is better rankings What if there was a way to get more sales with the existing rankings you already have? How? By convincing more searchers to click on your search listings. More clicks == more traffic == more sales. What we’re talking about today is a search enhancement in Google called Rich Snippets, specifically Product Rich Snippets. They enhance your existing search results with more data and pixels. Eric Davis joins us today to walk us through it in easy to understand terms. Eric is founder of Little Stream Software, which helps Shopify entrepreneurs customize their Shopify stores using public and private Shopify Apps.
Podcast episode
How to Get Better Traffic, Free with Semantic Data: Everyone wants better ranking but that’s not really what they want. Stores want more traffic. They want that because it leads to sales. Sales is what really matters. All of SEO is just a marketing channel to get more visitors who turn into customers. The main FIX for SEO is better rankings What if there was a way to get more sales with the existing rankings you already have? How? By convincing more searchers to click on your search listings. More clicks == more traffic == more sales. What we’re talking about today is a search enhancement in Google called Rich Snippets, specifically Product Rich Snippets. They enhance your existing search results with more data and pixels. Eric Davis joins us today to walk us through it in easy to understand terms. Eric is founder of Little Stream Software, which helps Shopify entrepreneurs customize their Shopify stores using public and private Shopify Apps.
byThe Unofficial Shopify Podcast: Entrepreneur Tales
0 ratings
0% found this document useful
Automate Your Pipeline Creation For Streaming Data Transformations With SQLake: Managing end-to-end data flows becomes complex and unwieldy as the scale of data and its variety of applications in an organization grows. Part of this complexity is due to the transformation and orchestration of data living in disparate systems. The team at Upsolver is taking aim at this problem with the latest iteration of their platform in the form of SQLake. In this episode Ori Rafael explains how they are automating the creation and scheduling of orchestration flows and their related transforations in a unified SQL interface.
Podcast episode
Automate Your Pipeline Creation For Streaming Data Transformations With SQLake: Managing end-to-end data flows becomes complex and unwieldy as the scale of data and its variety of applications in an organization grows. Part of this complexity is due to the transformation and orchestration of data living in disparate systems. The team at Upsolver is taking aim at this problem with the latest iteration of their platform in the form of SQLake. In this episode Ori Rafael explains how they are automating the creation and scheduling of orchestration flows and their related transforations in a unified SQL interface.
byData Engineering Podcast
0 ratings
0% found this document useful
Listener Questions 3 - How to Get Rid of Your Oracle Addiction: Join Pete and Jesse as they talk about the Herculean effort that is lifting and shifting a system to AWS; why your work is just getting started after you do a lift-and-shift; how technical debt piles up when you don’t modernize applications to take advant
Podcast episode
Listener Questions 3 - How to Get Rid of Your Oracle Addiction: Join Pete and Jesse as they talk about the Herculean effort that is lifting and shifting a system to AWS; why your work is just getting started after you do a lift-and-shift; how technical debt piles up when you don’t modernize applications to take advant
byAWS Morning Brief
0 ratings
0% found this document useful
Bringing DevOps to the Database with Automation
Podcast episode
Bringing DevOps to the Database with Automation
byThe Cloudcast
0 ratings
0% found this document useful
The Value of Analysts and Observability with Nick Heudecker: Nick Heudecker, who leads Market Strategy and Competitive Intelligence at Cirbl, joins Corey who, as it turns out, has some similarities with Corey. Nick also spent some time in Maine, as a cryptologist for the Navy, and also spent the months of deep wint
Podcast episode
The Value of Analysts and Observability with Nick Heudecker: Nick Heudecker, who leads Market Strategy and Competitive Intelligence at Cirbl, joins Corey who, as it turns out, has some similarities with Corey. Nick also spent some time in Maine, as a cryptologist for the Navy, and also spent the months of deep wint
byScreaming in the Cloud
0 ratings
0% found this document useful
Eliminate The Overhead In Your Data Integration With The Open Source dlt Library: Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of multiple options for flexible data integration, with a roughly equal distribution of commercial and open source options. The challenge is that most of those options are complex to operate and exist in their own silo. The dlt project was created to eliminate overhead and bring data integration into your full control as a library component of your overall data system. In this episode Adrian Brudaru explains how it works, the benefits that it provides over other data integration solutions, and how you can start building pipelines today.
Podcast episode
Eliminate The Overhead In Your Data Integration With The Open Source dlt Library: Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of multiple options for flexible data integration, with a roughly equal distribution of commercial and open source options. The challenge is that most of those options are complex to operate and exist in their own silo. The dlt project was created to eliminate overhead and bring data integration into your full control as a library component of your overall data system. In this episode Adrian Brudaru explains how it works, the benefits that it provides over other data integration solutions, and how you can start building pipelines today.
byData Engineering Podcast
0 ratings
0% found this document useful
Whiteboard Confessional: Everything's a Database Except SQLite: Join me as I continue a new series called Whiteboard Confessional with a look at the awesomeness that is SQLite, including how it wasn’t designed to work in a client-server fashion, when you should use it and when you absolutely shouldn’t, how deciding to
Podcast episode
Whiteboard Confessional: Everything's a Database Except SQLite: Join me as I continue a new series called Whiteboard Confessional with a look at the awesomeness that is SQLite, including how it wasn’t designed to work in a client-server fashion, when you should use it and when you absolutely shouldn’t, how deciding to
byAWS Morning Brief
0 ratings
0% found this document useful
Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary: Working with data is a complicated process, with numerous chances for something to go wrong. Identifying and accounting for those errors is a critical piece of building trust in the organization that your data is accurate and up to date. While there are numerous products available to provide that visibility, they all have different technologies and workflows that they focus on. To bring observability to dbt projects the team at Elementary embedded themselves into the workflow. In this episode Maayan Salom explores the approach that she has taken to bring observability, enhanced testing capabilities, and anomaly detection into every step of the dbt developer experience.
Podcast episode
Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary: Working with data is a complicated process, with numerous chances for something to go wrong. Identifying and accounting for those errors is a critical piece of building trust in the organization that your data is accurate and up to date. While there are numerous products available to provide that visibility, they all have different technologies and workflows that they focus on. To bring observability to dbt projects the team at Elementary embedded themselves into the workflow. In this episode Maayan Salom explores the approach that she has taken to bring observability, enhanced testing capabilities, and anomaly detection into every step of the dbt developer experience.
byData Engineering Podcast
0 ratings
0% found this document useful
129: How to Test Anything - David Lord: I asked people on twitter to fill in "How do I test _____?" to find out what people want to know how to test. Lots of responses. David Lord agreed to answer them with me. In the process, we come up with lots of great general advice on how to test just about anything.
Podcast episode
129: How to Test Anything - David Lord: I asked people on twitter to fill in "How do I test _____?" to find out what people want to know how to test. Lots of responses. David Lord agreed to answer them with me. In the process, we come up with lots of great general advice on how to test just about anything.
byTest and Code
0 ratings
0% found this document useful
Harnessing Generative AI For Creating Educational Content With Illumidesk: Generative AI has unlocked a massive opportunity for content creation. There is also an unfulfilled need for experts to be able to share their knowledge and build communities. Illumidesk was built to take advantage of this intersection. In this episode Greg Werner explains how they are using generative AI as an assistive tool for creating educational material, as well as building a data driven experience for learners.
Podcast episode
Harnessing Generative AI For Creating Educational Content With Illumidesk: Generative AI has unlocked a massive opportunity for content creation. There is also an unfulfilled need for experts to be able to share their knowledge and build communities. Illumidesk was built to take advantage of this intersection. In this episode Greg Werner explains how they are using generative AI as an assistive tool for creating educational material, as well as building a data driven experience for learners.
byData Engineering Podcast
0 ratings
0% found this document useful
From MLOps to DataOps - Santona Tuli
Podcast episode
From MLOps to DataOps - Santona Tuli
byDataTalks.Club
0 ratings
0% found this document useful
Understanding Machine Learning Features and Platforms
Podcast episode
Understanding Machine Learning Features and Platforms
byThe Cloudcast
0 ratings
0% found this document useful
Bringing Feature Stores and MLOps to the Enterprise at Tecton: An interview with Kevin Stumpf, CTO of Tecton, about his work building an enterprise grade feature store and how it functions as the core element of an MLOps strategy.
Podcast episode
Bringing Feature Stores and MLOps to the Enterprise at Tecton: An interview with Kevin Stumpf, CTO of Tecton, about his work building an enterprise grade feature store and how it functions as the core element of an MLOps strategy.
byData Engineering Podcast
0 ratings
0% found this document useful
416: Multi-Dimensional Numbers: Joël discusses the challenges he encountered while optimizing slow SQL queries in a non-Rails application. Stephanie shares her experience with canary deploys in a Rails upgrade. Together, Stephanie and Joël address a listener's question about replacing the wkhtml2pdf tool, which is no longer maintained. The episode's main topic revolves around the concept of multidimensional numbers and their applications in software development. Joël introduces the idea of treating objects containing multiple numbers as single entities, using the example of 2D points in space to illustrate how custom classes can define mathematical operations like addition and subtraction for complex data types. They explore how this approach can simplify operations on data structures, such as inventories of T-shirt sizes, by treating them as mathematical objects.
Podcast episode
416: Multi-Dimensional Numbers: Joël discusses the challenges he encountered while optimizing slow SQL queries in a non-Rails application. Stephanie shares her experience with canary deploys in a Rails upgrade. Together, Stephanie and Joël address a listener's question about replacing the wkhtml2pdf tool, which is no longer maintained. The episode's main topic revolves around the concept of multidimensional numbers and their applications in software development. Joël introduces the idea of treating objects containing multiple numbers as single entities, using the example of 2D points in space to illustrate how custom classes can define mathematical operations like addition and subtraction for complex data types. They explore how this approach can simplify operations on data structures, such as inventories of T-shirt sizes, by treating them as mathematical objects.
byThe Bike Shed
0 ratings
0% found this document useful
Designing A Non-Relational Database Engine: Databases come in a variety of formats for different use cases. The default association with the term "database" is relational engines, but non-relational engines are also used quite widely. In this episode Oren Eini, CEO and creator of RavenDB, explores the nuances of relational vs. non-relational engines, and the strategies for designing a non-relational database.
Podcast episode
Designing A Non-Relational Database Engine: Databases come in a variety of formats for different use cases. The default association with the term "database" is relational engines, but non-relational engines are also used quite widely. In this episode Oren Eini, CEO and creator of RavenDB, explores the nuances of relational vs. non-relational engines, and the strategies for designing a non-relational database.
byData Engineering Podcast
0 ratings
0% found this document useful
Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer: Maintaining a single source of truth for your data is the biggest challenge in data engineering. Different roles and tasks in the business need their own ways to access and analyze the data in the organization. In order to enable this use case, while maintaining a single point of access, the semantic layer has evolved as a technological solution to the problem. In this episode Artyom Keydunov, creator of Cube, discusses the evolution and applications of the semantic layer as a component of your data platform, and how Cube provides speed and cost optimization for your data consumers.
Podcast episode
Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer: Maintaining a single source of truth for your data is the biggest challenge in data engineering. Different roles and tasks in the business need their own ways to access and analyze the data in the organization. In order to enable this use case, while maintaining a single point of access, the semantic layer has evolved as a technological solution to the problem. In this episode Artyom Keydunov, creator of Cube, discusses the evolution and applications of the semantic layer as a component of your data platform, and how Cube provides speed and cost optimization for your data consumers.
byData Engineering Podcast
0 ratings
0% found this document useful
Information Extraction from Natural Document Formats with David Rosenberg - TWiML Talk #126: In this episode, I’m joined by David Rosenberg, d…
Podcast episode
Information Extraction from Natural Document Formats with David Rosenberg - TWiML Talk #126: In this episode, I’m joined by David Rosenberg, d…
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
An Exploration Of Tobias' Experience In Building A Data Lakehouse From Scratch: Five years of hosting the Data Engineering Podcast has provided Tobias Macey with a wealth of insight into the work of building and operating data systems at a variety of scales and for myriad purposes. In order to condense that acquired knowledge into a format that is useful to everyone Scott Hirleman turns the tables in this episode and asks Tobias about the tactical and strategic aspects of his experiences applying those lessons to the work of building a data platform from scratch.
Podcast episode
An Exploration Of Tobias' Experience In Building A Data Lakehouse From Scratch: Five years of hosting the Data Engineering Podcast has provided Tobias Macey with a wealth of insight into the work of building and operating data systems at a variety of scales and for myriad purposes. In order to condense that acquired knowledge into a format that is useful to everyone Scott Hirleman turns the tables in this episode and asks Tobias about the tactical and strategic aspects of his experiences applying those lessons to the work of building a data platform from scratch.
byData Engineering Podcast
0 ratings
0% found this document useful
Becoming a Data-led Professional - Arpit Choudhury
Podcast episode
Becoming a Data-led Professional - Arpit Choudhury
byDataTalks.Club
0 ratings
0% found this document useful
Seamless SQL And Python Transformations For Data Engineers And Analysts With SQLMesh: Data transformation is a key activity for all of the organizational roles that interact with data. Because of its importance and outsized impact on what is possible for downstream data consumers it is critical that everyone is able to collaborate seamlessly. SQLMesh was designed as a unifying tool that is simple to work with but powerful enough for large-scale transformations and complex projects. In this episode Toby Mao explains how it works, the importance of automatic column-level lineage tracking, and how you can start using it today.
Podcast episode
Seamless SQL And Python Transformations For Data Engineers And Analysts With SQLMesh: Data transformation is a key activity for all of the organizational roles that interact with data. Because of its importance and outsized impact on what is possible for downstream data consumers it is critical that everyone is able to collaborate seamlessly. SQLMesh was designed as a unifying tool that is simple to work with but powerful enough for large-scale transformations and complex projects. In this episode Toby Mao explains how it works, the importance of automatic column-level lineage tracking, and how you can start using it today.
byData Engineering Podcast
0 ratings
0% found this document useful

Skip carousel

Elasticsearch And Kibana Basics
Linux Format
Article
Elasticsearch And Kibana Basics
Dec 15, 2020
1 min read
Grafana, Telegraf And Influxdb
Linux Format
Article
Grafana, Telegraf And Influxdb
Jun 30, 2020
If you don’t like Netdata or if you want to try something else, you can give Grafana (https://grafana.com), Telegraf (www.influxdata.com/time-series-platform/telegraf) and InfluxDB (www.influxdata.com/products/influxdb-overview) a try. Grafana can’t
1 min read
MARIADB Optimise And Control Your Databases
Linux Format
Article
MARIADB Optimise And Control Your Databases
Jul 30, 2019
9 min read
Understanding ELT & ETL
Techfastly
Article
Understanding ELT & ETL
Apr 1, 2021
8 min read
Scan And Scrape Websites Using Python
Linux Format
Article
Scan And Scrape Websites Using Python
Nov 14, 2023
David Bolton once accidentally boosted the traffic for his firm’s website by 25% in one day by running a web scraper on it. Luckily, they never found out! Ever since the web made an appearance back in the mid-’90s, programmers have been writing softw
6 min read
Google Answer Box Strategy
Techfastly
Article
Google Answer Box Strategy
Sep 21, 2020
Leveraging the Google PAA (People Also Ask) element on a Search Results Page for Targeted Content Creation with a Python Scraper All businesses that are online today are creating content at a furious pace. According to Technavio, a research firm, con
7 min read
What is ELT?
Techfastly
Article
What is ELT?
Apr 1, 2021
It stands for extract, load, and transform- the processes a data pipeline uses for replicating the data from a source system into a target system such as a cloud data warehouse. 1. Extraction is the first step in which data is copied from the source
6 min read
“When Something Goes Wrong, You Realise You’re Like That Cartoon Character That Has Run Off The Edge Of The Cliff”
PC Pro Magazine
Article
“When Something Goes Wrong, You Realise You’re Like That Cartoon Character That Has Run Off The Edge Of The Cliff”
Feb 9, 2023
We need to talk about data. Specifically, your data and my data. The stuff we use on a day-to-day basis, from where we store it to what our expectations are for its safe handling. Now let me get one thing clear from the beginning: I am going to sugge
9 min read
Mac 911
MacWorld
Article
Mac 911
Sep 18, 2018
5 min read
Enterprise-grade Monitoring Made Easy
Linux Format
Article
Enterprise-grade Monitoring Made Easy
Mar 10, 2020
9 min read
Finish Your Cataloguing App
Linux Format
Article
Finish Your Cataloguing App
Jan 10, 2023
Matt Holder has been a fan of the open source methodology for over two decades and uses Linux and other tools where possible. In his spare time, Matt enjoys listening to music and reading. More featurepacked source code for this project can be downlo
7 min read
ETL OR ELT: Build vs Buy
Techfastly
Article
ETL OR ELT: Build vs Buy
Apr 1, 2021
2 min read
Embed An Excel File On Your Site
Computeractive
Article
Embed An Excel File On Your Site
Jul 20, 2022
When you need to share data in an Excel spreadsheet, you could choose to extract it and then send it to members. An easier way is to embed it on your site for everyone to see. In our example, our local history club wants to share details of its yearl
2 min read
Why Is ELT Better For Cloud Data Warehousing?
Techfastly
Article
Why Is ELT Better For Cloud Data Warehousing?
Apr 1, 2021
2 min read
Observability Of The Kernel And Containers
Linux Format
Article
Observability Of The Kernel And Containers
Apr 4, 2023
Mihalis Tsoukalos is currently working on Time Series. You can reach him at: @mactsouk. For our final delve into eBPF, we’re tackling applications, the kernel and Docker containers. At the end of the day, all Linux machines execute code for applicat
10 min read
Code An Admin Back-end In Django
Linux Format
Article
Code An Admin Back-end In Django
Dec 13, 2022
Credit: www.djangoproject.com OUR EXPERT Matt Holder has been a fan of the open source methodology for over two decades and uses Linux and other tools where possible. More featurepacked source code for this project can be downloaded from https://
6 min read
“There’s No Single ‘Best’ Language To Learn. I Think The Real Key Is To Learn How To Write Code”
PC Pro Magazine
Article
“There’s No Single ‘Best’ Language To Learn. I Think The Real Key Is To Learn How To Write Code”
Oct 8, 2022
9 min read
Create Visualisations And Cool Dashboards
Linux Format
Article
Create Visualisations And Cool Dashboards
Jan 14, 2020
8 min read
Thanks Admins!
Linux Format
Article
Thanks Admins!
Jul 26, 2022
Every year, on the first Friday in July, Database Administrator Appreciation Day takes place, which is about saying thank you to the people who are responsible for organising data. They are the ones who set up and run databases so they can manage and
1 min read
Manipulate Data Like A Pro With Pandas
Linux Format
Article
Manipulate Data Like A Pro With Pandas
Jul 27, 2021
7 min read
Disrupting Databases
APC
Article
Disrupting Databases
Apr 19, 2021
3 min read
Accurate, Open Source IP-based Localisation
Linux Format
Article
Accurate, Open Source IP-based Localisation
Dec 14, 2021
8 min read
Mining Actionable Information with Smart Capture
The European Business Review
Article
Mining Actionable Information with Smart Capture
May 22, 2018
4 min read
Scikit-Learn: The Ultimate Python Library
APC
Article
Scikit-Learn: The Ultimate Python Library
Jul 15, 2019
4 min read
Manage And Read Your System Logs
Linux Format
Article
Manage And Read Your System Logs
Mar 10, 2020
10 min read
Real World Computing
PC Pro Magazine
Article
Real World Computing
May 11, 2023
Migrating to Azure isn’t necessarily the toughest part of a successful cloud migration, explains our guest columnist Many organisations succeed at deploying resources in or migrating to Microsoft Azure. But many of those same organisations fail to en
6 min read
“We’re Learning As We Go And Accepting Any False Starts As Being A Part Of The Process”
PC Pro Magazine
Article
“We’re Learning As We Go And Accepting Any False Starts As Being A Part Of The Process”
Jul 8, 2021
6 min read
Top 10 Excel Functions That Everyone Should Know
Techfastly
Article
Top 10 Excel Functions That Everyone Should Know
Feb 4, 2021
5 min read
Buying The Tool
Techfastly
Article
Buying The Tool
Apr 1, 2021
3 min read
Machine Learning – With Zero Programming
APC
Article
Machine Learning – With Zero Programming
Aug 12, 2019
6 min read

Related categories

Skip carousel

Reviews for SAP Data Services 4.x Cookbook

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

SAP Data Services 4.x Cookbook - Shomnikov Ivan

SAP Data Services 4.x Cookbook

Credits

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Instant updates on new Packt books

Preface

What this book covers

What you need for this book

Who this book is for

Sections

Getting ready

How to do it…

How it works…

There's more…

See also

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

1. Introduction to ETL Development

Introduction

Preparing a database environment

Getting ready

How to do it…

How it works…

Creating a source system database

How to do it…

How it works…

There's more…

Defining and creating staging area structures

How to do it…

Flat files

RDBMS tables

How it works…

Creating a target data warehouse

Getting ready

How to do it…

How it works…

There's more…

2. Configuring the Data Services Environment

Introduction

Creating IPS and Data Services repositories

Getting ready…

How to do it…

How it works…

See also

Installing and configuring Information Platform Services

Getting ready…

How to do it…

How it works…

Installing and configuring Data Services

Getting ready…

How to do it…

How it works…

Configuring user access

Getting ready…

How to do it…

How it works…

Starting and stopping services

How to do it…

How it works…

See also

Administering tasks

How to do it…

How it works…

See also

Understanding the Designer tool

Getting ready…

How to do it…

How it works…

Executing ETL code in Data Services

Validating ETL code

Template tables

Query transform basics

The HelloWorld example

3. Data Services Basics – Data Types, Scripting Language, and Functions

Introduction

Creating variables and parameters

Getting ready

How to do it…

How it works…

There's more…

Creating a script

How to do it…

How it works…

Using string functions

How to do it…

Using string functions in the script

How it works…

There's more…

Using date functions

How to do it…

Generating current date and time

Extracting parts from dates

How it works…

There's more…

Using conversion functions

How to do it…

How it works…

There's more…

Using database functions

How to do it…

key_generation()

total_rows()

sql()

How it works…

Using aggregate functions

How to do it…

How it works…

Using math functions

How to do it…

How it works…

There's more…

Using miscellaneous functions

How to do it…

How it works…

Creating custom functions

How to do it…

How it works…

There's more…

4. Dataflow – Extract, Transform, and Load

Introduction

Creating a source data object

How to do it…

How it works…

There's more…

Creating a target data object

Getting ready

How to do it…

How it works…

There's more…

Loading data into a flat file

How to do it…

How it works…

There's more…

Loading data from a flat file

How to do it…

How it works…

There's more…

Loading data from table to table – lookups and joins

How to do it…

How it works…

Using the Map_Operation transform

How to do it…

How it works…

Using the Table_Comparison transform

Getting ready

How to do it…

How it works…

Exploring the Auto correct load option

Getting ready

How to do it…

How it works…

Splitting the flow of data with the Case transform

Getting ready

How to do it…

How it works…

Monitoring and analyzing dataflow execution

Getting ready

How to do it…

How it works…

There's more…

5. Workflow – Controlling Execution Order

Introduction

Creating a workflow object

How to do it…

How it works…

Nesting workflows to control the execution order

Getting ready

How to do it

How it works…

Using conditional and while loop objects to control the execution order

Getting ready

How to do it…

How it works…

There is more…

Using the bypassing feature

Getting ready…

How to do it…

How it works…

There is more…

Controlling failures – try-catch objects

How to do it…

How it works…

Use case example – populating dimension tables

Getting ready

How to do it…

How it works…

Mapping

Dependencies

Development

Execution order

Testing ETL

Preparing test data to populate DimSalesTerritory

Preparing test data to populate DimGeography

Using a continuous workflow

How to do it…

How it works…

There is more…

Peeking inside the repository – parent-child relationships between Data Services objects

Getting ready

How to do it…

How it works…

Get a list of object types and their codes in the Data Services repository

Display information about the DF_Transform_DimGeography dataflow

Display information about the SalesTerritory table object

See the contents of the script object

6. Job – Building the ETL Architecture

Introduction

Projects and jobs – organizing ETL

Getting ready

How to do it…

How it works…

Hierarchical object view

History execution log files

Executing/scheduling jobs from the Management Console

Using object replication

How to do it…

How it works…

Migrating ETL code through the central repository

Getting ready

How to do it…

How it works…

Adding objects to and from the Central Object Library

Comparing objects between the Local and Central repositories

There is more…

Migrating ETL code with export/import

Getting ready

How to do it…

Import/Export using ATL files

Direct export to another local repository

How it works…

Debugging job execution

Getting ready…

How to do it…

How it works…

Monitoring job execution

Getting ready

How to do it…

How it works…

Building an external ETL audit and audit reporting

Getting ready…

How to do it…

How it works…

Using built-in Data Services ETL audit and reporting functionality

Getting ready

How to do it…

How it works…

Auto Documentation in Data Services

How to do it…

How it works…

7. Validating and Cleansing Data

Introduction

Creating validation functions

Getting ready

How to do it…

How it works…

Using validation functions with the Validation transform

Getting ready

How to do it…

How it works…

Reporting data validation results

Getting ready

How to do it…

How it works…

Using regular expression support to validate data

Getting ready

How to do it…

How it works…

Enabling dataflow audit

Getting ready

How to do it…

How it works…

There's more…

Data Quality transforms – cleansing your data

Getting ready

How to do it…

How it works…

There's more…

8. Optimizing ETL Performance

Introduction

Optimizing dataflow execution – push-down techniques

Getting ready

How to do it…

How it works…

Optimizing dataflow execution – the SQL transform

How to do it…

How it works…

Optimizing dataflow execution – the Data_Transfer transform

Getting ready

How to do it…

How it works…

Why we used a second Data_Transfer transform object

When to use Data_Transfer transform

There's more…

Optimizing dataflow readers – lookup methods

Getting ready

How to do it…

Lookup with the Query transform join

Lookup with the lookup_ext() function

Lookup with the sql() function

How it works…

Query transform joins

lookup_ext()

sql()

Performance review

Optimizing dataflow loaders – bulk-loading methods

How to do it…

How it works…

When to enable bulk loading?

Optimizing dataflow execution – performance options

Getting ready

How to do it…

Dataflow performance options

Source table performance options

Query transform performance options

lookup_ext() performance options

Target table performance options

9. Advanced Design Techniques

Introduction

Change Data Capture techniques

Getting ready

No history SCD (Type 1)

Limited history SCD (Type 3)

Unlimited history SCD (Type 2)

How to do it…

How it works…

Source-based ETL CDC

Target-based ETL CDC

Native CDC

Automatic job recovery in Data Services

Getting ready

How to do it…

How it works…

There's more…

Simplifying ETL execution with system configurations

Getting ready

How to do it…

How it works…

Transforming data with the Pivot transform

Getting ready

How to do it…

How it works…

10. Developing Real-time Jobs

Introduction

Working with nested structures

Getting ready

How to do it…

How it works…

There is more…

The XML_Map transform

Getting ready

How to do it…

How it works…

The Hierarchy_Flattening transform

Getting ready

How to do it…

Horizontal hierarchy flattening

Vertical hierarchy flattening

How it works…

Querying result tables

Configuring Access Server

Getting ready

How to do it…

How it works…

Creating real-time jobs

Getting ready

Installing SoapUI

How to do it…

How it works…

11. Working with SAP Applications

Introduction

Loading data into SAP ERP

Getting ready

How to do it…

How it works…

IDoc

Monitoring IDoc load on the SAP side

Post-load validation of loaded data

There is more…

12. Introduction to Information Steward

Introduction

Exploring Data Insight capabilities

Getting ready

How to do it…

Creating a connection object

Profiling the data

Viewing profiling results

Creating a validation rule

Creating a scorecard

How it works…

Profiling

Rules

Scorecards

There is more…

Performing Metadata Management tasks

Getting ready

How to do it…

How it works…

Working with the Metapedia functionality

How to do it…

How it works…

Creating a custom cleansing package with Cleansing Package Builder

Getting ready

How to do it…

How it works…

There is more…

Index

SAP Data Services 4.x Cookbook

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: November 2015

Production reference: 1261115

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78217-656-5

www.packtpub.com

Credits

Author

Ivan Shomnikov

Reviewers

Andrés Aguado Aranda

Dick Groenhof

Bernard Timbal Duclaux de Martin

Sridhar Sunkaraneni

Meenakshi Verma

Commissioning Editor

Vinay Argekar

Acquisition Editors

Shaon Basu

Kevin Colaco

Content Development Editor

Merint Mathew

Technical Editor

Humera Shaikh

Copy Editors

Brandt D'mello

Shruti Iyer

Karuna Narayanan

Sameen Siddiqui

Project Coordinator

Francina Pinto

Proofreader

Safis Editing

Indexer

Monica Ajmera Mehta

Production Coordinator

Nilesh Mohite

Cover Work

Nilesh Mohite

About the Author

Ivan Shomnikov is an SAP analytics consultant specializing in the area of Extract, Transform, and Load (ETL). He has in-depth knowledge of the data warehouse life cycle processes (DWH design and ETL development) and extensive hands-on experience with both the SAP Enterprise Information Management (Data Services) technology stack and the SAP BusinessObjects reporting products stack (Web Intelligence, Designer, Dashboards).

Ivan has been involved in the implementation of complex BI solutions on the SAP BusinessObjects Enterprise platform in major New Zealand companies across different industries. He also has a strong background as an Oracle database administrator and developer.

This is my first experience of writing a book, and I would like to thank my partner and my son for their patience and support.

About the Reviewers

Andrés Aguado Aranda is a 26-year-old computer engineer from Spain. His experience has given him a really technical background in databases, data warehouse, and business intelligence.

Andrés has worked in different business sectors, such as banking, public administrations, and energy, since 2012 in data-related positions.

This book is my first stint as a reviewer, and it has been really interesting and valuable to me, both personally and professionally.

I would like to thank my family and friends for always being willing to help me when I needed. Also, I would like to thank to my former coworker and currently friend, Antonio Martín-Cobos, a BI reporting analyst who really helped me get this opportunity.

Dick Groenhof started his professional career in 1990 after finishing his studies in business information science at Vrije Universiteit Amsterdam. Having worked as a software developer and service management consultant for the first part of his career, he became active as a consultant in the business intelligence arena since 2005.

Dick has been a lead consultant on numerous SAP BI projects, designing and implementing successful solutions for his customers, who regard him as a trusted advisor. His core competences include both frontend (such as Web Intelligence, Crystal Reports, and SAP Design Studio) and backend tools (such as SAP Data Services and Information Steward). Dick is an early adopter of the SAP HANA platform, creating innovative solutions using HANA Information Views, Predictive Analysis Library, and SQLScript.

He is a Certified Application Associate in SAP HANA and SAP BusinessObjects Web Intelligence 4.1. Currently, Dick works as senior HANA and big data consultant for a highly respected and innovative SAP partner in the Netherlands.

He is a strong believer in sharing his knowledge with regard to SAP HANA and SAP Data Services by writing blogs (at http://www.dickgroenhof.com and http://www.thenextview.nl/blog) and speaking at seminars.

Dick is happily married to Emma and is a very proud father of his son, Christiaan, and daughter, Myrthe.

Bernard Timbal Duclaux de Martin is a business intelligence architect and technical expert with more than 15 years of experience. He has been involved in several large business intelligence system deployments and administration in banking and insurance companies. In addition, Bernard has skills in modeling, data extraction, transformation, loading, and reporting design. He has authored four books, including two regarding SAP BusinessObjects Enterprise administration.

Meenakshi Verma has been a part of the IT industry since 1998. She is an experienced business systems specialist having the CBAP and TOGAF certifications. Meenakshi is well-versed with a variety of tools and techniques used for business analysis, such as SAP BI, SAP BusinessObjects, Java/J2EE technologies, and others. She is currently based in Toronto, Canada, and works with a leading utility company.

Meenakshi has helped technically review many books published by Packt Publishing across various enterprise solutions. Her earlier works include JasperReports for Java Developers, Java EE 5 Development using GlassFish Application Server, Practical Data Analysis and Reporting with BIRT, EJB 3 Developer Guide, Learning Dojo, and IBM WebSphere Application Server 8.0 Administration Guide.

I'd like to thank my father, Mr. Bhopal Singh, and mother, Mrs. Raj Bala, for laying a strong foundation in me and giving me their unconditional love and support. I also owe thanks and gratitude to my husband, Atul Verma, for his encouragement and support throughout the reviewing of this book and many others; my ten-year-old son, Prieyaansh Verma, for giving me the warmth of his love despite my hectic schedules; and my brother, Sachin Singh, for always being there for me.

www.PacktPub.com

Support files, eBooks, discount offers, and more

For support files and downloads related to your book, please visit www.PacktPub.com.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www2.packtpub.com/books/subscription/packtlib

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.

Why subscribe?

Fully searchable across every book published by Packt

Copy and paste, print, and bookmark content

On demand and accessible via a web browser

Free access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.

Instant updates on new Packt books

Get notified! Find out when new books are published by following @PacktEnterprise on Twitter or the Packt Enterprise Facebook page.

Preface

SAP Data Services delivers an enterprise-class solution to build data integration processes as well as perform data quality and data profiling tasks, allowing you to govern your data in a highly-efficient way.

Some of the tasks that Data Services helps accomplish include: migration of the data between databases or applications, extracting data from various source systems into flat files, data cleansing, data transformation using either common database-like functions or complex custom-built functions that are created using an internal scripting language, and of course, loading data into your data warehouse or external systems. SAP Data Services has an intuitive user-friendly graphical interface, allowing you to access all its powerful Extract, Transform, and Load (ETL) capabilities from the single Designer tool. However, getting started with SAP Data Services can be difficult, especially for people who have little or no experience in ETL development. The goal of this book is to guide you through easy-to-understand examples of building your own ETL architecture. The book can also be used as a reference to perform specific tasks as it provides real-world examples of using the tool to solve data integration problems.

What this book covers

Chapter 1, Introduction to ETL Development, explains what Extract, Transform, and Load (ETL) processes are, and what role Data Services plays in ETL development. It includes the steps to configure the database environment used in recipes of the book.

Chapter 2, Configuring the Data Services Environment, explains how to install and configure all Data Services components and applications. It introduces the Data Services development GUI—the Designer tool—with the simple example of Hello World ETL code.

Chapter 3, Data Services Basics – Data Types, Scripting Language, and Functions, introduces the reader to Data Services internal scripting language. It explains various categories of functions that are available in Data Services, and gives the reader an example of how scripting language can be used to create custom functions.

Chapter 4, Dataflow – Extract, Transform, and Load, introduces the most important processing unit in Data Service, dataflow object, and the most useful types of transformations that can be performed inside a dataflow. It gives the reader examples of extracting data from source systems and loading data into target data structures.

Chapter 5, Workflow – Controlling Execution Order, introduces another Data Services object, workflow, which is used to group other workflows, dataflows, and script objects into execution units. It explains the conditional and loop structures available in Data Services.

Chapter 6, Job – Building the ETL Architecture, brings the reader to the job object level and reviews the steps used in the development process to make a successful and robust ETL solution. It covers the monitoring and debugging functionality available in Data Services and embedded audit features.

Chapter 7, Validating and Cleansing Data, introduces the concepts of validating methods, which can be applied to the data passing through the ETL processes in order to cleanse and conform it according to the defined Data Quality standards.

Chapter 8, Optimizing ETL Performance, is one of the first advanced chapters, which starts explaining complex ETL development techniques. This particular chapter helps the user understand how the existing processes can be optimized further in Data Services in order to make sure that they run quickly and efficiently, consuming as less computer resources as possible with the least amount of execution time.

Chapter 9, Advanced Design Techniques, guides the reader through advanced data transformation techniques. It introduces concepts of Change Data Capture methods that are available in Data Services, pivoting transformations, and automatic recovery concepts.

Chapter 10, Developing Real-time Jobs, introduces the concept of nested structures and the transforms that work with nested structures. It covers the mains aspects of how they can be created and used in Data Services real-time jobs. It also introduces new a Data Services component—Access Server.

Chapter 11, Working with SAP Applications, is dedicated to the topic of reading and loading data from SAP systems with the example of the SAP ERP system. It presents the real-life use case of loading data into the SAP ERP system module.

Chapter 12, Introduction to Information Steward, covers another SAP product, Information Steward, which accompanies Data Services and provides a comprehensive view of the organization's data, and helps validate and cleanse it by applying Data Quality methods.

What you need for this book

To use the examples given in this book, you will need to download and make sure that you are licensed to use the following software products:

SQL Server Express 2012

SAP Data Services 4.2 SP4 or higher

SAP Information Steward 4.2 SP4 or higher

SAP ERP (ECC)

SoapUI—5.2.0

Who this book is for

The book will be useful to application developers and database administrators who want to get familiar with ETL development using SAP Data Services. It can also be useful to ETL developers or consultants who want to improve and extend their knowledge of this tool. The book can also be useful to data and business analysts who want to take a peek at the backend of BI development. The only requirement of this book is that you are familiar with the SQL language and general database concepts. Knowledge of any kind of programming language will be a benefit as well.

Sections

In this book, you will find several headings that appear frequently (Getting ready, How to do it, How it works, There's more, and See also).

To give clear instructions on how to complete a recipe, we use these sections as follows:

Getting ready

This section tells you what to expect in the recipe, and describes how to set up any software or any preliminary settings required for the recipe.

How to do it…

This section contains the steps required to follow the recipe.

How it works…

This section usually consists of a detailed explanation of what happened in the previous section.

There's more…

This section consists of additional information about the recipe in order to make the reader more knowledgeable about the recipe.

Conventions

In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: We can include other contexts through the use of the include directive.

A block of code is set as follows:

select *

from dbo.al_langtext txt

JOIN dbo.al_parent_child pc

on txt.parent_objid = pc.descen_obj_key

where

pc.descen_obj = 'WF_continuous';

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

AlGUIComment (ActaName_1 = 'RSavedAfterCheckOut', ActaName_2 = 'RDate_created', ActaName_3 = 'RDate_modified', ActaValue_1 = 'YES', ActaValue_2 = 'Sat Jul 04 16:52:33 2015', ActaValue_3 = 'Sun Jul 05 11:18:02 2015', x = '-1', y = '-1')

CREATE PLAN WF_continuous::'7bb26cd4-3e0c-412a-81f3-b5fdd687f507'( )

DECLARE

$l_Directory VARCHAR(255) ; $l_File VARCHAR(255) ;

BEGIN

AlGUIComment (UI_DATA_XML = '00216-1790-1902002000

IDATA>',

ui_display_name = 'script', ui_script_text = '$l_Directory = \'C:\\\\AW\\\\Files\\\\\'; $l_File = \'flag.txt\';

$g_count = $g_count + 1;

print(\'Execution #\'||$g_count); print(\'Starting \'||workflow_name()||\' ...\'); sleep(10000); print(\'Finishing \'||workflow_name()||\' ...\');'

, x = '116', y = '-175')

BEGIN_SCRIPT

$l_Directory = 'C:\\AW\\Files\\';$l_File = 'flag.txt';$g_count = ($g_count + 1);print(('Execution #' || $g_count));print((('Starting ' || workflow_name()) || ' ...'));sleep(10000);print((('Finishing ' || workflow_name()) || ' ...'));END

END

SET

(loop_exit = 'fn_check_flag($l_Directory, $l_File)', loop_exit _option = 'yes', restart_condition = 'no', restart_count = '10', restart_count_option = 'yes', workflow_type = 'Continuous')

Any command-line input or output is written as follows:

setup.exe SERVERINSTALL=Yes

New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: Open the workflow properties again to edit the continuous options using the Continuous Options tab.

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

To send us general feedback, simply e-mail <feedback@packtpub.com>, and mention the book's title in the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Downloading the color images of this book

We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from: https://www.packtpub.com/sites/default/files/downloads/6565EN_Graphics.pdf.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at <copyright@packtpub.com> with a link to the suspected pirated material.

We appreciate your help in protecting our authors and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this book, you can contact us at <questions@packtpub.com>, and we will do our best to address the problem.

Chapter 1. Introduction to ETL Development

In this chapter, we will cover:

Preparing a database environment

Creating a source system database

Defining and creating staging area structures

Creating a target data warehouse

Introduction

Simply put, Extract-Transform-Load (ETL) is an engine of any data warehouse. The nature of the ETL system is straightforward:

Extract data from operational databases/systems

Transform data according to the requirements of your data warehouse so that the different pieces of data can be used together

Apply data quality transformation methods in order to cleanse data and ensure that it is reliable before it gets loaded into a data warehouse

Load conformed data into a data warehouse so that end users can access it via reporting tools, using client applications directly, or with the help of SQL-based query tools

While

Enjoying the preview?

Page 1 of 1

SAP Data Services 4.x Cookbook

About this ebook

Shomnikov Ivan

Related authors

Related to SAP Data Services 4.x Cookbook

Related ebooks

Data Modeling & Design For You

Related podcast episodes

Related articles

Related categories

Reviews for SAP Data Services 4.x Cookbook

What did you think?

Book preview

SAP Data Services 4.x Cookbook - Shomnikov Ivan

Table of Contents

SAP Data Services 4.x Cookbook

SAP Data Services 4.x Cookbook

Credits

About the Author

About the Reviewers

Support files, eBooks, discount offers, and more

Why subscribe?

Preface

What this book covers

What you need for this book

Who this book is for

Sections

Getting ready

There's more…

See also

Conventions

Note

Tip

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

Chapter 1. Introduction to ETL Development

Introduction