Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

A Guide to Db2 Performance for Application Developers: Code for Performance from the Beginning
A Guide to Db2 Performance for Application Developers: Code for Performance from the Beginning
A Guide to Db2 Performance for Application Developers: Code for Performance from the Beginning
Ebook339 pages5 hours

A Guide to Db2 Performance for Application Developers: Code for Performance from the Beginning

Rating: 0 out of 5 stars

()

Read preview

About this ebook

A Guide to Db2 Performance for Application Developers will make you a better programmer by teaching you how to write more efficient application code to access Db2 databases. Whether you write applications on the mainframe or distributed systems, this book will teach you practices, methods, and techniques for optimizing your SQL and applications as you build them.

Write efficient applications and become your DBA's favorite developer by learning the techniques outlined in this book!
LanguageEnglish
PublisherBookBaby
Release dateOct 1, 2018
ISBN9781543949162
A Guide to Db2 Performance for Application Developers: Code for Performance from the Beginning

Related to A Guide to Db2 Performance for Application Developers

Related ebooks

Databases For You

View More

Related articles

Reviews for A Guide to Db2 Performance for Application Developers

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    A Guide to Db2 Performance for Application Developers - Craig S. Mullins

    Resources

    Chapter 1

    Db2 Performance Essentials

    We must first understand what performance means before discussing optimal development techniques

    If you work with Db2 (or indeed, any DBMS), you most likely deal with performance related issues. But have you ever stopped for a moment and tried to define what database performance means? Doing so can be a worthwhile exercise to help organize your thinking and behavior.

    Defining Database Performance

    To better understand the concept of database performance, consider the familiar concepts of supply and demand. Users demand information from Db2 and Db2 supplies information to those requesting it. The rate at which Db2 supplies the demand for information can be loosely thought of as database performance.

    But let’s investigate at a deeper level. There are five factors that influence database performance: workload, throughput, resources, optimization, and contention.

    The workload requested of the DBMS defines the demand. It is a combination of transactions, web requests, batch jobs, ad hoc queries, business intelligence queries, analytics requests, utilities, and system commands directed through the DBMS at any given time. Workload can fluctuate drastically from day to day, hour to hour, minute to minute, and even second to second. Sometimes workload can be predicted (such as heavy month-end payroll processing, or very light access after 6 pm, when most users have left for the day). At other times it is unpredictable. The overall workload can have a major impact on database performance.

    Throughput defines the overall capability of the computer to process data. It is a composite of I/O speed, CPU speed, parallel capabilities, and the efficiency of the operating system and system software. Do not just base your throughput assumptions on hardware capacity figures (e.g., MHz for Wintel boxes, MSUs and MIPS for mainframes). Perhaps you have installed hard or soft capping on your box, which can impact throughput. And don't forget about those specialty processors (such as zIIPs) if you program on IBM mainframes.

    The hardware and software tools at the system’s disposal are called the resources of the system. Examples include memory (such as that allocated to buffer pools or address spaces), disk, cache controllers, and microcode.

    The fourth defining element of database performance is optimization. All types of systems can be optimized, but relational database systems are unique in that query optimization is primarily accomplished internal to the DBMS. Ensuring that you have provided up-to-date and accurate database statistics for the query optimizer is of the utmost importance in achieving optimal SQL queries. Keep in mind, though, that there are other factors that need to be optimized (SQL, database parameters, system parameters, etc.) to enable the relational optimizer to create the most efficient access paths. And there are optimization aspects that are outside the scope and control of the relational optimizer, too, such as efficient program and script coding, proper application design, coding efficient utility options, and so on.

    When the demand (workload) for a resource is high, contention can result. Contention is the condition in which two or more components of the workload are attempting to use a single resource simultaneously in a conflicting way (for example, dual updates to the same piece of data). When one program tries to read data that is in the process of being changed by another, the DBMS must prohibit access until the modification is complete to ensure the integrity and accuracy of that data. Db2 uses a locking mechanism to enable multiple, concurrent users to access and modify data in the database. Using locks, Db2 automatically guarantees the integrity of data (at least in terms of what you have requested). DBMS locking strategies permit multiple users from multiple environments to access and modify data in the database at the same time. As contention increases, locks are taken and throughput can decrease.

    So, putting these factors together: the definition of database performance is the optimization of resource usage to maximize throughput, minimize contention, and process the largest possible workload.

    In addition, database applications regularly communicate with other system software, which must also be factored into performance planning. Many factors influence not only the performance of the DBMS and applications accessing its databases, but also the performance of the other system components (e.g., transactions processor, network software, application servers, etc.)

    The Three Aspects of Database Performance

    Now that we understand the high-level definition of database performance let’s discuss the three aspects of database systems where performance must be managed:

    •     the application,

    •     the database structures, and

    •     the system.

    The Application

    Application code must be designed and coded appropriately and efficiently. Many performance problems are caused by improperly coded applications. SQL is the primary culprit; coding efficient SQL statements can be complicated. Developers need to be taught how to properly formulate, monitor, and tune SQL statements.

    Not all application problems are due to improperly coded SQL. The host language application code in which the SQL has been embedded may be causing the problem. For example, Java, COBOL, C++, Python, and many others supported by Db2. The host language code may be inefficient, causing database application performance to suffer.

    Techniques for improving your host language and SQL coding is the primary focus of this book.

    Database Structures

    The physical design of your database structures can also have a significant impact on performance. Important factors in this category include normalization, disk storage, number of tables, index design, and use of DDL and its associated parameters.

    Design is not the only component of database performance. The organization of the database will change over time. As data is inserted, updated, and deleted from the database, the efficiency of the database will degrade. Moreover, the files that hold the data may need to expand as more data is added. Perhaps additional files, or file extents, will need to be allocated. Both disorganization and file growth can degrade performance.

    Indexes also need to be monitored, analyzed, and tuned to optimize data access and to ensure that they are not having a negative impact on data modification.

    It is important to understand these issues but controlling and managing the performance of database structures is a task for DBAs, not application programmers and developers.

    The System

    System tuning occurs at the highest level and has the greatest impact on the overall health of database applications because every application depends on the system. For the purposes of this discussion, we will define the system as comprising the DBMS itself and all the related components on which it relies. No amount of tuning is going to help a database or application when the server it is running on is short on resources or improperly installed.

    The Db2 system and its environment can and must be tuned to assure optimum performance. The way in which the DBMS software is installed, its memory, disk, CPU, other resources, and any configuration options can impact database application performance.

    The other systems software with which the DBMS interacts includes the operating system, networking software, message queueing systems, middleware, and transaction processors. System tuning comprises installation, configuration, and integration issues, as well as ensuring connectivity of the software to the DBMS and database applications.

    Again, though it is important to acknowledge and understand the importance of tuning the DBMS system, this task typically falls to a system administrator or DBA, not the application development staff.

    Three Primary Performance Indicators

    When it all boils down to it, though, there are three primary things that impact the performance of Db2 applications: CPU, I/O, and concurrency.

    Figure 1. The Three Primary Performance Indicators

    CPU is the amount of machine processor power that is consumed to perform an activity. Simply activities generally consume less CPU than more complex activities. The fewer CPU cycles required to achieve a task, the more efficient the process will be.

    I/O involves reading data from and writing data to disk. Disk drives are mechanical and therefore it takes time for the physical parts to move to the proper location to retrieve or write the requested data. This latency – the time between when the data is requested and when it is obtained – contributes to performance degradation. Therefore, the fewer disk reads required to achieve a task the better performance will be.

    Concurrency is the ability to perform more than one task at the same time. In the context of database processing, concurrency requires a lock manager. The lock manager controls which processes are accessing and updating which pieces of data. This allows multiple tasks to all access the same database table at the same time without corrupting the data. The more tasks that can be accomplished in the same time window the better throughput will be.

    I/O and CPU are computing resources to be optimized. The transactions and SQL of your applications constitute the workload that needs to be processed. Throughput defines the capacity of the system to process workload. When the demand (workload) for a resource is high, contention can result. Contention is the condition in which two or more components of the workload are attempting to use a single resource at the same time in a conflicting way.

    As Figure 1 shows, the goal is for your programs to minimize the amount of CPU and I/O they consume, while also maximizing the concurrency of all programs. These concepts are somewhat inter-related, too. When the same amount of work is performed using less I/O, CPU savings occur. This is so because there are many system level processes required to perform an I/O operation. Furthermore, note that how you code your programs will impact the concurrency of other programs, so just like grade school it is important to write code that works well with others.

    Keep these high-level concepts in mind as you read through the rest of this book. We will regularly reference these three performance indicators as we discuss the various aspects of efficient Db2 application development.

    The Rest of the Book

    As you progress through the rest of this book there will be three broad sections.

    In the first section, general application development guidelines and techniques will be presented. For these chapters, very little actual code will be addressed, instead the content focuses on approaches, mindsets and philosophies of development that result in efficient Db2 applications.

    The second section moves into discussing data access methods and techniques and how Db2 accomplishes them. It will give guidance on how many types of application development requirements translate into Db2 and how to tackle them effectively.

    Finally, the third part will give guidance on Db2 SQL development for performance. There will be a lot of SQL code and examples in this section.

    When you understand each of these areas you will be well on your way to becoming a Db2 application developer that writes efficient programs… and that earns the respect of your peers and managers.

    Chapter 2

    Code Relationally

    It is necessary to develop a relational mindset to become a successful Db2 programmer who codes with performance in mind. But what does this mean? First, we must understand what a relational database system is and how that differs from other types of data storage.

    What is a Database?

    Before we talk about relational database, let’s first answer the question: What is a database? A database is a large structured set of persistent data. So, a phone book is a database. But within the world of IT, a database usually is associated with software. A simple database might be a single file containing many records, each of which contains the same fields having the same data type and length. In short, a database is an organized store of data wherein the data is accessible by named data elements.

    A Database Management System (DBMS) is a software package designed to create, store, and manage databases. The DBMS software enables end users or application programmers to share data. It provides a systematic method of creating, updating, retrieving and storing information in a database. DBMSs are generally responsible for data integrity, data access control, and automated rollback, restart and recovery.

    In layman’s terms, you can think of a database as a file of information. You can think of the filing cabinet itself along with the file folders and labels as the DBMS. A DBMS manages databases. You implement and access database instances using the capabilities of the DBMS.

    Db2 is a database management system. Your payroll application uses the payroll database, which may be implemented using Db2 (or some other DBMS). It is important to understand this distinction to avoid confusion as we move forward.

    Relational Database Systems

    Relational database systems became the norm in IT in the 1980s as low-cost servers became powerful enough to make them widely practical and relatively affordable. There are other types of database systems available (such as NoSQL, hierarchical, and network) but the RDBMS, of which Db2 is one of the leading offerings, continues to be the leader in terms of usage, revenue, and installed base.

    Relational technology is based on the mathematics of set theory. Relational databases provide data storage, access and protection with reasonable performance for most applications, whether operational or analytical in nature. The RDBMS is adaptable to most use cases in a reliable and efficient way.

    The term relational comes from the mathematical term relation. In set theory, a relation is a set of unordered elements — all of the same type. A relational DBMS is based on relations. This overview of relational theory offers a quick introduction.

    It is important to note that today’s database systems that are referred to as relational do not conform to all the requirements and definition of relational theory. For additional references that can offer more details, consult the Bibliography at the end of this chapter.

    How to Think About Data in a Db2 Database

    Working with a relational mindset is an important requirement for writing efficient Db2 application programs. Doing so requires an understanding of how data is stored and referenced by a relational DBMS like Db2.

    A database is a complex set of inter-related data designed for a specific intent. Do not think of the database as a set of files. Files have no relationships set within and among them, whereas your database does.

    Furthermore, do not think of tables as files because they are based on sets: Sets are not ordered whereas files have a specific order to them. Although there are performance-specific physical storage details that are important to learn (and we will cover them later) your relational mindset should be that tables are unordered sets of data. And members of each set are all of the same type. That means that each row has the same number of columns each of the same data type.

    When you perform an operation on a set, the action happens all at once to all the members of the set. Programmers tend to think in terms of sequential operations such as: Read x, multiply it by 2, save it to a new location, read another x until there are no more. We can accomplish all of this in one SQL statement with something like this:

    There is no looping and all the actions are contained in the single SQL statement. It is imperative to be able to think this way to write Db2 applications programs that perform well.

    Additionally, rows are not records. Records contained in files or data sets are sequential, stored in the order they were written. Db2 rows have no specific physical order and can be accessed by coding the appropriate SQL WHERE clauses.

    Finally, columns are not fields. Columns are typed and can be null. This is not so for fields. Without a program, a field has no meaning.

    How This Should Impact Your Coding

    Application developers accustomed to processing data a record-at-a-time will make very poor Db2 programmers without some training in the set-at-a-time nature of accessing data in a relational database.

    If you have experience programming with flat files you must unlearn the flat file mentality. Forget sequentially accessing data record by record. Access what you need using the features of SQL.

    Master file processing is not appropriate for optimal Db2 applications. With master file processing two or more files are read with one driving reads to the other. For example, consider a program designed to send offers to all customers who purchased dairy items in November.

    The master file approach would read the customer purchase history file looking for dairy items purchased in November. When it finds one it will take the customer id read from the history file and use it to read from the customer file to gather the customer address.

    The SQL approach simply joins the two tables (customer purchase history and customer) using the customer id with where conditions to limit the output to dairy items in November. Here is what the SQL solution looks like:

    Another aspect of coding relationally is to understand cursors. Remember, a SQL select statement can return multiple rows. A cursor is used to enable application programs to access individual rows. The select statement is assigned to a cursor, which is opened by the program, and then rows are fetched from the cursor one by one. For example:

    When you open a cursor in your program to process a SQL statement it is not the same as opening a file. Opening a cursor can cause a lot of activity to occur (e.g. sorting) whereas opening a file is a benign operation.

    Set-at-a-Time Processing and Relational Closure

    Every operation performed on a relational database operates on a table (or set of tables) and results in another table. This feature of relational databases is called relational closure.

    All SQL data manipulation operations—Select, Insert, Update, Delete—are performed at a set level. One retrieval statement can return multiple rows; one modification can modify multiple rows.

    Application developers must learn a different way of interacting with data because of the set-at-a-time nature of SQL. Most programming languages operate on data one record-at-a-time. When a program requires relational data, though, it must request the data using SQL, creating an impedance mismatch. The program expects data to be returned a single row-at-a-time, but SQL returns data a set-at-a-time. Db2 provides a feature called a cursor that accepts the input from a SQL request and provides a mechanism to fetch individual rows of the results set. Some programming tools automatically transform multi-row sets to single rows when communicating with Db2.

    Furthermore, many programmers are accustomed to hard-wiring data-navigational instructions into their programs. SQL specifies what to retrieve but not how to retrieve it. Db2 determines how best to retrieve the data based on the request. Programmers unaccustomed to database processing are unlikely to grasp this concept without some training.

    Relational Optimization

    Db2 determines the best method for accessing and modifying data based on your SQL statements, information about your system and database statistics. The same SQL statement can be optimized by Db2 to do the same work in many ways. This is a key benefit of using SQL instead of writing host language code.

    There is a component of Db2, known as the Optimizer, that processes SQL and creates executable code for it. The Optimizer is very complex and understanding all the nuances of how it works is not something most Db2 programmers need to know. There are important aspects of optimization, such as types of access paths, filter factors, and indexing, that will be covered later in this text.

    Let Db2 Do the Work

    An important guideline for coding relationally is to let Db2 do as much work as possible by coding as many of your requirements into the SQL as you can. The more work that can be done in Db2, the more efficient your program will tend to be.

    Why is this so? Remember from Chapter 1 the three components of performance: I/O, CPU and concurrency. The more work that Db2 can do without moving data to your program the more we can reduce I/O operations. Many programmers without Db2 experience tend to revert to their earlier programming practices.

    For example, instead of coding appropriate SQL WHERE clauses a novice programmer may

    Enjoying the preview?
    Page 1 of 1