Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Agile Data Warehousing Project Management: Business Intelligence Systems Using Scrum
Agile Data Warehousing Project Management: Business Intelligence Systems Using Scrum
Agile Data Warehousing Project Management: Business Intelligence Systems Using Scrum
Ebook691 pages8 hours

Agile Data Warehousing Project Management: Business Intelligence Systems Using Scrum

Rating: 0 out of 5 stars

()

Read preview

About this ebook

You have to make sense of enormous amounts of data, and while the notion of “agile data warehousing might sound tricky, it can yield as much as a 3-to-1 speed advantage while cutting project costs in half. Bring this highly effective technique to your organization with the wisdom of agile data warehousing expert Ralph Hughes.

Agile Data Warehousing Project Management will give you a thorough introduction to the method as you would practice it in the project room to build a serious “data mart. Regardless of where you are today, this step-by-step implementation guide will prepare you to join or even lead a team in visualizing, building, and validating a single component to an enterprise data warehouse.

  • Provides a thorough grounding on the mechanics of Scrum as well as practical advice on keeping your team on track
  • Includes strategies for getting accurate and actionable requirements from a team’s business partner
  • Revolutionary estimating techniques that make forecasting labor far more understandable and accurate
  • Demonstrates a blends of Agile methods to simplify team management and synchronize inputs across IT specialties
  • Enables you and your teams to start simple and progress steadily to world-class performance levels
LanguageEnglish
Release dateDec 28, 2012
ISBN9780123965172
Agile Data Warehousing Project Management: Business Intelligence Systems Using Scrum
Author

Ralph Hughes

Ralph Hughes, former DW/BI practice manager for a leading global systems integrator, has led numerous BI programs and projects for Fortune 500 companies in aerospace, government, telecom, and pharmaceuticals. A certified Scrum Master and a PMI Project Management Professional, he began developing an agile method for data warehouse 15 years ago, and was the first to publish books on the iterative solutions for business intelligence projects. He is a veteran trainer with the world's leading data warehouse institute and has instructed or coached over 1,000 BI professionals worldwide in the discipline of incremental delivery of large data management systems. A frequent keynote speaker at business intelligence and data management events, he serves as a judge on emerging technologies award panels and program advisory committees of advanced technology conferences. He holds BA and MA degrees from Stanford University where he studied computer modeling and econometric forecasting. A co-inventor of Zuzena, the automated testing engine for data warehouses, he serves as Chief Systems Architect for Ceregenics and consults on agile projects internationally.

Read more from Ralph Hughes

Related to Agile Data Warehousing Project Management

Related ebooks

Enterprise Applications For You

View More

Related articles

Reviews for Agile Data Warehousing Project Management

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Agile Data Warehousing Project Management - Ralph Hughes

    internationally.

    Part 1

    An Introduction to Iterative Development

    Chapter 1 What Is Agile Data Warehousing?

    Chapter 2 Iterative Development in a Nutshell

    Chapter 3 Streamlining Project Management

    Chapter 1

    What Is Agile Data Warehousing?

    How are agile approaches different than traditional methods?

    What does it take to make agile succeed for data warehousing projects?

    Where do we have to be careful with this new approach?

    Faster, better, cheaper. That’s the promise that agile methods have been delivering upon during the past decade for general application development. Although they are an increasingly popular style of programming for transaction-processing or data- capture applications, they have not been employed nearly as much for data warehousing/business intelligence (DWBI) applications. It usually takes a combination of project disasters and existential threats to inspire corporate DWBI departments to consider replacing their traditional development methods. Typically, the impetus to change begins after several projects seriously overrun their go-live dates and/or forecasted delivery budgets. Next, the project sponsors in the company’s business departments start to grumble. Our corporate data warehousing department is way too slow and far too expensive. Eventually, one end-user department finds a way to build an analytical application using an outside services vendor and reports to the other business leads that this new system required a fraction of corporate DWBI’s typical time and cost brackets. Should we send all of our business intelligence (BI) work to outside vendors? the business departments begin asking themselves. It does not matter to them that outside vendors trying to deliver fast frequently neglect to program the many hidden attributes needed to make a system manageable, scalable, and extensible over the long run. All the business sponsors saw was fast delivery of what they asked for. If the business units are frustrated over time and cost, corporate DWBI will survive only if it finds a way to accelerate its deliveries and lower programming expense, restoring its rapport with the customer. This type of predicament is exactly why corporate DWBI should consider agile data warehousing techniques.

    On the one hand, it is easy to understand why the popularity of agile for data warehousing lags 10 years behind its usage for general applications. It is hard to envision delivering any DWBI capabilities quickly. For data capture applications, creating a new element requires simply creating a column for it in the database and then dropping an entry field for it on the screen. To deliver a new warehousing attribute, however, a team has to create several distinct programs to extract, scrub, integrate, and dimensionalize the data sets containing the element before it can be placed on the end user’s console. Compared to the single transaction application challenge that agile methods originally focused on, data warehousing projects are trying to deliver a half-dozen new applications at once. They have too many architectural layers to manage for a team to update the data transform logic quickly in order to satisfy a program sponsor’s latest functional whim.

    On the other hand, data warehousing professionals need to be discussing agile methods intently, because every year more business intelligence departments large and small are experimenting with rapid delivery techniques for analytic and reporting applications. To succeed, they are adapting the generic agile approaches somewhat, but not beyond recognition. These adaptations make the resulting methods one notch more complex than agile for transaction-capture systems, but they are no less effective. In practice, agile methods applied properly to large data integration and information visualization projects have lowered the development hours needed and driven coding defects to zero. All this is accomplished while placing a steady stream of new features before the development team’s business partner. By saving the customer time and money while steadily delivering increments of business value, agile methods for BI projects go a long way toward solving the challenges many DWBI departments have with pleasing their business customers.

    For those readers who are new to agile concepts, this chapter begins with a sketch of the method to be followed throughout most of this book. The next sections provide a high-level contrast between traditional development methods and the agile approach, and a listing of the key innovative techniques that give agile methods much of their delivery speed. After surveying evidence that agile methods accelerate general application development, the presentation introduces a key set of adaptations that will make agile a productive approach for data warehousing. Next, the chapter outlines two fundamental challenges unique to data warehousing that any development method must address in order to succeed. It then closes with a guide to the remainder of the book and a second volume that will follow it.

    A quick peek at an agile method

    The practice of agile data warehousing is the application of several styles of iterative and incremental development to the specific challenges of integrating and presenting data for analytics and decision support. By adopting techniques such as colocating programmers together in a single workspace and embedding a business representative in the team to guide them, companies can build DWBI applications without a large portion of the time-consuming procedures and artifacts typically required by formal software development methods. Working intently on deliverables without investing time in a full suite of formal specifications necessarily requires that developers focus only on a few deliverables at time. Building only small pieces at a time, in turn, repeats the delivery process many times. These repeated deliveries of small scopes place agile methods in the category of iterative and incremental development methods for project management.

    When following agile methods, DWBI developers essentially roll up their sleeves and work like they have only a few weeks before the system is due. They concentrate on the most important features first and perform only those activities that directly generate fully shippable code, thus realizing a tremendous boost in delivery speed. Achieving breakthrough programming speeds on a BI project will require developers to work differently than most of them are trained, including the way they define requirements, estimate work, design and code their systems, and communicate results to stakeholders, plus the way they test and document the resulting system modules. To make iterative and incremental delivery work, they will also need to change the physical environment in which they work and the role of the project manager. Most traditional DWBI departments will find these changes disorienting for a while, but their disruption will be more than compensated for by the increased programmer productivity they unleash.

    Depending on how one counts, there are at least a dozen agile development styles to choose from (see sidebar). They differ by the level of ongoing ceremonies they follow during development and the amount of project planning they invest in before coding begins. By far the most popular flavor of agile is Scrum, first introduced in 1995 by Dr. Jeff Sutherland and Ken Schwaber. [Schwaber 2004] Scrum involves a small amount of both ceremony and planning, making it fast for teams to learn and easy for them to follow dependably. It has many other advantages, among them being that it

    • Adroitly organizes a team of 6 to 10 developers

    • Intuitively synchronizes coding efforts with repeated time boxes

    • Embeds a business partner in the team to maximize customer engagement

    • Appeals to business partners with its lightweight requirements artifacts

    • Double estimates the work for accuracy using two units of measure

    • Forecasts overall project duration and cost when necessary

    • Includes regular self-optimizing efforts in every time box

    • Readily absorbs techniques from other methods

    Agile Development Methods

    Scrum has such mindshare that, unless one clarifies he is speaking of another approach, Scrum is generally assumed to be the base method whenever one says agile. Even if that assumption is right, however, the listener still has to interpret the situation with care. Scrum teams are constantly optimizing their practices and borrowing techniques from other sources so that they all quickly arrive at their own particular development method. Over time Scrum teams can vary their practice considerably, to the point of even dropping a key component or two such as the time box. Given this diversity in implementations, this book refers to Scrum when speaking of the precise method as defined by Sutherland and Schwaber. It employs the more general term agile when the context involves an ongoing project that may well have started with Scrum but then customized the method to better meet the situation at hand.

    Figure 1.1 depicts the simple, five-step structure of an iteration with which Scrum teams build their applications. A team of 6 to 10 individuals—including an embedded partner from the customer organization that will own the applications—repeats this cycle every 2 to 8 weeks. The next chapter presents the iteration cycle in detail. Here, the objective is to provide the reader with enough understanding of an agile approach to contrast it with a traditional method.

    Figure 1.1 Structure of Scrum development iteration and duration of its phases.

    As shown in Figure 1.1, a list of requirements drives the Scrum process. Typically this list is described as a backlog of user stories. User stories are single sentences authored by the business stating one of their functional needs. The embedded business partner owns this list, keeping it sorted by each story’s importance to the business. With this backlog available, Scrum teams repeatedly pull from the top as many stories as they can manage in one time box, turning them into shippable software modules that satisfy the stated needs. In practice, a minority of the stories on a backlog include nonfunctional features, often stipulated for the application by the project architect. These architectural stories call for reusable submodules and features supporting quality attributes such as performance and scalability. Scrum does not provide a lot of guidance on where the original backlog of stories comes from. For that reason, project planners need to situate the Scrum development process in a larger project life cycle that will provide important engineering and project management notions such as scope and funding, as well as data and process architecture.

    The standard development iteration begins with a story conference where the developers use a top-down estimating technique using what are called story points to identify the handful of user stories at the top of the project’s backlog that they can convert into shippable code during the iteration.

    Next, the team performs task planning where it decomposes the targeted user stories into development tasks, this time estimating the work bottom-up in terms of labor hours in order to confirm that they have not taken on too much work for one iteration.

    After confirming they have targeted just the right amount of work, the teammates now dive into the development phase, where they are asked to self-organize and create over the next couple of weeks the promised enhancement to the application, working in the most productive way they can devise. The primary ceremony that Scrum places upon them during this phase is that they check in with each other in the morning via a short stand-up meeting, that is, it asks them to hold a daily scrum.

    At the end of the cycle, the team conducts a user demo where the business partner on the team operates the portions of the application that the developers have just completed, often with other business stakeholders looking on. For data integration projects that have not delivered the information yet to a presentation layer, the team will typically provide a simple front end (perhaps a quickly built, provisional BI module) so that the business partner can independently explore the newly loaded data tables. The business partner evaluates the enhanced application by considering each user story targeted during the story conference, deciding whether the team has delivered the functionality requested.

    Finally, before beginning the cycle anew, the developers meet for a sprint retrospective, where they discuss the good and bad aspects of the development cycle they just completed and brainstorm new ways to work together during the next cycle in order to smooth out any rough spots they may have encountered.

    At this point, the team is ready to start another cycle. These iterations progress as long as there are user stories on the project’s backlog and the sponsors continue funding the project. During an iteration’s development phase, the team’s embedded business partner may well have reshuffled the order of the stories in the backlog, added some new one, and even discarded others. Such requirements churn does not bother the developers because they are always working within the near-planning horizon defined by the iteration’s time box. Because Scrum has the developers constantly focused on only the top of the backlog, the business can steer the team in a completely new direction every few weeks, heading to wherever the project needs to go next. Such flexibility often makes business partners very fond of Scrum because it allows the developers from the information technology (IT) department to become very flexible and responsive.

    The disappointment cycle of many traditional projects

    In contrast to Scrum’s iterative approach to delivering systems, traditional software engineering operates on a single-pass model. The most widely cited definition of this approach can be found in the first half of a 1970 white paper entitled Managing the Development of Large Software Systems by a TRW researcher named Dr. Winston Royce. [Royce 1970] This paper has been commonly interpreted to suggest that, in order to avoid conceptual errors and extensive reprogramming of an application, all requirements should be gathered before design begins, all design should be completed before programmers begin coding, and the bulk of coding should be completed before serious testing can get underway. In this process, each work phase should fill up with specifications before that information spills over into the next phase—a notion that led many to call this approach a cascade or a waterfall method.

    Many people describe this waterfall process as the big design up-front strategy because it requires enormous design specifications to be drafted and approved before programming work can begin. [Ambler 2011] It has also been called a plan-driven or command and control approach because the big design results in enormous project plans, with possibly thousands of separate tasks, that project managers use to drive the daily activities of the development teams they command. A further name for this style of organizing development is the big bang approach because all the value is theoretically dropped upon end users at the conclusion of the project.

    Waterfall-style project organization can seem to be a safe approach for large applications, especially those with multiple, intersecting data layers found in data warehouses, because the engineers supposedly think out all aspects of the project thoroughly ahead of time. Scrum, however, simply takes a few requirements off the top of a list and converts them into code before the next set of features is even considered. In contrast to a waterfall method, Scrum can seem very tactical and ad hoc.

    For these reasons, when software professionals first learn of agile, they often decide that a waterfall method must be far more robust. Such conclusions are ironic because waterfall methods have had over 40 years to prove themselves, but statistics show that they struggle to deliver applications reliably. The Standish Group’s seminal Chaos reports detailed the software industry’s track record in delivering large systems using traditional methods. [Standish Group 1999] After surveying the results of 8380 projects conducted by 365 major America companies, results revealed that even small projects below $750,000 were unable to deliver applications on time, on budget, and with all the promised features more than 55% of the time. As the size of the applications grew, the success rate fell steadily to 25% for efforts over $3M and down to zero for projects over $10M (1999 dollars).

    Data warehousing projects fall easily in the middle to upper reaches of the range documented in the Standish study. Not surprisingly, they, too, have demonstrated trouble reliably delivering value under traditional project management methods. A survey performed in 1994—before agile methods were common—by the data management industry’s primary trade magazine revealed that its readers’ data warehouse projects averaged above $12M in cost and failed 65% of the time. [Cited in Ericson 2006] Such statistics do not indicate that every waterfall-based data warehousing project is destined to fail. However, if the approach was as robust as people often assume, plan-driven, big-bang project methods should have achieved a much higher success rate in the 40 years since Royce’s paper first defined the approach.

    Unfortunately, there is a good reason to believe that waterfalls will remain a very risky manner in building large systems: the specifications flowing into any one of its phases will always contain flaws. Being only human, the individuals preparing these enormous artifacts will have less than perfect foresight, especially for companies situated in a world market that is constantly changing. Moreover, in this age of increasing global competition, the engineers producing these specifications are also frequently overworked and given too little time to thoroughly research a project’s problem domain. In this reality, developers working from these specifications will steadily encounter the unexpected. Lamentably, waterfall methods contain no back step that allows the developers to substantially revisit requirements if they encounter a major oversight while coding. Testing, compressed to fit into the very last phase of development, cannot call for a major reconsideration of a system’s design. With plan-driven project management, the team is locked into a schedule. They must hurriedly span the gaps between their specifications and the actual situation because, according to plan, work must move steadily downstream so that the project meets its promised delivery date.

    Rather than mitigating project risk, the waterfall approach actually inflates it. Aiming for a big design up front, waterfall processes are heavy with documentation and encumbered with numerous reviews in an effort to avoid oversights. However, large specification documents acquire their own inertia by virtue of the long procedures needed to update and reapprove them. With this inertia, errors committed during requirements or design phases get baked into the application because it is either too expensive or too late to substantively rework specifications when flaws are discovered in them. The phases cannot validate nor contribute to one another’s success, making each phase a single point of failure. The entire project will either succeed or fail based on the quality of the requirements package taken in isolation, then the quality of the design taken alone, and then finally the quality of the coding as a whole. Because little flaws cannot be simply corrected when they are discovered, their consequences get amplified to where any one of them can threaten the entire undertaking.

    In this light, an iterative approach mitigates risk in an important way. Scrum validates the engineering of each small slice of the project by pushing it through requirements, design, coding, and testing before work on the next set of stories from the backlog begins. Errors detected in any of these engineering steps can provide important guidance on how the team should pursue the remainder of the project. Scrum practitioners note that the method pushes errors to the surface early in the process while there is still time to correct assumptions and the foundations of the project’s coding. (See, for example, [Vingrys 2011].) In contrast to the tendency of a waterfall to allow errors to undermine a whole project, the agile approach works to contain failures to just one project slice at a time.

    Being large undertakings by nature, data warehousing initiatives pursued with a single-delivery paradigm demonstrate waterfall’s whole project failure pattern all too often. A quick sampling of horror stories shared with the author by prospective customers seeking a better approach to building business intelligence systems illustrates the point

    Enjoying the preview?
    Page 1 of 1