IBM InfoSphere: A Platform for Big Data Governance and Process Data Governance
By Sunil Soares
2/5
()
About this ebook
Written by a leading expert in the field of information governance, this title is the only book available that focuses specifically on IBM solutions and the convergence of two major trends in information management today: big data and information governance. Nontechnical in nature and geared toward business audiences responsible for the analytics of big data, the book provides a big data governance framework with three dimensions: big data types, information governance disciplines, and industries and functions. Practitioners can use the knowledge presented in this book to understand big data support across the IBM InfoSphere portfolio, including DataStage, Streams, QualityStage, MDM, Guardium, Optim, and Data Explorer. They will also learn about the integration between IBM Business Process Manager Express and IBM InfoSphere in terms of integrating information governance, big data, and business process management.
Read more from Sunil Soares
Selling Information Governance to the Business: Best Practices by Industry and Job Function Rating: 0 out of 5 stars0 ratingsThe IBM Data Governance Unified Process: Driving Business Value with IBM Software and Best Practices Rating: 4 out of 5 stars4/5
Related to IBM InfoSphere
Related ebooks
Business Metadata: Capturing Enterprise Knowledge Rating: 4 out of 5 stars4/5The Data Governance Imperative Rating: 0 out of 5 stars0 ratingsBuilding a Scalable Data Warehouse with Data Vault 2.0 Rating: 4 out of 5 stars4/5Modern Enterprise Business Intelligence and Data Management: A Roadmap for IT Directors, Managers, and Architects Rating: 0 out of 5 stars0 ratingsMaster Data Management Rating: 0 out of 5 stars0 ratingsMaster Data Management Rating: 4 out of 5 stars4/5Learn Data Warehousing in 24 Hours Rating: 0 out of 5 stars0 ratingsUnderstanding Big Data: A Beginners Guide to Data Science & the Business Applications Rating: 4 out of 5 stars4/5Corporate Data Quality: Prerequisite for Successful Business Models Rating: 0 out of 5 stars0 ratingsBig Data: Understanding How Data Powers Big Business Rating: 2 out of 5 stars2/5The Case for the Chief Data Officer: Recasting the C-Suite to Leverage Your Most Valuable Asset Rating: 4 out of 5 stars4/5Business Intelligence Guidebook: From Data Integration to Analytics Rating: 4 out of 5 stars4/5Spreadsheets To Cubes (Advanced Data Analytics for Small Medium Business): Data Science Rating: 0 out of 5 stars0 ratingsBig Data for Enterprise Architects Rating: 5 out of 5 stars5/5The Data Warehouse Lifecycle Toolkit Rating: 0 out of 5 stars0 ratingsBig Data Analytics: Disruptive Technologies for Changing the Game Rating: 4 out of 5 stars4/5Managing Data in Motion: Data Integration Best Practice Techniques and Technologies Rating: 0 out of 5 stars0 ratingsBusiness Intelligence Strategy and Big Data Analytics: A General Management Perspective Rating: 5 out of 5 stars5/5Big Data Visualization Rating: 0 out of 5 stars0 ratingsBI and Big Data Management Rating: 0 out of 5 stars0 ratingsLeaders and Innovators: How Data-Driven Organizations Are Winning with Analytics Rating: 1 out of 5 stars1/5Big Data: Opportunities and challenges Rating: 0 out of 5 stars0 ratingsPragmatic Enterprise Architecture: Strategies to Transform Information Systems in the Era of Big Data Rating: 0 out of 5 stars0 ratingsAnalytics: The Agile Way Rating: 5 out of 5 stars5/5Data Governance A Complete Guide - 2021 Edition Rating: 0 out of 5 stars0 ratingsMaster data management Complete Self-Assessment Guide Rating: 0 out of 5 stars0 ratingsMDM And Data Governance A Complete Guide - 2021 Edition Rating: 0 out of 5 stars0 ratingsMDM for Customer Data: Optimizing Customer Centric Management of Your Business Rating: 0 out of 5 stars0 ratingsCustomer Data Complete Self-Assessment Guide Rating: 0 out of 5 stars0 ratings
Information Technology For You
Health Informatics: Practical Guide Rating: 0 out of 5 stars0 ratingsCompTIA ITF+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Exam FC0-U61 Rating: 0 out of 5 stars0 ratingsHow To Use Chatgpt: Using Chatgpt To Make Money Online Has Never Been This Simple Rating: 0 out of 5 stars0 ratingsWordPress Plugin Development: Beginner's Guide Rating: 0 out of 5 stars0 ratingsWindows Registry Forensics: Advanced Digital Forensic Analysis of the Windows Registry Rating: 4 out of 5 stars4/5How to Write Effective Emails at Work Rating: 4 out of 5 stars4/5AWS Certified Cloud Practitioner: Study Guide with Practice Questions and Labs Rating: 5 out of 5 stars5/5The Ultimate Guide to Landing a Network Engineering Job Rating: 0 out of 5 stars0 ratingsComputer Science: A Concise Introduction Rating: 4 out of 5 stars4/5Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates Rating: 4 out of 5 stars4/5Practical Ethical Hacking from Scratch Rating: 5 out of 5 stars5/5Data Analytics for Beginners: Introduction to Data Analytics Rating: 4 out of 5 stars4/5An Ultimate Guide to Kali Linux for Beginners Rating: 3 out of 5 stars3/5Hacking Essentials - The Beginner's Guide To Ethical Hacking And Penetration Testing Rating: 3 out of 5 stars3/5The Basics of Hacking and Penetration Testing: Ethical Hacking and Penetration Testing Made Easy Rating: 4 out of 5 stars4/5Inkscape Beginner’s Guide Rating: 5 out of 5 stars5/5The Certified Fintech Professional Rating: 5 out of 5 stars5/5The Programmer's Brain: What every programmer needs to know about cognition Rating: 5 out of 5 stars5/5DNS in Action Rating: 0 out of 5 stars0 ratingsQuantum Computing for Programmers and Investors: with full implementation of algorithms in C Rating: 5 out of 5 stars5/5Summary of Super-Intelligence From Nick Bostrom Rating: 5 out of 5 stars5/5CompTIA Network+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Exam N10-008 Rating: 0 out of 5 stars0 ratingsA Civic Technologist's Practice Guide Rating: 0 out of 5 stars0 ratingsSupercommunicator: Explaining the Complicated So Anyone Can Understand Rating: 3 out of 5 stars3/5Computer Organization and Design: The Hardware / Software Interface Rating: 4 out of 5 stars4/5ChatGPT: The Future of Intelligent Conversation Rating: 4 out of 5 stars4/5CompTIA A+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Core 1 Exam 220-1101 Rating: 0 out of 5 stars0 ratingsCybersecurity for Beginners : Learn the Fundamentals of Cybersecurity in an Easy, Step-by-Step Guide: 1 Rating: 0 out of 5 stars0 ratings
Reviews for IBM InfoSphere
1 rating0 reviews
Book preview
IBM InfoSphere - Sunil Soares
Scripts
CHAPTER 1
AN INTRODUCTION TO BIG DATA GOVERNANCE
We are drowning in data today. This data comes from social media, telephone GPS signals, utility smart meters, RFID tags, digital pictures, and online videos, among other sources. IDC estimates that the amount of information in the digital universe exceeded 1.8 zettabytes (1.8 trillion gigabytes) in 2011 and is doubling every two years.¹ Much of this data can be characterized as big data. Big data is generally referred to in the context of the three Vs
—volume, velocity, and variety. We add another V
for value. Let’s consider each of these terms:
Volume (data at rest)—Big data is generally large. Enterprises are awash with data, easily amassing terabytes and petabytes of information, and even zettabytes in the future.
Velocity (data in motion)—Often time-sensitive, streaming data must be analyzed with millisecond response times to bolster real-time decisions.
Variety (data in many formats)—Big data includes structured, semi-structured, and unstructured data such as email, audio, video, clickstreams, log files, and biometrics.
Value (cost effectiveness)—Organizations are looking to gain insights from big data in a cost-effective manner. This is where open-source technologies such as Apache Hadoop have become extremely popular. Hadoop is software that processes large data sets across clusters of hundreds or thousands of computers in a cost-effective way.
Organizations must govern all this big data, which brings us to the subject of this book. We define big data governance as follows:
Big data governance is part of a broader information governance program that formulates policy relating to the optimization, privacy, and monetization of big data by aligning the objectives of multiple functions.
Let’s decompose this definition into its main parts:
Big data is part of a broader information governance program. Information governance organizations should incorporate big data into their existing information governance frameworks by doing the following:
Extend the scope of the information governance charter to include big data governance.
Broaden the membership of the information governance council to include power users of big data such as data scientists.
Appoint stewards for specific categories of big data such as social media.
Align big data with information governance disciplines such as metadata, privacy, data quality, and master data.
Big data governance is about policy formulation.
Policy includes the written or unwritten declarations of how people should behave in a given situation. For example, a big data governance policy might state that an organization will not integrate a customer’s Facebook profile into his or her master data record without that customer’s informed consent.
Big data must be optimized.
Consider how organizations might apply the principles of the physical world to their big data. Companies have well-defined enterprise asset management programs to care for their machinery, aircraft, vehicles, and other assets. Similar to cataloging physical assets, organizations need to optimize their big data as follows:
Metadata—Build information about inventories of big data.
Data quality management—Cleanse big data just as companies conduct preventive maintenance on physical assets.
Information lifecycle management—Archive and retire big data when it no longer makes sense to retain these massive volumes.
Privacy of big data is important.
Organizations also need to establish the appropriate policies to prevent the misuse of big data. Organizations need to consider the reputational, regulatory, and legal risks involved when handling social media, geolocation, biometric, and other forms of personally identifiable