You are on page 1of 11

GLOSSARY OF DATA WAREHOUSE TERMS THE OHIO STATE UNIVERSITY

Glossary of Data Warehouse Terms

Ad-Hoc Query
One time only access of data using parameters never before used and perhaps never used
again. An ad-hoc query consists of dynamically constructed SQL that is often created by the
knowledge worker using a desktop- query tool.

After Image
The snapshot of data placed on a log upon the completion of a transaction.

Aggregate Data
Data that has been summarized or totaled; the rollup of amounts or counts to a summary
level.

Alerts
A system generated notification resulting from an event that has exceeded a pre-defined
threshold.

Algorithm
A set of statements or a formula to calculate a result or solve a problem in a defined set of
steps.

Alias
A secondary and non-standard synonym or alternate name of a standard business term or
name, used to cross-reference the official name to a legacy data name.

Application
A group of computer programs, computer hardware, procedures, data, and knowledge
workers interlinked to provide support for an organizational function or requirement.

Architecture
The structure and/or design of a system, application or database. In architected systems, all
components share the same structure. This approach makes it easier to integrate the
components into a single system.

Architect
The term applied to an individual or group that is responsible for developing the overall
structure for the Operational or Decision Support System. Sometimes called a Data
Architect. The architect should have experience in logical data modeling and decision
support systems.

Archived Data
Data of a historical nature that is relevant to a moment in time, which has now passed; saved
in its exact form for historical, recovery or restoration purposes. Archived data cannot be
updated.

Atomic Data
Data that represents the lowest level of detail or granularity of a meaningful fact.

Attribute
A property or characteristic that describes an entity or object. It is a conceptual
representation of a fact that is implemented as a field in a record or file, or as a column in a
table or database.

Covansys PAGE 1
10/14/2004
GLOSSARY OF DATA WAREHOUSE TERMS THE OHIO STATE UNIVERSITY

Audit Trail
Data that can be used to trace system activity, such as add, update or delete transactions;
often accomplished using before and after images.

Back Up
To restore a database to its state as of previous point in time,

Backup
A file that represents a snapshot of a database as of some previous moment in time.

Batch Processing
A computer environment in which programs access data exclusively, often prohibiting any
other access to the data; usually a long-running, sequential process.

Batch Window
The time when an application schedules batch processing to take place, often in a block of
time during non-peak processing hours (overnight).

Before Image
The snapshot of data placed on a log prior to the execution of a transaction.

Bi-directional Extracts
The ability to extract, cleanse and transfer data in two directions among different type of
databases including hierarchical, networked and relational databases.

Business Drivers
The people, information, and tasks that support the fulfillment of a business objective.

Business Model
A view of the business at any given point in time. The view can be from a process, data,
event or resource perspective and can be the past, present, or future state of business.

Business Intelligence (BI)


Knowledge gained about a business through the use of various hardware/software
technologies, which enable organizations to turn data into information.

Business Rule
A type of metadata that documents constraints applied to some aspect of the business. A
statement expressing a policy or condition that governs business actions and establishing
data integrity guidelines.

Candidate Key
A value that can serve as a unique identification for an occurrence of an entity; no two
records must have the same value.

Cardinality
The number of unique occurrences for a value (for example, gender code (Male or Female)
has low cardinality with 2 values, but state code has a higher cardinality with 50 values.); also
the number of occurrences between a pair of related entities (one to one, one to many, many
to many).

Central Data Warehouse


A database created from operational extracts that adheres to a single, consistent, enterprise
data model to ensure consistency of decision support data across the organization.

Covansys PAGE 2
10/14/2004
GLOSSARY OF DATA WAREHOUSE TERMS THE OHIO STATE UNIVERSITY

Change Data Capture


The capture of changes made to a production data source. The changes are captured
through reading the source DBMS log. Change Data Capture is used to reduce data volume
in a data warehousing environment by consolidating units of work.

Class Word
See Domain.

Code
A shorthand representation or abbreviation of a specific value for an attribute or column. If a
code exists, there must be a code table that allows the encoding and decoding of valid
values.

Column
A vertical table in which values represent the same type of information (such as a list of
student Social Security Numbers). Columns represent elements in a database.

Configuration Management
The process of identifying and defining configurable items in an environment by controlling
their release, use and subsequent changes during the development life cycle of an
application or data warehouse. This includes verifying their completeness and correctness,
recording and reporting the status of those items and change requests.

Concurrency
A characteristic of information quality measuring the degree to which the timing of
equivalence of data is stored in redundant or distributed database files.

Cost of Ownership
The total costs of ownership of products, such as software packages, and services, including
planning, acquiring, process redesign, implementation, and support required for the
successful use of the product or service.

Cube
The concept that data can be represented in multiple dimensions in a construct called a
cube. This allows one to visualize the location of data in a multi-dimensional database as an
intersection of several dimensions.

Customer Relationship Management


The idea of establishing relationships with customers on an individual basis, then using that
information to treat different customers differently. Customer buying profiles and churn
analysis are examples of decision support activities that can affect the success of customer
relationships.

Database Approach
Designing and constructing a system around a central database. Applications are developed
and implemented to load, update and retrieve data from the database.

Database Schema
The logical and physical structural design of a database structure.

Data Cleansing
A process to correct data errors during the extraction, transformation process and prior to
loading the data into the warehouse.

Covansys PAGE 3
10/14/2004
GLOSSARY OF DATA WAREHOUSE TERMS THE OHIO STATE UNIVERSITY

Data Definition
The specification of the meaning, domain values, and business integrity rules for an entity
type or attribute. Data definition includes name, definition, and relationships, as well as
domain value definition and business rules that govern business actions that are reflected in
data.

Data Dictionary
A database about data and database structures. A catalog of all data elements, containing
their names, structures, and information about their usage. A central location for metadata.
Normally, data dictionaries are designed to store a limited set of available metadata,
concentrating on the information relating to the data elements, databases, files and programs
of implemented systems.

Data Definition Language (DDL)


The language used to describe database schemas or designs

Data Flow Diagram


A diagram that shows the normal flow of data between services as well as the flow of data
between data stores and services.

Data Mapping
The process of assigning a source data element to a target data element.

Data Marts
Data Marts are built to support information retrieval within a specific subject area. Data Marts
are built to support answering limited set of questions. The Data Marts architecture supports
fast retrieval of data from the database supporting those questions. Data Marts are often
designed around a Star Schema.

- Dependent Data Mart


The dependent Data Mart is populated from a central data warehouse. Since each
dependent data mart is populated from the same data warehouse, the data in these data
marts share common business rules, semantics, and definitions. This approach ensures
that all components of the architecture are consistent and conformed through common
definitions and business rules. When the same question is asked against different
dependent data marts, the same answer will be received because their data comes from
a single source of truth. (Sometimes referred to as an architected data mart.)

- Independent Data Mart


The independent Data Mart is populated directly from the source system(s). Since each
independent data mart is populated directly from the source system(s), the data in these
data marts may not share common business rules, semantics, or definitions. This
approach creates the risk that many components of the architecture will be inconsistent
and contradictory because of the application of different definitions and business rules.

- Virtual Data Mart


Virtual Data marts are created to simulate data mart structures. Creating views over
existing tables without creating the physical tables creates virtual data marts. The views
give an impression of the data that is different than the physical organization of the data.
The views can give an impression of data being organized in a star or snowflake schema.

Data Mining
Data mining is the process of sifting through large amounts of data to produce data content
relationships. Data mining allows the knowledge worker to understand the factors that have
created trends.

Covansys PAGE 4
10/14/2004
GLOSSARY OF DATA WAREHOUSE TERMS THE OHIO STATE UNIVERSITY

Data Model
A data model is a graphical representation of data objects that describe a business, its data,
and some of its business rules.

Data Modeling
A method used to define and analyze data requirements needed to support the business
functions of an enterprise. These data requirements are recorded as a conceptual data model
with associated data definitions. Data modeling defines the relationships between data
elements and structures.

Data Partitioning
The process of logically and/or physically partitioning data into segments those are more
easily maintained or accessed. Current RDBMS systems provide this kind of distribution
functionality. Partitioning of data aids in performance and utility processing.

Data Pivot
A process of rotating the view of data.

Data Staging
A process to store the data temporarily during the loading of data to a data warehouse.
Performed when data is extracted from the operational systems, including integrating
dissimilar data types and processing calculations.

Data Transformation
Creating information from data. This includes decoding production data and merging of
records from multiple DBMS formats. A part of the process also involves data scrubbing or
data cleansing.

Data Warehouse
A warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data
in support of management's decision making process". - Bill Inmon. The data warehouse is a
normalized data structure that represents standardized, consistent cleansed data.

Data Warehouse Management Layer


A level of software used to manage the Data Warehouse. Management includes security,
access rights, monitoring performance and tracking the rows and columns that are accessed
by queries.

Decision Support System (DSS)


A system designed to help management and business analysts make decisions based on
information derived from data. Decision Support Systems contain historical data grouped by
subject areas. The structure is flexible making it easy to ask a variety of questions about the
data. There are many types of Decision Support Systems such as Data Warehouses and
Data Marts. Often finding a Decision Support System that contains other Decision Support
Systems is usual

Decision Support System Data


Decision Support System Data represents the data used in the decision support process.
This data is at the many levels of detail, is integrated, can not be changed (is read only), and
has been assessed for data quality and cleansed. This data is historical in nature

Derived Data
Data that is the result of calculations in reference to source data.

Covansys PAGE 5
10/14/2004
GLOSSARY OF DATA WAREHOUSE TERMS THE OHIO STATE UNIVERSITY

Designers View
The perspective on the Decision Support Systems development taken by the person
responsible for applying technology to the solution of business problems.

Dimension
An index into a set of facts. (Payments, geography, time etc.) . A dimension acts as an
index for identifying values within a multi-dimensional array. An example, all months,
quarters, years, etc., make up a time dimension; likewise all cities, regions, countries, etc.,
make up a geography dimension.

Domain
A set of valid values from which actual values are derived for an attribute or a data element.

Drill Down
A method of exploring detailed data that was used in creating a summary level of data. Drill
down levels depend on the granularity of the data in the data warehouse.

Drill Through
The process of accessing the original source data from a replicated or transformed copy to
verify equivalence to the record of origin data.

EIS
EIS stands for Executive Information System. These systems are used to produce
information for the top level of management. The information created by an EIS system is
aggregated and summarized at a high level (not detail level) and is global in nature.

Entity
An entity represents a Person, Place, Thing or an Event about which the organization wants
to keep data.

Enterprise Data Warehouse


A centralized data warehouse that supports the business information needs of the entire
enterprise. The enterprise data warehouse is not biased towards any business functional
subject area.

Entity Relationship Diagramming


A process that visually identifies the relationships between data elements.

ETL
Extract, Transformation and load. ETL is the process of moving data from source systems to
the data warehouse. In the ETL process, after the data has been extracted it is transformed
(the filtering, merging, decoding, and translating source data to create validated data for the
data warehouse.) After the transformation, the data is loaded into the target database.

Extract Frequency
The latency of data extracts, such as daily versus weekly, monthly, quarterly, etc. The
frequency that data extracts are needed in the data warehouse is determined by the shortest
frequency requested through an order, or by the frequency required to maintain consistency
of the other associated data types in the source data warehouse.

Fact
A piece of information that is the target of a typical retrieval.

Covansys PAGE 6
10/14/2004
GLOSSARY OF DATA WAREHOUSE TERMS THE OHIO STATE UNIVERSITY

Fastload
A technology that typically replaces a specific DBMS load function. A fastload technology
obtains significantly faster load times by preprocessing data and bypassing data integrity
checks and logging.

Feedback Loop
A formal mechanism for communicating information about process performance and
information quality to the process owner and information producers.

Filters
Saved sets of chosen criteria that specify a subset of information in a data warehouse.

Foreign Key
A data element in one entity (or relation) that is the primary key of another entity that serves
to implement a relationship between the entities.

Granularity
The level of detail. The lower the granularity, the greater the detail. A file containing data on
every welfare client would represent a low level of granularity. A file containing summaries of
student applications on a county by county basis would be considered to represent a high
level of granularity.

Hierarchical Relationships
Any dimension's members may be organized based on parent- child relationships, typically
where a parent member represents the consolidation of the members which are its children.
The result is a hierarchy, and the parent/child relationships are hierarchical relationships.

Householding
A process of identifying a group of related parties.

Impact Analysis
The identification of the effects of change to the data warehouse environment

Incremental Load
The propagation of changed data to a target database or data warehouse in which only the
data that has been changed since the last load is loaded or updated in the target.

Information
Data that a person has processed and evaluated to solve a problem or to make a decision.

Information Architecture
A blueprint of an enterprise expressed in terms of a business process model, showing what
the enterprise does; an enterprise information model, showing what information resources
are required; and a business information model, showing the relationships of the processes
and information.

Integrated
Data that is gathered into the data warehouse from a variety of sources and merged into a
coherent whole.

Joins
An operation performed on tables of data in a relational DBMS in which the data from two
tables is combined in a larger, more detailed joined table.

Knowledge base
A database where the codification of knowledge is kept.

Covansys PAGE 7
10/14/2004
GLOSSARY OF DATA WAREHOUSE TERMS THE OHIO STATE UNIVERSITY

Knowledge Worker
The role of individuals in which they use information in any form as part of their job function or
in the course of performing a process, whether operational or strategic

Legacy Data
Data that comes from files and/or databases developed without using an enterprise data
architecture approach

Load
The insertion of data into a database.

Logical Data Model


A logical model represents the business at the fully attributed, normalized, atomic level. The
logical model is not biased towards specific application requirements, does not reflect the
constraints of specific technology limitations or issues. The logical model serves as a source
for designing physical models of the system.

Mapping
Identifying that a column in a database (or a data element from a file) means the same thing
as a column (or a combination of columns) in another database (or a data element from
another file)

Metadata
Data that describes data in a system. Metadata includes physical information (this attribute is
numeric, contains two decimal places etc.), definitions (this attribute represents the monthly
amount of assistance payments given to the client this month) business rules etc. Metadata
allows the business user to understand data in the system. Metadata defines the availability,
content and context of the information in the Decision Support System. Metadata is not just
data about data. Metadata explains the business rules and constraints that determine how
the organization does business. This type of Informational Metadata describes the data in
business terms. Such metadata simplifies the data exploration activity enabling the business
users to comprehend and discover new ways to use the data. Besides Informational
Metadata there is also Technical Metadata, that describes the physical structure of the
object being defined such as data type, format, length etc.

Metadata Drift
The fact that data changes over time that can impact the quality of data in the warehouse.

MOLAP
Multi dimensional On Line Analytical Processing. A set of applications and proprietary
databases that have a strong dimensional flavor.

MPP
Massive Parallel Processing. The shared nothing approach of parallel computing.

Multi dimensional Database


A database organized in terms of facts and dimensions. (Star Schema and Snowflake
Schema)

Non-volatile
Data is stable in a data warehouse. More data is added but data is never removed. This
enables management to gain a consistent picture of the business.

Normalization
Organizing data in a manner that minimizes or eliminates redundancy.

Covansys PAGE 8
10/14/2004
GLOSSARY OF DATA WAREHOUSE TERMS THE OHIO STATE UNIVERSITY

ODBC
Open Database Connectivity. A standard for database access co-opted by Microsoft from the
SQL access group consortium.

OLAP
On line Analytical Processing. Technology that transforms data into multidimensional views
and that supports multidimensional data interaction, exploration, and analysis.

OLTP
Online Transaction Processing. OLTP describes the requirements for a system that is used in
an operational environment.

Operational Data Store (ODS)


Current data used for decision support by knowledge workers required to operate at the
tactical level. This data is extracted from the operational data, is transformed, and is stored
in a relational structure to support DSS analysis. The ODS system is often rebuilt on a daily
basis though some areas require rebuilds of the systems in shorter time intervals. The ODS
is a read-only system. An ODS may contain 30 to 60 days of information, while a data
warehouse typically contains years of data.

Operational Data
Operational data represents the data used in the daily business process. This data is at the
lowest level of detail, is often transaction oriented, can be (and will probably be changed)
through out the day. This data is current and not historical in nature.

Operational System
An operational system is a data system used to carry out daily business operations for the
organization. These systems tend to be application (function) oriented.

Owners View
The perspective on systems development held by the person(s) who will use the system to
run the business (business users).

Parallel Processing Support


Computers that use multiple CPUs, where each processor can perform a part of the task, all
at the same time.

Primary Key
A column or combination of columns whose values uniquely identify a row or record in the
table. The primary key(s) will have a unique value for each record or row in the table.

Physical Data Model


A physical model defines the explicit structure of the physical database using a specific
database technology. The physical model may reflect the needs of specific applications. One
or more physical models will be generated from a logical model.

Query Governor
A facility that terminates a database query when it has exceeded a predefined threshold

Rationalization
A process of consolidating names and addresses

Repository
A database containing data. Often used as a short hand term form metadata repository
(database containing all metadata).

Covansys PAGE 9
10/14/2004
GLOSSARY OF DATA WAREHOUSE TERMS THE OHIO STATE UNIVERSITY

Referential Integrity
Integrity constraints that govern the relationship of an occurrence of one entity type to one or
more occurrences of another entity type.

ROI
Return On Investment. A statement of the relative profitability generated as a result of a given
investment.

ROLAP (Relational OLAP)


A product that provides multidimensional analysis of data, aggregates and metadata stored in
an RDBMS. The multidimensional processing may be done within the RDBMS, a mid-tier
server or the client. A 'merchant' ROLAP is one from an independent vendor, which can work
with any standard RDBMS.

Rollback
The process of restoring data in a database to the state at its last commit point.

Scalability
The ability for a system to grow with increasing demands for storage space, processing
speed and network capacity.

Schema
The complete description of a database in terms of its entity types, attributes and
relationships or structure, or of an object base in terms of the definitions of types classes,
attributes, operations and interfaces, or protocols

SME
Subject Matter Expert. A business person who has knowledge and experience of a given
business subject or function. Also TSME Technical Subject Matter Expert and BSME
Business Subject Matter Expert

SMP
Symmetric Multi Processing.

Slowly-changing dimensions
A dimension that changes very slowly. An example of a slowly changing dimension in OSU
would be student last name that would change only in special circumstances.

Snapshot
A database dump or archiving of data as of some past moment in time.

Snowflake Schema
A snowflake schema is a set of tables comprised of a single, central fact table surrounded by
normalized dimension hierarchies. Each dimension level is represented in a table. Snowflake
schema implements dimensional data structures with fully normalized dimensions. Star
schemas are an alternative to snowflake schema.

Source Database
An operational, production database or a centralized warehouse that feeds into a target
database

Staging Area
The use of flat files and/or relational tables to store data extracted from the source system
during the transformation portion of the ETL process. The staging area is solely a work area
and is not to be used for queries or report generation.

Covansys PAGE 10
10/14/2004
GLOSSARY OF DATA WAREHOUSE TERMS THE OHIO STATE UNIVERSITY

Star Schema
A star schema is a set of tables comprised of a single, central fact table surrounded by de-
normalized dimensions. Each dimension is represented in a single table. Star schema
implement dimensional data structures with de- normalized dimensions. Snowflake schemas
are an alternative to star schema. A relational database schema for representing
multidimensional data. The data is stored in a central fact table, with one or more tables
holding information on each dimension. Dimensions have levels, and all levels are usually
shown as columns in each dimension table.

Stress Test
A test to determine how many resources are consumed during different levels of input
processing.

Subject Area Data Model


The conceptual enterprise data model is the highest level of data model used in the
organization. This model is logical in nature, identifies the subject areas of the business and
identifies the major entities within each subject area. This model is used to manage and
control the scope and iterations of a DSS project.

Subject Oriented
Data that gives information about a particular subject instead of about a company's ongoing
operations.

Target Database
The database in which data will be loaded or inserted.

Third Normal Form (3NF)


A relation R is in third normal form (3NF) if and only if it is in second normal form (2NF) and
every non-key attribute ids non-transitively dependent upon the primary key.

Time-variant
All data in the data warehouse is identified with a particular time period.

Versioning
The ability for a single definition to maintain information about multiple physical instantiations.

View
Presentation of data from one or more tables.

Visualization
Presenting data visually to aid the knowledge worker in the recognition of patterns and
trends.

Covansys PAGE 11
10/14/2004

You might also like