You are on page 1of 19

1 | P a g e

1. Which one among Star and snowflake schemas will occupy more space?

Star Schema will occupy more space since the tables are in demoralized form and will have many
historical records with redundant information on some columns of the table.
Hence require more storage space.
2. What is Materialized Views?

In literature on data warehousing are created in Oracle using a schema object called a materialized view.
Materialized views can be used to perform a number of roles, such as improving query performance or
providing replicated data, as described below.

In data warehouses, materialized views can be used to recompute and store aggregated data such as
sum of sales. Materialized views in these environments are typically referred to as summaries since they
store summarized data.
They can also be used to recompute joins with or without aggregations. So a materialized view is used to
eliminate overhead associated with expensive joins or aggregations for a large or important class of
queries.

3. How many ways we test the universe & Report?

There are some basic ways of testing the universe:

1. built in feature of check integrity and 5 other check options which check for parsing, looping, context
etc.

2. The other way of checking whether a universe is properly working is by creating dummy report and
verifying with the back end data.

The method of testing reports is:

1. Data checking: where we tally the report data with the database data by help of the database client like
sql server or toad which can aid to check the underlying data whether the source is JDE or Teradata or
DB2 or Oracle.

2. Integration Testing where we insert or update data from an UI and then check the impact on our
report.

4. What is hybrid schema?

Combination of star scheme and snowflake scheme
2 | P a g e


5. Can we have more than one time dimensions in a Schema (Star or Snowflake?)

In snowflake star schema it is possible

6. Tell me the Top 10 features of oracle that supports data warehousing?

1. Materialized views.
2. Bitmap indexes
3. Partitioning
4. ranking

7. Is it possible to have more than one row on the same table with the same foreign key?

Yes, it is possible.

Eg., EMP table consists of many employee records where each employee is designated to one
department#. Here the department# is the foreign key in the emp table.

8. Display the dname wise maximum salary

select dname, max(sal) from dept group by dname;

9. What are confirmed dimension?

We alwys give date as a conformed dimension but if it has different format for different countries say
YYMMDD for italy and MM-DD-YYYY for France.

Then are they not confirmed.

Conformed dimensions mean the exact same thing with every possible fact table to which they are
joined.

Dimension tables are not conformed if the attributes are labeled differently or contain different values.

Date Dimension is a confirmed dimension though they have different formats (Please correct me if I am
wrong)
3 | P a g e


10. What is Grain & Granularity in Data Warehouse?

Grain The length of time (or other meaning) associated with each record in the table.

The granularity is the lowest level of information stored in the fact table. The depth of data level is known
as granularity. In date dimension the level could be year, month, quarter, period, week, day of
granularity.

The process consists of the following two steps:

- Determining the dimensions that are to be included
- Determining the location to place the hierarchy of each
dimension of information

11. Are facts same as metrics and dimensions same as measures? If not, please tell the
difference?

See, facts are usually metrics i.e facts can used to show in metrics reports as they are means to measure
the data.For example:Sales revenue, profit,margin,total count etc..

And dimensions are not measures.They are for example objects like customer name, employee name,
age, phone number,salesid etc..

Dimensions are the means of analysis and analysis is done on them. So they are entirely different from
measures

12. What is difference between

Independent Data Mart Dependent Data Mart?

Independent datamart:it does not get data from datawrehouse
Dependent data mart: It gets data from datawarehouse.
Independent Data Mart Dependent Data Mart
1) it won't have 1) it will share the common dimension
shared Dimension
2) perfect Data mart 2) it's a Enterprise DW

13. What are aggregate fact tables? Why they are needed? Give an example
4 | P a g e


In aggregate Fact tables data is aggregated at some level.

For example, we have a fact table for sales in which data gets stored on daily basis. Now we can make
another fact table for sales which holds the sales data, aggregated at weekly level. Now if we have to do
reporting on weekly basis we can directly use the aggregated fact table. This will be the more efficient
way.

14. Capital of tamilnadu

Select upper ('tamilnadu') from dual; there is no toupper function in Oracle
15. What's a Cube in Rolap?

ROLAP Cube is a data structure that is used for faster analysis of data ROLAP is slow at analysis but
large at capacity (Multibillion rows can b stored) u can use good indexes to improve its performance

16. Which is the optimized method of extracting the data?

Delta loading is a way to extract data in a optimized way. Doing the incremental load by tracking the last
update date or modified date makes the ETL processing faster.
17. If we increase the number of dimensions will the rows in fact table increase or decrease?

The number of rows in a fact table depends upon the grain of the fact. Dimensions can only describe the
data in the fact table. Adding more dimensions will not increase the number of rows unless it impacts the
grain of the fact. e.g You have fact at the granularity of day and u want to now add time intervals
dimension.... this will increase the number of rows in fact.

18. Where exactly in data warehouse a confirmed dimension can be created

A confirmed Dimension, by definition, means a single dimension table is actually connected to more than
one fact table in your dimensional model.

So, if a dimension table is shared across two fact tables,
for example, then that dimension is called as Conformed Dimension.

5 | P a g e

19. What is staging area?

Staging area is a nothing but storage area or parking area to store the multiple source date. It will
transform, combine, cleanse data, do the simple business logic and prepare the source data for data
warehouse.
20. WHAT IS DATA MART?

Data mart like a small data warehouse..
It contains subset of data warehouse..
Data mart contains one particular subject oriented data..
21. How Materialized Views are linked with Oracle Data Warehouse?

Materialized views are schema objects that can be used to summarize, precompute, replicate, and
distribute data. E.g. to construct a data warehouse.

A materialized view provides indirect access to table data by storing the results of a query in a separate
schema object. Unlike an ordinary view, which does not take up any storage space or contain any data

The existence of a materialized view is transparent to SQL, but when used for query rewrites will improve
the performance of SQL execution. An updatable materialized view lets you insert, update, and delete.

You can define a materialized view on a base table, partitioned table or view and you can define indexes
on a materialized view.

A materialized view can be stored in the same database as its base table(s) or in a different database.
Materialized views stored in the same database as their base tables can improve query performance
through query rewrites.
Query rewrites are particularly useful in a data warehouse environment.

22. What is code review and what actually srs or how can we create HLD, LLD.PLS, GIVE ME ANSWER
AS EARLY AS FAST.

Well, this question is not specific to Datawarehouse. In general these are the practices followed in
software firms while going ahead with any new project. Lets hover around each topic asked

SRS (Software requirement Specification) : This is the initial phase where customer specifies its
requirements to a software team. Actually its a study of what the customer is asking for. It captures all
details of issues faced by customer and what are the projected solutions.

HLD (High Level Design) : This is next phase of design where the inputs given from SRS are collected
and transformed into development methodology. It lays down the roadmap of code development before
actually jumping onto the development.
6 | P a g e


LLD ( Low Level Design) : This is a actually a design document. It gives all code details, all issue
pertaining to development. Its development tools, how to resolve issues based on the given platform, so
on so forth.

Code Review : This is the final phase of development where developer is been rewarded for all his hard
work and its solution. Usually Managers, Sr. Developer, Product Managers all sit together, get the feel of
product and try to figure out any loopholes. If any, developers are been asked to revisit the code or else
product is ready for QA team to test and certify.
23. What is a hot file, why we need in data warehousing, where we used in general

A hot file is a data source by which we can develop .IMR reports

.IMR reports can be developed in two ways
1.by catalog
2.by hot files
we can generate the reports by using two data sources at a
time with the help of HOTFILES.

WHERE AS doesnt support two data sources at a time.

24. What is the "junk dimension"? Give with examples?

Junk dimension: A junk dimension is a collection of random transactional codes, flags and/or text
attributes that are unrelated to any particular dimension. Simply to say a randomly used dimension is junk
dimension
Consider a trade company which consists of fact about trading that take places in a share trading firm. in
these there may be some facts like mode of trade (which indicates whether the user is trading through
phone or online) which will be not related to any of the dimensions such as account,date,indices,amount
of share etc.

so these unrelated facts are removed from the fact table and stored as a separate dimension as a junk
dimension which will be useful to provide extra information

25. Tell me one example for multi valued dimension or bridge table?

bridge table nothig but its a multipart key ..
bridge table maintain relationship between fact table and dimension table at the time of having multiple
key values,that table calls bridge table

7 | P a g e

26. Could anyone tell me which schema (star and snow flake schema) will take much space?

Star schema takes more space then the snowflake because star schema is de normalized schema and
snowflake is normalized
27. Difference between for each and for all

FOR EACH is considered as a "group by" in the SQL generated while when we apply FOR ALL it
generates SUMMARY on the basis of the element

28. Explain the ROLLUP, CUBE, RANK AND DENSE_RANK FUNCTIONS OF ORACLE 8I

ROLLUP is used to calculate the subtotals And Grand Total Based on the Group by Function.
This ROLLUP is used after the GROUP BY function.

CUBE generates a result set that represents aggregates for all combinations of values in the selected
columns.

RANK Function Is Used to Give the Ranks, But It miss Some Ranks When Both Are Getting Same
Ranks.

DENSE_RANK function Won't Give the Gaps between the Ranks

29. What is query optimizer? What are different types of optimizer s supported by oracle?

The query optimizer is the component of a database management system to optimize the query.
Query optimization is a function of many relational database management systems in which multiple
query plans for satisfying a query are examined and a good query plan is identified.


Oracle supports two different Oracle query optimizers, a rule-based optimizer and a cost-based optimizer.
With Oracle Database 10g, the rule-based optimizer is disported.

30. What are different types of partitioning in oracle?

In Oracle you can partition a table by

8 | P a g e

Range Partitioning
Hash Partitioning
List Partitioning
Composite Partitioning
31. Why oracle is the best database for data warehousing?

ORACLE fallows total 12 E.F.CODD rules. So it is a perfect database. Whereas sql fallows 10 codd rules

32. What is the difference between DSS and ODS?

ODS DSS
1.it is designed to support 1.it is designed to support operational monitoring decision making
process
2.Data is volatile 2.Data is non-volatli
3.Current data 3.Historical data
4.Detailed data 4.Summary data

33. What is degenerated dimension? Its uses?

Degenerate dimension: A column of the key section of the fact table that does not have the associated
dimension table but used for reporting and analysis, such column is called degenerate dimension or line
item dimension.

For ex, we have a fact table with customer_id, product_id, branch_id, employee_id, bill_no, date in key
section and price, quantity, amount in measure section.
In this fact table, bill_no from key section is a single value, it has no associated dimension table. Instead
of creating a seperate dimension table for that single value, we can include it in fact table to improve
performance.

SO here the column, bill_no is a degenerate dimension or line item dimension.

34. Difference between ODS and Staging

An operational data store (ODS) is a type of database often used as an interim area for a data
warehouse. Unlike a data warehouse, which contains static data, the contents of the ODS are updated
through the course of business operations.


9 | P a g e

An ODS is designed to quickly perform relatively simple queries on small amounts of data (such as
finding the status of a customer order), rather than the complex queries on large amounts of data typical
of the data warehouse.
An ODS is similar to your short term memory in that it stores only very recent information; in
comparison, the data warehouse is more like long term memory in that it stores relatively permanent
information.

But in staging we are storing current as well as historic data. This data might be a raw and then need
cleansing and transform before load into data warehouse.

35. Suppose data are coming from different locations and those data will not change. Is there any need to
use surrogate key?

Yes, We should use the surrogate key, here we are getting data from different locations means everyone
have one primary key, while transforming the data into target that time more than two key not in use so if
you use surrogate key it will identified the duplicate fields in dimensional table.

36. What is the difference between aggregate table and fact table? How do you load these two
tables?

A fact table typically has two types of columns: those that contain numeric facts (often called
measurements), and those that are foreign keys to dimension tables. A fact table contains either detail-
level facts or facts that have been aggregated.
Fact tables that contain aggregated facts are often called summary tables or aggregated fact. A fact table
usually contains facts with the same level of aggregation.
Though most facts are additive, they can also be semi-additive or non-additive. Additive facts can be
aggregated by simple arithmetical addition.
A common example of this is sales. Non-additive facts cannot be added at all. An example of this is
averages. Semi-additive facts can be aggregated along some of the dimensions and not along others.
An example of this is inventory levels, where you cannot tell what a level means simply by looking at it.

37. What is source qualifier?

Source qualifier is a basic transformation which helps us to extract data from different sources


38. What Oracle features can be used to optimize my Warehouse system?

10 | P a g e

Partition table,bitmap index,sequence,table function,sql loader ,function like cube,roll_up etc.

39. How can Oracle Materialized Views be used to speed up data warehouse queries?

A materialized view is a database object that contains the results of a query. They are local copies of data
located remotely, or are used to create summary tables based on aggregations of a table's data.
Materialized views, which store data based on remote tables are also, know as snapshots.

A materialized view can query tables, views, and other materialized views. Collectively these are called
master tables (a replication term) or detail tables (a data warehouse term).

For data warehousing purposes, the materialized views commonly created are aggregate views, single-
table aggregate views, and join views.

40. What is the difference between Oracle Express and Oracle Discoverer?

Oracle Express is a Multi Dimensional Database where as Oracle Discoverer is a reporting tool

41. When should you use a STAR and when a SNOW-FLAKE schema?

The snowflake and star schema are methods of storing data which are multidimensional in nature (i.e.
which can be analyzed by any or all of a number of independent factors) in a relational database.

The snowflake schema (sometimes called snowflake join schema) is a more complex schema than the
star schema because the tables which describe the dimensions are normalized.

Snowflake schema is nothing but one dimension table will be connected to another dimension table and
so on.
------------
Snowflake
------------
? If a dimension is very sparse (i.e. most of the possible values for the dimension have no data) and/or a
dimension has a very long list of attributes which may be used in a query, the dimension table may
occupy a significant proportion of the database and snow flaking may be appropriate.

? A multidimensional view is sometimes added to an existing transactional database to aid reporting. In
this case, the tables which describe the dimensions will already exist and will typically be normalized. A
snowflake schema will hence be easier to implement.

11 | P a g e

? A snowflake schema can sometimes reflect the way in which users think about data. Users may prefer
to generate queries using a star schema in some cases, although this may or may not be reflected in the
underlying organization of the database.

? Some users may wish to submit queries to the database which, using conventional multidimensional
reporting tools, cannot be expressed within a simple star schema. This is particularly common in data
mining of customer databases, where a common requirement is to locate common factors between
customers who bought products meeting complex criteria.
Some snow flaking would typically be required to permit simple query tools such as Cognos Power play to
form such a query, especially if provision for these forms of query weren't anticipated when the
data warehouse was first designed.

---------
Star
----------
The star schema (sometimes referenced as star join schema) is the simplest data warehouse schema,
consisting of a single "fact table" with a compound primary key, with one segment for each "dimension"
and with additional columns of additive, numeric facts.

The star schema makes multi-dimensional database (MDDB) functionality possible using a traditional
relational database. Because relational databases are the most common data management system in
organizations today, implementing multi-dimensional views of data using a relational database
is very appealing.
Even if you are using a specific MDDB solution, its sources likely are relational databases.
Another reason for using star schema is its ease of understanding.
Fact tables in star schema are mostly in third normal form (3NF), but dimensional tables are in de-
normalized second normal form (2NF). If you want to normalize dimensional tables, they look like
snowflakes (see snowflake schema) and the same problems of relational databases arise - you need
complex queries and business users cannot easily understand the meaning of data.

Although query performance may be improved by advanced DBMS technology and hardware, highly
normalized tables make reporting difficult and applications complex.

42. What is a star schema? Why does one design this way?

A Star schema consists of fact table surrounded by one or more dimension tables which represent a star
like structure. Primary key of dimension tables connect to fact tables as foreign keys. Data is de
normalized manner. For better query performance we would go for star schema.

43. When should one use an MD-database (multi-dimensional database) and not a relational one?
12 | P a g e


1 Because More than one dimension can be sharable for Other Department

2 The Physical Load will be less.

3 Less Complexity of Fact

44. What is the difference between an ODS and a W/H?

An ODS is an environment that pulls together, validates, cleanses and integrates data from disparate
source application systems.
This becomes the foundation for providing the end-user community with an integrated view of enterprise
data to enable users anywhere in the organization to access information for strategic and/or tactical
decision support, day-to-day operations and management reporting.

The definition of Data Warehouse is as follows.

? Subject-oriented, meaning that the data in the database is organized so that all the data elements
relating to the same real-world event or object are linked together;

? Time-variant, meaning that the changes to the data in the database are tracked and recorded so that
reports can be produced showing changes over time;

? Non-volatile, meaning that data in the database is never over-written or deleted, but retained for future
reporting;

? Integrated, meaning that the database contains data from most or all of an organization's operational
applications, and that this data is made consistent.

Difference
------------
Ods
--------
Transactions similar to those of an Online Transaction Processing System
Data Warehouse
--------------
Queries process larger volumes of data
Ods
--------
Contains current and near current data
Data Warehouse
------------
13 | P a g e

Contains historical data
Ods
-----------
Typically detailed data only, often resulting in very large data volumes
Data Warehouse
--------------
Contains summarised and detailed data, generally smaller in size than on ODS

Ods
--------
Real-time and near real-time data loads
Data Warehouse
------------
Typically batch data loads

Ods
--------
Generally modeled to support rapid data update
Data Warehouse
-----------------
Generally dimensionally modeled and tunes to optimize query performance
Ods
------
Updated at the data field level
Data Warehouse
---------------
Data is appended, not updated
Ods
-------
Used for detailed decision making and operational reporting
Data Warehouse
----------------
Used for line-term decision making and management reporting

Ods
-----
Knowledge workers (customer service representatives, line managers)
Data Warehouse
-------------
Strategic audience (executives, business unit management)

45. What is the difference between a data warehouse and a data mart?

Dataware house: It is a collection of data marts.
Represents historical data.

14 | P a g e

Data mart: It is a sub set of data ware housing.
It can provide the data to analyze query reporting & analysis.

46. What is an ETL/ how does Oracle support the ETL process?

ETL stands for Extract transform and load.
In general the entire ETL process is only SQL scripts executed against the database.
Oracle being a Database which favors ETL through SQL scripts.

47. What is a Data Warehouse?

A data warehouse is a subject oriented, integrated, time variant collection of data in support of
management's decision making process

48. Explain in simple terms, the concept of Data Mining.

Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data
from different perspectives and summarizing it into useful information - information that can be used to
increase revenue, cuts costs, or both.
Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze
data from many different dimensions or angles, categorize it, and summarize the relationships identified.
Technically, data mining is the process of finding correlations or patterns among dozens of fields in large
relational databases.

49. What is a Data Cube?

Data cube is the logical representation of multidimensional data. The edge of the cube contains
dimensions and the body of the cube contains data.

50. How does data mining and data warehousing work together?

Data warehousing is used to store the historical data. By using dwh business users can analyze their
business.

15 | P a g e

Data mining is used to predict the future.dwh will act as the source for the data mining

51. What are different stages of? Data mining?

Exploration
pattern identification
deployment

52. What is MDX?

MDX means multidimensional Expression
MDX is a query language for multidimensional databases
53. What is META DATA information in Data warehousing projects?

META-DATA-> DATA-ABOUT-DATA IN THE SENSE ETL REPOSITORIES WILL STORE METADDATA
OF ALL THE TABLES/SCHEMAS TO MPROCESS DATA FROM SOURCE TO TARGET.
SO IT WILL STORE THE TABLE NAME, COLUMN TYPE/LENGTH/NULL/KEY OR NOT IN SUCH A
WAY MATADATA WILL BE STORD IN ETL REPOSITORIES./

54. What are CUBES?

Cubes are also called as business process. Cube is the logical representation of multidimensional data.
The edge of the cube contains dimension and the body of the cube contain data.

55. What is Data purging?

Data purging is nothing but deleting your data from DW.

Sometimes while loading data into staging or target table you may need to load fresh data every times
(called full load). In this case you need to purge the complete stage/target table before load it with Fresh
data.

56. What is Conceptual, Logical and Physical model?

Conceptual Data Model
During the Planning phase of the project, the conceptual data model is created to capture the high-level
data requirements for the project.

16 | P a g e

Since the model captures the highlights of the clients information needs, it is the only model that
effectively reflects the enterprise level.

Depending on the requirements, the enterprise-wide vision may need to be emphasized to help guide the
client in the development of an overall data warehousing strategy. Detail models that reflect the projects
scope will be created during logical and physical data modeling.

The conceptual data model is the precursor to the logical data model; it is not tied to any particular
solution or technology.

Entities, relationships, major attributes, and metadata across functional areas are included. During
successive releases, the conceptual data model should be validated and updated if necessary. An
enterprise should have only one conceptual data model.

Logical Data Model
During the design phase of the project, the logical data model is created for the scope of the complete
project. A portion of the conceptual data model will be fully attributed and completed as the logical data
model.

The logical data model reflects the technology to be used. In todays environment, this typically means
either a relational DBMS or a multidimensional tool. But if the client should be using an older DBMS such
as IMS or IDMS, the logical model will be quite different than if an RDBMS is to be used.

The logical data model reflects a logical data design that can be used by the developers on the project.
For an RDBMS, that means logical tables (views) and columns.

Physical Data Model
Like the logical data model, the physical data model is created during the design phase. This modeling
activity should reflect the scope of the specific release of the project. The models final design will be
highly dependent on the technical solution for the data warehouse.

The purpose of this model is to capture all the technical details required to produce the final tables, and
physical constructs such as indexes and table partitions. The logical data model will serve as a blueprint
to the project team while the physical data model is a blueprint for the DBAs.

All the functionality reflected in the logical data model should be preserved while creating the physical
data model. The generated table schemas will be identical to the physical data model.


57. Why do we use DSS database for OLAP tools?

Olap tools or reporting tools can be connected to only ONE database. If we need a report of data from
more than one database like oracle,

SQL Server and DB@ then all the data is put into a data warehouse database and then the olap tool can
be connected to this data warehouse database and the necessary reports can be generated.

17 | P a g e


58. What is subject area?

Subject area in a data model is a layer where we depict the business process using the entities and
relationships. So each subject area will capture a business process.


59. What is up date strategy and what are the options for update strategy?

Update Strategy allows us to decide what to do with the data (INSERT, DELETE, UPDATE or REJECT)
based on conditions defined. We can achieve this thru the mapping designer using transformation or we
can do this also at the session level by using the treat source rows as option.

60. What is data merging, data cleansing and sampling?

Data cleaning means duplicate data removing, merging means combing data from sql, oracle and other
etc. and dumped into data warehouse

61. What is staging area?

A staging area is where data is processed before entering into the data warehouse. You need to clean
and process the data before putting it into a warehouse. Cleansing means removing extra spaces,
deleting null values from primary key columns, etc.

62. What is difference between a connected look up and unconnected look up?

Connected lookup

Unconnected lookup

Receives input values directly from the pipeline

Receives input values from the result of a lkp expression in an another transformation.

yoU can use a dynamic or static
cache U can use a static cache.

Cache includes all lookup columns used in the maping
Cache includes all lookup out put ports in the

lookup condition and the
lookup/return port.

Support user defined default
values Does not support
user defiend default values

18 | P a g e


63. What is query panel?

Query panel is like a window in Business Objects where you pull the objects and measures that you need
to work with with for your reports.

64. What is mapplet?

Mapplet is group of transformation it reusable in different mapping, it will create in designer tool that is
mapplet designer

65. What is DTM session?

DTM Stands fo data transformation manager

it is a process which runs secondly with every session run

it creates main thread for running the session and the main thread creates separate threads for each
session process for reading a session and writing session and transform the data and to run pre and past
session errors

66. Advantages of de normalized data?

De-normalization is a concept used for fast data retrieval as it reduces the number of joins to be
performed in the select query.

It's reverse of normalization as any normalized data base is optimized for DML operations (insert, update,
delete).

67. What are cursors?

A cursor is a handle or name for sql area
two types of cursors they are

Explicit cursor: Created by user
Implicit Cursor: created by oracle automatically when ever DML is performed.

68. What is aggregate awareness?


Aggregate awareness is a function in business objects with which we can get information from the
aggregate tables in data base



69. What is normalization?
What is a trigger? (A special type of stored procedure, executed when certain event occurs.)
19 | P a g e

What does the FROM clause do?
Can you use an UPDATE clause and a SELECT clause together in the same query?
What is the difference between unique key and primary key?
What is referential integrity?
What is the use of AS clause?
What does locking refer to?
Why ORDER BY clause is used?
What is clustered and non-clustered index?
What is the difference between DELETE, DROP and TRUNCATE?(Delete-deletes the rows based on
WHERE clause, Drop-Delete the table structure, Truncate-Deletes all the rows in a table without affecting
the table structure)

You might also like