You are on page 1of 7

International Journal of EmergingTrends & Technology in Computer Science(IJETTCS)

Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com


Volume 3, Issue 2, March April 2014 ISSN 2278-6856


Volume 3, Issue 2 March April 2014 Page 95


Abstract Data Analysis helps in providing the critical
link between good decision making and success. Data
analysis is basically used for Prediction and Identification
and for the rules of evidence for guiding the analysis by
Falsifiability, Validity and Parsimony. The existing data
analysis tools are a collection of data analysis methodologies
that require experts as users to choose the correct
methodology. However, many business users want to apply
data analysis to business data to understand the trends, to
make predictions and to improve their business decisions. It is
is important to know which data analysis technique should be
applied when and to which kind of data. This paper performs
a comparative study of data analysis techniques and
highlights the advantages, disadvantages and application of
each technique. It also describes some of the key mistakes that
that should be taken care during the application of data
analysis techniques.

Keywords Data Analysis, Conjoint Analysis, Factor
Analysis, Discriminant Analysis, Cluster Analysis, Structural
Equation Modeling, Regression Analysis, Decision Science

1. INTRODUCTION
Analysis of data is a process of inspecting, cleaning,
transforming, and modelling data with the goal of
highlighting useful information, suggesting conclusions,
and supporting decision making. Data analysis has
multiple facets and approaches, encompassing diverse
techniques under a variety of names, in different business,
science, and social science domains.
Data mining is a particular data analysis technique that
focuses on modelling and knowledge discovery for
predictive rather than purely descriptive purposes.
Business intelligence covers data analysis that relies
heavily on aggregation, focusing on business information.
In statistical applications, some people divide data
analysis into descriptive statistics, exploratory data
analysis (EDA), and confirmatory data analysis (CDA).
EDA focuses on discovering new features in the data and
CDA on confirming or falsifying existing hypotheses.
Predictive analytics focuses on application of statistical or
structural models for predictive forecasting or
classification, while text analytics applies statistical,
linguistic, and structural techniques to extract and classify
information from textual sources, a species of
unstructured data. All are varieties of data analysis. Data
integration is a precursor to data analysis, and data
analysis is closely linked to data visualization and data
dissemination.
The available data analysis tools are mostly a collection of
data analysis methods that require experts as users. The
users need domain knowledge and also need to know
which data analysis methods have to be applied to a given
problem and which technique meets the requirements for
the solution. The expert should also know how the data
has to be prepared for the chosen technique and finally,
how the technique needs to be configured.
Business users require a much more user- or problem
oriented approach to data analysis. Rather than knowing
analysis methods, they are experts in the data domain and
they know what they want to achieve with data analysis. If
they only knew how. They might know, for example, that
they want to classify insurance claims as fraudulent or
non-fraudulent, given historic information of the
customer and the current case. They might want to
understand, how the analysis method actually classifies
customers (e.g. with a rule set), they might require a
certain classification accuracy and that the algorithm is so
simple that it can be implemented as an SQL query.
Ideally, such users would simply like to feed all these
high-level requirements and the data into a tool that
would then automatically find the best algorithm in terms
of requirements, configure it, run it and create a software
module that can be plugged into the business application.
[1]
In this paper, we focus on a way to select the most
appropriate data analysis algorithm given a problem
definition, a set of requirements and a data file.

2. NEED OF DATA ANALYSIS MODELS
As companies adopt analytics as the new science of
winning, organizations will need to focus both on the
creation and consumption of insights to enable better
decisions. There is need of data analysis due the following
reasons:
a. The business problem is not clear: In a rush to jump on
the analytics bandwagon, business practitioners often
forget that the business problem needs to be well-
defined for the analytics solution to be relevant to the
problem at hand.
b. Appropriate stakeholder(s) are not involved: If a firm is
using analytics to design a promotion campaign for a
certain product, the demand planning teams need to
know whats changing to get the product on the
shelves. Like any project team, the right stakeholders
need to be involved at the right time. This is especially
true when multiple functional groups are involved in a
specific business problem.
c. Mystery math: With the explosion in data and the
availability of technologies that bring applied math to
A Comparative Study of Data Analysis
Techniques

Prateek Bihani
1
and S. T. Patil
2


1&2
Pune University, Vishwakarma Institute of Technology,
Bibewadi, Pune 411037, India
International Journal of EmergingTrends & Technology in Computer Science(IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 3, Issue 2, March April 2014 ISSN 2278-6856


Volume 3, Issue 2 March April 2014 Page 96


the analytics workbench, analytics practitioners begin
to regard the technical analysis as an end in itself.
Mathematical techniques are tools necessary to solve
the business problem at hand.
d. The right expectations are not set: Sophisticated
mathematical techniques are often expected to act as
magic wands, solving any and every problem at hand.
More often than not, this creates unreasonable
expectations. As the key sponsor of a failed forecasting
project famously said, Why should there be any error
in the forecast if you have used sophisticated
mathematical techniques? This was clearly a case of a
mismatch in expectations it was never communicated
to the executive that no mathematical technique,
however sophisticated, could accurately predict the
future.
e. Lack of continuity: As basic a management principle as
it may sound, the best analytics ideas tend to lose
advantage and diminish in value, due to a variety of
reasons ranging from internal organization changes to
getting lost in the shuffle of organizational initiatives.
f. Losing relevance: Analytics needs to be extremely agile
to keep up with changing business priorities. Quite
often, the quest for the perfect mathematical technique
delays the solution to an extent that it is rendered
irrelevant. For example, a launch pricing analysis is
irrelevant after the product launch has already
happened.
g. Bridging the Chasm: It is becoming increasingly
apparent that investing in the creation of analytics
alone does not guarantee effective consumption of
analytics by businesses. To truly leverage analytics as a
competitive differentiator, companies will need to
ensure that the consumption cycle is tightly integrated
with the creation of analytics.
The creation of insights requires a holistic perspective of
Descriptive Analytics, Inquisitive Analytics, Predictive
Analytics and Prescriptive Analytics:
a. Descriptive analytics answers the questions What
happened in the business? It is looking at data and
information to describe the current business situation in
a way that trends, patterns and exceptions become
apparent
b. Inquisitive analytics answers the question Why is
something happening in the business? It is the study of
data to validate/reject business hypotheses
c. Predictive analytics answers the question What is
likely to happen in the future. It is data modeling to
determine future possibilities
d. Prescriptive analytics is the combination of the above to
provide answers to the so what? and the now what?
questions. For example, what should I do to retain my
key customers? How do I improve my supply chain to
enhance service levels while reducing my costs?

The type of analysis problem restricts the list of applicable
data analysis techniques to that problem. By the term
analysis problem, we mean, whether it is a classification
problem, function approximation like time series
prediction, a clustering problem, if it is about finding
dependencies or associations etc. The second category of
requirements is concerned with preferences regarding the
solution. These comprise properties like accuracy and
simplicity of the solution, if the method is adaptable to
new, whether it offers an explanation facility like rule-
based systems or functional models like linear regression
and how simple the explanation should be. Finally, the
data might constrain the applicability of methods. The
number of data records, for example, might be too small
for some statistical methods, or generally, some methods
might cope better with certain types of data than others.
Depending on the type of user, the level at which the
requirements of the problem are defined will vary
considerably. Some users may understand the difference
between function approximation and classification. Thus,
there is a need of hierarchical approaches where
requirements are iteratively mapped onto lower level
requirements until the lowest level is reached.


Figure 1 Types of Data Analysis

To choose the analysis methods, the requirements have to
be mapped onto properties of the methods and the various
stages or steps of data analysis have to be followed.

3. STAGES OF DATA ANALYSIS
Typically, there are five stages of data analysis viz.
Narrative, Coding, Interpretation, Confirmation, and
Presentation.
a. Narrative Review research questions; Write some
history; Describe a social process; Create summaries of
interviews; Describe functions / structures of group;
Write up critical events chronologically; Make list of
important facts. [2] Connect to your own experience.
[3] Read written descriptions. [4] Relate participants
story to your own experience; locate self in the story as
related to participant(s); Look at how participants speak
about self and their world. [5] Making metaphors; Note
reflections on collected data. [6]
b. Coding Create vignettes; create a conceptual
framework. [2] Identify data patterns; Extend analysis
International Journal of EmergingTrends & Technology in Computer Science(IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 3, Issue 2, March April 2014 ISSN 2278-6856


Volume 3, Issue 2 March April 2014 Page 97


by asking questions derived from the data. [3] Develop
meaning from the statements; organize meanings into
clusters of themes. [4] Break down text transcripts into
overlapping themes and Sub-themes; Organize data in
different ways to tap into different dimensions of data
sets. [5] Note patterns and themes; Cluster; Partition
variables; Subsume particulars into the general; Factor;
Note relations between variables; Find intervening
variables; Follow up surprises; Develop codes and
apply to textual data; Identify patterns, themes,
relationships between themes; Conduct an investigation
of common/different aspects; Categorize and sort data;
Order and reorder data by chronology, importance,
frequency [6]
c. Interpretation Develop a metaphor; Look at
theoretical framework; review relevant theories;
Engage in speculation; Look for relevance to
program/policy; Evaluate the project. [2] Relate to
theory; Refocus on the basis of your
tradition/discipline; Evaluate against a standard or
against participants interpretation; Position results in a
broader analytic framework; Make inferences using
inductive reasoning; Flesh out analytical framework;
[3] Extract significant statements related to the
phenomenon under study [4] Shift focus from
individual cases to groups. [5] Look for plausibility;
Build a logical chain of evidence; Make
conceptual/theoretical coherence; Weight evidence;
Check meaning of outliers; Use extreme cases; Make if-
then tests; Develop interpretation of findings; Contrast
data to determine what fits your assumptions or others
findings; Develop hunches; Re-state question to fit
data. [6]
d. Confirmation Contrast insider views
with outsider views. [2] Critique the research process;
Report systematic fieldwork procedures; Propose a
redesign of the study; Stop when you come to the end,
asking what needs to be done next?; Compare to a
known case; Analyze the interpretive process. [3] Use
member checks to validate the written description; Use
numbers to document, verify, and test interpretations.
[4] Look at ones assumptions. [5] Triangulate; Count;
Make contrasts and comparisons; Check for
representativeness; Check for researcher effects; Look
for negative evidence; Replicate a finding; Check out
rival explanations; Get feedback from participants;
Verify interpretations by member checks, peer review,
triangulation; Constantly compare earlier data with
later data using different bases for comparison. [6]
e. Presentation Consider the audience; Draw visual
display; Write in narrative form, borrowing form from
participants. [2] Emphasize important data; Take
suggestions from editors/committee/colleagues; Display
findings graphically; explore alternative formats for
presentation. [3] Use data analysis results to write an
exhaustive description. [4] Write up results in case
study form. [5] Use visual displays. [6]


Figure 2 Levels of Data Analysis

To solve or apply a data analysis problem, one should
follow each stage or level. It is important to understand
what is required as an output from the data analysis.
Based on the desired output, one can decide upto which
stage or level one has to reach.

4. TECHNIQUES OF DATA ANALYSIS
A. Conjoint Analysis / Choice Modeling:
Definition - Allows consumers preferences for a product
or service to be broken down into tradeoffs among its
individual attributes for the context in which overall
judgments are made. Conjoint analysis, a popular multi-
attribute preference assessment technique used in market
research, is a well suited tool to evaluate a multitude of
gamut mapping algorithms simultaneously. [7] The
objective of conjoint analysis is to determine what
combination of a limited number of attributes is most
preferred by consumers. Conjoint analysis is a multi-
attribute compositional model.
Application - Optimizing product configurations;
Studying price elasticities for demand; simulating market
response to new or modified offerings; Diagnosing
competitive strengths and weaknesses.
Pros - Of all survey research techniques, this most closely
replicates the real-world purchase process. It is flexible as
it can run what if scenarios including scenarios not
explicitly tested. Great for new product development and
pricing. Conjoint analysis helps determine the optimal
features of a product or service.
Cons - Models preference share rather than market
share.
There are limits to the number of features that can be
included in a study.

B. Factor Analysis:
Definition - Identifies a set of underlying dimensions
(Factors) within a set of variables, revealing unobserved
structure in the data. Factor analysis is applied as a data
reduction or structure detection method (the term factor
analysis was first introduced by Thurstone [8]). Factor
Analysis is an exploratory multivariate statistical method.
It is used to summarize the information contained in a
International Journal of EmergingTrends & Technology in Computer Science(IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 3, Issue 2, March April 2014 ISSN 2278-6856


Volume 3, Issue 2 March April 2014 Page 98


large set of variables in terms of a smaller set of
composite variables, called FACTORS.


Figure 3 Percentage of Variation explained by the factors

Application - Reducing the number of variables for
analysis; Identifying conceptual or benefit dimensions
underlying expressed product perceptions and
preferences.
Pros - Simplifies large or complex sets of
variables/attributes. Can be used to understand how the
customer thinks. Commonly used on subjective measures
such as attitudes, beliefs, and product attribute ratings.
Cons - Subjective interpretation of the results is a
component. Is often a companion to other analyses such
as segmentation, rather than an end itself.

C. Discriminate Analysis:
Definition - Examines how two or more groups
(generally respondents) differ from one another on the
basis of a number of predictor variables. Discriminant
Analysis works by combining variables in such a way that
the differences between predefined groups are maximized.
Linear discriminant analysis (LDA) and biased
discriminant analysis (BDA) are two effective techniques
for dimension reduction, which pay attention to different
roles of the positive and negative samples in finding
discriminating subspace. [9]


Figure 4 Discriminant Analysis Plot

Application - Understanding and modeling differences
between / among groups (e.g., buyers vs. non-buyers of
different brands); Predicting market behavior based on
demographic and psychographic variables.
Pros - Can be thought of as regression for categorical
dependent variables. Can include variables of differing
scales. Prediction is a powerful tool for finding segments
in databases for sales and direct marketing efforts.
Cons - Without careful implementation, models will not
perform as well on new data as they do on initial data.
LDA has limited efficiency in classifying sample data
from subclasses with different distributions, and BDA
does not account for the underlying distribution of
negative samples. [9]

D. Cluster Analysis:
Definition - Cluster analysis is an exploratory data
analysis tool for solving classification problems. Its object
is to sort cases (people, things, events, etc.) into groups,
or clusters, so that the degree of association is strong
between members of the same cluster and weak between
members of different clusters. A cluster is a group of
relatively homogenous cases or observations. Each cluster
thus describes, in terms of the data collected, the class to
which its members belong; and this description may be
abstracted through use from the particular to the general
class or type. Uses any of several techniques (viz. Nearest
Neighbors, K-Means etc.) to classify people, objects, or
variables into more homogeneous groups.
Application - Identifying / describing market segments;
developing typological findings and describing target
markets.
Pros - Allows a deeper understanding of the market. Can
greatly aid messaging and new product development by
targeting homogeneous groups.
Cons - Subjective interpretation of the results is a
component. The technique is mathematical and therefore
has no underlying model against which to test statistical
hypotheses. K-means is a fast cluster analysis method, in
which accuracy depends on the use of initialization
algorithms that are usually serial and slow. [10]

E. Structural Equation Modeling (SEM):
Definition - Also called causal modeling, it hypothesizes
causal relationships among variables and tests the causal
models with a linear equation system. It allows the
inclusion of latent variables (which are intangible
concepts such as intelligence, loyalty, or satisfaction, and
are difficult to measure). The basic idea of SEM differs
from the usual statistical approach of modeling individual
observations, since SEM considers the covariance
structure of the data [11]. In SEM, the parameters are
estimated by minimizing the difference between the
observed covariance and those implied by a structural or
path model. The Structural Equation Model consists of a
set of linear structural equations containing observed
variables and parameters defining causal relationships
among the variables. Variables in the equation system can
be endogenous (i.e., dependent from the other variables in
the model) or exogenous (independent from the model
itself). The structural equation model specifies the causal
relationship among the variables, describes the causal
International Journal of EmergingTrends & Technology in Computer Science(IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 3, Issue 2, March April 2014 ISSN 2278-6856


Volume 3, Issue 2 March April 2014 Page 99


effects and assigns the explained and the unexplained
variance. [12]
Application - Customer satisfaction and loyalty studies;
Driver analysis.
Pros - Can model latent variables. Can utilize data of
differing scales. Path diagrams make the results easier to
understand and communicate to management.
Cons - Models can be complex. Tends to need large
sample sizes.

F. Regression Analysis:
Definition - Studies the dependence of a single, interval
scale variable (such as market share) on one (simple) or
more (multiple) variables. Regression analysis is to nd a
crisp relationship between the dependent and independent
variables and to estimate a variance of measurement
error. [13] Logistic regression is an important technique
for analyzing and predicting data with categorical
attributes. Logistic regression is an important statistical
method for modeling and predicting categorical data. [14]

Figure 5 Linear Regression Graphs indicating the type
of relationships

Application - Forecasting sales, market share,
profitability; Modeling buying patterns and impact of
market programs; Estimating elasticity and response
functions.
Pros - Tremendous predictive modeling tool. A tried and
true methodology. Diagnostics can be used to evaluate the
success of the model.
Cons - Susceptible to outliers and highly correlated data.
Slow convergence speed and premature are two key
problems existing in the regression analysis techniques.
[15] When we conduct logistic regression analysis in real-
world data mining applications, we often encounter the
difficulty of not having the complete set of data in
advance. [14]

5. LIMITATIONS IN DATA ANALYSIS
Sophistication in statistics compensates for lack of data
and/or business understanding. Increased understanding
and acceptance of sophisticated statistical techniques in
business has resulted in enhanced availability of packaged
solutions. These solutions have twin advantages in
increasing the usage of statistical tools by business users
and reducing the lead time required to gain
results/insights. However, the convenience has led to an
added temptation of supplementing lack of data/business
understanding with sophisticated statistics. This has
resulted in overreliance on algorithmic approaches to
analytics problem solving. In this approach, business
understanding is used to validate the outcome of analytics
-- not necessarily the analytics process. A common
symptom of this problem is the prevalence of esoteric
modeling and data mining techniques without enough
inquiry in to their appropriateness and applicability for
the problem at hand. Unfortunately, it is model accuracy
that often becomes the final arbiter. This results in a
classic trap of choosing the technique that gives
maximum accuracy over the one that makes most business
sense. This can be best avoided by striking a balance
between algorithmic and heuristic approaches, which is
essentially a balance between a highly accurate model and
a model that makes business sense. Tilting to either
extreme is dangerous.
Extracting meaning out of randomness. Any data that
you encounter has non-zero meaningful pattern to
noise ratio and the art is in being able to isolate and
explain the meaningful pattern and being able to tolerate
the unexplained noise as error. But this is easier said than
done. Suppose you are relying on a sales-forecasting
model to help validate the quarterly targets you received
from finance. Wouldnt you want your model to be as
accurate as possible so that you can set realistic goals?
This is a reasonable expectation, but overemphasis on
model accuracy might land you in uncharted waters. Ask
any statistician or an expert modeler and you will hear
that statistics provides enough tools and implements to
make the data say whatever you want to hear.
A common outcome of this problem is that you can get a
model that is able to explain minute variations in the data
that it modeled but fails miserably on any new data. This
is call called the Problem of Overfitting in statistics.
This happens because noise is a random phenomenon that
is beyond the control of your business or even known
external factors (that is why it is noise!), and the part of
your model that tries to explain this noise fails when it is
trying to look at new data.
The way to control for this problem is to always compare
in-sample validation (model accuracy on the data over
which it is built) with out-sample validation (model
accuracy on the data which model has not seen). In data
mining parlance, these are called training and test data
sets, respectively. For the model to be stable, meaning
explaining only systematic patterns, the in-sample error
should be reasonably close to the out-sample error. What
is reasonable depends on the particular context of the
problem, but typically if difference between the two errors
is greater than 10 percent than you have reason to worry.
Correlation versus causation modeling will help
uncover causal relationships. This problem can be
illustrated with two simple examples. Suppose you are a
meteorologist who has a poor track record of predicting
International Journal of EmergingTrends & Technology in Computer Science(IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 3, Issue 2, March April 2014 ISSN 2278-6856


Volume 3, Issue 2 March April 2014 Page 100


rain. Your model has the usual predictors such as
weather, temperature, extent of cloud formation, wind
speed, etc., but still is not reliable. It is clear that you are
missing some significant predictor. One day, your analyst
comes to you with a breakthrough: a variable that is a
very significant predictor in the model and is able to
validate historical data very accurately. Unfortunately the
variable is weekly sales of umbrellas. This example shows
that modeling does not establish direction of causality; in
this case we all know that rains cause sales of umbrellas
and not vice versa, but there is no way for the model to
know this. Another example: Suppose you are a regional
sales head of a leading home appliance manufacturer and
you want to figure out what drives sales of air
conditioners in a region. Your insights team comes back
to you with an excellent driver model that is able to
explain the sales of air conditioners fairly accurately. The
only problem is that the most significant variable in the
model is the sale of aerated drinks. This example shows
that modeling does not correct for the presence of
confounding factors. We know that sale of aerated drinks
is not causing the sale of air conditioners and vice versa;
ambient temperature is the common factor that impacts
both. Because the impact is directionally the same, they
are highly correlated and the driver model will naturally
show this variable as a significant predictor. In this case,
sale of aerated drinks is a confounding variable that
should not be present in the model. The recommended
way to overcome this problem is to start with a hypothesis
matrix. A hypothesis matrix lays down hypotheses
connecting every predictor with the predicted and also
records the direction of impact. For example, in the above
problem, a hypothesis connecting price with sales will
read, As price increases, the sales of air conditioners
drop. No predictor should go in to the model unless there
is a well thought-out business hypothesis.
Extrapolating the models way beyond the permissible
limits. Well-designed statistical models can answer a lot
of business questions, but one has to realize that there is
no perfect model that is free from all constraints.
Judicious use of statistical models can aid business
decision-making, but not being aware of a models limits
can be counterproductive. A statistical model is based on
underlying data and is subject to the limitations of the
data captured. For instance, a model to predict sales
cannot take into account the impact of an earthquake on
sales if the historical data has never captured earthquakes.
Hence, this model will not be able to predict sales
accurately in the event of an earthquake. These Black
Swan events are often the cause of considerable distress;
the subprime crisis being the most recent example.
Another example is marketing mix models that are used
to assess the impact of marketing on sales. These models
are often used as tools for scenario planning where the
business user aims to estimate the sales based on different
spend scenarios. It is important to realize that any model
is only accurate in the range of data it has seen and,
therefore, if the scenario is drastically different from
history, there is a very high chance of error. For example,
if the marketing division decides to increase spend by 5x,
the same model might not be as accurate as it has been
developed based on the historical spend. To understand
intuitively, any model is just an interpreter that interprets
data into a language we can understand. If the data does
not speak about earthquakes or high marketing spends,
the model will not be able to interpret it accurately.
Imputing missing values with mean or median is the
best way of treating missing values. Any real life data
being used for statistical analysis is likely to have quality
issues and missing values in variables is one of the most
recurring issues. Therefore, it becomes imperative for an
analyst/statistician to impute missing values to avoid loss
of data and retain maximum information. Often we
encounter scenarios where there are 5-10 percent missing
values in variables and we are inclined to impute them
with the mean or median value. While it does the job in
certain cases, extreme caution needs to be taken before
imputing missing values as it might have significant
consequences on model behavior and interpretation of
parameters estimates.
It is important to realize that missing values can tell a
story and help us better understand the business dynamics
in many cases. Hence, it is necessary to look deeper
whenever a variable has missing values before coming up
with an imputation. For example, while conducting an
analysis on premiums for a large health insurer, it was
observed that 5 to 6 percent of values were missing.
Further analysis revealed that the missing values were
only for one state in the U.S. for a certain time period.
Research revealed that the company was temporarily
banned from operating in that state due to a legal issue. It
is, therefore, recommended to look for the cause of the
missing values before jumping into imputation.
This is by no means an exhaustive list but is certainly a
representative one of the types of errors encountered in
application of statistics to business. Some of these
mistakes stem from incomplete understanding of
statistics, some from the incomplete understanding of
underlying business and the rest from the inability to
marry the two together. With the advent of data analytics
and decision sciences, our decisions are being
increasingly impacted by these errors, which can result in
major implications for our business and therefore the need
for the business executives to appreciate, sense and avoid
these common pitfalls.

6. CONCLUSION
In each discipline, companies that are investing in
analytics often find that the businesses do not consume
the outputs for a variety of reasons. And as companies
realize that one of the many factors that separate failure
from success is their ability to effectively use analytics to
make better decisions, it becomes necessary for the key
stakeholders to ensure the right set of investments are
made on the process, technology and people dimensions
to bridge the gap between the creation and consumption
International Journal of EmergingTrends & Technology in Computer Science(IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 3, Issue 2, March April 2014 ISSN 2278-6856


Volume 3, Issue 2 March April 2014 Page 101


of analytics. The sooner businesses can get this done, the
better their chances are of leveraging the potential
competitive advantage offered by analytics.
There are various techniques that can be used in
achieving the desired results through data analysis.
However, it is important to choose the correct data
analysis technique for achieving the desired results.
Through this paper we have tried to provide a
comparative study of few of the data analysis techniques
available. We have also highlighted the common
limitations that are made in the name of data or statistical
analysis. This paper would enable the readers for the
selection of an appropriate data analysis method for their
problem.

References
[1] Martin Spott and Detlef Nauck; On Choosing an
Appropriate Data Analysis Algorithm The 2005
IEEE International Conference on Fuzzy Systems.
[2] LeCompte, M.D., & Schensul, J. J. (1999).
Analyzing and interpreting ethnographic data
Walnut Creek, CA: AltaMira Press
[3] Wolcott, H. F. (1998). Transforming qualitative
data: Description, analysis, interpretation. Thousand
Oaks, CA: Sage.
[4] Beck, C.T. (2003). Initiation into qualitative data
analysis. Journal of Nursing Education, 42 (5), 231.
[5] Doucet, A., & Mauthner, N. (1998). Voice,
reflexivity, and relationships in qualitative data
analysis: Background paper for workshop on Voice
in Qualitative Data Analysis. Retrieved August 15,
2001,
[6] Miles, M. B., & Huberman, A.M. (1994).
Qualitative data analysis: An expanded sourcebook.
(2nd ed.). London: Sage.
[7] Peter Zolliker, Zofia Baranczuk, Iris Sprow, and
Joachim Giesen, Conjoint Analysis for Evaluating
Parameterized Gamut Mapping Algorithms IEEE
TRANSACTIONS ON IMAGE PROCESSING,
VOL. 19, NO. 3, MARCH 2010
[8] Z. Yi, M. Ye, J. C. Lv, and K. K. Tan, Convergence
analysis of deterministic discrete time system of Ojas
PCA learning algorithm, IEEE Trans. Neural Netw.,
vol. 16, no. 6, pp. 13181328, Nov. 2005.
[9] Yijuan Lu, Member and Qi Tian, Discriminant
Subspace Analysis: An Adaptive Approach for Image
Classification IEEE TRANSACTIONS ON
MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER
2009
[10] Esteves, R.M., Hacker, T., Chunming Rong, Cluster
analysis for the cloud: Parallel competitive fitness
and parallel K-means++for large dataset analysis
Cloud Computing Technology and Science
(CloudCom), 2012 IEEE 4th International
Conference.
[11] K. A. Bollen, Structural Equations with Latent
Variables. New York: Wiley, 1989.
[12] Laura Astolfi, Febo Cincotti, Claudio Babiloni,
Filippo Carducci, Alessandra Basilisco, Paolo M.
Rossini, Serenella Salinari, Donatella , Mattia, Sergio
Cerutti, D. Ben Dayan, Lei Ding, Ying Ni, Bin He
and Fabio Babiloni, Estimation of the Cortical
Connectivity by High-Resolution EEG and Structural
Equation Modeling: Simulations and Application to
Finger Tapping Data IEEE TRANSACTIONS ON
BIOMEDICAL ENGINEERING, VOL. 52, NO. 5,
MAY 2005.
[13] Hideo Tanaka, Member and Haekwan Lee, Interval
Regression Analysis by Quadratic Programming
Approach, IEEE TRANSACTIONS ON FUZZY
SYSTEMS, VOL. 6, NO. 4, NOVEMBER 1998.
[14] Ruibin Xi, Nan Lin, and Yixin Chen; Compression
and Aggregation for Logistic Regression Analysis in
Data Cubes; IEEE TRANSACTIONS ON
KNOWLEDGE AND DATA ENGINEERING, VOL.
21, NO. 4, APRIL 2009.
[15] Xiaorong Cheng, Lin Sun, Ping Liu; Application of
regression analysis based on genetic particle swarm
algorithm in financial analysis; Computer Design
and Applications (ICCDA), 2010 International
Conference.

You might also like