Professional Documents
Culture Documents
Prescriptive
Analytics.
A. Descriptive Analytics: It is the first step which consists of gathering and initial checking of data from experiment.
Experiment can be of many kind including literal and perceived meanings. Eg. Promotional campaign, Website
visit data etc. Data is collected from here and checked for below points.
1. Detection of mistakes during data entry.
2. Checking for assumptions and constrains.
3. Pattern recognition like correlation, linear regression, auto regressive nature etc.
4. Determining relationship between exploratory variables.
5. A future rough direction towards how the future perspectives will be designed.
Steps involve are arranging unstructured data into structured data. Unstructured data are one which are not
properly arranged in excel format in form of well-defined tables. It must be arranged in tables in such a way that
any mismatch can be found easily by visualization. Plus, type of data i.e. categorical, ordinal, binary etc. is taken
care of. Other steps consist of:
1. Finding central tendency: Which include type of distribution. Finding potential outliers. The most common
measure of central tendency is mean. For skewed distribution and outliers (which cannot be avoided)
median is preferred. Modes are used for grouping purpose (examples are available on net).
2. Skewness and Kurtosis
3. Histogram: Challenge is finding the bin size. For higher bin size a data may be normally distributed but same
data with smaller bin size may not be normally distributed. This is carefully selected. Can be done using
clustering to get similar data under one bin.
4. QQ Plot: Used to check and interpret normal distribution.
5. Correlation and covariance: Cor (X,Y) = Cov (X,Y)/SDx.SDy, Covariance is difficult to interpret. Correlation is
more robust and only varies between -1 and 1.
6. Box Plot: Highly handy tool for data exploration especially for non-normally distributed data.
Further exploratory data analysis like t test, ANOVA etc. can be used for a detailed initial report.
B. Predictive Analytics: Consists of:
1. Regression (Linear/Multiple): Can be used for forecasting, election prediction.
2. Logistic Regression: Used for categorical data. Highly use full in medical/insurance industry, Loan default,
Cricket, basketball etc.
3. CART/Forest: Can be used for both categorical and normal data. Useful as it helps in formation of rules and
future data entry can be done group which are created by rules. Eg: Medical field, D2Hawkeye (a medical
analytics company uses such model), Sensex data, vote prediction etc.
4. Text Analytics: Used for sentiment analysis, understanding of trends etc. Are clubbed with specialized
libraries in R and can be used for twitter analytics, Facebook analytics (Personally able to extract Facebook
IDs(who commented maximum on page posts) from page of The Hindu, can be used to create focused
groups (connect with marketing terminologies) and some kind of loyalty programs (Connect with IMC)
helping us in further understanding of what customer wants-connect with consumer behaviour), further
google trend can be connected using R which gives scaled search interest for a particular or a group of words
on google, can be used for stock price prediction using news as news affects the sentiment of share owners
for very short time.
5. Clustering: Can be of hierarchical or K mean type. Use for market segmentation, used by IMDB, Netflix to
suggest movies.
6. Time Series: Used in the field of finance and economics especially stock price prediction etc. It captures auto
regression i.e. how previous data will affect any new data. It is possible the effect of historic data is of
exponential nature. Similarly, how errors effect future data must be studied too. Examples are moving