You are on page 1of 39

Course Title : Introduction to R in Business

Applications
Ram Mohan Dhara|
IMTG/ PGDM/ Term-V / 2017-2019
Session 1 (Part-1): Introduction to Business Analytics
To understand ‘Business Analytics’

To understand ‘Broad Applications Course


of Business Analytics’ Objectives
To learn ‘R programming
language’
Course Description
Session Topic to be covered
No.
1 Introduction to Business Analytics (Part-1) • Intro to Domain specific analytics – Marketing,
Finance, HR and Operations
Introduction to R and R-installation (Part-2)
• Analytics as a career choice • Intro to Business specific analytics – Healthcare,
Insurance, Retail, Credit Cards
• Analytics and Data Science lexicon
• R-installation – R and R-studio
• Analytics maturity cycle – Descriptive, Diagnostic,
Predictive and Prescriptive
2 Basic R- programming • Commonly used functions in R – if-then-else,
• Basic operators – Arithmetic, Relational, Logical switch, different types of loops, next, break etc.
and Assignment • Other commonly used functions - append(),
range(), rep(), seq(), grep(), summary() etc.
3 Data Structures in R (part-1) • Working with datasets – calling, editing, soring,
• Data types – scalar, vector, matrices, arrays, data slicing, merging etc.
frames and list
4 Data Structures in R (part-2) • Import/ export in R
• Working with text data • Apply functions
• Working with date and time data
Course Description
Session Topic to be covered
No.
5 Data Visualization (Part-1)
• Basic graphics - Bar chart, Kernel density plot, Pie Chart,
Histogram, Line chart, Box plot, Heat map
6 Data Visualization (Part-2)
• Faceted graphics using lattice
• Creating graphics using GGPlot
7 Basic statistics and distributions • Normal distribution and functions in R –
• Basic concepts – population and sample, data types, scales RNORM(), DNORM(), PNORM(),
of measurement, central limit theorem QNORM()
• Sampling distribution
8 Hypothesis Testing • t-test, z-test, chi^2 test, F-test
• Basic concepts – Null and Alternate, Type-1 and Type-2
errors,
9 Introduction to Multivariate analysis and machine learning • What is ML, different types of ML
techniques, applications of ML
• Regression analysis and correlation
analysis
10 Dimensionality reduction (Factor Analysis )
• Basic concepts in factors analysis – Scree plot, Communality,
Eigen value, Factor loading, Factor Rotation, Factor Score
Course Description
Session Topic to be covered
No.

11 Unsupervised machine learning – clustering techniques • K-means clusters,


• Basic concepts in cluster analysis – distance, • Hierarchical cluster
splitting algorithms, dendograms, stopping rules
12 Supervised machine learning – classification • Logistic regression
techniques (Part-1)
13 Supervised machine learning – classification • Decision tree analysis
techniques (Part-2)
14 Introduction to Rattle (a menu-driven IDE for R) • Solve a case-study using Rattle

15 Project work (1)

16 Project work (2)


• Lectures (Presentations)
Instructional • Hands –on practices on R
methods • Case Studies
• Video clips

Instructional
Methods
• Class participation/ attendance
• Quizzes
Evaluation • Assignments
methods • Group project
• End-term exam
Analytics :
career choice
Published by analyticsindiamag.com in June 2017 – one of the popular online analytics resources
Exponential Growth Engineering degree, followed by any post-grad

Banking and financial services, followed by Bangalore, Delhi NCR followed by Mumbai
e-commerce
Captive clients > 50% R and Python leading
programming languages

Experience 2-5 years 10-15 lacs INR starting salary


Orange - Business
Blue-Developer
Red - Creative
Green - Researcher
Investigating Data
Scientists, their Skills
and Team Makeup
• A new survey of 490 data
professionals from small to large
companies, conducted by
Analytics Week in partnership with
Business Over Broadway, provides
a look into the field of data
science.
• Solving problems with data
requires expertise across different
skill areas: 1) Business, 2)
Technology, 3) Programming, 4)
Math & Modelling and 5)
Statistics.

http://businessoverbroadway.com/investigating-data-scientists-their-
skills-and-team-makeup
What is Data Science/
Business Analytics?
• Big data
• Data science
• Business analytics
• Machine learning
• Artificial intelligence
• Deep learning
• Neural networks
• Data mining
• Tableau
• Hadoop / Apache Spark
• R, Python and SAS
Data Science/ Business Analytics Lexicon
Big Data
• Volume - The quantity of data
• Variety - The type and nature of the data.
• Velocity - In this context, the speed at which the data is generated
and processed to meet the demands and challenges.
• Variability - Inconsistency of the data set can hamper processes to
handle and manage it.
• Veracity - The data quality of captured data can vary greatly, affecting
the accurate analysis.
Data Science/ Business Analytics Lexicon
• Data science - also known as data-driven science, is an interdisciplinary
field about scientific methods, processes, and systems to extract
knowledge or insights from data in various forms, either structured or
unstructured.
• Business Analytics – Analytics is the science of analysis. Analytics is the
discovery, interpretation, and communication of meaningful patterns
in data. Application of analytics in various fields of business is Business
Analytics.
• Machine Learning - Machine learning is a "Field of study that gives
computers the ability to learn without being explicitly programmed".
Machine learning explores the study and construction of algorithms that
can learn from and make predictions on data.
Data Science/ Business Analytics Lexicon
• Artificial intelligence (AI) is the ability of a computer program or a
machine to think and learn. In general use, the term "artificial
intelligence" means a machine which mimics human cognition.
Examples - understanding human speech, game systems (such as
Chess ) , self-driving cars, and interpreting complex data.

• Deep learning - is a class of machine learning algorithms that use a


cascade of multiple layers. Each successive layer uses the output from
the previous layer as input. Learn in supervised (e.g., classification)
and/or unsupervised (e.g., pattern analysis) manners.
Data Science/ Business Analytics Lexicon
• Artificial neural networks (ANNs) – is a form of computing system inspired
by the biological neural networks that constitute human brains. Such
systems learn progressively and improve performance to do tasks by
considering examples. They have found most use in applications difficult to
express in a traditional computer algorithm using rule-based programming.
Application - Image identification.

• Data mining is the computing process of discovering patterns in large data


sets involving methods at the intersection of machine learning, statistics,
and database systems. The overall goal of the data mining process is to
extract information from a data set and transform it into an
understandable structure for further use.
Data Science/ Business Analytics Lexicon
• Hadoop - is an open source, Java-based programming framework that
supports the processing and storage of extremely large data sets in a
distributed computing environment. It is part of the Apache project
sponsored by the Apache Software Foundation. Hadoop quickly
emerged as a foundation for big data processing tasks, enormous
volumes of sensor data, including from internet of things sensors.
• Apache Spark - is a fast and general engine for large-scale data
processing.
• Tableau - is a data visualization application that can help anyone see
and understand their data. Connect to almost any database, drag and
drop to create visualizations, and share with a click.
Data Science/ Business Analytics Lexicon
• R - is a programming language and software environment for statistical
analysis, graphics representation and reporting. R was created by Ross
Ihaka and Robert Gentleman at the University of Auckland, New Zealand,
and is currently developed by the R Development Core Team.

• Python - is a widely used high-level programming language for scientific


programming, created by Guido van Rossum and first released in 1991.
When he began implementing Python, Guido van Rossum was also reading
the published scripts from “Monty Python's Flying Circus”, a BBC comedy
series from the 1970s. Van Rossum thought he needed a name that was
short, unique, and slightly mysterious, so he decided to call the language
Python.
Data Science/ Business Analytics Lexicon
• The SAS language - is a computer programming language used for
statistical analysis, created by Anthony James Barr at North Carolina
State University. It can read in data from common spreadsheets and
databases and output the results of statistical analyses in tables,
graphs, and as RTF, HTML and PDF documents.
Data Scientist’s toolkit
• Big data management –
Hadoop / Apache Spark
• Programming Language
– R or Python
• Data visualization
applications – Tableau,
Google Studio
After completing this session, you will be able to –

• Explain the need for Business Analytics


Session • Understand the Business Analytics maturity
objectives • Understand the application of Analytics in
domains / businesses
Need for Business Analytics
A business needs to take several • Which sellers may miss their
decisions, such as: target orders?
• What is the possibility that a • How to align sellers with
customer will buy a product ? customer opportunities to target
• Which should be the next maximum revenue impact?
recommended product? • Which factors can influence the
• Does any customer segment exist new version of a product in the
in which there is substantial marketplace?
untapped potential?
Need for Business Analytics
• Which customers are likely to go • What would be the optimal
to its competitors? marketing strategy?
• Which type of talent is required • Which employees can possibly
to achieve its targets? leave the company voluntarily?
• Which business segments are • How many employees are
not performing as expected? required to be hired to achieve
its production goals in the next
six months?
Business Analytics Maturity
• Descriptive
• Descriptive Statistics
• Data Query
• Data Visualization
• Diagnostic
• Identifying causes leading to effects
• Predictive
• Predictive Modelling
• Forecasting
• Prescriptive
• Optimization
• Simulation
Domain
specific
analytics :
Marketing
Analytics
Domain
specific
analytics :
Supply Chain
Analytics
Domain
specific
analytics : HR
Analytics
Domain
specific
analytics :
Financial
Analytics
Business Specific Analytics : Healthcare
Business Specific Analytics : Insurance
Business Specific Analytics : Retail
Business Specific Analytics : Credit Cards
Summary : what we have learnt
• What is Business Analytics / Data Science?
• Lexicon in Business Analytics/ Data Science
• Data Science : the most sought after career
choice and highest paid
• What skill sets are required to be a Data
Scientist
• Analytics maturity graph
• Application of business analytics in various
domains
• Application of business analytics in various
business areas
This concludes the session :
Introduction to Business Analytics

Next session : Introduction to R and


Installation of R – package

You might also like