You are on page 1of 31

BIG DATA ANALYTICS

Every day, we create 2.5 quintillion bytes of data so much that 90% of the
data in the world today has been created in the last two years alone.

Data created keeps on increasing at a huge speed.

This data comes from everywhere: sensors used to gather climate


information, posts to social media sites, digital pictures and videos, purchase
transaction records, cell phone GPS signals and so many other different
sources.

This data is called BIG DATA.


BIG DATA ANALYTICS
Big data is data that exceeds the processing capacity of conventional
database systems.
The data is too big, moves too fast, or does not fit the structures of
traditional database architectures.
Big data is often categorized to a few varieties including social data, machine
data, and transactional data.
Social media data is involves 230 million tweets posted on Twitter per day, 2.7
billion Likes and comments added to Facebook every day, and 60 hours of
video uploaded to YouTube every minute.
Big data is the term for a collection of data sets so large and complex that it
becomes difficult to process using on-hand database management tools or
traditional data processing applications.
Big Data Analytics

Big Data analytics is the process of examining data which are from variety of
sources, types, volumes and complexities, to uncover hidden patterns,
unknown correlations, and other useful information.
The goal is to find business ideas or possibilities that were not previously
possible or were missed, so that better decisions can be made.
The data is categorized into 4Vs
1. VOLUME
2. VELOCITY
3. VARIETY
4. VERACITY
4 Vs

VOLUME It refers to the vast amount of data that are generated every
second.

VELOCITY- It refers to the speed at which the new data is generated and the
speed at which the data moves around.

VARIETY- It refers to the different types of data that we can use now.

VERACTIY- It refers to data which are not accurate and messy.


WHY BIG DATA?

Understanding and Targeting Customers


This is one of the biggest and most publicized areas of big data use today. Here, big
data is used to better understand customers and their behaviors and preferences.
Understanding and Optimizing Business Processes
Big data is also increasingly used to optimize business processes. Retailers are able to
optimize their stock based on predictions generated from social media data, web
search trends and weather forecasts
Personal Quantification and Performance Optimization
Big data is not just for companies and governments but also for all of us individually.
We can now benefit from the data generated from wearable devices such as smart
watches or smart bracelets. Take the Up band from Jawbone as an example: the
armband collects data on our calorie consumption, activity levels, and our sleep
patterns
Improving Healthcare and Public Health
The computing power of big data analytics enables us to decode entire DNA
strings in minutes and will allow us to find new cures and better understand and
predict disease patterns
Improving Sports Performance
Most elite sports have now embraced big data analytics.
Improving Science and Research
Science and research is currently being transformed by the new possibilities big
data brings.
Optimizing Machine and Device Performance
Big data analytics help machines and devices become smarter and more
autonomous. For example, big data tools are used to operate Googles self-driving
car.
SOURCES OF BIG DATA

Just like the data storage formats have evolved, the sources of data have also
evolved and are ever expanding.

There is a need for storing the data into a wide variety of formats. With the
evolution and advancement of technology, the amount of data that is being
generated is ever increasing.

Sources of Big Data can be broadly classified into many different categories.
SOURCES OF BIG DATA
NETWORKING
A network is any collection of independent computers that communicate with
one another over a shared network medium.
A computer network is a collection of two or more connected computers.
When these computers are joined in a network, people can share files and
peripherals such as modems, printers, tape backup drives, or CD-ROM drives.
When networks at multiple locations are connected using services available
from phone companies, people can send e-mail, share links to the global
Internet, or conduct video conferences in real time with other remote users.
Types of Networks

LANs (Local Area Networks)


LANs are networks usually confined to a geographic area, such as a single
building or a college campus. LANs can be small, linking as few as three
computers, but often link hundreds of computers used by thousands of people.

WANs (Wide Area Networks)


Wide area networking combines multiple LANs that are geographically
separate. Network spread geographically (country or across globe) is called
WAN(Wide Area Network)
MANs (Metropolitan area Networks)
The refers to a network of computers with in a City.

VPN (Virtual Private Network)


VPN uses a technique known as tunneling to transfer data securely on the
Internet to a remote access server on your workplace network. Using a VPN helps
you save money by using the public Internet instead of making longdistance
phone calls to connect securely with your private network. There are two ways
to create a VPN connection, by dialing an Internet service provider (ISP), or
connecting directly to Internet
CATEGORIES OF NWTWORK
Network can be divided in to two main categories:
Peer-to-peer.
In peer-to-peer networking there are no dedicated servers or hierarchy among
the computers. All of the computers are equal and therefore known as peers.
Normally each computer serves as Client/Server and there is no one assigned to
be an administrator responsible for the entire network.

Peer-to-peer networks are good choices for needs of small organizations where
the users are allocated in the same general area, security is not an issue and the
organization and the network will have limited growth within the foreseeable
future.
Server based.
The term Client/server refers to the concept of sharing the work involved
in processing data between the client computer and the most powerful server
computer
OSI MODEL

Open System Interconnection (OSI) reference model has become an


International standard and serves as a guide for networking.
This model is the best known and most widely used guide to describe
networking environments.
Vendors design network products based on the specifications of the OSI
model.
It provides a description of how network hardware and software work
together in a layered fashion to make communications possible.
It also helps with trouble shooting by providing a frame of reference that
describes how components are supposed to function.
OSI MODEL
There are seven to get familiar with and these are the physical layer, data link
layer, network layer, transport layer, session layer, presentation layer, and the
application layer.

Physical Layer, is just that the physical parts of the network such as wires,
cables, and there media along with the length. Also this layer takes note of the
electrical signals that transmit data throughout system.

Data Link Layer, this layer is where we actually assign meaning to the
electrical signals in the network. The layer also determines the size and format
of data sent to printers, and other devices. Another thing to consider in this
layer is will also allow and define the error detection and correction schemes
that insure data was sent and received.
Network Layer, this layer provides the definition for the connection of two
dissimilar networks.

Transport Layer, this layer allows data to be broken into smaller packages for
data to be distributed and addressed to other nodes (workstations).

Session Layer, this layer helps out with the task to carry information from
one node (workstation) to another node (workstation). A session has to be
made before we can transport information to another computer.

Presentation Layer, this layer is responsible to code and decode data sent to
the node.

Application Layer, this layer allows you to use an application that will
communicate with say the operation system of a server. A good example
would be using your web browser to interact with the operating system on a
server such as Windows NT, which in turn gets the data you requested.
OSI MODEL
NETWORK TOPOLOGIES
A network topology is the geometric arrangement of nodes and cable links in a
LAN.
There are four types of topologies:
Star- in a star topology each node has a dedicated set of wires connecting it to
a central network hub. Since all traffic passes through the hub, the hub
becomes a central point for isolating network problems and gathering network
statistics.
Ring- a ring topology features a logically closed loop. Data packets travel in a
single direction around the ring from one network device to the next. Each
network device acts as a repeater, meaning it regenerates the signal.
Bus- the bus topology, each node attaches directly to a common cable. This
topology most often serves as the backbone for a network. In some instances,
such as in classrooms or labs, a bus will connect small workgroups.
Mesh- It is a point-to-point connection to other nodes or devices. All the
network nodes are connected to each other.
IP ADDRESSING
An IP (Internet Protocol) address is a unique identifier for a node or host
connection on an IP network.

An IP address is a 32 bit binary number usually represented as 4 decimal


values, each representing 8 bits, in the range 0 to 255 (known as octets)
separated by decimal points. This is known as "dotted decimal" notation.

Example: 192.16.4.1

Every IP address consists of two parts, one identifying the network and one
identifying the node. The Class of the address and the subnet mask determine
which part belongs to the network address and which part belongs to the
node address.
ADDRESS CLASSES
There are 5 different address classes. You can determine which class any IP
address is in by examining the first 4 bits of the IP address.
Class A addresses begin with 0xxx, or 1 to 126 decimal.
Class B addresses begin with 10xx, or 128 to 191 decimal.
Class C addresses begin with 110x, or 192 to 223 decimal.
Class D addresses begin with 1110, or 224 to 239 decimal.
Class E addresses begin with 1111, or 240 to 254 decimal
CLOUD COMPUTING

Cloud computing means storing and accessing data and programs over the
Internet instead of your computer's hard drive.
The cloud makes it possible for users to access information from anywhere
anytime.
It removes the need for users to be in the same location as the hardware that
stores data. Once the internet connection is established either with wireless
or broadband, user can access services of cloud computing through various
hardwares.
This hardware could be a desktop, laptop, tablet or phone.
Cloud offers a reliable online storage space.
It transfers the processing required to use web applications from the browser
as processing is done in the servers of cloud computing.
CLOUD COMPUTING

Cloud computing comprises of 2 components -the front end and the back end.
The front end includes clients devices and applications that are required to
access cloud.
And the back end refers to the cloud itself.
The whole cloud is administered by a central server that is used to monitor
clients demands.
Cloud computing systems must have a copy of all clients data to restore
service which may arise due to system breakdown
PROPERTIES OF CLOUD

Cloud computing is user-centric


Once you as a user are connected to the cloud, whatever is stored there the
documents, messages, images, applications becomes yours. In addition, not only
is the data yours, but you can also share it with others
Cloud computing is task-centric.
Instead of focusing on the application and what it can do, the focus is on what
has to be done and how the application can do it for you.
Cloud computing is powerful.
Connecting hundreds or thousands of computers together in a cloud creates a
wealth of computing power which is impossible with a single desktop PC.
Cloud computing is accessible
Because data is stored in the cloud, users can instantly retrieve more information
from multiple not limited to a single source of data, as you are with a desktop
PC.
Cloud computing is intelligent.
With all the various data stored on the computers in a cloud, data mining and
analysis are necessary to access that information in an intelligent manner.
Cloud computing is programmable
Many of the tasks necessary with cloud computing must be automated. For
example, to protect the integrity of the data, information stored on a single
computer in the cloud must be replicated on other computers in the cloud. If that
one computer goes offline, the clouds programming automatically redistributes
that computers data to a new computer in the cloud
SERVICE MODELS
Infrastructure as a Service (IaaS)
It means you're buying access to raw computing hardware over the Net, such as
servers or storage. Since you buy what you need and pay-as-you-go, this is often
referred to as utility computing. Ordinary web hosting is a simple example of IaaS:
you pay a monthly subscription or a per-megabyte/gigabyte fee to have a hosting
company serve up files for your website from their servers.
Software as a Service (SaaS)
Customers dont pay for owning the software rather, they pay for using it. Web-
based email and Google Documents are perhaps the best-known examples. Zoho is
another well-known SaaS provider offering a variety of office applications online.
Platform as a Service (PaaS)
It means you develop applications using Web-based tools so they run on systems
software and hardware provided by another company. So, for example, you might
develop your own ecommerce website but have the whole thing, including the
shopping cart, checkout, and payment mechanism running on a merchant's server. App
Cloud (from salesforce.com) and the Google App Engine are examples of PaaS.
DEPLOYMENT MODELS
Private cloud: The cloud infrastructure is operated solely for an organization. It
may be managed by the organization or a third party and may exist on premise or
off premise.
Community cloud: The cloud infrastructure is shared by several organizations and
supports a specific community that has shared concerns (e.g., mission, security
requirements, policy, and compliance considerations). It may be managed by the
organizations or a third party and may exist on premise or off premise.
Public cloud: The cloud infrastructure is made available to the general public or a
large industry group and is owned by an organization selling cloud services.
Hybrid cloud: The cloud infrastructure is a composition of two or more clouds
(private, community, or public) that remain unique entities but are bound
together by standardized or proprietary technology that enables data and
application portability (e.g., cloud bursting for load-balancing between clouds).
PROS OF CLOUD COMPUTING

Lower-Cost Computers for Users


Improved Performance
Fewer Maintenance Issues
Pay per use
Unlimited Storage Capacity
Increased Computing Power
Increased Data Safety
Universal Access to Documents
Cons of cloud computing

Requires a Constant Internet Connection


Doesnt Work Well with Low-Speed Connections
Can Be Slow
Features Might Be Limited
Stored Data Might Not Be Secure
If the Cloud Loses Your Data, then the whole data is lost unless you have
some backup.
THANK YOU

You might also like