You are on page 1of 4

1. Big data analytics is the process of examining large and varied data sets -- i.e.

, big data -- to uncover


hidden patterns, unknown correlations, market trends, customer preferences and other useful information
that can help organizations make more-informed business decisions.
Challenges:

Big data can be described by the following characteristics:[28][29]


Volume
The quantity of generated and stored data. The size of the data determines the value and
potential insight, and whether it can be considered big data or not.
Variety
The type and nature of the data. This helps people who analyze it to effectively use the
resulting insight. Big data draws from text, images, audio, video; plus it completes missing
pieces through data fusion.
Velocity
In this context, the speed at which the data is generated and processed to meet the
demands and challenges that lie in the path of growth and development. Big data is often
available in real-time.
Veracity
The data quality of captured data can vary greatly, affecting the accurate analysis.[37]
Factory work and Cyber-physical systems may have a 6C system:

 Connection (sensor and networks)


 Cloud (computing and data on demand)[38][39]
 Cyber (model and memory)
 Content/context (meaning and correlation)
 Community (sharing and collaboration)
 Customization (personalization and value)
Data must be processed with advanced tools (analytics and algorithms) to reveal
meaningful information. For example, to manage a factory one must consider both
visible and invisible issues with various components. Information generation
algorithms must detect and address invisible issues such as machine degradation,
component wear, etc. on the factory floor.[40][41]
Analysis and reporting:

Reporting: The process of organizing data into informational summaries in order to monitor
how different areas of a business are performing.

Analysis: The process of exploring data and reports in order to extract meaningful insights,
which can be used to better understand and improve business performance.

Reporting translates raw data into information. Analysis transforms data and information
into insights. Reporting helps companies to monitor their online business and be alerted to when
data falls outside of expected ranges. Good reporting should raise questions about the business
from its end users. The goal of analysis is to answer questions by interpreting the data at a
deeper level and providing actionable recommendations. Through the process of performing
analysis you may raise additional questions, but the goal is to identify answers, or at least
potential answers that can be tested. In summary, reporting shows you what is happening while
analysis focuses on explaining why it is happening and what you can do about it.
2. a. Big data streaming is a process in which big data is quickly processed in order to extract real-time
insights from it. The data on which processing is done is the data in motion. Big data streaming is ideally a
speed-focused approach wherein a continuous stream of data is processed.

Big data streaming is a process in which large streams of real-time data are processed with the sole
aim of extracting insights and useful trends out of it. A continuous stream of unstructured data is sent
for analysis into memory before storing it onto disk. This happens across a cluster of servers. Speed
matters the most in big data streaming. The value of data, if not processed quickly, decreases with
time.
Real-time streaming data analysis is a single-pass analysis. Analysts cannot choose to reanalyze the
data once it is streamed.

The plan shown in Figure 2-1 isn’t a bad design, and with the right choice of tools to carry out the queuing
and analytics and to build the dashboard, you’d be in fairly good shape for this one goal. But you’d be
missing out on a much better way to design your system in order to take full advantage of the data and to
improve your overall administration, operations, and development activities.
Instead, we recommend a radical change in how a system is designed.
The idea is to use data streams throughout your overall architecture—data streaming becomes the default
way to handle data rather than a specialty. The goal is to streamline (pun not intended) your whole operation
such that data is more readily available to those who need it, when they need it, for real-time analytics and
much more, without a great deal of inconvenient administrative burden.
Key Aspects of a Universal Stream-based Architecture
The idea that you can build applications to draw real-time insights from data before it is persisted is in itself a
big change from traditional ways of handling data. Even machine learning models are being developed with
streaming algorithms that can make decisions about data in real time and learn at the same time. Fast
performance is important in these systems, so in-memory processing methods and technologies are
attracting a lot of attention.

2. b. Real Time Analytics Platform:


A real-time analytics platform enables organizations to make the most out of real-time data by helping them
to extract the valuable information and trends from it. Such platforms help in measuring data from the
business point of view in real time, further making the best use of data.

 In CRM (customer relations management), real-time analytics can provide up-to-the-minute


information about an enterprise's customers and present it so that better and quicker business decisions can
be made -- perhaps even within the time span of a customer interaction.
 Real-time analytics can support instant refreshes to corporate dashboards to reflect business
changes throughout the day.
 In a data warehouse context, real-time analytics supports unpredictable, ad hoc queries against large
data sets.
 Another application is in scientific analysis such as the tracking of a hurricane's path, intensity, and
wind field, with the intent of predicting these parameters hours or days in advance.

3. Hadoop:

Hadoop is an open source distributed processing framework that manages data processing and storage for
big data applications running in clustered systems. It is at the center of a growing ecosystem of big
data technologies that are primarily used to support advanced analytics initiatives, including predictive
analytics, data mining and machine learningapplications. Hadoop can handle various forms of structured and
unstructured data, giving users more flexibility for collecting, processing and analyzing data than relational
databases and data warehouses provide.
Hadoop Distributed File System:
HDFS has a master/slave architecture. An HDFS cluster consists of a single NameNode, a master
server that manages the file system namespace and regulates access to files by clients. In addition,
there are a number of DataNodes, usually one per node in the cluster, which manage storage attached
to the nodes that they run on. HDFS exposes a file system namespace and allows user data to be
stored in files. Internally, a file is split into one or more blocks and these blocks are stored in a set of
DataNodes. The NameNode executes file system namespace operations like opening, closing, and
renaming files and directories. It also determines the mapping of blocks to DataNodes. The DataNodes
are responsible for serving read and write requests from the file system’s clients. The DataNodes also
perform block creation, deletion, and replication upon instruction from the NameNode.
Components:

 Hadoop Common – The common module contains libraries and utilities which are required by
other modules of Hadoop.
 Hadoop Distributed File System (HDFS) – This is the distributed file-system which stores data
on the commodity machines. This is the core of the hadoop framework. This also provides a very high
aggregate bandwidth across the cluster.
 Hadoop YARN – This is the resource-management platform which is responsible for managing
computer resources over the clusters and using them for scheduling of users' applications.
 Hadoop MapReduce – This is the programming model used for large scale data processing.

You might also like