The Big Data Levee: Simplifying Big Data Ownership

MARKETNOTE
NOTE
MARKET
FinTech
FinTech Research
Larry Tabb, Founder & CEO
July
May 2016
2016
The Big Data Levee:

Simplifying Big Data Ownership
Introduction
Capital markets data streams have turned into a flood.
External and internal pressures around oversight and
much needed new revenue are mounting. Global
regulatory changes stemming from Dodd-Frank, MIFID
(I) and (II), Basel III, Comprehensive Capital Analysis
and Review (CCAR) along with items by local
authorities have forced firms to have to capture, store,
analyze and leverage more data than ever before. And
this is only a sampling of wholesale changes and new
requirements stemming from the recent financial crisis.
Regulators, politicians, managers and investors expect
firms to have in-depth understanding of and innovative
uses for all generated and captured data. Regulators
are not finished passing new rules, and capital
restrictions will not be eased anytime soon. Capital
markets are dealing with new norms and new data
requirements.
5 Big Data Use Cases in Capital Markets
Market data, economic releases, market-moving

news, social media, internally calculated signals,
risk scenarios, etc. to develop models and run
back tests.
Companies also tried homegrown, ad-hoc data stores,

which often consisted of poorly documented flat files
that contained structured and unstructured data.
These files were coupled with badly supported scripts
that existed for on-demand access to and
processing of the data. Worse than insufficient, this
approach was dangerously unorganized.
Enter Hadoop
Firms have turned to big data frameworks such as
Hadoop to keep up with new data demands while
keeping costs low. Hadoop is built from the ground up
Regulatory Reporting
All communications associated with trading
activity, including email, voice, instant messaging,
voice recordings, market data, smart order routing,
etc., need to be stored, reported on and auditable.
Market Risk Analysis

All the same data requirements as Trade Creation
plus the layered complexity of mass compute for
scenario analysis such as Monte Carlo
simulations.
Operational Risk Analysis

All the same data requirements as Trade Creation
plus the need for systems performance data,
application log files, network health, configurations
used, etc.
Drowning in Data
When data demands first began to skyrocket, capital
markets companies attempted to keep up with
traditional enterprise data management systems.
Transactional, relational and hierarchal data stores are
useful but they have turned out to be insufficient for
the new data challenges.
Trade Creation
Customer Data
Storing, change tracking, reporting and auditing of
all customer interactions, know-your-customerrequired forms and background data, portfolio and
investment preferences, etc.
to handle vast amounts of structured and unstructured

data distributed across low cost, commodity hardware.
This cost-friendly approach is incredibly helpful, as
many IT and data warehousing departments are being
asked to provide more with shrinking budgets. Hadoop
is proving to be a necessary first step in addressing
many firms big data needs.
But make no mistake: Hadoop is not a panacea. It
succeeds with data types and volumes not suitable for
previous tools, but it cannot standalone. A successful
Hadoop deployment is one in which Hadoop is
surrounded by the right toolset.
2016 The TABB Group, LLC. All Rights Reserved. May not be reproduced by any means without express permission. | 1
The Big Data Levee: Simplifying Big Data Ownership | June 2016
The Unified Theory of Tools

Hadoop does not have all the necessary tools baked
in. It cannot replace existing relational and hierarchical
data that analysts and business intelligence systems
are used to. Nor is Hadoop an integration tool that
plays nicely with pre-existing enterprise data systems
out of the box. The Hadoop framework allows for
distributed compute, but in a way that is often
insufficient for time-series data in capital markets.
That is why it seems like every week new open source
projects are announced. Each new tool has its own
niche, and due to the distributed nature of their
development and the different biases and needs of
their stakeholders, these tools often do not play well
together. At a time when budgets have been cut
dramatically but investment is clearly needed, CIOs,
chief architects and engineering leads must establish a
toolset that fulfills diverse yet connected requirements.
The biggest challenge for a CIO or data architect is to
find a unified solution. A hodge-podge collection of
one-off solutions will not provide the value needed.
Rather than having one solution for machine learning,
one for SQL queries, one for OLAP and one for map
reduce, etc., a unified framework is best. For
operational efficiency, cost management and user
experience, all the tools in the toolbox must work
together seamlessly.
The Toolset Makes the System

Companies need a processing and analysis toolset
that lets them fully unlock the value of the data they
store. The right toolset helps Hadoop with the
challenges of data management and helps a company
achieve its needed data derived benefits (Exhibit 1).
In choosing and implementing this toolset, one should
focus on four primary areas of value and functionality.
It is crucial to find a toolset that simplifies big data
ownership by:
1. Easily leveraging distributed compute, not just
distributed storage;
2. Bridging the data divide between enterprise
and big data;
3. Providing familiarity to users; and
4. Allowing for multiple applications to have
automated access
If a toolset that fulfills all four of these requirements is
applied, it becomes possible to manage costs, realize
savings and gain company-wide efficiencies.
Take the Compute to the Data

The Hadoop framework comes with Hadoop Map
Reduce. However, map reduce is not always the best
tool for processing large amounts of data, especially if
dealing with time-series data (as is much capital
markets data). Still, some Hadoop users move large
amounts of data to where the compute is performed,
which is a waste of time, effort and money. The right
tools should make it easier and more cost efficient to
take the compute to the data rather than the data to
the compute.
Exhibit 1:
Challenges and Benefits of Data Management
Source: TABB Group

Hadoop has risen to prominence in other industries

and is becoming widely adopted in capital markets
because it is fantastic at storing large amounts of data
at lower cost. But data at rest is only part of the capital
markets data story.
In capital markets, value is in using data now. Data is
not being hoarded at such great expense and effort
with the thought that one day it will be useful. Instead,
there are immediate, pressing uses. This is why the
best tools allow for layering in-memory operations over
Hadoop, successfully marrying big data storage with
live, streaming processing.
A successful use of Hadoops distributed storage will
layer processing on top, effectively creating high
performance distributed compute. This can have an
immediate and positive impact by turning the system
from data at rest to data at work and fully leveraging
the otherwise idle compute of the storage servers.
The right tools will work directly on Hadoop, taking the
processing to where the data lives. This alleviates the
need for every application to have its own copy of the
data, which is also part of the second important value
add that a toolset must provide.
data, adding the contextual insight and enrichment that

capital markets participants need.
Democratize Data Access

More often than not, the right tool or the job is the one
that the craftsman is most familiar with. This logic
holds true in capital markets. The same tools deployed
to unify data across type and storage system should
be the ones users are familiar with. That does not
mean the very same tools they have always used, but
ones that work similarly and present paradigms that
the data analysts are accustomed to.
Allowing for interaction with the vast amount of data
that can be stored in Hadoop in familiar ways is not
easy but it is extremely important to democratize data
access and allow for easier exploration and discovery
within Hadoop.
Exhibit 2:
Data-to-Wisdom Pyramid
Bridging the Digital Divide

In most enterprises, data is spread across multiple
storage systems and data types: transactional,
relational, hierarchical, unstructured (or lessstructured) big data. Capital markets participants need
hot, in-memory data to be enriched with older, colder
data that can provide context and trends. Moreover,
different uses of data have different requirements of
underlying data stores, which leads to costly and
confusing replications. The result is more of a data
marsh than a data lake.
Due to the different structural requirements of the data,
not all data types can be combined. In fact, combining
all silos into a single repository is, more than likely,
undesirable. Enterprise data found in relational
databases as well as hierarchical data familiar to
business analysts often needs to be used in
conjunction with data in Hadoop.
The right tools integrate data types and storage
systems and will break down barricades, remove the
need for costly data duplication, and create a unified
foundation of data for all of an enterprises uses. With
the right, tightly integrated tools, data in Hadoop can
easily be accessed and merged with other enterprise
Source: TABB Group
When deploying new systems, the cost can quickly be

outpaced by the expense of training the workforce
and/or hiring people with skill sets and experience
required. The inefficiencies of a skilled and valuable
workforce not being able to work at top capacity as
they learn the system can heap cost upon cost.
The right toolset minimizes training and hiring
expenses by providing current users ways to quickly
and efficiently use the data being stored in Hadoop to
add context and improve their understanding of
traditional enterprise data.
Comprehensive tools will provide users with the ability

to create enriched, interactive analytics on Hadoop.
The foundation of data is built upon and it is easier for
users to obtain actionable intelligence (Exhibit 2). This
results in stakeholders having a more complete view of
the business and arriving at new insights sooner.
The right tools also allow stakeholders to integrate
data from across their enterprise in familiar ways,
giving them a more complete view of their business
and helping them arrive at the insights needed.
Enable Precision Decisions

Data is not just stored in Hadoop for human-driven
discovery. Rather, it should also be accessible for
automated uses including applications that build upon
the work of data analysts and insight discovery that
can come from machine learning and optimization.
Once business analysts have gained an insight or
established a workflow, they need to be able automate
their process so it can be applied on the scale and at
the speed the business demands. The best tools will
utilize the work that analysts have already done and
make it easy to port their work to a more automated
environment.
The tools chosen to work with Hadoop should also
allow more advanced data scientists to build
applications geared toward automated discovery and
exploration. This use requires the tools to provide
extensive programming support so data scientists can
use their language of choice, whether it be C, C++,
Scala, Java, Python or R.
In both cases, the tools layered over Hadoop should
allow for programmatic access to the data and the
relatively quick creation of applications that access,
process and manipulate the data stored therein.
Driving Toward New Revenue

The financial crisis and resulting political reaction not
only raised the requirements of government regulators,
compliance and risk, they hit profits hard. Regulatory
demands have stifled familiar revenue lines and unless
new ways of driving revenue are developed, profits will

not return to their pre-crisis levels.
To begin, companies need to ensure that nothing is
left on the table and wring every last drop from
remaining revenue lines. Second, they must explore
new lines of revenue, whether market or service
oriented. Like many data-driven startups, capital
markets firms need to innovate rapidly and cast off
ideas that do not have staying power. The most
promising innovations must be driven forward quickly
so firms can realize new profits sooner rather than
later.
For firms improving current revenue lines or exploring
and building new ones, data management is the key.
Data and its analysis allows for the rapid exploration
and discovery needed to bring new ideas to market.
Firms that have the right tools in place will find that
data is the fuel for future revenue growth.
Conclusion
Hadoop is a valuable tool for many capital markets
participants. It can handle the vast amounts of data
needed to satisfy new governance, compliance, risk,
and business requirements. However, Hadoop needs
to be surrounded by the right toolset to fully and
efficiently meet a companys needs.
The tools one chooses to use with Hadoop must make
distributed compute easier and more applicable to
financial data, aide in integration across data sources,
provide business users with familiar interfaces and
functionality, and allow for applications to easily
automate data workflows for real-time insight.
Even as data continues to grow by leaps and bounds,
the right toolset will allow users easier exploration and
discovery, leading to innovation and profits. While
Hadoop might be right storage system for capital
markets new data demands, only the right tools will
turn the cost of data infrastructure into an active
investment.
About TABB Group

TABB Group is a financial markets research and strategic advisory firm focused exclusively on capital markets. Founded in
2003 and based on the methodology of first-person knowledge, TABB Group analyzes and quantifies the investing value
chain, from the fiduciary, investment manager and broker to the exchange and custodian. Our goal is to help senior
business leaders gain a truer understanding of financial market issues and trends so they can grow their businesses. The
press regularly cites TABB Group members, and analysts routinely speak at industry conferences and gatherings. For more
information about TABB Group, visit www.tabbgroup.com.

The Big Data Levee: Simplifying Big Data Ownership

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Big Data Levee: Simplifying Big Data Ownership

Uploaded by

Copyright:

Available Formats

MARKETNOTE

The Big Data Levee:

5 Big Data Use Cases in Capital Markets

Market data, economic releases, market-moving

Companies also tried homegrown, ad-hoc data stores,

Market Risk Analysis

Operational Risk Analysis

to handle vast amounts of structured and unstructured

The Unified Theory of Tools

The Toolset Makes the System

Take the Compute to the Data

Source: TABB Group

Hadoop has risen to prominence in other industries

data, adding the contextual insight and enrichment that

Democratize Data Access

Bridging the Digital Divide

Source: TABB Group

When deploying new systems, the cost can quickly be

Comprehensive tools will provide users with the ability

Enable Precision Decisions

Driving Toward New Revenue

new ways of driving revenue are developed, profits will

About TABB Group

You might also like