You are on page 1of 16

BIG DATA IN HEALTHCARE

BIG DATA IN HEALTHCARE


GROUP PROJECT
FINAL REPORT

BY

STEVE DENSON
CHASE BLAND
ADNAN DOHADWALA

SUBMITTED TO: PROF. JORDAN MITCHELL


NOVEMBER 28TH, 2016

UHCL Honesty Code _ Submission of this exam/assignment


certifies my compliance with the UHCL Honesty Code that I signed at the start of the
semester. I pledge on my honor that I have complied with this policy, inclusive of
not acquiring unauthorized information or assistance, not providing others with
unauthorized information or assistance, avoiding plagiarism, avoiding conspiracy,
avoiding fabrication/falsification, avoiding abuse of resources and materials, and
reporting the academic dishonesty of others.

11/28/2016

Prof. Jordan Mitchell


Healthcare Administration Program
University of Houston Clear-Lake, Suite 120
Houston, TX 77204-0301
Cover Letter - HADM 5431 - Healthcare Information Management Group
Project Report
Dear Sir,
We are submitting this report on Big Data in Healthcare. This report will
provide useful information to the readers about importance of Big Data
in Healthcare, challenges faced to its adoption today, its healthcare
outcomes and steps to be taken in near future to make the transition
to BIG DATA. As such this paper should be of interest to a broad
readership including those interested in understanding more about Big
Data in Healthcare and we give complete permission to the instructor
to use this project as examples and cases for further students and
classes.
Thank you for your consideration of our work and your feedback on the
submitted paper will be highly appreciated.

Sincerely,
Steve Denson
Chase Bland
Adnan Dohadwala

The healthcare industry is at a turning point in how it provides care to


patients. Doctor-patient interactions are changing. These changes have been
brought about because the technology that is being utilized has rapidly
evolved overtime. To properly develop and use this new technology large
amounts of data, referred to as Big Data, have been collected, analyzed and
utilized to advance the healthcare industry in ways never seen before.
In 2001 Doug Laney introduced the concepts of the three Vs
(Volume, Velocity and Variety). He concluded that these three aspects of Big
Data are the most important facets to focus on to better apply data to real
world solutions. In healthcare, the three Vs and Big Data have never been
more important than they are now. The development of grand technologies
such as; AI (Artificial Intelligence), Blockchains, and IoT (Internet of Things)
devices have made use of Big Data in new ways and help to solve the
problem that the three Vs present in an ever-growing field.
Every aspect of healthcare is affected by the utilization of Big
Data. Patients, providers, and payors all have the potential to benefit if new
technologies can collect and analyze this data in a more efficient and
accurate way. The rising cost of this ongoing process is ever-present, but is
likely to be mitigated by the value that is already present and still to come.
Big Data plays a role in healthcare that will transform the way we
perceive the industry. New technologies that use Big Data are on the cutting

edge of reinventing healthcare. It all comes back to how we optimize Big


Data in relations to the three Vs: Volume, Velocity and Variety.

In order for technologies such as AI, Blockchains, and IoT to


utilized Big Data to forge a leaner healthcare system with cost savings and
optimal healthcare outcomes, funding through research & development (R &
D) in the three Vs and the accompanying technologies will be necessary. To
help drive the foundation for investment in Big Data, CMS developed a 3
Stage Meaningful Use incentive program in from 2011 to 2016 to assist in
expanding Electronic Medical Record (EMR) systems throughout hospital and
healthcare-based systems. The ever-expanding costs of developing new
EMR systems require organizational and strategic capital planning.
For example, Adventist HealthCare, which operates four hospitals and
other healthcare services around the Gaithersburg, Maryland area,
generated $750 million in operating revenues in 2014. James G. Lee, CFO of
Adventist HealthCare, developed a unique blend of capital allocations for
large investment projects such as Big Data/EMR infrastructure. His plan
involves allocating 50% of each hospital earnings before interest, taxes,
depreciation, and amortization (EBITDA) margins towards capital projects
such as renovating facilities. The remaining 50% is retained as a pooled fund,
which 25% is allocated for long-term projects like IT Big Data/EMR
implementation and the other 25% provides additional Balance Sheet

support as Lee puts it. Implementing this capital allocation strategy has
allowed Adventist HealthCare to plan capital projects around financial
performance over time, thus facilitating a massive capital investment in data
analytics over a 3-year period of time.
Capitalization and implementation of EMR/Big Data has been a highlevel priority for hospitals as according to the Congress Adoption of HIT (Feb.
2015), 97% of nearly all hospitals and 75% of physicians have certified EMR
as of 2014. One of the biggest objectives for most healthcare providers will
be establishing future proofing and sustainability in terms of Big Data. As the
utilization of AI, IoT, digitization, etc. continues to expand, real-time Big Data
generated from these technologies is expected to explode from 4.4
zettabytes in 2015 to 44 zettabytes in 2025 as predicted by IDC Directions
Conference 2016. Despite the massive potential for a 10-fold increase in the
amount of generated data over the next 10 years, there is a silver lining
within Big Data technologies that will affect the future cost, velocity, and
storage concerns. Enter community-driven software solution called Apache
Flume. Apache Flume software was developed as a solution to effectively
gather, sort, and allocate massive amounts of data. How Apache Flume will
decrease costs of managing Big Data is by: 1. Its Flexible Customizable to
work with various types of systems such as batch computation systems like
Hadoop and non-structured databases like MongoDB as well as integrating
with different data pipelines. 2. Reduced Processing Time compared to
older data collector systems, Apache Flumes performance in collecting,

transferring, and storing Big Data is currently one of the fastest in the
industry.
* Separate Section * Major hospitals and healthcare systems are
implementing IT technologies such as IMB Watson to develop predictive
modeling/analytics for the development of precision medicine and other
innovative programs. These institutions will need cost efficient efforts to
handle the continuous and increasing quantities of real-time data being
gathered, transmitted, categorized and stored in the amount of data being
generated, categorized, and stored

Technologies using Big Data to improve Healthcare


Outcomes
Artificial Intelligence:
Big Data offers nearly limitless possibilities in achieving far-reaching
goals. Lowering costs, improving outcomes, and achieving quality goals are
only a few advantages that can be attained with the adoption of future
technologies. Health care is not the only industry that is reaping the benefits
of Big Data; transportation, the environment, criminal justice, and economic
inclusion have benefited from public and private sector investments in AI. (1)
AI and machine learning come together to give smart machines the
appearance of intelligence. (2) The main driving force behind smart
machines are their ability to learn from experiences and produce
unanticipated results. (3) One can easily see how these hyper-intelligent
machines can improve the quality, speed, efficiency and precision of patient
care. Smart machines may be able to reduce error rates by up to 30% for
tasks that require high precision, which could result in lower costs. (1)
Every patient is unique down to their medical history, symptoms and
the affects treatment will have. AI can access cloud-dwelling Big Data to
assist physicians by juxtaposing the individual patients characteristics
against billions of patients histories and the latest finding in world medical
research. (6) AI will be able to analyze a patient, offer the most optimal care
and continuously monitor the reaction of this care.

IBM Watson Health is one of the pioneering AIs being used in health
care. Watson uses natural language processing and semantic computing
abilities to train in clinical decision support. Watson can ingest millions of
pages of academic literature and other health care data. It then offers
decisions to providers along with confidence intervals that show how
applicable the course of action may be. (8) The outcome is a computer that
can learn at a rate humans are incapable of and a quality of output that is far
more accurate due to this consumption of massive amounts of data.
Additional benefits from AI in the health care industry include
increased patient safety, earlier and more accurate diagnoses, and fewer
missed opportunities to deliver care based on recommended protocols. (1)
Safety, governance, and regulation are among the challenges standing
in the way of achieving the potential of AI. Guidelines will need to be
implemented that require AI to be thoroughly tested and vetted before being
exposed to the public. (1)
Blockchain:
An interconnected health care industry where data and records are
shared amongst patients and physicians is an achievable goal when Big
Datas advantages are harnessed. This is easier said than done when the
amount of data is overwhelming and more is pouring in every day.
Blockchains is an emerging technology that can be utilized to help create the

interconnected network of data that health care needs. In simplest terms a


blockchain is a series of data points that link to each other.
Bitcoin is one of the only companies to successfully apply blockchains.
The hurdles that Bitcoins has overcome and is still facing with the use of
blockchains is analogous to health care; insufficient data standards, a lack of
interoperability, questions over how to achieve scale, and concerns about
security. (1)
MIT graduate students and senior research scientist are developing a
system, called MedRec, for managing medical records that use blockchains.
(4) MedRec has been designed to give patients control over their medical
data by functioning as an interface between institutions siloed health
records. These dispersed health records are a huge problem in properly
implementing EHR systems. Once again the amount of data that is siloed in
separate systems is overwhelming and research into new technologies like
MedRec will finally bring together this data so that it can be accessed and
analyzed to provide better care.
Internet of Things (IoT):
A growing expansion of wearable fitness tracking devices has created a
more self-conscious populous. These devices create a massive amount of
data that can be mined to provide insight into health patterns and outcomes.
This data is also problematic for providers because it is poorly standardized
and voluminous. In a poll conducted by Strategy Analytics said the Internet

of Things produced too much data to analyze efficiently. (5) The poll also said
it was difficult to capture data reliably and that a lack of analytical
capabilities was an issue when attempted to extract insights from the data.
Wearable devices will continue to provide data directly to
patients and physicians. A survey by Quest Diagnostics and Inovalon found
87% of respondents acknowledge the importance of having access to a
patients entire medical record. (7) Providers are dissatisfied with how much
data they can access when sitting in front of a patient. 64% said they do not
routinely have all the data they need about their patients at the point of
care. This is a problem that wearable devices and the IoT can rectify if the
analytical potential of this data is achieved. Ultimately having data available
quickly benefits both sides of health care. Providers can make better
decision, while patients will be more satisfied with the level of care.

Challenges that exist today for using BIG DATA


In healthcare sector today on micro scale that is in hospital and clinical
facilities several challenges with big data have yet to be addressed in the
current big data distributions. Presently, there are two major roadblocks to
the regular use of big data in healthcare are the technical expertise required
to use it and a lack of vigorous, integrated security system surrounding it.

Expertise:
In healthcare system today the value for big data is largely limited to

research and development part of it, because using big data requires a very
specialized set of skill. Hospital IT experts who are familiar with SQL
programming languages and traditional relational databases are not very
much comfortable and prepared for a very steep learning curve and other
complexities associated big data. (Adamson, 2014)
As a matter of fact most of the organizations needed a data
expert in order to manipulate and get data out of a big data environment.
These data experts are usually a Ph.D. level thinkers with significant
expertise and typically it is very difficult to get a hang of them , theyre not
just floating around an average health system. Only research institutions
usually have access to them because they are hard to come by and
expensive. They are also recognized as Data scientists and these are in huge
demand across industries like banking and internet companies with deep
pockets. (Adamson, 2014)
The good news for such industries is, thanks to editing done with the
tooling, people with less-specialized skills will now be able to easily work with
big data in the near future. Big data is coming to embrace SQL as the lingua
franca for querying. And when this will happens, it will become useful in a
health system setting.
Microsofts Polybase is one of the example of a query tool which will enables
users to query with both Hadoop Distributed File System (HDFS) systems and
SQL relational databases using an extended SQL syntax. There are other

tools such as Impala, which enable the user of SQL over a Hadoop database.
With the help of such types of tools a larger group of user will come close to
big data. (Fennessy, 2016)

Security:
In a healthcare system, HIPAA compliance is non-negotiable. The privacy and
the security of data of the patient is the most important of all, But, as a fact
there are not many effective and integrated ways to manage security in big
data. Although security is coming along, it has been an afterthought up to
this point. And for a good reason. If a hospital only has to give access of their
data to only a few data scientists then it really is nothing much to worry
about. But when they are opening up such fragile data access to a large,
diverse group of users, security cannot be an afterthought.
To secure the big data the healthcare organizations will have take some of
the major steps. Big data runs on a open source technology with inconsistent
security . So to avoid security threats , organizations should be very specific
about big data vendors and avoid assuming that any big data distribution
they select will be secure. (Adamson, 2014)
A well -supported and commercially distributed form of implementation of
big data is by far the best option for the organization planning to adopt big
data, rather than going for a new raw apache distribution. An example of a
company with a well supported, secure distribution is Cloudera. This

company has created a Payment Card Industry (PCI) compliant Hadoop


environment supporting authentication, authorization, data protection, and
auditing. They also have another option, which is to select a cloud-based
solution like Azure HDInsight to get started right away. Certainly other
commercial distributions are very working hard as to provide with a very
sophisticated security that will be well-suited for HIPAA compliance and other
special security requirements to meet the need the healthcare industry.
(Carpo, 2014)

'The Future'_ Transition from Traditional database to BIG


DATA Infrastructure
We have discussed the present limitations for big data in healthcare
and the truly fascinating future outcomes and possibilities that big data
enables. But we need to understand that there is urgent need for datadriven quality and cost improvement direction in healthcare sector in
general. Healthcare organizations cannot afford to wait for big data
technology to mature before diving into analytics. The important factor will
be choosing a data warehousing solution that can easily adapt to the future
of big data. In order to understand the concept, we need to know the
difference between the big data and traditional data warehousing. In Big
data techniques, the data is stored in unstructured or minimal structure and
raw form until it is required. The lack of pre-defined structure means a big
data environment is cheaper and simpler to create, but there is complication

in retrieving the desire data and needs a specialist to do that job. (Adamson,
2014)
There are architecture like Late-Binding enterprise data warehouse (EDW)
that is making the transition from relational databases to unstructured big
data. It takes data from source systems (EHRs, financial systems, etc.) are
placed into source marts. It follows the principles of Big Data to keep the
data as raw as possible, relying on the natural data models of the source
systems. As much as possible, late-binding methods minimize remodeling
data in the source marts until the analytic use case requires it. The data
remains in its raw state until someone needs it. At that point, analysts
package the data into a separate data mart and apply meaning and semantic
context so that effective analysis can occur. (Barlow, 2013)
In reality today, there are many projects going on to create a massively
parallel data warehouse, which can run a traditional relational database and
a big data cluster in parallel to stores data simultaneously, which would
improve the data processing power significantly. So, the progression from
todays symmetric multiprocessing (SMP) relational databases to massively
parallel processing (MPP) databases to big data in healthcare is underway.

References:

Adamson Doug, 2014, "Big Data in Healthcare Made Simple: Where It


Stands Today and Where Its Going", Health Catalyst, retrieved from :
https://www.healthcatalyst.com/big-data-in-healthcare-made-simple

Barlow Steve, 2013, "Healthcare Data Warehouse Models Explained",


Health Catalyst, retrieved from : https://www.healthcatalyst.com/besthealthcare-data-warehouse-model

Carpo Jared, 2014, "Hadoop in Healthcare: A No-nonsense Q and A",


Health Catalyst, retrieved from : https://www.healthcatalyst.com/Hadoopin-healthcare

Fennessy Josh, 2016, " 5 Reasons to Get Excited About SQL Server
2016 and Big Data", Blue Granite, retrieved from: https://www.bluegranite.com/blog/5-reasons-to-get-excited-about-sql-server-2016-and-bigdata

You might also like