Repurposing Legacy Data: Innovative Case Studies

Ebook280 pages3 hours

Repurposing Legacy Data: Innovative Case Studies

Name: Repurposing Legacy Data: Innovative Case Studies
Author: Jules J. Berman
ISBN: 9780128029152

By Jules J. Berman

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Repurposing Legacy Data: Innovative Case Studies takes a look at how data scientists have re-purposed legacy data, whether their own, or legacy data that has been donated to the public domain.

Most of the data stored worldwide is legacy data—data created some time in the past, for a particular purpose, and left in obsolete formats. As with keepsakes in an attic, we retain this information thinking it may have value in the future, though we have no current use for it.

The case studies in this book, from such diverse fields as cosmology, quantum physics, high-energy physics, microbiology, psychiatry, medicine, and hospital administration, all serve to demonstrate how innovative people draw value from legacy data. By following the case examples, readers will learn how legacy data is restored, merged, and analyzed for purposes that were never imagined by the original data creators.

Discusses how combining existing data with other data sets of the same kind can produce an aggregate data set that serves to answer questions that could not be answered with any of the original data
Presents a method for re-analyzing original data sets using alternate or improved methods that can provide outcomes more precise and reliable than those produced in the original analysis
Explains how to integrate heterogeneous data sets for the purpose of answering questions or developing concepts that span several different scientific fields

Skip carousel

LanguageEnglish

PublisherElsevier Science

Release dateMar 13, 2015

ISBN9780128029152

Author

Jules J. Berman

Jules Berman holds two Bachelor of Science degrees from MIT (in Mathematics and in Earth and Planetary Sciences), a PhD from Temple University, and an MD from the University of Miami. He was a graduate researcher at the Fels Cancer Research Institute (Temple University) and at the American Health Foundation in Valhalla, New York. He completed his postdoctoral studies at the US National Institutes of Health, and his residency at the George Washington University Medical Center in Washington, DC. Dr. Berman served as Chief of anatomic pathology, surgical pathology, and cytopathology at the Veterans Administration Medical Center in Baltimore, Maryland, where he held joint appointments at the University of Maryland Medical Center and at the Johns Hopkins Medical Institutions. In 1998, he transferred to the US National Institutes of Health as a Medical Officer and as the Program Director for Pathology Informatics in the Cancer Diagnosis Program at the National Cancer Institute. Dr. Berman is a past President of the Association for Pathology Informatics and is the 2011 recipient of the Association’s Lifetime Achievement Award. He is a listed author of more than 200 scientific publications and has written more than a dozen books in his three areas of expertise: informatics, computer programming, and pathology. Dr. Berman is currently a freelance writer.

Related to Repurposing Legacy Data

Related ebooks

Skip carousel

Biological Network Analysis: Trends, Approaches, Graph Theory, and Algorithms
Ebook
Biological Network Analysis: Trends, Approaches, Graph Theory, and Algorithms
byPietro Hiram Guzzi
Rating: 0 out of 5 stars
0 ratings
Design and Analysis of Experiments in the Health Sciences
Ebook
Design and Analysis of Experiments in the Health Sciences
byGerald van Belle
Rating: 0 out of 5 stars
0 ratings
Data Simplification: Taming Information With Open Source Tools
Ebook
Data Simplification: Taming Information With Open Source Tools
byJules J. Berman
Rating: 0 out of 5 stars
0 ratings
Introduction to Scientific Research: Strategy and Planning
Ebook
Introduction to Scientific Research: Strategy and Planning
byWendi K Wolfram
Rating: 0 out of 5 stars
0 ratings
Exploring Mathematical Modeling in Biology Through Case Studies and Experimental Activities
Ebook
Exploring Mathematical Modeling in Biology Through Case Studies and Experimental Activities
byRebecca Sanft
Rating: 0 out of 5 stars
0 ratings
Computational Learning Approaches to Data Analytics in Biomedical Applications
Ebook
Computational Learning Approaches to Data Analytics in Biomedical Applications
byKhalid Al-Jabery
Rating: 5 out of 5 stars
5/5
Self-Control in Animals and People
Ebook
Self-Control in Animals and People
byMichael Beran
Rating: 0 out of 5 stars
0 ratings
Pattern Recognition and Artificial Intelligence
Ebook
Pattern Recognition and Artificial Intelligence
byC.H. Chen
Rating: 0 out of 5 stars
0 ratings
Cell Sources for iPSCs
Ebook
Cell Sources for iPSCs
byAlexander Birbrair
Rating: 0 out of 5 stars
0 ratings
Evolution's Clinical Guidebook: Translating Ancient Genes into Precision Medicine
Ebook
Evolution's Clinical Guidebook: Translating Ancient Genes into Precision Medicine
byJules J. Berman
Rating: 0 out of 5 stars
0 ratings
Collective Behavior In Systems Biology: A Primer on Modeling Infrastructure
Ebook
Collective Behavior In Systems Biology: A Primer on Modeling Infrastructure
byAssaf Steinschneider
Rating: 0 out of 5 stars
0 ratings
The Postgenomic Condition: Ethics, Justice, and Knowledge after the Genome
Ebook
The Postgenomic Condition: Ethics, Justice, and Knowledge after the Genome
byJenny Reardon
Rating: 0 out of 5 stars
0 ratings
Vaccines for Cancer Immunotherapy: An Evidence-Based Review on Current Status and Future Perspectives
Ebook
Vaccines for Cancer Immunotherapy: An Evidence-Based Review on Current Status and Future Perspectives
byNima Rezaei
Rating: 0 out of 5 stars
0 ratings
Picturing Personhood: Brain Scans and Biomedical Identity
Ebook
Picturing Personhood: Brain Scans and Biomedical Identity
byJoseph Dumit
Rating: 4 out of 5 stars
4/5
God and Randomness
Ebook
God and Randomness
byThomas R. McFaul
Rating: 0 out of 5 stars
0 ratings
Introduction to Bayesian Statistics
Ebook
Introduction to Bayesian Statistics
byWilliam M. Bolstad
Rating: 0 out of 5 stars
0 ratings
Medical Applications of Mass Spectrometry
Ebook
Medical Applications of Mass Spectrometry
byKaroly Vekey
Rating: 0 out of 5 stars
0 ratings
The End of the Beginning
Ebook
The End of the Beginning
byMichael Kinch
Rating: 0 out of 5 stars
0 ratings
Beyond the Real Mind: Why We Kill
Ebook
Beyond the Real Mind: Why We Kill
byDe-Witt A. Herd
Rating: 0 out of 5 stars
0 ratings
Multidisciplinary Microfluidic and Nanofluidic Lab-on-a-Chip: Principles and Applications
Ebook
Multidisciplinary Microfluidic and Nanofluidic Lab-on-a-Chip: Principles and Applications
byXiujun (James) Li
Rating: 0 out of 5 stars
0 ratings
Logic and Critical Thinking in the Biomedical Sciences: Volume 2: Deductions Based Upon Quantitative Data
Ebook
Logic and Critical Thinking in the Biomedical Sciences: Volume 2: Deductions Based Upon Quantitative Data
byJules J. Berman
Rating: 0 out of 5 stars
0 ratings
Healthcare and Artificial Intelligence
Ebook
Healthcare and Artificial Intelligence
byBernard Nordlinger
Rating: 0 out of 5 stars
0 ratings
Classification and Clustering: Proceedings of an Advanced Seminar Conducted by the Mathematics Research Center, the University of Wisconsin at Madison, May 3–5, 1976
Ebook
Classification and Clustering: Proceedings of an Advanced Seminar Conducted by the Mathematics Research Center, the University of Wisconsin at Madison, May 3–5, 1976
byJ. Van Ryzin
Rating: 0 out of 5 stars
0 ratings
Bioengineering Innovative Solutions for Cancer
Ebook
Bioengineering Innovative Solutions for Cancer
bySylvain Ladame
Rating: 0 out of 5 stars
0 ratings
Life Out of Sequence: A Data-Driven History of Bioinformatics
Ebook
Life Out of Sequence: A Data-Driven History of Bioinformatics
byHallam Stevens
Rating: 4 out of 5 stars
4/5
Counterpower: Making Change Happen
Ebook
Counterpower: Making Change Happen
byTim Gee
Rating: 2 out of 5 stars
2/5
Computational Immunology: Models and Tools
Ebook
Computational Immunology: Models and Tools
byJosep Bassaganya-Riera
Rating: 0 out of 5 stars
0 ratings
Molecular Data Analysis Using R
Ebook
Molecular Data Analysis Using R
byCsaba Ortutay
Rating: 0 out of 5 stars
0 ratings
Experiments and Modeling in Cognitive Science: MATLAB, SPSS, Excel and E-Prime
Ebook
Experiments and Modeling in Cognitive Science: MATLAB, SPSS, Excel and E-Prime
byFabien Mathy
Rating: 0 out of 5 stars
0 ratings
Science and the Skeptic: Discerning Fact from Fiction
Ebook
Science and the Skeptic: Discerning Fact from Fiction
byMarc Zimmer
Rating: 0 out of 5 stars
0 ratings

Computers For You

Skip carousel

The Invisible Rainbow: A History of Electricity and Life
Ebook
The Invisible Rainbow: A History of Electricity and Life
byArthur Firstenberg
Rating: 4 out of 5 stars
4/5
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters
Ebook
101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters
byTriumph Books
Rating: 4 out of 5 stars
4/5
Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls
Ebook
Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls
byKathleen Hale
Rating: 4 out of 5 stars
4/5
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
Ebook
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
byCea West
Rating: 5 out of 5 stars
5/5
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
Ebook
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
byTJ Books
Rating: 0 out of 5 stars
0 ratings
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad
Ebook
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad
byAaron Smith
Rating: 0 out of 5 stars
0 ratings
Elon Musk
Ebook
Elon Musk
byWalter Isaacson
Rating: 4 out of 5 stars
4/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics
Ebook
Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics
byGary Smith
Rating: 4 out of 5 stars
4/5
CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61
Ebook
CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61
byQuentin Docter
Rating: 0 out of 5 stars
0 ratings
Ultimate Guide to Mastering Command Blocks!: Minecraft Keys to Unlocking Secret Commands
Ebook
Ultimate Guide to Mastering Command Blocks!: Minecraft Keys to Unlocking Secret Commands
byTriumph Books
Rating: 5 out of 5 stars
5/5
Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition
Ebook
Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition
byAndrew Hodges
Rating: 4 out of 5 stars
4/5
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
Ebook
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
bySteven Cooper
Rating: 4 out of 5 stars
4/5
The Professional Voiceover Handbook: Voiceover training, #1
Ebook
The Professional Voiceover Handbook: Voiceover training, #1
byPeter Baker
Rating: 5 out of 5 stars
5/5
The Simulation Hypothesis: An MIT Computer Scientist Shows Why AI, Quantum Physics and Eastern Mystics All Agree We Are In a Video Game
Ebook
The Simulation Hypothesis: An MIT Computer Scientist Shows Why AI, Quantum Physics and Eastern Mystics All Agree We Are In a Video Game
byRizwan Virk
Rating: 5 out of 5 stars
5/5
Network+ Study Guide & Practice Exams
Ebook
Network+ Study Guide & Practice Exams
byRobert Shimonski
Rating: 4 out of 5 stars
4/5
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
Ebook
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
bySeth Stephens-Davidowitz
Rating: 4 out of 5 stars
4/5
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
Ebook
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
byNigel Tillery
Rating: 0 out of 5 stars
0 ratings
People Skills for Analytical Thinkers
Ebook
People Skills for Analytical Thinkers
byGilbert Eijkelenboom
Rating: 5 out of 5 stars
5/5
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
Deep Search: How to Explore the Internet More Effectively
Ebook
Deep Search: How to Explore the Internet More Effectively
byAlan Pearce
Rating: 5 out of 5 stars
5/5
Practical Lock Picking: A Physical Penetration Tester's Training Guide
Ebook
Practical Lock Picking: A Physical Penetration Tester's Training Guide
byDeviant Ollam
Rating: 5 out of 5 stars
5/5
CompTIA Security+ Get Certified Get Ahead: SY0-701 Study Guide
Ebook
CompTIA Security+ Get Certified Get Ahead: SY0-701 Study Guide
byJoe Shelley
Rating: 5 out of 5 stars
5/5
Artificial Intelligence: The Complete Beginner’s Guide to the Future of A.I.
Ebook
Artificial Intelligence: The Complete Beginner’s Guide to the Future of A.I.
byJohn Adamssen
Rating: 4 out of 5 stars
4/5
Summary of Max Tegmark's Life 3.0
Ebook
Summary of Max Tegmark's Life 3.0
byIRB Media
Rating: 0 out of 5 stars
0 ratings
CompTIA Security+ Practice Questions
Ebook
CompTIA Security+ Practice Questions
byIP Specialist
Rating: 2 out of 5 stars
2/5
Hacking: Ultimate Beginner's Guide for Computer Hacking in 2018 and Beyond: Hacking in 2018, #1
Ebook
Hacking: Ultimate Beginner's Guide for Computer Hacking in 2018 and Beyond: Hacking in 2018, #1
byDexter Jackson
Rating: 4 out of 5 stars
4/5
How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally
Ebook
How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally
byAlex Parkinson
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks: A cross-over episode from The Machine Learning Podcast with the team from Deepchecks, exploring the challenges of testing and validating machine learning applications and their work to make it easier.
Podcast episode
Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks: A cross-over episode from The Machine Learning Podcast with the team from Deepchecks, exploring the challenges of testing and validating machine learning applications and their work to make it easier.
byThe Python Podcast.__init__
0 ratings
0% found this document useful
The New Age of CRISPR: CRISPR has emerged as a powerful tool for altering DNA sequences with incredible precision, opening up new avenues of research into the treatment of disease. In this episode, we explore the science behind CRISPR, as well as its potential. From curing genetic disorders to creating new crop varieties, the possibilities seem endless. Our four guests today are scientists working to push these gene editing tools to the next frontier.
Podcast episode
The New Age of CRISPR: CRISPR has emerged as a powerful tool for altering DNA sequences with incredible precision, opening up new avenues of research into the treatment of disease. In this episode, we explore the science behind CRISPR, as well as its potential. From curing genetic disorders to creating new crop varieties, the possibilities seem endless. Our four guests today are scientists working to push these gene editing tools to the next frontier.
byI AM BIO
0 ratings
0% found this document useful
588: An Algorithm for Success! Using Computational and Imaging Approaches to Study Cognitive Science - Dr. Aleix Martinez: Dr. Aleix Martinez is a Professor in the Department of Electrical and Computer Engineering and Director of the Computational Biology and Cognitive Science Laboratory at the Ohio State University. He is also affiliated with the Department of Biomedical...
Podcast episode
588: An Algorithm for Success! Using Computational and Imaging Approaches to Study Cognitive Science - Dr. Aleix Martinez: Dr. Aleix Martinez is a Professor in the Department of Electrical and Computer Engineering and Director of the Computational Biology and Cognitive Science Laboratory at the Ohio State University. He is also affiliated with the Department of Biomedical...
byPeople Behind the Science Podcast - Stories from Scientists about Science, Life, Research, and Science Careers
0 ratings
0% found this document useful
#11 Data Science at BuzzFeed and the Digital Media Landscape: How does data science help Buzzfeed achieve online virality? What type of mass online experiments do data scientists at BuzzFeed run for this purpose? What products do they develop to make all of this easy and intuitive for content producers? Find out ...
Podcast episode
#11 Data Science at BuzzFeed and the Digital Media Landscape: How does data science help Buzzfeed achieve online virality? What type of mass online experiments do data scientists at BuzzFeed run for this purpose? What products do they develop to make all of this easy and intuitive for content producers? Find out ...
byDataFramed
0 ratings
0% found this document useful
#49 Data Science Tool Building
Podcast episode
#49 Data Science Tool Building
byDataFramed
0 ratings
0% found this document useful
Imposter Syndrome & ADHD: If you suffer from Imposter Syndrome, you’re a high achiever in some area, though you feel as if your achievements are not the result of training, skill, and intelligence, rather your success is the result of an accident of fate, and you are constantly on the cusp of being discovered as a fraud.
Podcast episode
Imposter Syndrome & ADHD: If you suffer from Imposter Syndrome, you’re a high achiever in some area, though you feel as if your achievements are not the result of training, skill, and intelligence, rather your success is the result of an accident of fate, and you are constantly on the cusp of being discovered as a fraud.
byTaking Control: The ADHD Podcast
100%
100% found this document useful
Vowels and Sounds and a Little Calculus
Podcast episode
Vowels and Sounds and a Little Calculus
byThe Art of Mathematics
0 ratings
0% found this document useful
MLA 005 Shapes & Sizes: Dimensions, size, and shape of Numpy ndarrays / TensorFlow tensors, and methods for transforming those.
Podcast episode
MLA 005 Shapes & Sizes: Dimensions, size, and shape of Numpy ndarrays / TensorFlow tensors, and methods for transforming those.
byMachine Learning Guide
0 ratings
0% found this document useful
Causal Trees: What do you get when you combine the causal infer…
Podcast episode
Causal Trees: What do you get when you combine the causal infer…
byLinear Digressions
0 ratings
0% found this document useful
Erin Cech, "The Trouble with Passion: How Searching for Fulfillment at Work Fosters Inequality" (U California Press, 2021): An interview with Erin Cech
Podcast episode
Erin Cech, "The Trouble with Passion: How Searching for Fulfillment at Work Fosters Inequality" (U California Press, 2021): An interview with Erin Cech
byNew Books in Critical Theory
0 ratings
0% found this document useful
Ian Stewart, “The Joy of Mathematics” (Open Agenda, 2021): An interview with Ian Stewart
Podcast episode
Ian Stewart, “The Joy of Mathematics” (Open Agenda, 2021): An interview with Ian Stewart
byNew Books in Science, Technology, and Society
0 ratings
0% found this document useful
Keeping ourselves honest when we work with observational healthcare data: The abundance of data in healthcare, and the valu…
Podcast episode
Keeping ourselves honest when we work with observational healthcare data: The abundance of data in healthcare, and the valu…
byLinear Digressions
0 ratings
0% found this document useful
006 Dr. Peter Murray-Rust - Promoting Open Science Through Advocacy, Software, & Community Building: Summary: This episode focuses on Dr. Murray-Rust’s work in advocacy, community building, and software development to create a more open scientific community in chemistry and materials. In this episode, Dr. Bryce Meredig and Prof....
Podcast episode
006 Dr. Peter Murray-Rust - Promoting Open Science Through Advocacy, Software, & Community Building: Summary: This episode focuses on Dr. Murray-Rust’s work in advocacy, community building, and software development to create a more open scientific community in chemistry and materials. In this episode, Dr. Bryce Meredig and Prof....
byDataLab: The Materials Informatics Podcast
0 ratings
0% found this document useful
[Bite] Data Science and the Scientific Method
Podcast episode
[Bite] Data Science and the Scientific Method
byDataCafé
0 ratings
0% found this document useful
Christine L. Borgman, “Big Data, Little Data, No Data: Scholarship in the Networked World” (MIT Press, 2015): Social media and digital technology now allow researchers to collect vast amounts of a variety data quickly. This so-called “big data,” and the practices that surround its collection, is all the rage in both the media and in research circles.
Podcast episode
Christine L. Borgman, “Big Data, Little Data, No Data: Scholarship in the Networked World” (MIT Press, 2015): Social media and digital technology now allow researchers to collect vast amounts of a variety data quickly. This so-called “big data,” and the practices that surround its collection, is all the rage in both the media and in research circles.
byNew Books in Education
0 ratings
0% found this document useful
Brain in A Nutshell 79 Part 2: The Latest Environmental Psychology Research Findings, Part 2
Podcast episode
Brain in A Nutshell 79 Part 2: The Latest Environmental Psychology Research Findings, Part 2
byHow to Save the World | The Psychology & Science of Environmental Behavior
0 ratings
0% found this document useful
Ep. 80 Making Ethics Matter with Dr. Eric Keller: In this episode, Dr. Eric Keller joins Dr. Christopher Beck to discuss medical ethics within IR. He speaks about using a bottom-up approach of applied ethics, and we examine why a combination of casuistry and virtue ethics may be helpful rather than principlism.
Podcast episode
Ep. 80 Making Ethics Matter with Dr. Eric Keller: In this episode, Dr. Eric Keller joins Dr. Christopher Beck to discuss medical ethics within IR. He speaks about using a bottom-up approach of applied ethics, and we examine why a combination of casuistry and virtue ethics may be helpful rather than principlism.
byBackTable Vascular & Interventional
0 ratings
0% found this document useful
002 Dr. Jim Warren and the Materials Genome Initiative: This episode focuses on the importance of materials data to the materials informatics process, as well an update on the . In this episode, Dr. Meredig and Dr. Warren discuss: How a shared improv background has surprisingly made a positive...
Podcast episode
002 Dr. Jim Warren and the Materials Genome Initiative: This episode focuses on the importance of materials data to the materials informatics process, as well an update on the . In this episode, Dr. Meredig and Dr. Warren discuss: How a shared improv background has surprisingly made a positive...
byDataLab: The Materials Informatics Podcast
0 ratings
0% found this document useful
Episode 121 - Megan Levis, full interview (rerun): Life is pretty intense for Paul these days. We present this interview with Megan Levis from the 2019 Society of Catholic Scientists archives, every bit as relevant now as it was then. It was originally presented as two episodes. Megan Levis is a fifth-ye...
Podcast episode
Episode 121 - Megan Levis, full interview (rerun): Life is pretty intense for Paul these days. We present this interview with Megan Levis from the 2019 Society of Catholic Scientists archives, every bit as relevant now as it was then. It was originally presented as two episodes. Megan Levis is a fifth-ye...
byThat's So Second Millennium
0 ratings
0% found this document useful
#89 – Owen Cotton-Barratt on epistemic systems and layers of defense against potential global catastrophes: You could think of academia as one big epistemic system — something which processes information, directs people's attention, and finds new ideas. 
Podcast episode
#89 – Owen Cotton-Barratt on epistemic systems and layers of defense against potential global catastrophes: You could think of academia as one big epistemic system — something which processes information, directs people's attention, and finds new ideas. 
by80,000 Hours Podcast
0 ratings
0% found this document useful
#36: Sleep and Memory - Part 2
Podcast episode
#36: Sleep and Memory - Part 2
byThe Matt Walker Podcast
0 ratings
0% found this document useful
Nature's Take: Can Registered Reports help tackle publication bias?: Nature staff take on the big topics that matter in science.
Podcast episode
Nature's Take: Can Registered Reports help tackle publication bias?: Nature staff take on the big topics that matter in science.
byNature Podcast
0 ratings
0% found this document useful
Gamification of Climate Action, Dr Markus Brauer PhD PART 2 Ep68
Podcast episode
Gamification of Climate Action, Dr Markus Brauer PhD PART 2 Ep68
byHow to Save the World | The Psychology & Science of Environmental Behavior
0 ratings
0% found this document useful
744 - I'm Not That Smart
Podcast episode
744 - I'm Not That Smart
byTiny Leaps, Big Changes
0 ratings
0% found this document useful
067 - Programming biology with Dr. Andrew Phillips
Podcast episode
067 - Programming biology with Dr. Andrew Phillips
byMicrosoft Research Podcast
0 ratings
0% found this document useful
Nature's Take: what's next for the preprint revolution: Nature editors take on the big topics that matter in science.
Podcast episode
Nature's Take: what's next for the preprint revolution: Nature editors take on the big topics that matter in science.
byNature Podcast
0 ratings
0% found this document useful
Challenging the Foundation of Asset Pricing Theory with Andrew Chen and Alejandro Lopez-Lira
Podcast episode
Challenging the Foundation of Asset Pricing Theory with Andrew Chen and Alejandro Lopez-Lira
byExcess Returns
0 ratings
0% found this document useful
Data Decisions (w/ Dr. Peter Enns)
Podcast episode
Data Decisions (w/ Dr. Peter Enns)
byThe People Nerds Podcast
0 ratings
0% found this document useful
[From the Archives] Ep 109: Dr. Mary Ellen Dello Stritto and Patrick Aldrich on Non-parametric Statistics: On this episode, Dr. Mary Ellen Dello Stritto is joined by Patrick Aldrich. Patrick received his bachelor’s degree in Wildlife biology and a minor in Entomology from the University of California, Davis. After graduation, he spent 5 years in various...
Podcast episode
[From the Archives] Ep 109: Dr. Mary Ellen Dello Stritto and Patrick Aldrich on Non-parametric Statistics: On this episode, Dr. Mary Ellen Dello Stritto is joined by Patrick Aldrich. Patrick received his bachelor’s degree in Wildlife biology and a minor in Entomology from the University of California, Davis. After graduation, he spent 5 years in various...
byResearch in Action | A podcast for faculty & higher education professionals on research design, methods, productivity & more
0 ratings
0% found this document useful
How exponentials on top of exponentials in single-cell analysis is transforming biology today
Podcast episode
How exponentials on top of exponentials in single-cell analysis is transforming biology today
by"Securities"
0 ratings
0% found this document useful

Skip carousel

You Had Questions For David Liu About CRISPR, Prime Editing, And Advice To Young Scientists. He Has Answers
STAT
Article
You Had Questions For David Liu About CRISPR, Prime Editing, And Advice To Young Scientists. He Has Answers
Nov 6, 2019
You had questions for David Liu about CRISPR, prime editing, and advice to young scientists. He has answers.
17 min read
Why a Scientist Must Always Doubt
Nautilus
Article
Why a Scientist Must Always Doubt
Aug 29, 2023
Francoise Barre-Sinoussi was so engrossed in her work at the Pasteur Institute on HIV in the 1980s that she almost missed her own wedding. She went on to win a Nobel Prize in Physiology or Medicine, in 2008, for identifying HIV as the cause of AIDS.
4 min read
Why Is Chaos Theory Important?
All About Space
Article
Why Is Chaos Theory Important?
Jan 3, 2020
1 min read
Worldwide Studies Find B-Vitamins Significantly Reduce Symptoms of Mental Health Disorders
The Art of Healing
Article
Worldwide Studies Find B-Vitamins Significantly Reduce Symptoms of Mental Health Disorders
Feb 28, 2018
2 min read
Personalised Medicine: More Than Just Personal
AQ: Australian Quarterly
Article
Personalised Medicine: More Than Just Personal
Mar 31, 2017
9 min read
Time-Management Tips From the Universe
The Atlantic
Article
Time-Management Tips From the Universe
Jan 8, 2024
20 min read
Five Veteran Scientists Tell Us What Most Surprised Them: Fifty years ago, who knew we’d learn to clone genes and find water on Mars?
Nautilus
Article
Five Veteran Scientists Tell Us What Most Surprised Them: Fifty years ago, who knew we’d learn to clone genes and find water on Mars?
Sep 24, 2015
Turn back the clock to 1965. Science appeared to be marching forward at an unrelenting pace. Biochemists had cracked the genetic code (how DNA translates into proteins), inspiring Life magazine to envision “superbabies with improved minds and bodies.
7 min read
Welcome
How It Works
Article
Welcome
Aug 31, 2023
1 min read
The Trump EPA Is Restricting EPA Science. It’s Somehow Worse than We Expected.
Union of Concerned Scientists
Article
The Trump EPA Is Restricting EPA Science. It’s Somehow Worse than We Expected.
Mar 4, 2020
3 min read
The National Academies Illustrates the More Nuanced Value of Transparency in Science
Union of Concerned Scientists
Article
The National Academies Illustrates the More Nuanced Value of Transparency in Science
May 13, 2019
4 min read
Updated Restricted Science Rule Spells Reanalysis Paralysis for the EPA
Union of Concerned Scientists
Article
Updated Restricted Science Rule Spells Reanalysis Paralysis for the EPA
Nov 12, 2019
7 min read
The EPA’s Rule to Restrict Science Could Compromise Your Confidential Research Data
Union of Concerned Scientists
Article
The EPA’s Rule to Restrict Science Could Compromise Your Confidential Research Data
Mar 5, 2020
5 min read
Is Transparency Always A Good Thing? EPA Weighs Controversial New Rule.
The Christian Science Monitor
Article
Is Transparency Always A Good Thing? EPA Weighs Controversial New Rule.
Mar 12, 2020
The Environmental Protection Agency is mulling a proposal to give preference to scientific research whose datasets and models are publicly available.
3 min read
Opinion: All Study Participants Have A Right To Know Their Own Results. My Lab Has Been Doing That For Years
STAT
Article
Opinion: All Study Participants Have A Right To Know Their Own Results. My Lab Has Been Doing That For Years
Sep 5, 2018
Giving study participants their individual results can drive greater public participation in research, increased support for science, and better health.
6 min read
Opinion: Machine Learning For Clinical Decision-making: Pay Attention To What You Don’t See
STAT
Article
Opinion: Machine Learning For Clinical Decision-making: Pay Attention To What You Don’t See
Dec 12, 2019
Don't take results from machine learning algorithms at face value. Ask what information isn't available. What subgroups haven't been prioritized? Who is on the research team?
4 min read
Opinion: Health Data Alone Doesn’t Account For Much. We Need Better Ways To Extract Its Value
STAT
Article
Opinion: Health Data Alone Doesn’t Account For Much. We Need Better Ways To Extract Its Value
Mar 6, 2019
5 min read
Six Things You Should Know About The EPA’s New Science Restriction Draft Policy
Union of Concerned Scientists
Article
Six Things You Should Know About The EPA’s New Science Restriction Draft Policy
Apr 25, 2018
5 min read
Opinion: Big Data Often Yields Small Returns. Here’s How to Fix That
STAT
Article
Opinion: Big Data Often Yields Small Returns. Here’s How to Fix That
Nov 29, 2017
4 min read
EPA Should Cancel Plans to Restrict Science Once and For All
Union of Concerned Scientists
Article
EPA Should Cancel Plans to Restrict Science Once and For All
May 15, 2020
3 min read
NIH-funded Project Aims To Build A ‘Google’ For Biomedical Data
STAT
Article
NIH-funded Project Aims To Build A ‘Google’ For Biomedical Data
Jul 31, 2019
4 min read
Administrator Michael Regan is Bringing Science Back to the EPA
Union of Concerned Scientists
Article
Administrator Michael Regan is Bringing Science Back to the EPA
Mar 26, 2021
4 min read
How Will A.I. Change Medicine?
Futurity
Article
How Will A.I. Change Medicine?
Dec 16, 2018
Artificial intelligence systems for health care have the potential to transform the diagnosis and treatment of diseases, which could help ensure that patients get the right treatment at the right time, but opportunities and challenges are ahead. In a
1 min read
Opinion: Biotech Execs Need To Create An Environment That Fosters A Truly Scientific Culture
STAT
Article
Opinion: Biotech Execs Need To Create An Environment That Fosters A Truly Scientific Culture
Mar 13, 2019
All #biotech CEOs should be energetic about their higher calling as caretakers of science and fostering cultures that support it.
3 min read
'The Cloud' and Other Dangerous Metaphors
The Atlantic
Article
'The Cloud' and Other Dangerous Metaphors
Jan 20, 2015
4 min read
Metabolomes: A New Way To Store Data In Little Space
Futurity
Article
Metabolomes: A New Way To Store Data In Little Space
Jul 5, 2019
3 min read
Opinion: Move Clinical Trial Data Sharing From An Option To An Imperative
STAT
Article
Opinion: Move Clinical Trial Data Sharing From An Option To An Imperative
Feb 19, 2019
3 min read
National Academies Panel Urges Researchers To Routinely Share Test Results With Study Participants
STAT
Article
National Academies Panel Urges Researchers To Routinely Share Test Results With Study Participants
Jul 10, 2018
"It's really calling for a sea change," said the author of a new National Academies report urging the routine release of test results to participants in human research.
3 min read
Opinion: Fixing Health Care’s Replication Crisis Is Important For Researchers And Patients
STAT
Article
Opinion: Fixing Health Care’s Replication Crisis Is Important For Researchers And Patients
Jul 16, 2019
We are awash in reports of new study results, from a potential cure for an illness or the latest take on what makes a healthy diet. But are we getting…
4 min read
Opinion: Real-world Evidence Is Changing The Way We Study Drug Safety And Effectiveness
STAT
Article
Opinion: Real-world Evidence Is Changing The Way We Study Drug Safety And Effectiveness
Jan 29, 2019
Instead of waiting for fleshed-out protocols for using real-world evidence, pharmaceutical companies should dive in now using a commonsense approach that relies on rigor and reason.
4 min read
To Protect Research Subjects, Account For The Internet
Futurity
Article
To Protect Research Subjects, Account For The Internet
Dec 3, 2020
3 min read

Related categories

Skip carousel

Reviews for Repurposing Legacy Data

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Repurposing Legacy Data - Jules J. Berman

http://www.julesberman.info/pubs.htm.

Chapter 1 Introduction

This chapter lists and explains the goals of data repurposing, namely: employing preexisting data to ask and answer questions that were not contemplated when the data was collected; combining preexisting data with other data sets of the same kind, producing an aggregate data set that is more useful than any single component data sets; reanalyzing the original data set using alternate or improved methods to attain outcomes of greater precision or reliability than the outcomes produced in the original analysis; integrating heterogeneous data sets (i.e., data sets with seemingly unrelated types of information), for the purpose an answering questions or developing concepts that span several different disciplines; finding subsets in a population once thought to be homogeneous; finding new relationships among data; retesting assertions, theories, or conclusions that had been drawn from the original data; creating new concepts or metaphors from the data; fine-tuning existing data models; starting over and remodeling the system. The remaining chapters of the book provide case studies wherein to show how data repurposing accomplishes these goals.

Keywords

Reanalysis; modeling; testing assertions; hypothesis generation; data integration

1.1 Why Bother?

We waste a lot of time waiting for spectacular new material.

We haven’t sat down and taken a very close look at the material we have.

Bettina Stangneth, historian [1]

This book demonstrates how old data can be used in new ways that were not foreseen by the people who originally collected the data. All data scientists understand that there is much more old data than there is new data, and that the proportion of old data to new data will always be increasing (see Glossary item, New data). Two reasons account for this situation: (i) all new data becomes old data, with time and (ii) old data accumulates, without limit. If we are currently being flooded by new data, then we are surely being glaciated by old data.

Old data has enormous value, but we must be very smart if we hope to learn what the data is telling us. Data scientists interested in resurrecting old data must be prepared to run a gauntlet of obstacles. At the end of the gauntlet lies a set of unavoidable and important questions:

1. Can I actually use abandoned and ignored data to solve a problem that is worth solving?

2. Can old data be integrated into current information systems (databases, information systems, network resources) and made accessible, along with new data?

3. Will the availability of this old data support unknowable future efforts by other data scientists?

4. Can old data tell us anything useful about the world today and tomorrow?

A credible field of science cannot consist exclusively of promises for a bright future. As a field matures, it must develop its own written history, replete with successes and failures and a thoughtful answer to the question, Was it worth the bother? This book examines highlights in the history of data repurposing. In the process, you will encounter the general attitudes, skill sets, and approaches that have been the core, essential ingredients for successful repurposing projects. Specifically, this book provides answers to the following questions:

1. What are the kinds of data that are most suitable for data repurposing projects?

2. What are the fundamental chores of the data scientists who need to understand the content and potential value of old data?

3. How must data scientists prepare old data so that it can be understood and analyzed by future generations of data scientists?

4. How do innovators receive intellectual inspiration from old data?

5. What are the essential analytic techniques that are used in every data repurposing project?

6. What are the professional responsibilities of data scientists who use repurposed data?

The premise of this book is that data repurposing creates value where none was expected (see Glossary item, Data vs. datum). In some of the most successful data repurposing projects, analysts crossed scientific disciplines, to search for relationships that were unanticipated when the data was originally collected. We shall discover that the field of data repurposing attracts individuals with a wide range of interests and professional credentials. In the course of the book, we will encounter archeologists, astronomers, epigraphers, ontologists, cosmologists, entrepreneurs, anthropologists, forensic scientists, biomedical researchers, and many more (see Glossary item, Epigrapher). Anyone who needs to draw upon the general methods of data science will find this book useful. Although some of the case studies use advanced analytical techniques, most do not. It is surprising, but the majority of innovative data repurposing projects primarily involve simple counts of things. The innovation lies in the questions asked and the ability of the data repurposer to resurrect and organize information contained in old files. Accordingly, the book is written to be accessible to a diverse group of readers. Technical jargon is kept to a minimum, but unavoidable terminology is explained, at length, in an extensive Glossary.

1.2 What Is Data Repurposing?

If you want to make an apple pie from scratch, you must first create the universe.

Carl Sagan

Data repurposing involves taking preexisting data and performing any of the following:

1. Using the preexisting data to ask and answer questions that were not contemplated by the people who designed and collected the data (see Section 2.1)

2. Combining preexisting data with additional data, of the same kind, to produce aggregate data that suits a new set of questions that could not have been answered with any one of the component data sources (see Section 4.2)

3. Reanalyzing data to validate assertions, theories, or conclusions drawn from the original studies (see Section 5.1)

4. Reanalyzing the original data set using alternate or improved methods to attain outcomes of greater precision or reliability than the outcomes produced in the original analysis (see Section 5.3)

5. Integrating heterogeneous data sets (i.e., data sets with seemingly unrelated types of information), for the purpose of answering questions, or developing concepts, that span diverse scientific disciplines (see Section 4.3; Glossary item, Heterogeneous data)

6. Finding subsets in a population once thought to be homogeneous (see Section 5.4)

7. Seeking new relationships among data objects (see Section 3.1)

8. Creating, on-the-fly, novel data sets through data file linkages (see Section 2.5)

9. Creating new concepts or ways of thinking about old concepts, based on a re-examination of data (see Section 2.1)

10. Fine-tuning existing data models (see Section 4.2)

11. Starting over and remodeling systems (Section 2.3).

Most of the listed types of data repurposing efforts are self-explanatory and all of them will be followed by examples throughout this book. Sticklers may object to the inclusion of one of the items on the list, namely, reanalyzing data to validate assertions, theories, or conclusions drawn from the original studies (see Glossary item, Reanalysis). It can be argued that a reanalysis of data, for the purposes of validating a study, is an obligatory step in any well-designed project, done in conformance with the original purpose of the data; hence, reanalysis is not a form of data repurposing. If you believe that data reanalysis is a normal and usual process, you may wish to try a little experiment. Approach any scientist who has published his data analysis results in a journal. Indicate that you would like to reanalyze his results and conclusions, using your own preferred analytic techniques. Ask him to provide you with all of the data he used in his published study. Do not be surprised if your request is met with astonishment, horror, and a quick rebuff.

In general, scientists believe that their data is their personal property. Scientists may choose to share their data with bona fide collaborators, under their own terms, for a restricted purpose. In many cases, data sharing comes at a steep price (see Glossary item, Data sharing). A scientist who shares his data may stipulate that any future publications, based in whole or in part, on his data, must meet with his approval and must list him among the coauthors.

Because third-party reanalysis is seldom contemplated when scientists are preparing their data, I include it here as type of data repurposing (i.e., an unexpected way to use old data). Furthermore, data reanalysis often involves considerably more work than the original analysis, because data repurposers must closely analyze the manner in which the data was originally collected, must review the methods by which the data was prepared (i.e., imputing missing values, deleting outliers), and must develop various alternate methods of data analysis (see Glossary item, Missing values). A data reanalysis project, whose aim was to validate the original results, will often lead to new questions that were not entertained in the original study. Hence, data reanalysis and data repurposing are tightly linked concepts.

1.3 Data Worth Preserving

The first lesson of Web-scale learning is to use available large-scale data rather than hoping for annotated data that isn’t available.

Peter Norvig, Alon Halevy, and Ferdinand Pereira [2]

Despite the preponderance of old data, most data scientists concentrate their efforts on newly acquired data or to nonexistent data that may emerge in the unknowable future. Why does old data get such little respect? The reasons are manifold.

1. Much of old data is proprietary and cannot be accessed by anyone other than its owners.

2. The owners of proprietary data, in many cases, are barely aware of the contents, or even the existence of their own data, and have no understanding of the value of their holdings, to themselves or to others.

3. Old data is typically stored in formats that are inscrutable to young data scientists. The technical expertise required to use the data intelligibly is long-forgotten.

4. Much of old data lacks proper annotation (see Glossary item, Annotation). There simply is not sufficient information about the data (e.g., how it was collected and what the data means) to support useful analysis.

5. Much of old data, annotated or not, has not been indexed in any serious way. There is no easy method of searching the contents of old data.

6. Much of old data is poor data, collected without the kinds of quality assurances that would be required to support any useful analysis of its contents.

7. Old data is orphaned data. When data has no guardianship, the tendency is to ignore the data or to greatly underestimate its value.

The sheer messiness of old data is conveyed by the gritty jargon that permeates the field of data repurposing (see Glossary items, Data cleaning, Data mining, Data munging, Data scraping, Data scrubbing, Data wrangling). Anything that requires munging, scraping, and scrubbing can’t be too clean.

Data sources are referred to as old or legacy; neither term calls to mind vitality or robustness (see Glossary item, Legacy data). A helpful way of thinking about the subject is to recognize that new data is just updated old data. New data, without old data, cannot be used for the purpose of seeing long-term trends.

It may seem that nobody puts much value on legacy data; that nobody pays for legacy data, and that nobody invests in preserving legacy data. It is not surprising that nobody puts much effort into preserving data that has no societal value. The stalwart data scientist must not be discouraged. We shall see that preserving old data is worth the

Enjoying the preview?

Page 1 of 1

Repurposing Legacy Data: Innovative Case Studies

About this ebook

Jules J. Berman

Read more from Jules J. Berman

Related authors

Related to Repurposing Legacy Data

Related ebooks

Computers For You

Related podcast episodes

Related articles

Related categories

Reviews for Repurposing Legacy Data

What did you think?

Book preview

Repurposing Legacy Data - Jules J. Berman

Chapter 1

Introduction

Keywords

Reanalysis; modeling; testing assertions; hypothesis generation; data integration

1.1 Why Bother?

1.2 What Is Data Repurposing?

1.3 Data Worth Preserving