Getting Started with Greenplum for Big Data Analytics

Ebook341 pages2 hours

Getting Started with Greenplum for Big Data Analytics

Name: Getting Started with Greenplum for Big Data Analytics
Author: Gollapudi Sunila
ISBN: 9781782177050

By Gollapudi Sunila

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Standard tutorial-based approach."Getting Started with Greenplum for Big Data" Analytics is great for data scientists and data analysts with a basic knowledge of Data Warehousing and Business Intelligence platforms who are new to Big Data and who are looking to get a good grounding in how to use the Greenplum Platform. It’s assumed that you will have some experience with database design and programming as well as be familiar with analytics tools like R and Weka.

Skip carousel

LanguageEnglish

PublisherPackt Publishing

Release dateOct 23, 2013

ISBN9781782177050

Author

Gollapudi Sunila

Related authors

Skip carousel

Related to Getting Started with Greenplum for Big Data Analytics

Related ebooks

Skip carousel

Oracle Warehouse Builder 11g: Getting Started
Ebook
Oracle Warehouse Builder 11g: Getting Started
byBob Griesemer
Rating: 0 out of 5 stars
0 ratings
HDInsight Essentials - Second Edition
Ebook
HDInsight Essentials - Second Edition
byRajesh Nadipalli
Rating: 0 out of 5 stars
0 ratings
Learning Tableau 10 - Second Edition
Ebook
Learning Tableau 10 - Second Edition
byJoshua N. Milligan
Rating: 4 out of 5 stars
4/5
Data Fluency: Empowering Your Organization with Effective Data Communication
Ebook
Data Fluency: Empowering Your Organization with Effective Data Communication
byZach Gemignani
Rating: 2 out of 5 stars
2/5
Tableau Desktop Certified Associate: Exam Guide: Develop your Tableau skills and prepare for Tableau certification with tips from industry experts
Ebook
Tableau Desktop Certified Associate: Exam Guide: Develop your Tableau skills and prepare for Tableau certification with tips from industry experts
byDmitry Anoshin
Rating: 0 out of 5 stars
0 ratings
Learning Tableau 2019 - Third Edition: Tools for Business Intelligence, data prep, and visual analytics, 3rd Edition
Ebook
Learning Tableau 2019 - Third Edition: Tools for Business Intelligence, data prep, and visual analytics, 3rd Edition
byJoshua N. Milligan
Rating: 0 out of 5 stars
0 ratings
Scalable Big Data Architecture: A practitioners guide to choosing relevant Big Data architecture
Ebook
Scalable Big Data Architecture: A practitioners guide to choosing relevant Big Data architecture
byBahaaldine Azarmi
Rating: 0 out of 5 stars
0 ratings
Data Lake Development with Big Data
Ebook
Data Lake Development with Big Data
byPasupuleti Pradeep
Rating: 0 out of 5 stars
0 ratings
IBM Cognos 10 Framework Manager
Ebook
IBM Cognos 10 Framework Manager
byTerry Curran
Rating: 0 out of 5 stars
0 ratings
Real-Time Big Data Analytics
Ebook
Real-Time Big Data Analytics
byShilpi
Rating: 5 out of 5 stars
5/5
Concept Based Practice Questions for Tableau Desktop Specialist Certification Latest Edition 2023
Ebook
Concept Based Practice Questions for Tableau Desktop Specialist Certification Latest Edition 2023
byExam OG
Rating: 0 out of 5 stars
0 ratings
Learning Tableau
Ebook
Learning Tableau
byJoshua N. Milligan
Rating: 0 out of 5 stars
0 ratings
Expert T-SQL Window Functions in SQL Server 2019: The Hidden Secret to Fast Analytic and Reporting Queries
Ebook
Expert T-SQL Window Functions in SQL Server 2019: The Hidden Secret to Fast Analytic and Reporting Queries
byKathi Kellenberger
Rating: 0 out of 5 stars
0 ratings
Introduction to Data Science Using R
Ebook
Introduction to Data Science Using R
byPrema Alla
Rating: 0 out of 5 stars
0 ratings
Microsoft Azure Machine Learning
Ebook
Microsoft Azure Machine Learning
bySumit Mund
Rating: 4 out of 5 stars
4/5
Monitoring Hadoop
Ebook
Monitoring Hadoop
byGurmukh Singh
Rating: 0 out of 5 stars
0 ratings
Data Analytics with Google Cloud Platform
Ebook
Data Analytics with Google Cloud Platform
byMurari Ramuka
Rating: 0 out of 5 stars
0 ratings
Mastering Azure Synapse Analytics: Learn how to develop end-to-end analytics solutions with Azure Synapse Analytics (English Edition)
Ebook
Mastering Azure Synapse Analytics: Learn how to develop end-to-end analytics solutions with Azure Synapse Analytics (English Edition)
byDebananda Ghosh
Rating: 0 out of 5 stars
0 ratings
SAS Viya: The Python Perspective
Ebook
SAS Viya: The Python Perspective
byKevin D. Smith
Rating: 0 out of 5 stars
0 ratings
My Part-Time Study Notes on Mssql Server
Ebook
My Part-Time Study Notes on Mssql Server
byMorris Sebenzile Mntoninzi
Rating: 0 out of 5 stars
0 ratings
Ultimate Data Engineering with Databricks: Develop Scalable Data Pipelines Using Data Engineering's Core Tenets Such as Delta Tables, Ingestion, Transformation, Security, and Scalability
Ebook
Ultimate Data Engineering with Databricks: Develop Scalable Data Pipelines Using Data Engineering's Core Tenets Such as Delta Tables, Ingestion, Transformation, Security, and Scalability
byMayank Malhotra
Rating: 0 out of 5 stars
0 ratings
Data Governance and Data Management: Contextualizing Data Governance Drivers, Technologies, and Tools
Ebook
Data Governance and Data Management: Contextualizing Data Governance Drivers, Technologies, and Tools
byRupa Mahanti
Rating: 0 out of 5 stars
0 ratings
Tableau Training Manual 9.0 Basic Version: This Via Tableau Training Manual Was Created for Both New and Intermediate
Ebook
Tableau Training Manual 9.0 Basic Version: This Via Tableau Training Manual Was Created for Both New and Intermediate
byLarry Keller
Rating: 3 out of 5 stars
3/5
Data Modeling A Complete Guide - 2021 Edition
Ebook
Data Modeling A Complete Guide - 2021 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Azure Data Lake A Complete Guide - 2019 Edition
Ebook
Azure Data Lake A Complete Guide - 2019 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Core architecture data model A Clear and Concise Reference
Ebook
Core architecture data model A Clear and Concise Reference
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Professional Hadoop Solutions
Ebook
Professional Hadoop Solutions
byBoris Lublinsky
Rating: 4 out of 5 stars
4/5
IBM InfoSphere DataStage A Complete Guide - 2021 Edition
Ebook
IBM InfoSphere DataStage A Complete Guide - 2021 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Spark SQL A Complete Guide
Ebook
Spark SQL A Complete Guide
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Data Visualization Strategy Standard Requirements
Ebook
Data Visualization Strategy Standard Requirements
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings

Data Visualization For You

Skip carousel

Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals
Ebook
Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals
byBrent Dykes
Rating: 4 out of 5 stars
4/5
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
Ebook
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
byNigel Tillery
Rating: 0 out of 5 stars
0 ratings
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
Ebook
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Data Analytics for Beginners: Introduction to Data Analytics
Ebook
Data Analytics for Beginners: Introduction to Data Analytics
byAnthony S. Williams
Rating: 4 out of 5 stars
4/5
Data Visualization: A Practical Introduction
Ebook
Data Visualization: A Practical Introduction
byKieran Healy
Rating: 5 out of 5 stars
5/5
How to Lie with Maps
Ebook
How to Lie with Maps
byMark Monmonier
Rating: 4 out of 5 stars
4/5
The Big Book of Dashboards: Visualizing Your Data Using Real-World Business Scenarios
Ebook
The Big Book of Dashboards: Visualizing Your Data Using Real-World Business Scenarios
bySteve Wexler
Rating: 4 out of 5 stars
4/5
Financial Reporting with Dashboards in Power BI
Ebook
Financial Reporting with Dashboards in Power BI
byMONICA SCHEIANU
Rating: 0 out of 5 stars
0 ratings
NumPy Recipes
Ebook
NumPy Recipes
byMartin McBride
Rating: 0 out of 5 stars
0 ratings
Data Science: What the Best Data Scientists Know About Data Analytics, Data Mining, Statistics, Machine Learning, and Big Data – That You Don't
Ebook
Data Science: What the Best Data Scientists Know About Data Analytics, Data Mining, Statistics, Machine Learning, and Big Data – That You Don't
byHerbert Jones
Rating: 5 out of 5 stars
5/5
Python For Beginners.Learn Data Science in 5 Days the Smart Way and Remember it Longer. With Easy Step by Step Guidance & Hands on Examples. (Python Crash Course-Programming for Beginners): Python for Beginners
Ebook
Python For Beginners.Learn Data Science in 5 Days the Smart Way and Remember it Longer. With Easy Step by Step Guidance & Hands on Examples. (Python Crash Course-Programming for Beginners): Python for Beginners
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
Learning pandas - Second Edition
Ebook
Learning pandas - Second Edition
byHeydt Michael
Rating: 4 out of 5 stars
4/5
The Applied SQL Data Analytics Workshop - Second Edition: Develop your practical skills and prepare to become a professional data analyst, 2nd Edition
Ebook
The Applied SQL Data Analytics Workshop - Second Edition: Develop your practical skills and prepare to become a professional data analyst, 2nd Edition
byMatt Goldwasser
Rating: 0 out of 5 stars
0 ratings
Excel for Beginners 2023: A Step-by-Step and Comprehensive Guide to Master the Basics of Excel, with Formulas, Functions, & Charts
Ebook
Excel for Beginners 2023: A Step-by-Step and Comprehensive Guide to Master the Basics of Excel, with Formulas, Functions, & Charts
byGerald Stroud
Rating: 0 out of 5 stars
0 ratings
Hands-On Data Analysis with Pandas: Efficiently perform data collection, wrangling, analysis, and visualization using Python
Ebook
Hands-On Data Analysis with Pandas: Efficiently perform data collection, wrangling, analysis, and visualization using Python
byStefanie Molin
Rating: 0 out of 5 stars
0 ratings
Teach Yourself VISUALLY Power BI
Ebook
Teach Yourself VISUALLY Power BI
byAlexander Loth
Rating: 0 out of 5 stars
0 ratings
Mastering Excel: Excel Apps
Ebook
Mastering Excel: Excel Apps
byMark Moore
Rating: 3 out of 5 stars
3/5
DAX Patterns: Second Edition
Ebook
DAX Patterns: Second Edition
byMarco Russo
Rating: 5 out of 5 stars
5/5
Advanced Analytics with Power BI and Excel: Learn powerful visualization and data analysis techniques using Microsoft BI tools along with Python and R
Ebook
Advanced Analytics with Power BI and Excel: Learn powerful visualization and data analysis techniques using Microsoft BI tools along with Python and R
byDejan Sarka
Rating: 0 out of 5 stars
0 ratings
Tableau For Dummies
Ebook
Tableau For Dummies
byMolly Monsey
Rating: 4 out of 5 stars
4/5
Visualizing Graph Data
Ebook
Visualizing Graph Data
byCorey Lanum
Rating: 0 out of 5 stars
0 ratings
Fieldwork Handbook: A Practical Guide on the Go
Ebook
Fieldwork Handbook: A Practical Guide on the Go
byMarika Vertzonis
Rating: 0 out of 5 stars
0 ratings
Getting to Know ArcGIS Desktop 10.8
Ebook
Getting to Know ArcGIS Desktop 10.8
byMichael Law
Rating: 4 out of 5 stars
4/5
Visual Analytics with Tableau
Ebook
Visual Analytics with Tableau
byAlexander Loth
Rating: 0 out of 5 stars
0 ratings
How to Become a Data Analyst: My Low-Cost, No Code Roadmap for Breaking into Tech
Ebook
How to Become a Data Analyst: My Low-Cost, No Code Roadmap for Breaking into Tech
byAnnie Nelson
Rating: 0 out of 5 stars
0 ratings
Cool Infographics: Effective Communication with Data Visualization and Design
Ebook
Cool Infographics: Effective Communication with Data Visualization and Design
byRandy Krum
Rating: 4 out of 5 stars
4/5
No-Code Data Science: Mastering Advanced Analytics, Machine Learning, and Artificial Intelligence
Ebook
No-Code Data Science: Mastering Advanced Analytics, Machine Learning, and Artificial Intelligence
byDavid Patrishkoff
Rating: 0 out of 5 stars
0 ratings
Mastering Data Analysis with Python: A Comprehensive Guide to NumPy, Pandas, and Matplotlib
Ebook
Mastering Data Analysis with Python: A Comprehensive Guide to NumPy, Pandas, and Matplotlib
byRajender Kumar
Rating: 0 out of 5 stars
0 ratings
R for Data Science
Ebook
R for Data Science
byDan Toomey
Rating: 5 out of 5 stars
5/5
D3.js in Action: Data visualization with JavaScript
Ebook
D3.js in Action: Data visualization with JavaScript
byElijah Meeks
Rating: 0 out of 5 stars
0 ratings

Related podcast episodes

Skip carousel

Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
Podcast episode
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
byInvest Like the Best with Patrick O'Shaughnessy
0 ratings
0% found this document useful
Building A Cost Effective Data Catalog With Tree Schema - Episode 158: An interview about the Tree Schema data catalog platform and using it to quickly get visibility into your data assets.
Podcast episode
Building A Cost Effective Data Catalog With Tree Schema - Episode 158: An interview about the Tree Schema data catalog platform and using it to quickly get visibility into your data assets.
byData Engineering Podcast
0 ratings
0% found this document useful
An Agile Approach To Master Data Management with Mark Marinelli - Episode 46: Building A Master Data Catalog Using Machine Learning (Interview)
Podcast episode
An Agile Approach To Master Data Management with Mark Marinelli - Episode 46: Building A Master Data Catalog Using Machine Learning (Interview)
byData Engineering Podcast
100%
100% found this document useful
Automate Your Pipeline Creation For Streaming Data Transformations With SQLake: Managing end-to-end data flows becomes complex and unwieldy as the scale of data and its variety of applications in an organization grows. Part of this complexity is due to the transformation and orchestration of data living in disparate systems. The team at Upsolver is taking aim at this problem with the latest iteration of their platform in the form of SQLake. In this episode Ori Rafael explains how they are automating the creation and scheduling of orchestration flows and their related transforations in a unified SQL interface.
Podcast episode
Automate Your Pipeline Creation For Streaming Data Transformations With SQLake: Managing end-to-end data flows becomes complex and unwieldy as the scale of data and its variety of applications in an organization grows. Part of this complexity is due to the transformation and orchestration of data living in disparate systems. The team at Upsolver is taking aim at this problem with the latest iteration of their platform in the form of SQLake. In this episode Ori Rafael explains how they are automating the creation and scheduling of orchestration flows and their related transforations in a unified SQL interface.
byData Engineering Podcast
0 ratings
0% found this document useful
A Multipurpose Database For Transactions And Analytics To Simplify Your Data Architecture With Singlestore: An interview with Shireesh Thota about how the Singlestore database engine allows you to reduce architectural sprawl in your data systems by combining performant and scalable transactional and analytical capabilities into a single platform
Podcast episode
A Multipurpose Database For Transactions And Analytics To Simplify Your Data Architecture With Singlestore: An interview with Shireesh Thota about how the Singlestore database engine allows you to reduce architectural sprawl in your data systems by combining performant and scalable transactional and analytical capabilities into a single platform
byData Engineering Podcast
0 ratings
0% found this document useful
#122 How Organizations Can Bridge the Data Literacy Gap
Podcast episode
#122 How Organizations Can Bridge the Data Literacy Gap
byDataFramed
0 ratings
0% found this document useful
Renee M. P. Teate, "SQL for Data Scientists: A Beginner's Guide for Building Datasets for Analysis" (John Wiley & Sons, 2021): An interview with Renee M. P. Teate
Podcast episode
Renee M. P. Teate, "SQL for Data Scientists: A Beginner's Guide for Building Datasets for Analysis" (John Wiley & Sons, 2021): An interview with Renee M. P. Teate
byNew Books in Science, Technology, and Society
0 ratings
0% found this document useful
40: Should data visualization work be outsourced? w/ Mustafa Mustafa: BI tools change by the minute, so have you ever considered outsourcing your data visualization needs in the future? Maybe you should, especially if you don’t have proper in-house skill sets. Don’t risk your reputation because users can’t unsee a...
Podcast episode
40: Should data visualization work be outsourced? w/ Mustafa Mustafa: BI tools change by the minute, so have you ever considered outsourcing your data visualization needs in the future? Maybe you should, especially if you don’t have proper in-house skill sets. Don’t risk your reputation because users can’t unsee a...
byAnalytics on Fire
0 ratings
0% found this document useful
#121 — ChatGPT and How Generative AI is Augmenting Workflows
Podcast episode
#121 — ChatGPT and How Generative AI is Augmenting Workflows
byDataFramed
0 ratings
0% found this document useful
2155: Databricks - The Story Behind the Lakehouse Company: Many are citing open source as the future. The UK Government's National Data Strategy even talks about the importance of opening public sector datasets to form the backbone of innovation, efficiency, and growth. This is a trend that Databricks...
Podcast episode
2155: Databricks - The Story Behind the Lakehouse Company: Many are citing open source as the future. The UK Government's National Data Strategy even talks about the importance of opening public sector datasets to form the backbone of innovation, efficiency, and growth. This is a trend that Databricks...
byThe Tech Talks Daily Podcast
0 ratings
0% found this document useful
#54 Women in Data Science
Podcast episode
#54 Women in Data Science
byDataFramed
0 ratings
0% found this document useful
Simplifying Data Integration Through Eventual Connectivity - Episode 91: An interview about a new pattern for data integration that reduces the amount of effort required to find connections in numerous data sets
Podcast episode
Simplifying Data Integration Through Eventual Connectivity - Episode 91: An interview about a new pattern for data integration that reduces the amount of effort required to find connections in numerous data sets
byData Engineering Podcast
0 ratings
0% found this document useful
Unlocking The Power of Data Lineage In Your Platform with OpenLineage: An interview with Julien Le Dem about the OpenLineage specification and the opportunity that it offers for simplifying the tracking and analysis of data lineage across your data platform.
Podcast episode
Unlocking The Power of Data Lineage In Your Platform with OpenLineage: An interview with Julien Le Dem about the OpenLineage specification and the opportunity that it offers for simplifying the tracking and analysis of data lineage across your data platform.
byData Engineering Podcast
0 ratings
0% found this document useful
[DataFramed Careers Series #2] What Makes a Great Data Science Portfolio
Podcast episode
[DataFramed Careers Series #2] What Makes a Great Data Science Portfolio
byDataFramed
0 ratings
0% found this document useful
Revisit The Fundamental Principles Of Working With Data To Avoid Getting Caught In The Hype Cycle: The data ecosystem has seen a constant flurry of activity for the past several years, and it shows no signs of slowing down. With all of the products, techniques, and buzzwords being discussed it can be easy to be overcome by the hype. In this episode Juan Sequeda and Tim Gasper from data.world share their views on the core principles that you can use to ground your work and avoid getting caught in the hype cycles.
Podcast episode
Revisit The Fundamental Principles Of Working With Data To Avoid Getting Caught In The Hype Cycle: The data ecosystem has seen a constant flurry of activity for the past several years, and it shows no signs of slowing down. With all of the products, techniques, and buzzwords being discussed it can be easy to be overcome by the hype. In this episode Juan Sequeda and Tim Gasper from data.world share their views on the core principles that you can use to ground your work and avoid getting caught in the hype cycles.
byData Engineering Podcast
0 ratings
0% found this document useful
Analyze Massive Data At Interactive Speeds With The Power Of Bitmaps Using FeatureBase: An interview with Matt Jaffee about FeatureBase, an open source bitmap database that allows you to query and analyze massive data sets at interactive speeds and the work they have done to simplify integration with the rest of your data platform.
Podcast episode
Analyze Massive Data At Interactive Speeds With The Power Of Bitmaps Using FeatureBase: An interview with Matt Jaffee about FeatureBase, an open source bitmap database that allows you to query and analyze massive data sets at interactive speeds and the work they have done to simplify integration with the rest of your data platform.
byData Engineering Podcast
0 ratings
0% found this document useful
Streaming Data Pipelines Made SQL With Decodable: An interview with Eric Sammer about the difficulty of working with streaming engines at a low level of abstraction and how he and his team at Decodable are working to make development of streaming data pipelines as straightforward as writing SQL
Podcast episode
Streaming Data Pipelines Made SQL With Decodable: An interview with Eric Sammer about the difficulty of working with streaming engines at a low level of abstraction and how he and his team at Decodable are working to make development of streaming data pipelines as straightforward as writing SQL
byData Engineering Podcast
0 ratings
0% found this document useful
Accelerated data science with a Kaggle grandmaster: featuring Christof Henkel
Podcast episode
Accelerated data science with a Kaggle grandmaster: featuring Christof Henkel
byPractical AI: Machine Learning, Data Science
0 ratings
0% found this document useful
Reflections On Designing A Data Platform From Scratch: A monologue by Tobias Macey, the host of the show, about the design considerations involved in building a data platform and how the lessons learned from running the Data Engineering Podcast are influencing the choices made.
Podcast episode
Reflections On Designing A Data Platform From Scratch: A monologue by Tobias Macey, the host of the show, about the design considerations involved in building a data platform and how the lessons learned from running the Data Engineering Podcast are influencing the choices made.
byData Engineering Podcast
100%
100% found this document useful
Low Friction Data Governance With Immuta: An interview about how the Immuta platform simplifies the work of managing access control and data security as part of your data governance strategy.
Podcast episode
Low Friction Data Governance With Immuta: An interview about how the Immuta platform simplifies the work of managing access control and data security as part of your data governance strategy.
byData Engineering Podcast
0 ratings
0% found this document useful
Hasty Treat - Webhooks: In this Hasty Treat, Scott and Wes talk about webhooks — one of those concepts that seems a lot scarier than it actually is. Linode - Sponsor Whether you’re working on a personal project or managing enterprise infrastructure, you deserve simple,...
Podcast episode
Hasty Treat - Webhooks: In this Hasty Treat, Scott and Wes talk about webhooks — one of those concepts that seems a lot scarier than it actually is. Linode - Sponsor Whether you’re working on a personal project or managing enterprise infrastructure, you deserve simple,...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
New DataFramed Episodes
Podcast episode
New DataFramed Episodes
byDataFramed
0 ratings
0% found this document useful
#608: Generative AI Roundup - August 2023: Simon takes you on a tour of your GenAI options. From software development, to AI policy, to trialli
Podcast episode
#608: Generative AI Roundup - August 2023: Simon takes you on a tour of your GenAI options. From software development, to AI policy, to trialli
byAWS Podcast
0 ratings
0% found this document useful
Spanner Myths Busted with Pritam Shah and Vaibhav Govil: This week, we’re busting myths around Google Cloud Spanner with our guests Pritam Shah and Vaibhav Govil. and host this episode and learn about the fantastic capabilities of Cloud Spanner. Our guests give us a quick run-down of Spanner database...
Podcast episode
Spanner Myths Busted with Pritam Shah and Vaibhav Govil: This week, we’re busting myths around Google Cloud Spanner with our guests Pritam Shah and Vaibhav Govil. and host this episode and learn about the fantastic capabilities of Cloud Spanner. Our guests give us a quick run-down of Spanner database...
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
#124 Using AI to Improve Data Quality in Healthcare
Podcast episode
#124 Using AI to Improve Data Quality in Healthcare
byDataFramed
0 ratings
0% found this document useful
#515: [Right Now at AWS] Episode 15 - Future of Payments Dominated by AI / ML & Emerging Payments Use Cases: Fintechs are enabling seamless payments and are increasingly providing more options, like extending
Podcast episode
#515: [Right Now at AWS] Episode 15 - Future of Payments Dominated by AI / ML & Emerging Payments Use Cases: Fintechs are enabling seamless payments and are increasingly providing more options, like extending
byAWS Podcast
0 ratings
0% found this document useful
A Practical Introduction To Graph Data Applications - Episode 144: An interview with the authors of the Practitioner's Guide To Graph Data about how, when, and why to use graph data algorithms and data structures.
Podcast episode
A Practical Introduction To Graph Data Applications - Episode 144: An interview with the authors of the Practitioner's Guide To Graph Data about how, when, and why to use graph data algorithms and data structures.
byData Engineering Podcast
0 ratings
0% found this document useful
Build Your Data Analytics Like An Engineer - Episode 81: An interview about how dbt enables your data teams to build better analytics in your data warehouse
Podcast episode
Build Your Data Analytics Like An Engineer - Episode 81: An interview about how dbt enables your data teams to build better analytics in your data warehouse
byData Engineering Podcast
0 ratings
0% found this document useful
#92 Democratizing Data in Large Enterprises
Podcast episode
#92 Democratizing Data in Large Enterprises
byDataFramed
0 ratings
0% found this document useful
#55 Getting Your First Data Science Job
Podcast episode
#55 Getting Your First Data Science Job
byDataFramed
0 ratings
0% found this document useful

Skip carousel

Understanding ELT & ETL
Techfastly
Article
Understanding ELT & ETL
Apr 1, 2021
8 min read
01 Giving Data Collectors—and Donors—a Real-Time Rush
Fast Company
Article
01 Giving Data Collectors—and Donors—a Real-Time Rush
Mar 20, 2017
7 min read
What is ELT?
Techfastly
Article
What is ELT?
Apr 1, 2021
It stands for extract, load, and transform- the processes a data pipeline uses for replicating the data from a source system into a target system such as a cloud data warehouse. 1. Extraction is the first step in which data is copied from the source
6 min read
Types Of Databases
Linux Format
Article
Types Of Databases
Aug 27, 2019
NoSQL databases provide the performance, scalability and stability that’s required by the modern data-driven apps we interact with these days. But that is where the similarity between NoSQL systems end. In fact, it wouldn’t be wrong to say that the o
1 min read
Rokoko Studio 2.0
3D World
Article
Rokoko Studio 2.0
Feb 23, 2021
1 min read
Build A Search And Analytic Engine
Linux Format
Article
Build A Search And Analytic Engine
Mar 10, 2020
7 min read
Grafana Terminology
Linux Format
Article
Grafana Terminology
Jan 14, 2020
A Grafana data source is a database, file or service that provides data to Grafana – it cannot operate without data. A Grafana panel is the basic building block of Grafana. Panels are made of visualisations or queries. A Grafana query is used for req
1 min read
CSV Handling
Linux Format
Article
CSV Handling
Mar 10, 2020
3 min read
01 Ready Or Not, AI Is Here To Assist You
HWM Singapore
Article
01 Ready Or Not, AI Is Here To Assist You
Jul 11, 2023
4 min read
2 The Use of Python in AI and ML
Techfastly
Article
2 The Use of Python in AI and ML
Nov 30, 2020
3 min read
Getting The edge
The European Business Review
Article
Getting The edge
Feb 25, 2021
7 min read
Buying The Tool
Techfastly
Article
Buying The Tool
Apr 1, 2021
3 min read
Inform And Enhance Your Business With Open Data
PC Pro Magazine
Article
Inform And Enhance Your Business With Open Data
Jun 10, 2021
7 min read
Building Trends, Building Momentum
Facility Management
Article
Building Trends, Building Momentum
Oct 14, 2019
3 min read
Machine-learning On Your Android Phone?
APC
Article
Machine-learning On Your Android Phone?
Dec 30, 2019
4 min read
Data-driven Decision Making That Uses Data, Mind And Heart
The European Business Review
Article
Data-driven Decision Making That Uses Data, Mind And Heart
Jan 31, 2020
14 min read
Decoding The Impact Of AI
Her World Singapore
Article
Decoding The Impact Of AI
May 5, 2023
6 min read
Mining Actionable Information with Smart Capture
The European Business Review
Article
Mining Actionable Information with Smart Capture
May 22, 2018
4 min read
The Machine Learning Revolution
APC
Article
The Machine Learning Revolution
Sep 6, 2021
8 min read
Quantum Leap
Marketing
Article
Quantum Leap
Jul 11, 2019
6 min read
Leadership Forum: Investing in Disruption
Rotman Management
Article
Leadership Forum: Investing in Disruption
Jan 1, 2019
10 min read
Cloudy With No Chance Of Erp
Architectural Review Asia Pacific
Article
Cloudy With No Chance Of Erp
Nov 11, 2019
ERP (enterprise resource planning) was born around the time the first ‘[Something] for Dummies’ book was published*. It’s typically inflexible, uncompromising software designed for large businesses, like banks, large corporations, manufacturing and s
2 min read
ARTIFICIAL INTELLIGENCE (AI) IN SUPPLY CHAIN PLANNING THE Future is Here & Now
The European Business Review
Article
ARTIFICIAL INTELLIGENCE (AI) IN SUPPLY CHAIN PLANNING THE Future is Here & Now
Dec 3, 2019
7 min read
How Google Is Making The AI That Powers Its Products Better.
HWM Singapore
Article
How Google Is Making The AI That Powers Its Products Better.
Jun 3, 2019
3 min read
The Machine Learning Revolution
Maximum PC
Article
The Machine Learning Revolution
Aug 17, 2021
8 min read
Pragmatic Parametricism
Architectural Review Asia Pacific
Article
Pragmatic Parametricism
Nov 13, 2020
4 min read
How Can AI Help Your Business?
PC Pro Magazine
Article
How Can AI Help Your Business?
Jun 8, 2023
7 min read
Salesforce Adding Einstein Analytics Al To Tableau Platform
Techfastly
Article
Salesforce Adding Einstein Analytics Al To Tableau Platform
Feb 4, 2021
3 min read
Why We Need To Fear The Risk Of AI Model Collapse
Evening Standard
Article
Why We Need To Fear The Risk Of AI Model Collapse
Dec 17, 2023
4 min read
Will Generative AI Disrupt Your Company And Your need For Workers?
The European Business Review
Article
Will Generative AI Disrupt Your Company And Your need For Workers?
Jul 31, 2023
5 min read

Related categories

Skip carousel

Reviews for Getting Started with Greenplum for Big Data Analytics

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Getting Started with Greenplum for Big Data Analytics - Gollapudi Sunila

Getting Started with Greenplum for Big Data Analytics

Credits

Foreword

About the Author

Acknowledgement

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers and more

Why Subscribe?

Free Access for Packt account holders

Instant Updates on New Packt Books

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Errata

Piracy

Questions

1. Big Data, Analytics, and Data Science Life Cycle

Enterprise data

Classification

Features

Big Data

So, what is Big Data?

Multi-structured data

Data analytics

Data science

Data science life cycle

Phase 1 – state business problem

Phase 2 – set up data

Phase 3 – explore/transform data

Phase 4 – model

Phase 5 – publish insights

Phase 6 – measure effectiveness

References/Further reading

Summary

2. Greenplum Unified Analytics Platform (UAP)

Big Data analytics – platform requirements

Greenplum Unified Analytics Platform (UAP)

Core components

Greenplum Database

Hadoop (HD)

Chorus

Command Center

Modules

Database modules

HD modules

Data Integration Accelerator (DIA) modules

Core architecture concepts

Data warehousing

Column-oriented databases

Parallel versus distributed computing/processing

Shared nothing, massive parallel processing (MPP) systems, and elastic scalability

Shared disk data architecture

Shared memory data architecture

Shared nothing data architecture

Data loading patterns

Greenplum UAP components

Greenplum Database

The Greenplum Database physical architecture

The Greenplum high-availability architecture

High-speed data loading using external tables

External table types

Polymorphic data storage and historic data management

Data distribution

Hadoop (HD)

Hadoop Distributed File System (HDFS)

Hadoop MapReduce

Chorus

Greenplum Data Computing Appliance (DCA)

Greenplum Data Integration Accelerator (DIA)

References/Further reading

Summary

3. Advanced Analytics – Paradigms, Tools, and Techniques

Analytic paradigms

Descriptive analytics

Predictive analytics

Prescriptive analytics

Analytics classified

Classification

Forecasting or prediction or regression

Clustering

Optimization

Simulations

Modeling methods

Decision trees

Association rules

The Apriori algorithm

Linear regression

Logistic regression

The Naive Bayesian classifier

K-means clustering

Text analysis

R programming

Weka

In-database analytics using MADlib

References/Further reading

Summary

4. Implementing Analytics with Greenplum UAP

Data loading for Greenplum Database and HD

Greenplum data loading options

External tables

gpfdist

gpload

Hadoop (HD) data loading options

Sqoop 2

Greenplum BulkLoader for Hadoop

Using external ETL to load data into Greenplum

Extraction, Load, and Transformation (ELT) and Extraction, Transformation, Load, and Transformation (ETLT)

Greenplum target configuration

Sourcing large volumes of data from Greenplum

Unsupported Greenplum data types

Push Down Optimization (PDO)

Greenplum table distribution and partitioning

Distribution

Data skew and performance

Optimizing the broadcast or redistribution motion for data co-location

Partitioning

Querying Greenplum Database and HD

Querying Greenplum Database

Analyzing and optimizing queries

The ANALYZE function

The EXPLAIN function

Dynamic Pipelining in Greenplum

Querying HDFS

Hive

Pig

Data communication between Greenplum Database and Hadoop (using external tables)

Data Computing Appliance (DCA)

Storage design, disk protection, and fault tolerance

Master server RAID configurations

Segment server RAID configurations

Monitoring DCA

Greenplum Database management

In-database analytics options (Greenplum-specific)

Window functions

The PARTITION BY clause

The ORDER BY clause

The OVER (ORDER BY…) clause

Creating, modifying, and dropping functions

User-defined aggregates

Using R with Greenplum

DBI Connector for R

PL/R

Using Weka with Greenplum

Using MADlib with Greenplum

Using Greenplum Chorus

Pivotal

References/Further reading

Summary

Index

Getting Started with Greenplum for Big Data Analytics

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: October 2013

Production Reference: 1171013

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78217-704-3

www.packtpub.com

Cover Image by Aniket Sawant (<aniket_sawant_photography@hotmail.com>)

Credits

Author

Sunila Gollapudi

Reviewers

Brian Feeny

Scott Kahler

Alan Koskelin

Tuomas Nevanranta

Acquisition Editor

Kevin Colaco

Commissioning Editor

Deepika Singh

Technical Editors

Kanhucharan Panda

Vivek Pillai

Project Coordinator

Amey Sawant

Proofreader

Bridget Braund

Indexer

Mariammal Chettiyar

Graphics

Valentina D'silva

Ronak Dhruv

Abhinash Sahu

Production Coordinator

Adonia Jones

Cover Work

Adonia Jones

Foreword

In the last decade, we have seen the impact of exponential advances in technology on the way we work, shop, communicate, and think. At the heart of this change is our ability to collect and gain insights into data; and comments like Data is the new oil or we have a Data Revolution only amplifies the importance of data in our lives.

Tim Berners-Lee, inventor of the World Wide Web said, Data is a precious thing and will last longer than the systems themselves. IBM recently stated that people create a staggering 2.5 quintillion bytes of data every day (that's roughly equivalent to over half a billion HD movie downloads). This information is generated from a huge variety of sources including social media posts, digital pictures, videos, retail transactions, and even the GPS tracking functions of mobile phones.

This data explosion has led to the term Big Data moving from an Industry buzz word to practically a household term very rapidly. Harnessing Big Data to extract insights is not an easy task; the potential rewards for finding these patterns are huge, but it will require technologists and data scientists to work together to solve these problems.

The book written by Sunila Gollapudi, Getting Started with Greenplum for Big Data Analytics, has been carefully crafted to address the needs of both the technologists and data scientists.

Sunila starts with providing excellent background to the Big Data problem and why new thinking and skills are required. Along with a dive deep into advanced analytic techniques, she brings out the difference in thinking between the new Big Data science and the traditional Business Intelligence, this is especially useful to help understand and bridge the skill gap.

She moves on to discuss the computing side of the equation-handling scale, complexity of data sets, and rapid response times. The key here is to eliminate the noise in data early in the data science life cycle. Here, she talks about how to use one of the industry's leading product platforms like Greenplum to build Big Data solutions with an explanation on the need for a unified platform that can bring essential software components (commercial/open source) together backed by a hardware/appliance.

She then puts the two together to get the desired result—how to get meaning out of Big Data. In the process, she also brings out the capabilities of the R programming language, which is mainly used in the area of statistical computing, graphics, and advanced analytics.

Her easy-to-read practical style of writing with real examples shows her depth of understanding of this subject. The book would be very useful for both data scientists (who need to learn the computing side and technologies to understand) and also for those who aspire to learn data science.

V. Laxmikanth

Managing Director

Broadridge Financial Solutions (India) Private Limited

www.broadridge.com

About the Author

Sunila Gollapudi works as a Technology Architect for Broadridge Financial Solutions Private Limited. She has over 13 years of experience in developing, designing and architecting data-driven solutions with a focus on the banking and financial services domain for around eight years. She drives Big Data and data science practice for Broadridge. Her key roles have been Solutions Architect, Technical leader, Big Data evangelist, and Mentor.

Sunila has a Master's degree in Computer Applications and her passion for mathematics enthused her into data and analytics. She worked on Java, Distributed Architecture, and was a SOA consultant and Integration Specialist before she embarked on her data journey. She is a strong follower of open source technologies and believes in the innovation that open source revolution brings.

She has been a speaker at various conferences and meetups on Java and Big Data. Her current Big Data and data science specialties include Hadoop, Greenplum, R, Weka, MADlib, advanced analytics, machine learning, and data integration tools such as Pentaho and Informatica.

With a unique blend of technology and domain expertise, Sunila has been instrumental in conceptualizing architectural patterns and providing reference architecture for Big Data problems in the financial services domain.

Acknowledgement

It was a pleasure to work with Packt Publishing on this project. Packt has been most accommodating, extremely quick, and responsive to all requests.

I am deeply grateful to Broadridge for providing me the platform to explore and build expertise in Big Data technologies. My greatest gratitude to Laxmikanth V. (Managing Director, Broadridge) and Niladri Ray (Executive Vice President, Broadridge) for all the trust, freedom, and confidence in me.

Thanks to my parents for having relentlessly encouraged me to explore any and every subject that interested me.

Authors usually thank their spouses for their patience and support or words to that effect. Unless one has lived through the actual experience, one cannot fully comprehend how true this is. Over the last ten years, Kalyan has endured what must have seemed like a nearly continuous stream of whining punctuated by occasional outbursts of exhilaration and grandiosity—all of which before the background of the self-absorbed attitude of a typical author. His patience and support were unfailing.

Last but not least, my love, my daughter, my angel, Nikita, who has been my continuous drive. Without her being as accommodative as she was, this book wouldn't have been possible.

About the Reviewers

Brian Feeny is a technologist/evangelist working with many Big Data technologies such as analytics, visualization, data mining, machine learning, and statistics. He is a graduate student in Software Engineering at Harvard University, primarily focused on data science, where he gets to work on interesting data problems using some of the latest methods and technology.

Brian works for Presidio Networked Solutions, where he helps businesses with their Big Data challenges and helps them understand how to make best use of their data.

I would like to thank my wife, Scarlett, for her tolerance of my busy schedule. I would like to thank Presidio, my employer, for investing in in our Big Data practice. Lastly, I would like to thank EMC and Pivotal for the excellent training and support they have given Presidio and myself.

Scott Kahler started down the path in the mid 80s when he disconnected the power LED on his Commodore 64. In this fashion he could run his handwritten Dungeons and Dragons' random character generator, and his parents wouldn't complain about the computer being

Enjoying the preview?

Page 1 of 1

Getting Started with Greenplum for Big Data Analytics

About this ebook

Gollapudi Sunila

Related authors

Related to Getting Started with Greenplum for Big Data Analytics

Related ebooks

Data Visualization For You

Related podcast episodes

Related articles

Related categories

Reviews for Getting Started with Greenplum for Big Data Analytics

What did you think?

Book preview

Getting Started with Greenplum for Big Data Analytics - Gollapudi Sunila

Table of Contents

Getting Started with Greenplum for Big Data Analytics

Getting Started with Greenplum for Big Data Analytics

Credits

Foreword

About the Author

Acknowledgement

About the Reviewers