Ebook714 pages3 hours

Pig Design Patterns

Name: Pig Design Patterns
Author: Pasupuleti Pradeep
ISBN: 9781783285563

By Pasupuleti Pradeep

Rating: 0 out of 5 stars

()

Read preview

About this ebook

A comprehensive practical guide that walks you through the multiple stages of data management in enterprise and gives you numerous design patterns with appropriate code examples to solve frequent problems in each of these stages. The chapters are organized to mimick the sequential data flow evidenced in Analytics platforms, but they can also be read independently to solve a particular group of problems in the Big Data life cycle.

If you are an experienced developer who is already familiar with Pig and is looking for a use case standpoint where they can relate to the problems of data ingestion, profiling, cleansing, transforming, and egressing data encountered in the enterprises. Knowledge of Hadoop and Pig is necessary for readers to grasp the intricacies of Pig design patterns better.

Skip carousel

LanguageEnglish

PublisherPackt Publishing

Release dateApr 17, 2014

ISBN9781783285563

Author

Pasupuleti Pradeep

Related authors

Skip carousel

Related to Pig Design Patterns

Related ebooks

Skip carousel

Learning Apache Mahout
Ebook
Learning Apache Mahout
byTiwary Chandramani
Rating: 0 out of 5 stars
0 ratings
Scala for Data Science
Ebook
Scala for Data Science
byBugnion Pascal
Rating: 0 out of 5 stars
0 ratings
Learning NHibernate 4
Ebook
Learning NHibernate 4
bySuhas Chatekar
Rating: 0 out of 5 stars
0 ratings
Apache Solr Search Patterns
Ebook
Apache Solr Search Patterns
byJayant Kumar
Rating: 0 out of 5 stars
0 ratings
ASP.NET MVC 4 Mobile App Development
Ebook
ASP.NET MVC 4 Mobile App Development
byAndy Meadows
Rating: 0 out of 5 stars
0 ratings
Administrating Solr
Ebook
Administrating Solr
bySurendra Mohan
Rating: 0 out of 5 stars
0 ratings
Hadoop Blueprints
Ebook
Hadoop Blueprints
byAnurag Shrivastava
Rating: 0 out of 5 stars
0 ratings
MongoDB Recipes: With Data Modeling and Query Building Strategies
Ebook
MongoDB Recipes: With Data Modeling and Query Building Strategies
bySubhashini Chellappan
Rating: 0 out of 5 stars
0 ratings
Apache Hive Essentials
Ebook
Apache Hive Essentials
byDayong Du
Rating: 0 out of 5 stars
0 ratings
Mastering Google App Engine
Ebook
Mastering Google App Engine
byHijazee Mohsin Shafique
Rating: 0 out of 5 stars
0 ratings
Learning Heroku Postgres
Ebook
Learning Heroku Postgres
byPatrick Espake
Rating: 0 out of 5 stars
0 ratings
MariaDB Essentials
Ebook
MariaDB Essentials
byKenler Emilien
Rating: 0 out of 5 stars
0 ratings
FuelPHP Application Development Blueprints
Ebook
FuelPHP Application Development Blueprints
bySébastien Drouyer
Rating: 0 out of 5 stars
0 ratings
Big Data Analytics
Ebook
Big Data Analytics
byVenkat Ankam
Rating: 0 out of 5 stars
0 ratings
AngularJS Web Application Development Cookbook
Ebook
AngularJS Web Application Development Cookbook
byMatt Frisbie
Rating: 0 out of 5 stars
0 ratings
Location-Aware Applications
Ebook
Location-Aware Applications
byRichard Ferraro
Rating: 0 out of 5 stars
0 ratings
Spotfire A Complete Guide - 2021 Edition
Ebook
Spotfire A Complete Guide - 2021 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Lo-Dash Essentials
Ebook
Lo-Dash Essentials
byBoduch Adam
Rating: 0 out of 5 stars
0 ratings
Learning Underscore.js
Ebook
Learning Underscore.js
byPop Alex
Rating: 0 out of 5 stars
0 ratings
PostgreSQL 9 Administration Cookbook - Second Edition
Ebook
PostgreSQL 9 Administration Cookbook - Second Edition
bySimon Riggs
Rating: 0 out of 5 stars
0 ratings
The Best Damn Exchange, SQL and IIS Book Period
Ebook
The Best Damn Exchange, SQL and IIS Book Period
byHenrik Walther
Rating: 0 out of 5 stars
0 ratings
Drupal for Education and E-Learning
Ebook
Drupal for Education and E-Learning
byBill Fitzgerald
Rating: 3 out of 5 stars
3/5
Learning Apache Thrift
Ebook
Learning Apache Thrift
byRakowski Krzysztof
Rating: 0 out of 5 stars
0 ratings
Java 9 with JShell
Ebook
Java 9 with JShell
byGastón C. Hillar
Rating: 0 out of 5 stars
0 ratings
Proof Of Concept Poc A Complete Guide - 2020 Edition
Ebook
Proof Of Concept Poc A Complete Guide - 2020 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Cassandra Design Patterns - Second Edition
Ebook
Cassandra Design Patterns - Second Edition
byThottuvaikkatumana Rajanarayanan
Rating: 0 out of 5 stars
0 ratings
Real-time Analytics with Storm and Cassandra
Ebook
Real-time Analytics with Storm and Cassandra
byShilpi Saxena
Rating: 0 out of 5 stars
0 ratings
Beginning SQL Server Reporting Services
Ebook
Beginning SQL Server Reporting Services
byKathi Kellenberger
Rating: 0 out of 5 stars
0 ratings
Scrum Release Management: Successful Combination of Scrum, Lean Startup, and User Story Mapping
Ebook
Scrum Release Management: Successful Combination of Scrum, Lean Startup, and User Story Mapping
byPaul C. Porter
Rating: 0 out of 5 stars
0 ratings
Meteor in Action
Ebook
Meteor in Action
byStephan Hochhaus
Rating: 0 out of 5 stars
0 ratings

Data Modeling & Design For You

Skip carousel

Hacks To Crush Plc Program Fast & Efficiently Everytime... : Coding, Simulating & Testing Programmable Logic Controller With Examples
Ebook
Hacks To Crush Plc Program Fast & Efficiently Everytime... : Coding, Simulating & Testing Programmable Logic Controller With Examples
byMichael Blake
Rating: 5 out of 5 stars
5/5
Python: Master the Art of Design Patterns
Ebook
Python: Master the Art of Design Patterns
byDusty Phillips
Rating: 4 out of 5 stars
4/5
Data Analytics for Beginners: Introduction to Data Analytics
Ebook
Data Analytics for Beginners: Introduction to Data Analytics
byAnthony S. Williams
Rating: 4 out of 5 stars
4/5
DAX Patterns: Second Edition
Ebook
DAX Patterns: Second Edition
byMarco Russo
Rating: 5 out of 5 stars
5/5
Bayesian Analysis with Python
Ebook
Bayesian Analysis with Python
byOsvaldo Martin
Rating: 5 out of 5 stars
5/5
Power Pivot and Power BI: The Excel User's Guide to DAX, Power Query, Power BI & Power Pivot in Excel 2010-2016
Ebook
Power Pivot and Power BI: The Excel User's Guide to DAX, Power Query, Power BI & Power Pivot in Excel 2010-2016
byRob Collie
Rating: 4 out of 5 stars
4/5
Raspberry Pi :Raspberry Pi Guide On Python & Projects Programming In Easy Steps
Ebook
Raspberry Pi :Raspberry Pi Guide On Python & Projects Programming In Easy Steps
byJason Scotts
Rating: 3 out of 5 stars
3/5
Tableau Cookbook – Recipes for Data Visualization
Ebook
Tableau Cookbook – Recipes for Data Visualization
byShweta Sankhe-Savale
Rating: 0 out of 5 stars
0 ratings
Mastering Agile User Stories
Ebook
Mastering Agile User Stories
byDeEtta Balthazar
Rating: 4 out of 5 stars
4/5
Minding the Machines: Building and Leading Data Science and Analytics Teams
Ebook
Minding the Machines: Building and Leading Data Science and Analytics Teams
byJeremy Adamson
Rating: 0 out of 5 stars
0 ratings
Python Data Analysis
Ebook
Python Data Analysis
byIvan Idris
Rating: 4 out of 5 stars
4/5
WordPress For Beginners - How To Set Up A Self Hosted WordPress Blog
Ebook
WordPress For Beginners - How To Set Up A Self Hosted WordPress Blog
byCyrus Jackson
Rating: 0 out of 5 stars
0 ratings
Supercharge Power BI: Power BI is Better When You Learn To Write DAX
Ebook
Supercharge Power BI: Power BI is Better When You Learn To Write DAX
byMatt Allington
Rating: 5 out of 5 stars
5/5
How To Make Money With 3D Printing: The New Digital Revolution
Ebook
How To Make Money With 3D Printing: The New Digital Revolution
byAdidas Wilson
Rating: 3 out of 5 stars
3/5
Learn T-SQL Querying: A guide to developing efficient and elegant T-SQL code
Ebook
Learn T-SQL Querying: A guide to developing efficient and elegant T-SQL code
byPedro Lopes
Rating: 0 out of 5 stars
0 ratings
Microsoft 365 Excel: The Only App That Matters: Calculations, Analytics, Modeling, Data Analysis and Dashboard Reporting for the New Era of Dynamic Data Driven Decision Making & Insight
Ebook
Microsoft 365 Excel: The Only App That Matters: Calculations, Analytics, Modeling, Data Analysis and Dashboard Reporting for the New Era of Dynamic Data Driven Decision Making & Insight
byMike Girvin
Rating: 3 out of 5 stars
3/5
The Secrets of ChatGPT Prompt Engineering for Non-Developers
Ebook
The Secrets of ChatGPT Prompt Engineering for Non-Developers
byCea West
Rating: 5 out of 5 stars
5/5
What Makes Us Smart: The Computational Logic of Human Cognition
Ebook
What Makes Us Smart: The Computational Logic of Human Cognition
bySamuel J. Gershman
Rating: 0 out of 5 stars
0 ratings
R All-in-One For Dummies
Ebook
R All-in-One For Dummies
byJoseph Schmuller
Rating: 0 out of 5 stars
0 ratings
Neural Networks: Neural Networks Tools and Techniques for Beginners
Ebook
Neural Networks: Neural Networks Tools and Techniques for Beginners
byJohn Slavio
Rating: 5 out of 5 stars
5/5
Data Visualization: a successful design process
Ebook
Data Visualization: a successful design process
byAndy Kirk
Rating: 4 out of 5 stars
4/5
Data Analytics with Python: Data Analytics in Python Using Pandas
Ebook
Data Analytics with Python: Data Analytics in Python Using Pandas
byFrank Millstein
Rating: 3 out of 5 stars
3/5
Advanced Deep Learning with Python: Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch
Ebook
Advanced Deep Learning with Python: Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch
byIvan Vasilev
Rating: 0 out of 5 stars
0 ratings
Logic Design: A Review Of Theory And Practice
Ebook
Logic Design: A Review Of Theory And Practice
byGlen G. Jr. Langdon
Rating: 0 out of 5 stars
0 ratings
Data Visualization with D3.js Cookbook
Ebook
Data Visualization with D3.js Cookbook
byNick Qi Zhu
Rating: 0 out of 5 stars
0 ratings
150 Most Poweful Excel Shortcuts: Secrets of Saving Time with MS Excel
Ebook
150 Most Poweful Excel Shortcuts: Secrets of Saving Time with MS Excel
byAndrei Besedin
Rating: 3 out of 5 stars
3/5
Spreadsheets To Cubes (Advanced Data Analytics for Small Medium Business): Data Science
Ebook
Spreadsheets To Cubes (Advanced Data Analytics for Small Medium Business): Data Science
byalasdair gilchrist
Rating: 0 out of 5 stars
0 ratings
Tableau Desktop Certified Associate: Exam Guide: Develop your Tableau skills and prepare for Tableau certification with tips from industry experts
Ebook
Tableau Desktop Certified Associate: Exam Guide: Develop your Tableau skills and prepare for Tableau certification with tips from industry experts
byDmitry Anoshin
Rating: 0 out of 5 stars
0 ratings
Thinking in Algorithms: Strategic Thinking Skills, #2
Ebook
Thinking in Algorithms: Strategic Thinking Skills, #2
byAlbert Rutherford
Rating: 5 out of 5 stars
5/5
Living in Data: A Citizen's Guide to a Better Information Future
Ebook
Living in Data: A Citizen's Guide to a Better Information Future
byJer Thorp
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
Podcast episode
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
byInvest Like the Best with Patrick O'Shaughnessy
0 ratings
0% found this document useful
Unlocking The Power of Data Lineage In Your Platform with OpenLineage: An interview with Julien Le Dem about the OpenLineage specification and the opportunity that it offers for simplifying the tracking and analysis of data lineage across your data platform.
Podcast episode
Unlocking The Power of Data Lineage In Your Platform with OpenLineage: An interview with Julien Le Dem about the OpenLineage specification and the opportunity that it offers for simplifying the tracking and analysis of data lineage across your data platform.
byData Engineering Podcast
0 ratings
0% found this document useful
120: FastAPI & Typer - Sebastián Ramírez: Sebastián Ramírez is the developer behind FastAPI for Python REST APIs and Typer, for CLI applications. We discuss FastAPI, Typer, Swagger UI, interface design, autocompletion, and more.
Podcast episode
120: FastAPI & Typer - Sebastián Ramírez: Sebastián Ramírez is the developer behind FastAPI for Python REST APIs and Typer, for CLI applications. We discuss FastAPI, Typer, Swagger UI, interface design, autocompletion, and more.
byTest and Code
0 ratings
0% found this document useful
Lessons Learned from Cloud Foundry
Podcast episode
Lessons Learned from Cloud Foundry
byThe Cloudcast
0 ratings
0% found this document useful
Understanding Time-Series Database Patterns
Podcast episode
Understanding Time-Series Database Patterns
byThe Cloudcast
0 ratings
0% found this document useful
Real-World SRE Perspectives
Podcast episode
Real-World SRE Perspectives
byThe Cloudcast
0 ratings
0% found this document useful
Improving search with RAG architecture with Pinecone CEO Edo Liberty
Podcast episode
Improving search with RAG architecture with Pinecone CEO Edo Liberty
byNo Priors: Artificial Intelligence | Machine Learning | Technology | Startups
0 ratings
0% found this document useful
WBSP204: Grow Your Business by Understanding the Nuances of a Multi-Site ERP Implementation w/ Bob Feathers
Podcast episode
WBSP204: Grow Your Business by Understanding the Nuances of a Multi-Site ERP Implementation w/ Bob Feathers
byWBSRocks: Business Growth with ERP and Digital Transformation
0 ratings
0% found this document useful
118: Code Coverage and 100% Coverage: Code Coverage or Test Coverage is a way to measure what lines of code and branches in your code that are utilized during testing. Coverage tools are an important part of software engineering. But there's also lots of different opinions about using it. Should you try for 100% coverage? What code can and should you exclude? What about targets?
Podcast episode
118: Code Coverage and 100% Coverage: Code Coverage or Test Coverage is a way to measure what lines of code and branches in your code that are utilized during testing. Coverage tools are an important part of software engineering. But there's also lots of different opinions about using it. Should you try for 100% coverage? What code can and should you exclude? What about targets?
byTest and Code
0 ratings
0% found this document useful
Composable Data Analytics
Podcast episode
Composable Data Analytics
byThe Cloudcast
0 ratings
0% found this document useful
DevOps and Incident Response Evolution
Podcast episode
DevOps and Incident Response Evolution
byThe Cloudcast
0 ratings
0% found this document useful
Strategic Buy-In For FOSS4G: Embracing Open-Source Geospatial Technology is easy as an individual but what if you want your organization to use FOSS4G How do you get strategic buy-in? It turns out that the software does not sell itself and that even in the age of AI we still hav...
Podcast episode
Strategic Buy-In For FOSS4G: Embracing Open-Source Geospatial Technology is easy as an individual but what if you want your organization to use FOSS4G How do you get strategic buy-in? It turns out that the software does not sell itself and that even in the age of AI we still hav...
byThe MapScaping Podcast - GIS, Geospatial, Remote Sensing, earth observation and digital geography
0 ratings
0% found this document useful
The "Normsky" architecture for AI coding agents — with Beyang Liu + Steve Yegge of SourceGraph
Podcast episode
The "Normsky" architecture for AI coding agents — with Beyang Liu + Steve Yegge of SourceGraph
byLatent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and all things Software 3.0
0 ratings
0% found this document useful
BEST-OF-BRAD: Using Top Tier Solutions to Build Hybrid Cloud Ecosystems with Brad Feakes: Today on What the Duck?!, we’re ducking around with Brad Feakes, an expert in operations, supply chain management, and information technology. Brad sits down with Host, Sarah Scudder, to discuss the use of best-of-breed solutions to build hybrid Cloud ecosystems that support ERP customer needs. Brad shares his personal and professional journey, including his education, career choices, and his experience working with ERP systems in manufacturing companies. They also touch upon Brad's role as a business analyst and his involvement in implementing Epicor as company-wide ERP system and his current role at EstesGroup.
Podcast episode
BEST-OF-BRAD: Using Top Tier Solutions to Build Hybrid Cloud Ecosystems with Brad Feakes: Today on What the Duck?!, we’re ducking around with Brad Feakes, an expert in operations, supply chain management, and information technology. Brad sits down with Host, Sarah Scudder, to discuss the use of best-of-breed solutions to build hybrid Cloud ecosystems that support ERP customer needs. Brad shares his personal and professional journey, including his education, career choices, and his experience working with ERP systems in manufacturing companies. They also touch upon Brad's role as a business analyst and his involvement in implementing Epicor as company-wide ERP system and his current role at EstesGroup.
byWhat the Duck - Another Supply Chain Podcast
0 ratings
0% found this document useful
Convergence: How Ops, Dev and SRE are coming together
Podcast episode
Convergence: How Ops, Dev and SRE are coming together
byTechnology Now
0 ratings
0% found this document useful
Cloud Dataflow with Frances Perry: Cloud Dataflow and its OSS counterpart Apache Beam are amazing tools for Big Data so we asked Frances Perry, the Tech Lead and PMC for those projects, to join us and tell us more about it.
Podcast episode
Cloud Dataflow with Frances Perry: Cloud Dataflow and its OSS counterpart Apache Beam are amazing tools for Big Data so we asked Frances Perry, the Tech Lead and PMC for those projects, to join us and tell us more about it.
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
EP 121: Faster and More Accurate Results From ChatGPT with ScholarAI
Podcast episode
EP 121: Faster and More Accurate Results From ChatGPT with ScholarAI
byEveryday AI Podcast – An AI and ChatGPT Podcast
0 ratings
0% found this document useful
SAP + Apigee: The Power of APIs with Benjamin Schuler and Dave Feuer: Max Saltonstall and Carter Morgan co-host the podcast this week and talk APIs with our guests Benjamin Schuler and Dave Feuer.
Podcast episode
SAP + Apigee: The Power of APIs with Benjamin Schuler and Dave Feuer: Max Saltonstall and Carter Morgan co-host the podcast this week and talk APIs with our guests Benjamin Schuler and Dave Feuer.
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
69: Testing Front End Code: Summary Oren Rubin (@Shexman) goes through why it’s important to not only test the back-end code of our applications but also to test our Front End code, the integration points, and the full user experience. Oren also goes through...
Podcast episode
69: Testing Front End Code: Summary Oren Rubin (@Shexman) goes through why it’s important to not only test the back-end code of our applications but also to test our Front End code, the integration points, and the full user experience. Oren also goes through...
byThe Web Platform Podcast
0 ratings
0% found this document useful
Harnessing Generative AI For Creating Educational Content With Illumidesk: Generative AI has unlocked a massive opportunity for content creation. There is also an unfulfilled need for experts to be able to share their knowledge and build communities. Illumidesk was built to take advantage of this intersection. In this episode Greg Werner explains how they are using generative AI as an assistive tool for creating educational material, as well as building a data driven experience for learners.
Podcast episode
Harnessing Generative AI For Creating Educational Content With Illumidesk: Generative AI has unlocked a massive opportunity for content creation. There is also an unfulfilled need for experts to be able to share their knowledge and build communities. Illumidesk was built to take advantage of this intersection. In this episode Greg Werner explains how they are using generative AI as an assistive tool for creating educational material, as well as building a data driven experience for learners.
byData Engineering Podcast
0 ratings
0% found this document useful
Understanding Machine Learning Features and Platforms
Podcast episode
Understanding Machine Learning Features and Platforms
byThe Cloudcast
0 ratings
0% found this document useful
Managing SAP to the Cloud
Podcast episode
Managing SAP to the Cloud
byThe Cloudcast
0 ratings
0% found this document useful
129: How to Test Anything - David Lord: I asked people on twitter to fill in "How do I test _____?" to find out what people want to know how to test. Lots of responses. David Lord agreed to answer them with me. In the process, we come up with lots of great general advice on how to test just about anything.
Podcast episode
129: How to Test Anything - David Lord: I asked people on twitter to fill in "How do I test _____?" to find out what people want to know how to test. Lots of responses. David Lord agreed to answer them with me. In the process, we come up with lots of great general advice on how to test just about anything.
byTest and Code
0 ratings
0% found this document useful
Building Vector Search Applications
Podcast episode
Building Vector Search Applications
byThe Cloudcast
0 ratings
0% found this document useful
How to Get Better Traffic, Free with Semantic Data: Everyone wants better ranking but that’s not really what they want. Stores want more traffic. They want that because it leads to sales. Sales is what really matters. All of SEO is just a marketing channel to get more visitors who turn into customers. The main FIX for SEO is better rankings What if there was a way to get more sales with the existing rankings you already have? How? By convincing more searchers to click on your search listings. More clicks == more traffic == more sales. What we’re talking about today is a search enhancement in Google called Rich Snippets, specifically Product Rich Snippets. They enhance your existing search results with more data and pixels. Eric Davis joins us today to walk us through it in easy to understand terms. Eric is founder of Little Stream Software, which helps Shopify entrepreneurs customize their Shopify stores using public and private Shopify Apps.
Podcast episode
How to Get Better Traffic, Free with Semantic Data: Everyone wants better ranking but that’s not really what they want. Stores want more traffic. They want that because it leads to sales. Sales is what really matters. All of SEO is just a marketing channel to get more visitors who turn into customers. The main FIX for SEO is better rankings What if there was a way to get more sales with the existing rankings you already have? How? By convincing more searchers to click on your search listings. More clicks == more traffic == more sales. What we’re talking about today is a search enhancement in Google called Rich Snippets, specifically Product Rich Snippets. They enhance your existing search results with more data and pixels. Eric Davis joins us today to walk us through it in easy to understand terms. Eric is founder of Little Stream Software, which helps Shopify entrepreneurs customize their Shopify stores using public and private Shopify Apps.
byThe Unofficial Shopify Podcast: Entrepreneur Tales
0 ratings
0% found this document useful
Continuous Application Profiling
Podcast episode
Continuous Application Profiling
byThe Cloudcast
0 ratings
0% found this document useful
#608: Generative AI Roundup - August 2023: Simon takes you on a tour of your GenAI options. From software development, to AI policy, to trialli
Podcast episode
#608: Generative AI Roundup - August 2023: Simon takes you on a tour of your GenAI options. From software development, to AI policy, to trialli
byAWS Podcast
0 ratings
0% found this document useful
Understanding Graph Database Patterns
Podcast episode
Understanding Graph Database Patterns
byThe Cloudcast
0 ratings
0% found this document useful
Serverless Data APIs
Podcast episode
Serverless Data APIs
byThe Cloudcast
0 ratings
0% found this document useful
WBSP369: Grow Your Business by Avoiding Odoo ERP Implementation Disasters w/ Nick Foy
Podcast episode
WBSP369: Grow Your Business by Avoiding Odoo ERP Implementation Disasters w/ Nick Foy
byWBSRocks: Business Growth with ERP and Digital Transformation
0 ratings
0% found this document useful

Skip carousel

Three Low-code Options
PC Pro Magazine
Article
Three Low-code Options
Nov 12, 2020
Counting Intel, Vodafone and VW among its customers, OutSystems helps businesses create cloudbased, on-premises and hybrid applications for mobile and web. Its development environment is predominantly drag-and-drop, with views for processes, data and
3 min read
For Designing A Superior Solution
Fast Company
Article
For Designing A Superior Solution
Aug 4, 2020
5 min read
Google Answer Box Strategy
Techfastly
Article
Google Answer Box Strategy
Sep 21, 2020
Leveraging the Google PAA (People Also Ask) element on a Search Results Page for Targeted Content Creation with a Python Scraper All businesses that are online today are creating content at a furious pace. According to Technavio, a research firm, con
7 min read
Finish Your Cataloguing App
Linux Format
Article
Finish Your Cataloguing App
Jan 10, 2023
Matt Holder has been a fan of the open source methodology for over two decades and uses Linux and other tools where possible. In his spare time, Matt enjoys listening to music and reading. More featurepacked source code for this project can be downlo
7 min read
Artificial Intelligence Rules Of The Road
Linux Format
Article
Artificial Intelligence Rules Of The Road
Nov 14, 2023
AI FOR ALL! Anyone who works with computers needs to understand that AI will undoubtedly change how work is executed. That said, I don’t think we are anywhere near the much bleated “Everyone will lose their jobs!” IT-related jobs will change but they
2 min read
Mining Actionable Information with Smart Capture
The European Business Review
Article
Mining Actionable Information with Smart Capture
May 22, 2018
4 min read
Manipulate Data Like A Pro With Pandas
Linux Format
Article
Manipulate Data Like A Pro With Pandas
Jul 27, 2021
7 min read
Powering Costing With Artificial Intelligence: The Case Of Vodafone Procurement
The European Business Review
Article
Powering Costing With Artificial Intelligence: The Case Of Vodafone Procurement
May 25, 2021
8 min read
Code A Cataloguing Application In Python
Linux Format
Article
Code A Cataloguing Application In Python
Nov 15, 2022
Credit: www.djangoproject.com Matt Holder has been a fan of the open source methodology for over two decades and uses Linux and other tools where possible. More featurepacked source code for this project can be downloaded from https://github.com/mat
8 min read
Seven Ways To Future-proof Your SEO Strategy
Marketing
Article
Seven Ways To Future-proof Your SEO Strategy
Apr 8, 2018
Search engine optimisation (SEO) is always changing. To stay ahead of your competitors you need to be able to shift your SEO strategy. Expect to see mobile devices, artificial intelligence (AI) and voice search dominating the news. But what practical
3 min read
Code An Admin Back-end In Django
Linux Format
Article
Code An Admin Back-end In Django
Dec 13, 2022
Credit: www.djangoproject.com OUR EXPERT Matt Holder has been a fan of the open source methodology for over two decades and uses Linux and other tools where possible. More featurepacked source code for this project can be downloaded from https://
6 min read
Scan And Scrape Websites Using Python
Linux Format
Article
Scan And Scrape Websites Using Python
Nov 14, 2023
David Bolton once accidentally boosted the traffic for his firm’s website by 25% in one day by running a web scraper on it. Luckily, they never found out! Ever since the web made an appearance back in the mid-’90s, programmers have been writing softw
6 min read
2 The Use of Python in AI and ML
Techfastly
Article
2 The Use of Python in AI and ML
Nov 30, 2020
3 min read
DJANGO Create A Database-driven Website
Linux Format
Article
DJANGO Create A Database-driven Website
Jun 4, 2019
The Django web framework was named after the famous guitarist Django Reinhardt and was first created by web developers at a small newspaper in Kansas. The main goals of Django is to enable fast development of complex websites with database needs. It
7 min read
Getting The edge
The European Business Review
Article
Getting The edge
Feb 25, 2021
7 min read
A.I.-POWERED RASPBERRY Pi
Linux Format
Article
A.I.-POWERED RASPBERRY Pi
Sep 19, 2023
1 min read
FLASK Web Frameworks
Linux Format
Article
FLASK Web Frameworks
Jun 4, 2019
The main focus of Python has always been to get you cracking on with your coding – the language was never made for web programming. However, this has just made it more interesting to extend the language for the web, or to create an interface to web-b
9 min read
Best Password Managers For Your Android Device
Android Advisor
Article
Best Password Managers For Your Android Device
Jul 5, 2023
7 min read
Accurate, Open Source IP-based Localisation
Linux Format
Article
Accurate, Open Source IP-based Localisation
Dec 14, 2021
8 min read
Make AI Work For You
Linux Format
Article
Make AI Work For You
Apr 2, 2024
8 min read
Generative AI: What Leaders Need To Know
Rotman Management
Article
Generative AI: What Leaders Need To Know
Jan 1, 2024
12 min read
The Deep Learning Revolution For Artificial Intelligence
Facility Management
Article
The Deep Learning Revolution For Artificial Intelligence
Mar 28, 2019
3 min read
Seven Questions About Chatgpt Answered
NZBusiness and Management
Article
Seven Questions About Chatgpt Answered
Apr 18, 2023
3 min read
Visualise Smart- Home Sensor Data
Linux Format
Article
Visualise Smart- Home Sensor Data
Oct 17, 2023
8 min read
Set Up A Production- Ready Web Server
APC
Article
Set Up A Production- Ready Web Server
Nov 4, 2019
8 min read
Web App Security
Linux Format
Article
Web App Security
Jun 29, 2021
8 min read
Learning Electron
Linux Format
Article
Learning Electron
Oct 22, 2019
If you start writing code with new packages, you are bound to get lost in the documentation for most of the time. A faster way to learn is to have samples and documentation mixed. This is the idea behind Electron Fiddle. To add Fiddle to your system,
1 min read
Is My Data Really Safe? Your Questions About Cloud-Based Storage, Answered.
Entrepreneur
Article
Is My Data Really Safe? Your Questions About Cloud-Based Storage, Answered.
Nov 1, 2014
2 min read
Automotive Grade Linux
Linux Format
Article
Automotive Grade Linux
Nov 16, 2021
9 min read
Rediscover Speed With The Redis Revolution
Linux Format
Article
Rediscover Speed With The Redis Revolution
Jul 25, 2023
Credit: https://redis.io Redis is an open-source, in-memory data structure store that has gained popularity R as a highly efficient caching and messaging system. It prioritises speed, efficiency and versatility, making it a top choice for various ap
8 min read

Related categories

Skip carousel

Reviews for Pig Design Patterns

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Pig Design Patterns - Pasupuleti Pradeep

Pig Design Patterns

Credits

Foreword

About the Author

Acknowledgments

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers and more

Why Subscribe?

Free Access for Packt account holders

Preface

What this book covers

Motivation for this book

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Third-party libraries

Datasets

Errata

Piracy

Questions

1. Setting the Context for Design Patterns in Pig

Understanding design patterns

The scope of design patterns in Pig

Hadoop demystified – a quick reckoner

The enterprise context

Common challenges of distributed systems

The advent of Hadoop

Hadoop under the covers

Understanding the Hadoop Distributed File System

HDFS design goals

Working of HDFS

Understanding MapReduce

Understanding how MapReduce works

The MapReduce internals

Pig – a quick intro

Understanding the rationale of Pig

Understanding the relevance of Pig in the enterprise

Working of Pig – an overview

Firing up Pig

The use case

Code listing

The dataset

Understanding Pig through the code

Pig's extensibility

Operators used in code

The EXPLAIN operator

Understanding Pig's data model

Primitive types

Complex types

The relevance of schemas

Summary

2. Data Ingest and Egress Patterns

The context of data ingest and egress

Types of data in the enterprise

Ingest and egress patterns for multistructured data

Considerations for log ingestion

The Apache log ingestion pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Code for the CommonLogLoader class

Code for the CombinedLogLoader class

Results

Additional information

The Custom log ingestion pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Results

Additional information

The image ingress and egress pattern

Background

Motivation

Use cases

Pattern implementation

The image Ingress Implementation

The image egress implementation

Code snippets

The image ingress

Pig script

Image to a sequence UDF snippet

The image egress

Pig script

Sequence to an image UDF

Results

Additional information

The ingress and egress patterns for the NoSQL data

MongoDB ingress and egress patterns

Background

Motivation

Use cases

Pattern implementation

The ingress implementation

The egress implementation

Code snippets

The ingress code

The egress code

Results

Additional information

The HBase ingress and egress pattern

Background

Motivation

Use cases

Pattern implementation

The ingress implementation

The egress implementation

Code snippets

The ingress code

The egress code

Results

Additional information

The ingress and egress patterns for structured data

The Hive ingress and egress patterns

Background

Motivation

Use cases

Pattern implementation

The ingress implementation

The egress implementation

Code snippets

The ingress Code

Importing data using RCFile

Importing data using HCatalog

The egress code

Results

Additional information

The ingress and egress patterns for semi-structured data

The mainframe ingestion pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Results

Additional information

XML ingest and egress patterns

Background

Motivation

Motivation for ingesting raw XML

Motivation for ingesting binary XML

Motivation for egression of XML

Use cases

Pattern implementation

The implementation of the XML raw ingestion

The implementation of the XML binary ingestion

Code snippets

The XML raw ingestion code

The XML binary ingestion code

The XML egress code

Pig script

The XML storage

Results

Additional information

JSON ingress and egress patterns

Background

Motivation

Use cases

Pattern implementation

The ingress implementation

The egress implementation

Code snippets

The ingress code

The code for simple JSON

The code for nested JSON

The egress code

Results

Additional information

Summary

3. Data Profiling Patterns

Data profiling for Big Data

Big Data profiling dimensions

Sampling considerations for profiling Big Data

Sampling support in Pig

Rationale for using Pig in data profiling

The data type inference pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Pig script

Java UDF

Results

Additional information

The basic statistical profiling pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Pig script

Macro

Results

Additional information

The pattern-matching pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Pig script

Macro

Results

Additional information

The string profiling pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Pig script

Macro

Results

Additional information

The unstructured text profiling pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Pig script

Java UDF for stemming

Java UDF for generating TF-IDF

Results

Additional information

Summary

4. Data Validation and Cleansing Patterns

Data validation and cleansing for Big Data

Choosing Pig for validation and cleansing

The constraint validation and cleansing design pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Results

Additional information

The regex validation and cleansing design pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Results

Additional information

The corrupt data validation and cleansing design pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Results

Additional information

The unstructured text data validation and cleansing design pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Results

Additional information

Summary

5. Data Transformation Patterns

Data transformation processes

The structured-to-hierarchical transformation pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Results

Additional information

The data normalization pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Results

Additional information

The data integration pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Results

Additional information

The aggregation pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Results

Additional information

The data generalization pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Results

Additional information

Summary

6. Understanding Data Reduction Patterns

Data reduction – a quick introduction

Data reduction considerations for Big Data

Dimensionality reduction – the Principal Component Analysis design pattern

Background

Motivation

Use cases

Pattern implementation

Limitations of PCA implementation

Code snippets

Results

Additional information

Numerosity reduction – the histogram design pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Results

Additional information

Numerosity reduction – sampling design pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Results

Additional information

Numerosity reduction – clustering design pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Results

Additional information

Summary

7. Advanced Patterns and Future Work

The clustering pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Results

Additional information

The topic discovery pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Results

Additional information

The natural language processing pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Results

Additional information

The classification pattern

Background

Motivation

Use cases

Pattern implementation

Code snippets

Results

Additional information

Future trends

Emergence of data-driven patterns

The emergence of solution-driven patterns

Patterns addressing programmability constraints

Summary

Index

Pig Design Patterns

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: April 2014

Production Reference: 1100414

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78328-555-6

www.packtpub.com

Cover Image by Pradeep Pasupuleti (<pasupuleti.pradeepkumar@gmail.com>)

Credits

Author

Pradeep Pasupuleti

Reviewers

Aaron Binns

Shingo Furuyama

Shashwat Shriparv

Fábio Uechi

Acquisition Editor

Owen Roberts

Content Development Editor

Priya Singh

Technical Editors

Aparna Kumar

Pooja Nair

Nikhil Potdukhe

Copy Editors

Alisha Aranha

Brandt D'Mello

Gladson Monteiro

Adithi Shetty

Project Coordinator

Wendell Palmer

Proofreaders

Ting Baker

Elinor Perry-Smith

Indexer

Hemangini Bari

Graphics

Sheetal Aute

Ronak Dhruv

Yuvraj Mannari

Abhinash Sahu

Production Coordinator

Aditi Gajjar Patel

Cover Work

Aditi Gajjar Patel

Foreword

Nearly 30 years ago, when I started my career, a 10 MB upgrade on a hard-disk drive was a big purchase and had to go through many approvals in the enterprise. The drawing office of a medium-sized engineering enterprise stored their drawings in this extra large storage! Over the years, storage became cheaper and bigger. The supply side proved the Moore's law and its variations accurately.

Much more has happened on the demand side though. User organizations have realized the potential of data and analytics. So, the amount of data generated at each level in the enterprise has gone up much more steeply. Some of this data comes through well-defined processes; on the other hand though, a large majority of it comes through numerous unstructured forms, and as a result, ends up as unstructured data. Analytics tried to keep pace and mostly succeeded. However, the diversity of both the data and the desired analytics demands newer and smarter methods for working with the data. The Pig platform surely is one of these methods. Nevertheless, the power of such a platform is best tapped by extending it efficiently. Extending requires great familiarity of the platform. More importantly, extending is fun when the process of building such extensions is easy.

The Pig Latin platform offers great simplicity. However, a practitioner's advice is immensely valuable in leveraging this simplicity to an enterprise's own requirement. This is where I find this book to be very apt. It makes you productive with the platform pretty quickly through very well-researched design patterns. This helps simplify programming in Hadoop and create complex end-to-end enterprise-grade Big Data solutions through a building block and best-pattern approach.

This book covers the journey of Big Data from the time it enters the enterprise to its eventual use in analytics, either in the form of a dashboard or a predictive model.

I particularly liked the presentation of the content. You need not go sequentially through the book; you can go straight to the pattern of your interest, skipping some of the preceding content. The fact that every pattern you see in this book will be relevant to you at some point in your journey with Big Data should be a good reason to spend time with those patterns as well. The simplicity of the quoted examples puts the subject in the right perspective, in case you already browsed through some pages and felt that the examples were not exactly from your domain.

Most likely, you will find a few patterns that exactly fit your requirement. So go ahead, adopt them, and gain productivity right away.

As of writing this foreword, the world is still struggling with analyzing incomprehensibly large data, which is like trying to locate a passenger plane that went missing in the sky! This is the way things seem to work. Just when we think we have all the tools and technologies, we realize that we need much more power beyond what we have available today. Extending this, one would realize that data (creation, collection, and so on) and analytics will both play an extremely important role in our future. A knowledge tool that helps us move toward this future should always be welcomed, and what could be a better tool than a good book like this!

I had a very enriching experience while working with Pradeep earlier in my career. I spotted talent in him that was beyond the ordinary. However, in an environment that is driven primarily by a customer project and where technologies and platforms are defined by the customer, I must admit that we did not give sufficient room for him to show his creativity in designing new technologies. Even here, I fondly recollect a very creative work of distributed processing of a huge vector map data by Pradeep and his colleagues. This monster of a job would run overnight on many desktop systems that were otherwise lying unused in our organization. A consolidation engine would later stitch up the results from individual systems to make one seamless large dataset. This might look very trivial today, but more than a decade ago, it was a big innovation that helped greatly compress our release cycles.

Throughout the years, he continued this passion of using machine learning on Big Data to solve complex problems and find answers that touch human lives. Possessing a streak of hard-to-hide innovativeness, Pradeep is bold enough to think beyond what is possible. His works on computational linguistics (NLP) and deep-learning techniques to build expert systems are all examples of this.

That he made a transition from being the lead of a development-focused team to an established technology author makes me immensely pleased. His constant and unlimited appetite for knowledge is something to emulate for people like me, who are in the technology space! Although not directly related to this book, it is appropriate that I mention even his strong value system as an individual. This quality is what makes him a successful professional, a great leader, and a guru to learn from!

He was kind enough to ask me to review this book. However, the boss in me jumped out and tried to grill him as I often did when he worked in my team. He responded very positively to my critique, which at times was harsh when I look back at it! For you see, both of us share a common belief that it is better to realize the existing errors and potential improvements in processes ourselves, and not simply leave them to reach our customers or you, the audience of this book.

I always felt that a good book can be authored only with a specific end user profile in mind. A book written for beginners may not appeal to a professional at all. The opposite of this is even truer. However, this work by Pradeep benefits both beginners and professionals equally well. This is the biggest difference that I found in this book.

An initiation, a book, or a training program are all meant to give you the essentials and point you to the right direction. There is no replacement to practicing what you learn. I encourage you to practice what you learn from this book and push up your efficiencies of Big Data development!

Srinivas Uppuluri

Founder Director, Valueware Technologies

www.valueware.co.in

<srinivas.uppuluri@valueware.co.in>

About the Author

Pradeep Pasupuleti has over 16 years of experience in architecting and developing distributed and real-time data-driven systems. Currently, his focus is on developing robust data platforms and data products that are fuelled by scalable machine-learning algorithms, and delivering value to customers by addressing business problems by juxtaposing his deep technical insights into Big Data technologies with future data management and analytical needs. He is extremely passionate about Big Data and believes that it will be the cradle of many innovations that will save humans their time, money, and lives.

He has built solid data product teams with experience spanning through every aspect of data science, thus successfully helping clients to build an end-to-end strategy around how their current data architecture can evolve into a hybrid pattern that is capable of supporting analytics in both batch and real time—all of this is done using the lambda architecture. He has created COE's (Center of Excellence) to provide quick wins with data products that analyze high-dimensional multistructured data using scalable natural language processing and deep learning techniques.

He has performed roles in technology consulting advising Fortune 500 companies on their Big Data strategy, product management, systems architecture, social network analysis, negotiations, conflict resolution, chaos and nonlinear dynamics, international policy, high-performance computing, advanced statistical techniques, risk management, marketing, visualization of high dimensional data, human-computer interaction, machine learning, information retrieval, and data mining. He has a strong experience of working in ambiguity to solve complex problems using innovation by bringing smart people together.

His other interests include writing and reading poetry, enjoying the expressive delights of ghazals, spending time with kids discussing impossible inventions, and searching for archeological sites.

You can reach him at http://www.linkedin.com/in/pradeeppasupuleti and .

Acknowledgments

Writing a technical book takes an unpredictable amount of sacrifice every single day. I sincerely believe that nobody could ever complete writing a book alone without the willing sacrifices of family, friends, and coworkers. It is an honor to give credit where credit is due. I am truly blessed to have been in the company of some of the consistently bright people in the world while working on this book.

I owe a deep sense of gratitude to my parents, Prabhakar and Sumathy, who have constantly guided, encouraged, and blessed me; I am sure mere words can never express the magnitude of my gratitude to them. On the home front, I gave up more time with my wife, Sushma, and sons, Sresht and Samvruth, than I'm proud to admit. Thanks most of all to you for your support, love, and patience while I researched, wrote, reviewed, and rewrote the book by stealing your valuable time.

More than anything else, this book has been a team effort right from the beginning. Every member of my team has contributed in one way or another, whether they realize it or not. I am grateful to Salome, Vasundhara Boga, and Pratap for their extraordinary efforts and endless fortitude to help put together the environment, develop the code, and test the output. Without their stellar performances, this book would be incomplete. Their effort reinforces my faith in teamwork—the key ingredient for the success of any endeavor.

Srinivas Uppuluri has been an inspiration right from the beginning of my career, and I am extremely proud to be associated with him. I would like to profusely thank him for reviewing this book at every step and allowing me to be exposed to many great ideas, points of view, and zealous inspiration.

I would also like to thank Dr. Dakshina Murthy who eased me into the world of Big Data analytics and is my mentor and role model in the field of data sciences.

I would like to express my appreciation to all the staff of Packt Publishing for assisting me while editing this book. It was a marvelous effort on their part to shape its outcome for the best. They also made writing my first book an enjoyable experience. I thank everyone involved with Apache Pig. This includes committers, contributors, as well as end users for documenting so much in so little time.

I also want to show appreciation to an e-mail by my previous manager, Sandeep Athavale, which was sent to me a few years ago. In that e-mail, he reposed faith in my writing abilities and encouraged me to write a book one day, thus sowing the seed that culminated in the writing of this book—thank you Sandeep for that action-provoking mail. Through this, I want to let you know that little words of encouragement definitely leave an indelible impression to make improvements to both your personal and professional life.

Thanks to the readers for giving this book a chance. I hope you will definitely find something that can enrich your ideas and trigger new thoughts in you.

Above all, I want to thank all the folks who have helped me in some way or the other to write this book. These are a few of them who happen to be on the top of my mind: Pallavi P, Praveen P, Srini Mannava, Sunil Sana, Ravi Jordan, Haribabu T, Syam A, Robin H, Roopa, Satish B and his family, and so on.

This book is dedicated to the beloved memory of my teammate: Subramanyam Pagadala

About the Reviewers

Aaron Binns spent over five years at the Internet Archive where he designed and built a petabyte-scale Hadoop cluster supporting full-text search and Big Data analytics, the majority of which was implemented in Pig. He was responsible for the construction and deployment of full-text search of domain-scale web archives of hundreds of millions of archived web pages, as well as the over two billion web pages indexed for full-text search in the Archive-It service. He also developed custom software, built on Lucene, to provide special functionality required for full-text search of archival web documents.

He currently works at TaskRabbit as a data scientist. He holds a Bachelor of Science degree in Computer Science from Case Western Reserve University.

Shingo Furuyama is a software engineer, who has specialized in domain logic implementation to realize the value of software in the financial industry. At weekends, he enjoys cycling, scuba diving, wind surfing, and coding. Currently, he is studying English in the Philippines to expand his career opportunities.

He started his career as a software engineer at Simplex Technology, taking major responsibility in developing interest rate derivatives and a Forex option management system for a Japanese mega bank. Before going to the Philippines, he was working for Nautilus Technologies, a Japanese start-up that specializes in Big Data technologies and cloud-related enterprise solutions.

You can get more information from his blog (http://marblejenka.blogspot.jp/) or LinkedIn (http://jp.linkedin.com/in/shingofuruyama). You can also follow him on Twitter (@marblejenka).

Shashwat Shriparv holds a master's degree in Computer Application from Cochin University of Science and Technology and currently working as Senior. System Engineer HPC with Cognilytics. With a total IT experience of six years, he spent three and a half years working on core Big Data technologies, such as Hadoop, Hive, HBase, Pig, Sqoop, Flume, and Mongo in the field of development and management, and the rest of his time in handling projects in technologies, such as .Net, Java, web programming languages, and mobile development.

He has worked with companies, such as HCL, C-DAC, PointCross, and Genilok. He actively participates and contributes to online Big Data forums and groups. He has also contributed to Big Data technologies by creating and uploading several videos for Big Data enthusiasts and practitioners on YouTube free of cost.

He likes writing articles, poems, and technology blogs, and also enjoys photography. More information about him can be found at https://github.com/shriparv and http://helpmetocode.blogspot.com. You can connect to him on LinkedIn at http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9 and can mail him at .

Fábio Franco Uechi has a bachelor's degree in Computer Science and is a Senior Software Engineer at CI&T Inc. He has been the architect of enterprise-grade solutions in the software industry for around 11 years and has been using Big Data and cloud technologies over the past four to five years to solve complex business problems.

He is highly interested in machine learning and Big Data technologies, such as R, Hadoop, Mahout, Pig, Hive, and related distributed processing platforms to analyze datasets to achieve informative insights.

Other than programming, he enjoys playing pinball, slacklining, and wakeboarding. You can learn more from his blog (http://fabiouechi.blogspot.com) and GitHub (https://github.com/fabito).

www.PacktPub.com

Support files, eBooks, discount offers and more

You might want to visit www.PacktPub.com for support files and downloads related to your book.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

http://PacktLib.PacktPub.com

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can access, read and search across Packt's entire library of books.

Why Subscribe?

Fully searchable across every book published by Packt

Copy and paste, print and bookmark content

On demand and accessible via web browser

Free Access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view nine entirely free books. Simply use your

Enjoying the preview?

Page 1 of 1

Pig Design Patterns

About this ebook

Pasupuleti Pradeep

Related authors

Related to Pig Design Patterns

Related ebooks

Data Modeling & Design For You

Related podcast episodes

Related articles

Related categories

Reviews for Pig Design Patterns

What did you think?

Book preview

Pig Design Patterns - Pasupuleti Pradeep

Table of Contents

Pig Design Patterns

Pig Design Patterns

Credits

Foreword

About the Author

Acknowledgments

About the Reviewers

Support files, eBooks, discount offers and more

Why Subscribe?