Ebook904 pages10 hours

Machine Learning and Data Mining

Name: Machine Learning and Data Mining
Brand: Elsevier Science
Rating: 3.0 (2 reviews)

By Igor Kononenko and Matjaz Kukar

Rating: 3 out of 5 stars

3/5

()

Read preview

About this ebook

Data mining is often referred to by real-time users and software solutions providers as knowledge discovery in databases (KDD). Good data mining practice for business intelligence (the art of turning raw software into meaningful information) is demonstrated by the many new techniques and developments in the conversion of fresh scientific discovery into widely accessible software solutions. This book has been written as an introduction to the main issues associated with the basics of machine learning and the algorithms used in data mining.Suitable for advanced undergraduates and their tutors at postgraduate level in a wide area of computer science and technology topics as well as researchers looking to adapt various algorithms for particular data mining tasks. A valuable addition to the libraries and bookshelves of the many companies who are using the principles of data mining (or KDD) to effectively deliver solid business and industry solutions.

Provides an introduction to the main issues associated with the basics of machine learning and the algorithms used in data mining
A valuable addition to the libraries and bookshelves of companies using the principles of data mining (or KDD) to effectively deliver solid business and industry solutions

Skip carousel

LanguageEnglish

PublisherElsevier Science

Release dateApr 30, 2007

ISBN9780857099440

Author

Igor Kononenko

Igor Kononenko studied computer science at the University of Ljubliana, Slovenia, receiving his BSc in 1982, MSc in 1985 and PhD in 1990. He is now professor at the Faculty of Computer and Information Science there, teaching courses in Programming Languages, Algorithms and Data Structures; Introduction to Algorithms and Data Structures; Knowledge Engineering, Machine Learning and Knowledge Discovery in Databases. He is the head of the Laboratory for Cognitive Modelling and a member of the Artificial Intelligence Department at the same faculty. His research interests include artificial intelligence, machine learning, neural networks and cognitive modelling. He is the (co) author of 170 scientific papers in these fields and 10 textbooks. Professor Kononenko is a member of the editorial board of Applied Intelligence and Informatica journals and was also twice chair of the programme committee of the International Cognitive Conference in Ljubliana.

Related authors

Skip carousel

Related to Machine Learning and Data Mining

Related ebooks

Skip carousel

Practical Machine Learning for Data Analysis Using Python
Ebook
Practical Machine Learning for Data Analysis Using Python
byAbdulhamit Subasi
Rating: 0 out of 5 stars
0 ratings
Introduction to Algorithms for Data Mining and Machine Learning
Ebook
Introduction to Algorithms for Data Mining and Machine Learning
byXin-She Yang
Rating: 0 out of 5 stars
0 ratings
Designing Machine Learning Systems with Python
Ebook
Designing Machine Learning Systems with Python
byDavid Julian
Rating: 0 out of 5 stars
0 ratings
Deep Learning for Beginners: A Comprehensive Introduction of Deep Learning Fundamentals for Beginners to Understanding Frameworks, Neural Networks, Large Datasets, and Creative Applications with Ease
Ebook
Deep Learning for Beginners: A Comprehensive Introduction of Deep Learning Fundamentals for Beginners to Understanding Frameworks, Neural Networks, Large Datasets, and Creative Applications with Ease
bySteven Cooper
Rating: 3 out of 5 stars
3/5
Data Science Fundamentals and Practical Approaches: Understand Why Data Science Is the Next
Ebook
Data Science Fundamentals and Practical Approaches: Understand Why Data Science Is the Next
byRupam Kumar Sharma
Rating: 0 out of 5 stars
0 ratings
Neural Data Science: A Primer with MATLAB® and Python™
Ebook
Neural Data Science: A Primer with MATLAB® and Python™
byErik Lee Nylen
Rating: 5 out of 5 stars
5/5
Data Science: Concepts and Practice
Ebook
Data Science: Concepts and Practice
byVijay Kotu
Rating: 3 out of 5 stars
3/5
Data Mining: Practical Machine Learning Tools and Techniques
Ebook
Data Mining: Practical Machine Learning Tools and Techniques
byIan H. Witten
Rating: 4 out of 5 stars
4/5
Data Mining: Concepts and Techniques
Ebook
Data Mining: Concepts and Techniques
byJiawei Han
Rating: 4 out of 5 stars
4/5
Business Modeling and Data Mining
Ebook
Business Modeling and Data Mining
byDorian Pyle
Rating: 3 out of 5 stars
3/5
Handbook of Statistical Analysis and Data Mining Applications
Ebook
Handbook of Statistical Analysis and Data Mining Applications
byRobert Nisbet
Rating: 4 out of 5 stars
4/5
Introduction to Statistical Machine Learning
Ebook
Introduction to Statistical Machine Learning
byMasashi Sugiyama
Rating: 4 out of 5 stars
4/5
Big Data, Data Mining, and Machine Learning: Value Creation for Business Leaders and Practitioners
Ebook
Big Data, Data Mining, and Machine Learning: Value Creation for Business Leaders and Practitioners
byJared Dean
Rating: 3 out of 5 stars
3/5
Machine Learning with R
Ebook
Machine Learning with R
byBrett Lantz
Rating: 4 out of 5 stars
4/5
Mastering Python for Data Science
Ebook
Mastering Python for Data Science
bySamir Madhavan
Rating: 3 out of 5 stars
3/5
Pattern Recognition
Ebook
Pattern Recognition
byKonstantinos Koutroumbas
Rating: 4 out of 5 stars
4/5
Python Data Science Essentials - Second Edition
Ebook
Python Data Science Essentials - Second Edition
byBoschetti Alberto
Rating: 4 out of 5 stars
4/5
Deep Learning for Data Analytics: Foundations, Biomedical Applications, and Challenges
Ebook
Deep Learning for Data Analytics: Foundations, Biomedical Applications, and Challenges
byHimansu Das
Rating: 0 out of 5 stars
0 ratings
Principles of Data Science
Ebook
Principles of Data Science
bySinan Ozdemir
Rating: 4 out of 5 stars
4/5
Pattern Recognition and Machine Learning
Ebook
Pattern Recognition and Machine Learning
byY. Anzai
Rating: 0 out of 5 stars
0 ratings
Learning Predictive Analytics with Python
Ebook
Learning Predictive Analytics with Python
byKumar Ashish
Rating: 0 out of 5 stars
0 ratings
Big Data Analytics: Disruptive Technologies for Changing the Game
Ebook
Big Data Analytics: Disruptive Technologies for Changing the Game
byArvind Sathi
Rating: 4 out of 5 stars
4/5
Computer Vision: Principles, Algorithms, Applications, Learning
Ebook
Computer Vision: Principles, Algorithms, Applications, Learning
byE. R. Davies
Rating: 5 out of 5 stars
5/5
Practical Data Analysis
Ebook
Practical Data Analysis
byHector Cuesta
Rating: 4 out of 5 stars
4/5
Understanding Big Data: A Beginners Guide to Data Science & the Business Applications
Ebook
Understanding Big Data: A Beginners Guide to Data Science & the Business Applications
byEileen McNulty-Holmes
Rating: 4 out of 5 stars
4/5
Learning Data Mining with Python
Ebook
Learning Data Mining with Python
byRobert Layton
Rating: 0 out of 5 stars
0 ratings
Data Mining Applications with R
Ebook
Data Mining Applications with R
byYanchang Zhao
Rating: 4 out of 5 stars
4/5
R and Data Mining: Examples and Case Studies
Ebook
R and Data Mining: Examples and Case Studies
byYanchang Zhao
Rating: 3 out of 5 stars
3/5
Machine Learning with R - Second Edition
Ebook
Machine Learning with R - Second Edition
byBrett Lantz
Rating: 5 out of 5 stars
5/5
Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today's Businesses
Ebook
Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today's Businesses
byMichele Chambers
Rating: 0 out of 5 stars
0 ratings

Intelligence (AI) & Semantics For You

Skip carousel

Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
Summary of Super-Intelligence From Nick Bostrom
Ebook
Summary of Super-Intelligence From Nick Bostrom
bySummary Station
Rating: 5 out of 5 stars
5/5
2084: Artificial Intelligence and the Future of Humanity
Ebook
2084: Artificial Intelligence and the Future of Humanity
byJohn C. Lennox
Rating: 4 out of 5 stars
4/5
Artificial Intelligence: A Guide for Thinking Humans
Ebook
Artificial Intelligence: A Guide for Thinking Humans
byMelanie Mitchell
Rating: 4 out of 5 stars
4/5
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
Ebook
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
byCea West
Rating: 5 out of 5 stars
5/5
101 Midjourney Prompt Secrets
Ebook
101 Midjourney Prompt Secrets
byMarcus Byrne
Rating: 3 out of 5 stars
3/5
ChatGPT For Dummies
Ebook
ChatGPT For Dummies
byPam Baker
Rating: 0 out of 5 stars
0 ratings
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
Ebook
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
byCea West
Rating: 4 out of 5 stars
4/5
ChatGPT For Fiction Writing: AI for Authors
Ebook
ChatGPT For Fiction Writing: AI for Authors
byNova Leigh
Rating: 5 out of 5 stars
5/5
The Secrets of ChatGPT Prompt Engineering for Non-Developers
Ebook
The Secrets of ChatGPT Prompt Engineering for Non-Developers
byCea West
Rating: 5 out of 5 stars
5/5
Enterprise AI For Dummies
Ebook
Enterprise AI For Dummies
byZachary Jarvinen
Rating: 3 out of 5 stars
3/5
Dark Aeon: Transhumanism and the War Against Humanity
Ebook
Dark Aeon: Transhumanism and the War Against Humanity
byJoe Allen
Rating: 5 out of 5 stars
5/5
Our Final Invention: Artificial Intelligence and the End of the Human Era
Ebook
Our Final Invention: Artificial Intelligence and the End of the Human Era
byJames Barrat
Rating: 4 out of 5 stars
4/5
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
Ebook
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
byMaximus Wilson
Rating: 0 out of 5 stars
0 ratings
Chat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures
Ebook
Chat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures
byThe Passive Income Strategist
Rating: 4 out of 5 stars
4/5
ChatGPT for Beginners: How to Make Money Online and 10x Your Productivity Using ChatGPT Even if You’re an Absolute Beginner (The Complete Up-to-Date ChatGPT Guide)
Ebook
ChatGPT for Beginners: How to Make Money Online and 10x Your Productivity Using ChatGPT Even if You’re an Absolute Beginner (The Complete Up-to-Date ChatGPT Guide)
byMatthew Hayes
Rating: 0 out of 5 stars
0 ratings
What Makes Us Human: An Artificial Intelligence Answers Life's Biggest Questions
Ebook
What Makes Us Human: An Artificial Intelligence Answers Life's Biggest Questions
byJasmine Wang
Rating: 5 out of 5 stars
5/5
Mastering ChatGPT: Create Highly Effective Prompts, Strategies, and Best Practices to Go From Novice to Expert
Ebook
Mastering ChatGPT: Create Highly Effective Prompts, Strategies, and Best Practices to Go From Novice to Expert
byTJ Books
Rating: 3 out of 5 stars
3/5
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
Ebook
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
byHadelin de Ponteves
Rating: 0 out of 5 stars
0 ratings
The Algorithm of the Universe (A New Perspective to Cognitive AI)
Ebook
The Algorithm of the Universe (A New Perspective to Cognitive AI)
byAncient Philosophy
Rating: 5 out of 5 stars
5/5
Summary of Building a Second Brain: by Tiago Forte - A Proven Method to Organize Your Digital Life and Unlock Your Creative Potential - A Comprehensive Summary
Ebook
Summary of Building a Second Brain: by Tiago Forte - A Proven Method to Organize Your Digital Life and Unlock Your Creative Potential - A Comprehensive Summary
byAlexander Cooper
Rating: 1 out of 5 stars
1/5
CompTIA Certification: The Ultimate Guide To Discover CompTIA. Certified Quickly And Easily Passing The Certification Exam. Real Practice Test With Detailed Screenshots, Answers And Explanations
Ebook
CompTIA Certification: The Ultimate Guide To Discover CompTIA. Certified Quickly And Easily Passing The Certification Exam. Real Practice Test With Detailed Screenshots, Answers And Explanations
byDavid Mayer
Rating: 0 out of 5 stars
0 ratings
Impromptu: Amplifying Our Humanity Through AI
Ebook
Impromptu: Amplifying Our Humanity Through AI
byReid Hoffman
Rating: 5 out of 5 stars
5/5
ChatGPT Side Hustles 2024 - Unlock the Digital Goldmine and Get AI Working for You Fast with More Than 85 Side Hustle Ideas to Boost Passive Income, Create New Cash Flow, and Get Ahead of the Curve
Ebook
ChatGPT Side Hustles 2024 - Unlock the Digital Goldmine and Get AI Working for You Fast with More Than 85 Side Hustle Ideas to Boost Passive Income, Create New Cash Flow, and Get Ahead of the Curve
byAlec Rowe
Rating: 0 out of 5 stars
0 ratings
Python Machine Learning - Third Edition: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, 3rd Edition
Ebook
Python Machine Learning - Third Edition: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, 3rd Edition
bySebastian Raschka
Rating: 5 out of 5 stars
5/5
Writing AI Prompts For Dummies
Ebook
Writing AI Prompts For Dummies
byStephanie Diamond
Rating: 0 out of 5 stars
0 ratings
The Exponential Age: How Accelerating Technology is Transforming Business, Politics and Society
Ebook
The Exponential Age: How Accelerating Technology is Transforming Business, Politics and Society
byAzeem Azhar
Rating: 5 out of 5 stars
5/5
ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind
Ebook
ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind
byAlec Rowe
Rating: 0 out of 5 stars
0 ratings
AI for Educators: AI for Educators
Ebook
AI for Educators: AI for Educators
byMatt Miller
Rating: 5 out of 5 stars
5/5

Related podcast episodes

Skip carousel

[DataFramed Careers Series #2] What Makes a Great Data Science Portfolio
Podcast episode
[DataFramed Careers Series #2] What Makes a Great Data Science Portfolio
byDataFramed
0 ratings
0% found this document useful
Getting Technical about the Data Center Revolution with Jonathan Friedmann, CEO of Speedata
Podcast episode
Getting Technical about the Data Center Revolution with Jonathan Friedmann, CEO of Speedata
byMaking Data Simple
0 ratings
0% found this document useful
Simplifying Data Integration Through Eventual Connectivity - Episode 91: An interview about a new pattern for data integration that reduces the amount of effort required to find connections in numerous data sets
Podcast episode
Simplifying Data Integration Through Eventual Connectivity - Episode 91: An interview about a new pattern for data integration that reduces the amount of effort required to find connections in numerous data sets
byData Engineering Podcast
0 ratings
0% found this document useful
Unlocking The Power of Data Lineage In Your Platform with OpenLineage: An interview with Julien Le Dem about the OpenLineage specification and the opportunity that it offers for simplifying the tracking and analysis of data lineage across your data platform.
Podcast episode
Unlocking The Power of Data Lineage In Your Platform with OpenLineage: An interview with Julien Le Dem about the OpenLineage specification and the opportunity that it offers for simplifying the tracking and analysis of data lineage across your data platform.
byData Engineering Podcast
0 ratings
0% found this document useful
Being Bayesian: This episode explores the root concept of what it is to be Bayesian: describing knowledge of a system probabilistically, having an appropriate prior probability, know how to weigh new evidence, and following Bayes's rule to compute the revised...
Podcast episode
Being Bayesian: This episode explores the root concept of what it is to be Bayesian: describing knowledge of a system probabilistically, having an appropriate prior probability, know how to weigh new evidence, and following Bayes's rule to compute the revised...
byData Skeptic
0 ratings
0% found this document useful
Episode 19 (Python for Data Science - Python Files - Scripts and Modules)
Podcast episode
Episode 19 (Python for Data Science - Python Files - Scripts and Modules)
byHow to Data (Joshiverse- Journey of a Budding Data Scientist)
0 ratings
0% found this document useful
4 + 1 Model of Data Science: Before diving into the complex world of data science it seemed to wise to establish a shared definition of the field. Here at the UVA School of Data Science, we have defined data science with the 4 + 1 Model. This model serves an outline for the first series of UVA Data Points. It also serves as a guiding definition within the School of Data Science, touching everything from research to course planning. In this introduction trailer, host Monica Manney discusses the history, development, and function of the 4 + 1 Model of Data Science with its main author, Raf Alvarado. Below is a brief expect from An Outline of the 4 + 1 Model of Data Science by Raf Alvarado: “The point of the 4 + 1 model, abstract as it is, is to provide a practical template for strategically planning the various elements of a school of data science. To serve as an effective template, a model must be general. But generality if often purchased at the cost of intuitive understanding. The fol
Podcast episode
4 + 1 Model of Data Science: Before diving into the complex world of data science it seemed to wise to establish a shared definition of the field. Here at the UVA School of Data Science, we have defined data science with the 4 + 1 Model. This model serves an outline for the first series of UVA Data Points. It also serves as a guiding definition within the School of Data Science, touching everything from research to course planning. In this introduction trailer, host Monica Manney discusses the history, development, and function of the 4 + 1 Model of Data Science with its main author, Raf Alvarado. Below is a brief expect from An Outline of the 4 + 1 Model of Data Science by Raf Alvarado: “The point of the 4 + 1 model, abstract as it is, is to provide a practical template for strategically planning the various elements of a school of data science. To serve as an effective template, a model must be general. But generality if often purchased at the cost of intuitive understanding. The fol
byUVA Data Points
0 ratings
0% found this document useful
Understanding Deep Learning - Prof. SIMON PRINCE [STAFF FAVOURITE]
Podcast episode
Understanding Deep Learning - Prof. SIMON PRINCE [STAFF FAVOURITE]
byMachine Learning Street Talk (MLST)
0 ratings
0% found this document useful
Thomas Huckle and Tobias Neckel, "Bits and Bugs: A Scientific and Historical Review of Software Failures in Computational Science" (SIAM, 2019): An interview with Thomas Huckle and Tobias Neckel
Podcast episode
Thomas Huckle and Tobias Neckel, "Bits and Bugs: A Scientific and Historical Review of Software Failures in Computational Science" (SIAM, 2019): An interview with Thomas Huckle and Tobias Neckel
byNew Books in Mathematics
0 ratings
0% found this document useful
Thomas Huckle and Tobias Neckel, "Bits and Bugs: A Scientific and Historical Review of Software Failures in Computational Science" (SIAM, 2019): An interview with Thomas Huckle and Tobias Neckel
Podcast episode
Thomas Huckle and Tobias Neckel, "Bits and Bugs: A Scientific and Historical Review of Software Failures in Computational Science" (SIAM, 2019): An interview with Thomas Huckle and Tobias Neckel
byNew Books in Science, Technology, and Society
0 ratings
0% found this document useful
Thomas Huckle and Tobias Neckel, "Bits and Bugs: A Scientific and Historical Review of Software Failures in Computational Science" (SIAM, 2019): An interview with Thomas Huckle and Tobias Neckel
Podcast episode
Thomas Huckle and Tobias Neckel, "Bits and Bugs: A Scientific and Historical Review of Software Failures in Computational Science" (SIAM, 2019): An interview with Thomas Huckle and Tobias Neckel
byNew Books in the History of Science
0 ratings
0% found this document useful
Alignment Newsletter #167: Concrete ML safety problems and their relevance to x-risk: Concrete ML safety problems and their relevance to x-risk
Podcast episode
Alignment Newsletter #167: Concrete ML safety problems and their relevance to x-risk: Concrete ML safety problems and their relevance to x-risk
byAlignment Newsletter Podcast
0 ratings
0% found this document useful
#60 Geometric Deep Learning Blueprint (Special Edition)
Podcast episode
#60 Geometric Deep Learning Blueprint (Special Edition)
byMachine Learning Street Talk (MLST)
0 ratings
0% found this document useful
Cognition Ignition – Andrej Nikonov, CEO at Cognostics AG – Advancements in Learning Methodology Via Machine Learning, AI Strategies and Tools: Andrej Nikonov, CEO at Cognostics AG (<a href="http://cognostics.de">cognostics.de</a>), discusses the science of thinking and the many ways AI can expand human knowledge and capability. Andrej Nikonov, as CEO, is driven to discover new...
Podcast episode
Cognition Ignition – Andrej Nikonov, CEO at Cognostics AG – Advancements in Learning Methodology Via Machine Learning, AI Strategies and Tools: Andrej Nikonov, CEO at Cognostics AG (<a href="http://cognostics.de">cognostics.de</a>), discusses the science of thinking and the many ways AI can expand human knowledge and capability. Andrej Nikonov, as CEO, is driven to discover new...
byFinding Genius Podcast
0 ratings
0% found this document useful
AI and the Future of Supply Chain with Yossi Sheffi: and discuss AI and the future of supply chain, which is the central theme of Yossi’s latest book, . Yossi is the Elisha Gray II professor of Engineering Systems at the Massachusetts Institute of Technology, where he serves as Director of...
Podcast episode
AI and the Future of Supply Chain with Yossi Sheffi: and discuss AI and the future of supply chain, which is the central theme of Yossi’s latest book, . Yossi is the Elisha Gray II professor of Engineering Systems at the Massachusetts Institute of Technology, where he serves as Director of...
byThe Logistics of Logistics
0 ratings
0% found this document useful
The Future of AI and ML in Process Automation // Slater Victoroff // MLOps Coffee Sessions #64
Podcast episode
The Future of AI and ML in Process Automation // Slater Victoroff // MLOps Coffee Sessions #64
byMLOps.community
0 ratings
0% found this document useful
596 : Topical English Vocabulary Lesson With Teacher Tiffani about Technology Advancements
Podcast episode
596 : Topical English Vocabulary Lesson With Teacher Tiffani about Technology Advancements
bySpeak English with Tiffani Podcast
0 ratings
0% found this document useful
ML Observability
Podcast episode
ML Observability
byThe Cloudcast
0 ratings
0% found this document useful
Inventing the future of computing, with Alessandro Curioni
Podcast episode
Inventing the future of computing, with Alessandro Curioni
byLondon Futurists
0 ratings
0% found this document useful
Alex Pentland and Alexander Lipton, "Building the New Economy: Data As Capital" (MIT Press, 2021): An interview with Alex Pentland
Podcast episode
Alex Pentland and Alexander Lipton, "Building the New Economy: Data As Capital" (MIT Press, 2021): An interview with Alex Pentland
byNew Books in Science, Technology, and Society
0 ratings
0% found this document useful
The Role of Infrastructure in ML // Niels Bantilan // #197
Podcast episode
The Role of Infrastructure in ML // Niels Bantilan // #197
byMLOps.community
0 ratings
0% found this document useful
Despotic Data: how authoritarian regimes are driving technology and innovation [Audio]
Podcast episode
Despotic Data: how authoritarian regimes are driving technology and innovation [Audio]
byLSE: Public lectures and events
0 ratings
0% found this document useful
A Survey on Large Language Model based Autonomous Agents: Autonomous agents have long been a prominent research topic in the academic community. Previous research in this field often focuses on training agents with limited knowledge within isolated environments, which diverges significantly from the human l...
Podcast episode
A Survey on Large Language Model based Autonomous Agents: Autonomous agents have long been a prominent research topic in the academic community. Previous research in this field often focuses on training agents with limited knowledge within isolated environments, which diverges significantly from the human l...
byPapers Read on AI
0 ratings
0% found this document useful
Alex Pentland and Alexander Lipton, "Building the New Economy: Data As Capital" (MIT Press, 2021): An interview with Alex Pentland
Podcast episode
Alex Pentland and Alexander Lipton, "Building the New Economy: Data As Capital" (MIT Press, 2021): An interview with Alex Pentland
byNew Books in Economics
0 ratings
0% found this document useful
The Future of Work in Supply Chain, Exponential Technology, AI, and Chat GPT with Dr. Yossi Sheffi (MIT, Author)
Podcast episode
The Future of Work in Supply Chain, Exponential Technology, AI, and Chat GPT with Dr. Yossi Sheffi (MIT, Author)
bySupply Chain Revolution
0 ratings
0% found this document useful
Engineering MLOps // Emmanuel Raj // MLOps Meetup #69
Podcast episode
Engineering MLOps // Emmanuel Raj // MLOps Meetup #69
byMLOps.community
0 ratings
0% found this document useful
Instruction Tuning for Large Language Models: A Survey: This paper surveys research works in the quickly advancing field of instruction tuning (IT), a crucial technique to enhance the capabilities and controllability of large language models (LLMs). Instruction tuning refers to the process of further trai...
Podcast episode
Instruction Tuning for Large Language Models: A Survey: This paper surveys research works in the quickly advancing field of instruction tuning (IT), a crucial technique to enhance the capabilities and controllability of large language models (LLMs). Instruction tuning refers to the process of further trai...
byPapers Read on AI
0 ratings
0% found this document useful
The Myth of AI Breakthroughs // Jonathan Frankle // #205
Podcast episode
The Myth of AI Breakthroughs // Jonathan Frankle // #205
byMLOps.community
0 ratings
0% found this document useful
Bringing DevOps Agility to ML// Luis Ceze // Coffee Sessions #121
Podcast episode
Bringing DevOps Agility to ML// Luis Ceze // Coffee Sessions #121
byMLOps.community
0 ratings
0% found this document useful
Fast.ai, AutoML, and Software Engineering for ML: Jeremy Howard // Coffee Session #47
Podcast episode
Fast.ai, AutoML, and Software Engineering for ML: Jeremy Howard // Coffee Session #47
byMLOps.community
0 ratings
0% found this document useful

Skip carousel

01 Giving Data Collectors—and Donors—a Real-Time Rush
Fast Company
Article
01 Giving Data Collectors—and Donors—a Real-Time Rush
Mar 20, 2017
7 min read
How to Make Predictive Analytics Work for Your Business
Entrepreneur
Article
How to Make Predictive Analytics Work for Your Business
Jul 1, 2014
1 min read
Scikit-Learn: The Ultimate Python Library
APC
Article
Scikit-Learn: The Ultimate Python Library
Jul 15, 2019
4 min read
Want A Job In Data Science? You Might Have To Take A Standardized Test When Applying
Chicago Tribune
Article
Want A Job In Data Science? You Might Have To Take A Standardized Test When Applying
Jul 10, 2018
3 min read
The Future Of The Database
Linux Format
Article
The Future Of The Database
Aug 27, 2019
7 min read
How Image Recognition Works
APC
Article
How Image Recognition Works
Nov 4, 2019
4 min read
Tensor Flow 101
APC
Article
Tensor Flow 101
Jan 27, 2020
4 min read
Upgrade Your Marketing With Machine Learning
Fast Company
Article
Upgrade Your Marketing With Machine Learning
Sep 9, 2019
2 min read
The Fundamental Limits of Machine Learning
Nautilus
Article
The Fundamental Limits of Machine Learning
Sep 20, 2016
5 min read
Business applications For Quantum computing
Rotman Management
Article
Business applications For Quantum computing
May 1, 2022
COMPUTERS DO ARITHMETIC. Underlying every amazing application of computers today is math, calculated using binary digits or ‘bits.’ The original computers of the early 1950s could perform about 465 multiplications per second — much faster than the ‘h
11 min read
Is Artificial Intelligence Permanently Inscrutable?: Despite new biology-like tools, some insist interpretation is impossible.
Nautilus
Article
Is Artificial Intelligence Permanently Inscrutable?: Despite new biology-like tools, some insist interpretation is impossible.
Sep 1, 2016
Dmitry Malioutov can’t say much about what he built. As a research scientist at IBM, Malioutov spends part of his time building machine learning systems that solve difficult problems faced by IBM’s corporate clients. One such program was meant for a
13 min read
Is Artificial Intelligence Permanently Inscrutable?
Nautilus
Article
Is Artificial Intelligence Permanently Inscrutable?
Sep 1, 2016
Dmitry Malioutov can’t say much about what he built. As a research scientist at IBM, Malioutov spends part of his time building machine learning systems that solve difficult problems faced by IBM’s corporate clients. One such program was meant for a
13 min read
“You Don’t Need A Computer, Let Alone One With 75,000 Processor Cores, To Think About The Parts Of A Problem”
PC Pro Magazine
Article
“You Don’t Need A Computer, Let Alone One With 75,000 Processor Cores, To Think About The Parts Of A Problem”
Dec 10, 2020
9 min read
Quantum Computing and The Rise Of Machine Learning
Techfastly
Article
Quantum Computing and The Rise Of Machine Learning
Oct 1, 2021
2 min read
Future Historians Probably Won't Understand Our Internet, and That's Okay
The Atlantic
Article
Future Historians Probably Won't Understand Our Internet, and That's Okay
Dec 6, 2017
6 min read
Finding A New Career In AI
APC
Article
Finding A New Career In AI
Mar 23, 2020
4 min read
Machine Learning in Business: Issues for Society
Rotman Management
Article
Machine Learning in Business: Issues for Society
Jan 1, 2020
11 min read
Who Will Be The Boss In Future? Man Or Machine?
Weekend Argus Saturday
Article
Who Will Be The Boss In Future? Man Or Machine?
Mar 2, 2024
4 min read
Who Will Be The Boss In Future? Man Or Machine?
Saturday Star
Article
Who Will Be The Boss In Future? Man Or Machine?
Mar 2, 2024
4 min read
AI And Design: Questions Of Ethics
Architecture Australia
Article
AI And Design: Questions Of Ethics
Mar 4, 2024
Artificial intelligence (AI) is a very old idea, but the term AI and the field of AI as it relates to modern programmable digital computing have taken their contemporary forms in the past 70 years.1Today, we interact with AI technologies constantly,
5 min read
How And Where You Use Machine-learning
APC
Article
How And Where You Use Machine-learning
Oct 7, 2019
4 min read
Why It’s Imperative For Universities To Teach AI To All Students
Forbes Africa
Article
Why It’s Imperative For Universities To Teach AI To All Students
Mar 26, 2020
IN FEBRUARY THIS year, telecommunications company Ericsson announced that it had launched artificial intelligence-powered Energy Infrastructure Operations. This energy management system would leverage artificial intelligence (AI) to optimize energy c
3 min read
Machine Behavior Needs to Be an Academic Discipline
Nautilus
Article
Machine Behavior Needs to Be an Academic Discipline
Mar 29, 2018
What if physiologists were the only people who study human behavior at all scales: from how the human body functions, to how social norms emerge, to how the stock market functions, to how we create, share, and consume culture? What if neuroscientists
7 min read
Why a Hedge Fund Started a Video Game Competition
Nautilus
Article
Why a Hedge Fund Started a Video Game Competition
Nov 30, 2017
There’s a weird way in which a hedge fund is a confluence of everything. There’s the money of course—Two Sigma, located in lower Manhattan, manages over $50 billion, an amount that has grown 600 percent in 6 years and is roughly the size of the econo
9 min read
ChatGPT Has Taken The World By Storm: Part 2
Farmer's Weekly
Article
ChatGPT Has Taken The World By Storm: Part 2
May 12, 2023
ChatGPT was launched on 30 November 2022. In its first five days, it amassed one million users and numbers have since grown by 95 million a month. If you haven’t given it a try, you’re missing out on the most powerful AI tool ever developed (see FW,
3 min read
Book Review: The Future Is Faster Than You Think
Techfastly
Article
Book Review: The Future Is Faster Than You Think
Aug 2, 2021
4 min read
Learning Code
India Today
Article
Learning Code
Feb 1, 2020
2 min read
Bots And Robbers What Is AI, And Will It Make Us All Redundant?
Guardian Weekly
Article
Bots And Robbers What Is AI, And Will It Make Us All Redundant?
Nov 3, 2023
What is artificial intelligence? The term was coined in 1955 by a team including Harvard computer scientist Marvin Minsky. With no strict definition of the phrase, almost anything more complex than a calculator has been called artificial intelligence
3 min read
Who Decides Who Decides What Conversations Are Allowed About Artificial Intelligence?
AQ: Australian Quarterly
Article
Who Decides Who Decides What Conversations Are Allowed About Artificial Intelligence?
Mar 31, 2023
6 min read
Can’t Ignore The Emerging Economy Of The Internet-of-Things In Africa
Forbes Africa
Article
Can’t Ignore The Emerging Economy Of The Internet-of-Things In Africa
Oct 1, 2020
MORE recently, on August 28 to be precise, the South African-born, Silicon Valley-based engineer Elon Musk unveiled a brain implant that measures the activities of the brain and sends the information to a central database where it can be analyzed. Th
3 min read

Related categories

Skip carousel

Reviews for Machine Learning and Data Mining

Rating: 3 out of 5 stars

3/5

2 ratings0 reviews

Book preview

Machine Learning and Data Mining - Igor Kononenko

Machine Learning and Data Mining

Introduction to Principles and Algorithms

First Edition

Igor Kononenko

Professor of Computer and Information Science, University of Ljubljana, Slovenia

Matjaž Kukar

Assistant Professor of Computer and Information Science, University of Ljubljana, Slovenia

Horwood Publishing

Chichester, UK

Cover image

Title page

Copyright page

Foreword

Preface

Acknowledgements

Chapter 1: Introduction

1.1 THE NAME OF THE GAME

1.2 OVERVIEW OF MACHINE LEARNING METHODS

1.3 HISTORY OF MACHINE LEARNING

1.4 SOME EARLY SUCCESSES

1.5 APPLICATIONS OF MACHINE LEARNING

1.6 DATA MINING TOOLS AND STANDARDS

1.7 SUMMARY AND FURTHER READING

Chapter 2: Learning and Intelligence

2.1 WHAT IS LEARNING

2.2 NATURAL LEARNING

2.3 LEARNING, INTELLIGENCE, CONSCIOUSNESS

2.4 WHY MACHINE LEARNING

2.5 SUMMARY AND FURTHER READING

Chapter 3: Machine Learning Basics

3.1 BASIC PRINCIPLES

3.2 MEASURES FOR PERFORMANCE EVALUATION

3.3 ESTIMATING PERFORMANCE

3.4 *COMPARING PERFORMANCE OF MACHINE LEARNING ALGORITHMS

3.5 COMBINING SEVERAL MACHINE LEARNING ALGORITHMS

3.6 SUMMARY AND FURTHER READING

Chapter 4: Knowledge Representation

4.1 PROPOSITIONAL CALCULUS

4.2 *FIRST ORDER PREDICATE CALCULUS

4.3 DISCRIMINANT AND REGRESSION FUNCTIONS

4.4 PROBABILITY DISTRIBUTIONS

4.5 SUMMARY AND FURTHER READING

Chapter 5: Learning as Search

5.1 EXHAUSTIVE SEARCH

5.2 BOUNDED EXHAUSTIVE SEARCH (BRANCH AND BOUND)

5.3 BEST-FIRST SEARCH

5.4 GREEDY SEARCH

5.5 BEAM SEARCH

5.6 LOCAL OPTIMIZATION

5.7 GRADIENT SEARCH

5.8 SIMULATED ANNEALING

5.9 GENETIC ALGORITHMS

5.10 SUMMARY AND FURTHER READING

Chapter 6: Measures for Evaluating the Quality of Attributes

6.1 MEASURES FOR CLASSIFICATION AND RELATIONAL PROBLEMS

6.2 MEASURES FOR REGRESSION

6.3 **FORMAL DERIVATIONS AND PROOFS

6.4 SUMMARY AND FURTHER READING

Chapter 7: Data Preprocessing

7.1 REPRESENTATION OF COMPLEX STRUCTURES

7.2 DISCRETIZATION OF CONTINUOUS ATTRIBUTES

7.3 ATTRIBUTE BINARIZATION

7.4 TRANSFORMING DISCRETE ATTRIBUTES INTO CONTINUOUS

7.5 DEALING WITH MISSING VALUES

7.6 VISUALIZATION

7.7 DIMENSIONALITY REDUCTION

7.8 **FORMAL DERIVATIONS AND PROOFS

7.9 SUMMARY AND FURTHER READING

Chapter 8: *Constructive Induction

8.1 DEPENDENCE OF ATTRIBUTES

8.2 CONSTRUCTIVE INDUCTION WITH PRE-DEFINED OPERATORS

8.3 CONSTRUCTIVE INDUCTION WITHOUT PRE-DEFINED OPERATORS

8.4 SUMMARY AND FURTHER READING

Chapter 9: Symbolic Learning

9.1 LEARNING OF DECISION TREES

9.2 LEARNING OF DECISION RULES

9.3 LEARNING OF ASSOCIATION RULES

9.4 LEARNING OF REGRESSION TREES

9.5 *INDUCTIVE LOGIC PROGRAMMING

9.6 NAIVE AND SEMI-NAIVE BAYESIAN CLASSIFIER

9.7 BAYESIAN BELIEF NETWORKS

9.8 SUMMARY AND FURTHER READING

Chapter 10: Statistical Learning

10.1 NEAREST NEIGHBORS

10.2 DISCRIMINANT ANALYSIS

10.3 LINEAR REGRESSION

10.4 LOGISTIC REGRESSION

10.5 *SUPPORT VECTOR MACHINES

10.6 SUMMARY AND FURTHER READING

Chapter 11: Artificial Neural Networks

11.1 INTRODUCTION

11.2 TYPES OF ARTIFICIAL NEURAL NETWORKS

11.3 *HOPFIELD’S NEURAL NETWORK

11.4 *BAYESIAN NEURAL NETWORK

11.5 PERCEPTRON

11.6 RADIAL BASIS FUNCTION NETWORKS

11.7 **FORMAL DERIVATIONS AND PROOFS

11.8 SUMMARY AND FURTHER READING

Chapter 12: Cluster Analysis

12.1 INTRODUCTION

12.2 MEASURES OF DISSIMILARITY

12.3 HIERARCHICAL CLUSTERING

12.4 PARTITIONAL CLUSTERING

12.5 MODEL-BASED CLUSTERING

12.6 OTHER CLUSTERING METHODS

12.7 SUMMARY AND FURTHER READING

Chapter 13: **Learning Theory

13.1 COMPUTABILITY THEORY AND RECURSIVE FUNCTIONS

13.2 FORMAL LEARNING THEORY

13.3 PROPERTIES OF LEARNING FUNCTIONS

13.4 PROPERTIES OF INPUT DATA

13.5 CONVERGENCE CRITERIA

13.6 IMPLICATIONS FOR MACHINE LEARNING

13.7 SUMMARY AND FURTHER READING

Chapter 14: **Computational Learning Theory

14.1 INTRODUCTION

14.2 GENERAL FRAMEWORK FOR CONCEPT LEARNING

14.3 PAC LEARNING MODEL

14.4 VAPNIK-CHERVONENKIS DIMENSION

14.5 LEARNING IN THE PRESENCE OF NOISE

14.6 EXACT AND MISTAKE BOUNDED LEARNING MODELS

14.7 INHERENT UNPREDICTABILITY AND PAC-REDUCTIONS

14.8 WEAK AND STRONG LEARNING

14.9 SUMMARY AND FURTHER READING

Appendix A: *Definitions of some lesser known terms

A.1 COMPUTATIONAL COMPLEXITY CLASSES

A.2 ASYMPTOTIC NOTATION

A.3 SOME BOUNDS FOR PROBABILISTIC ANALYSIS

A.4 COVARIANCE MATRIX

References

Index

Copyright

HORWOOD PUBLISHING LIMITED

International Publishers in Science and Technology

Coli House, Westergate, Chichester, West Sussex, PO20 3QL, UK

First published in 2007

COPYRIGHT NOTICE

All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the permission of Horwood Publishing, Coll House, Westergate, Chichester, West Sussex, PO20 3QL, UK.

British Library Cataloguing in Publication Data

A catalogue record of this book is available from the British Library

ISBN − 10:1-904275-21-4

ISBN − 13: 978-1-904275-21-3

Cover design by Jim Wilkie.

Printed and bound in the UK by Antony Rowe Limited.

Foreword

Ivan Bratko

In the past two decades, Machine Learning (ML) has made, among the many areas of Artificial Intelligence (AI), probably the most significant progress, and has grown into a field of its own. In this period, many of its methods found their way into practical applications, and in many application areas are now routinely used. The wide spread and visibility of ML was accelerated by powerful new techniques, and also by the appearance of effective and user-friendly implementations of ML tools, including some excellent freely available platforms, such as Weka and Orange. All this has resulted in ML, together with its related areas of Data Mining, Knowledge Discovery in Databases, and Intelligent Data Analysis, becoming an effective tool for many other disciplines as diverse as engineering, medicine, biology, ecology, biochemistry and finance. In science, it is becoming part of general scientific methodology and is now used by scientists in interpreting experimental data and in formulating new scientific theories. ML is beginning to be taught outside Computer Science and is on the way to attaining a similar status as some other subjects of general importance, in the same way as it is, for example, useful to students of medicine and biology to learn the basics of statistics.

This book is written by two scientists who have made their own original contributions to the methods and applications of ML, one of the co-authors (I. K.) having been involved since its early stages. In my view, the main strength of this book is its breadth of coverage and wealth of material presented, without loss of depth. In this respect, the book is rather unique among the books on machine learning. The book covers a large majority of key methods, concepts and issues. The style of writing is terse and directly to the point.

In comparison with other books on ML, the following chapters seem to be particularly strong. Chapter 3 systematically introduces a repertoire of basic elements that are universally used as components of ML methodology. Chapter 6 gives an outstanding review of measures for attribute selection. Chapter 8 introduces constructive induction, a topic that is rarely covered in books. In Chapters 13 and 14 on the theory of ML, the book addresses deep theoretical questions of learnability.

This book will be useful reading both for beginners to the field, and even more so to people who have already done some work in ML, but would like to broaden their knowledge and are looking for reference to learn about yet unfamiliar topics.

Preface

Igor Kononenko; Matjaž Kukar, Ljubljana, December 2006

In learning, each day something is gained. In following the Tao, each day something is lost,

— Lao Tse

The book describes the basics of machine learning principles and algorithms that are used in data mining and is intended for students of computer science, researchers who need to adapt various algorithms for particular data mining tasks, and for education of sophisticated users of machine learning and data mining tools.

Machine learning has exhibited tremendous progress in the last two decades. Numerous systems for machine learning are nowadays available and the number of their applications in various fields, such as industry, medicine, economics, ecology, finance, and many others, is rapidly growing. Machine learning is used for data analysis and knowledge discovery in databases, data mining, automatic generation of knowledge bases for expert systems, game playing, text classification and text mining, automatic recognition of speech, handwriting, images, etc.

The basic principle of machine learning is modeling of available data. Results of various machine learning systems may be rules, functions, relations, equation systems, probability distributions and other knowledge representations. The obtained knowledge aims to explain the data and can be used for forecasting, diagnostics, control, validation, and simulations.

Several excellent books have already been written on machine learning and data mining, most notably (Mitchell, 1997; Witten and Frank, 2000, 2005; Hand et al., 2001; Alpaydin, 2005). The purpose of our book is to provide a thorough and yet a comprehensible general overview of the field. Our approach differs from other textbooks by providing several different points of view to machine learning and data mining, as reflected from the structure of the book:

• Besides defining the basic notions and overviewing the machine learning methods, Chapter 1 provides the historical overview of the field and an overview of the state-of-the-art data mining tools as well as established and emerging standards.

• Chapter 2 is the least formal and introduces the psychological and philosophical issues related to machine and human learning, intelligence, and consciousness.

• Basic principles of machine learning, described in Chapter 3, are the central issue and can be considered as the core of the book.

• The next three chapters divide the machine learning process into three basic components: the representation of knowledge (Chapter 4) defines the hypothesis space, Chapter 5 describes basic search algorithms that can be used to search the hypothesis space, and the attribute quality measures (Chapter 6) are used to guide the search.

• Chapters 7 and 8 add to the whole story formal and practical guidelines for preparing, cleansing and transforming the input data in order to get the most out of it with the available machine learning algorithms. Constructive induction, described in Chapter 8, has great potentials, however, in the past not many researchers or implementations considered it worthwhile. We believe that in the future, when more research will be devoted to this area, the advantages of constructive induction will be revealed and implemented in useful practical tools.

• The following four chapters use the principles and notions, described in preceding chapters, as building blocks in order to describe particular learning algorithms. Four chapters divide the field into four main approaches. Symbolic learning (Chapter 9) includes all approaches that result in symbolic and transparent knowledge representations. Statistical learning (Chapter 10), on the other hand, results in knowledge representations that are typically hard to understand and interpret by the end user. Artificial neural networks (Chapter 11) are inspired by biological neural networks. With respect to transparency of knowledge representation, artificial neural networks are much closer to statistical learning than to symbolic learning. Cluster analysis deserves a separate chapter (Chapter 12) as the task of clustering algorithms differs a lot from other machine learning tasks.

• The last two chapters introduce formal approaches to machine learning. Chapter 13 describes a mathematically and philosophically inspired problem of identification in the limit. The results of this formal theory provide the ultimate answers to philosophical questions which, however, are of modest practical use. On the other hand, the computational learning theory, described in Chapter 14, aims to provide theoretical answers to questions of more practical value.

In order to improve readability, we provide all the references in separate sections on further reading, that also contain chapter summaries. Obvious exceptions that naturally include references are, besides this preface, also historical in the text indicates that the proof of the assertion is omitted from the text and can be found at the end of the chapter in the corresponding section on formal derivations and proofs. Obvious exceptions are the last two chapters, where proofs can be found in the literature indicated in further reading. Those sections and chapters are intended for advanced readers. In the table of contents, advanced topics are indicated with * (moderately advanced) or ** (advanced). Appendix A defines some less known formal notions, which are occasionally used in the text.

Both authors have experience in teaching various parts of the material presented in this book, to undergraduate and postgraduate students. For the undergraduate course we suggest excluding sections and chapters marked with *: Section 3.4 (comparing performance of machine learning algorithms), sections related to inductive logic programming (Sections 4.2 and 9.5), Chapter 8 (constructive induction), Section 10.5 (support vector machines), Sections 11.3 and 11.4 (Hopfield’s and Bayesian neural networks), as well as all sections and chapters marked with **. Optionally, Sections 9.3 (association rules) and 9.7 (Bayesian belief networks), and/or Chapter 12 (cluster analysis) can also be kept for the postgraduate course.

The book web page can be found at the address: mldmbook.fri.uni-lj.si, where all information and post-press corrections will be available. The readers’ comments, suggestions and indications about errors in the text are welcome, and should be sent to either author’s e-mail address; igor.kononenko@f ri.uni-lj.si or matjaz.kukar@fri.uni-1j.si.

Acknowledgements

The book is the result of numerous years of active research in the field of machine learning. We are grateful to Professor Ivan Bratko for his pioneering efforts in establishing an outstanding machine learning research group in Ljubljana. We are grateful to the Faculty of Computer and Information Science for support, both during research and teaching. For long term collaboration in machine learning research we are particularly grateful to Matjaž Bevk, Zoran Bosnić, Ivan Bratko, Bojan Cestnik, Janez Demšar, Aleks Jakulin, Matevž Kovačič, Nada Lavrač, Uroš Pompe, Marko Robnik Šikonja, Luka Šajn, and Blaž Zupan. We thank Luka Šajn and Alan McConnell-Duff for proofreading the text, Jožica Robnik Šikonja for proofreading Chapter 2, Ivan Bratko and the anonymous reviewer for their comments and suggestions, and Bogdan Filipič, Marko Robnik Šikonja, and Blaž Zupan for their comments on earlier drafts of the book. Useful comments on various parts of the book were provided by several people. In particular, we thank Janez Demšar for his comments on Chapter 3, Aleks Jakulin for his comments on Chapter 8, Aristidis Likas and Eduardo Hruschka for their comments on Chapter 12, and France Dacar, Matevž Kovačič and Boštjan Vilfan for their comments on Chapter 13. The author of the spider net graphics from the book cover is a young artist and Igor’s daughter Nina Kononenko. We thank also Francesca Bonner and Ellis Horwood from Horwood Publishing for their kind assistance and patience. Finally, we are grateful to our wives and children for their loving support and patience and to whom with all our love we dedicate this book.

Chapter 1 Introduction

The time will come when we will have to forget everything we have learnt.

— Ramana Maharshi

Machine learning, a subfield of artificial intelligence, has shown tremendous improvements in the last 20 years. This is reflected in numerous commercial and open source systems for machine learning and their applications in industry, medicine, economics, natural and technical sciences, ecology, finance, and many others. Machine learning is used for data analysis and knowledge discovery in databases (data mining), automatic generation of knowledge bases for expert systems, learning to plan, game playing, construction of numerical and qualitative models, text classification and text mining (e.g., on the world wide web), for automatic knowledge acquisition to control dynamic processes, automatic recognition of speech, handwriting, and images, etc.

The basic principle of machine learning is the automatic modeling of underlying processes that have generated the collected data. Learning from data results in rules, functions, relations, equation systems, probability distributions and other knowledge representations such as decision rules, decision and regression trees, Bayesian nets, neural nets, etc. Models explain the data and can be used for supporting decisions concerning the same underlying process (e.g., forecasting, diagnostics, control, validation, and simulations).

We start with an overview and definitions of overlapping and frequently (mis)used buzzwords and acronyms, such as machine learning (ML), data mining (DM), knowledge discovery in databases (KDD), and intelligent data analysis (IDA). We continue with a short overview of machine learning methods and historical development of different directions in machine learning research: symbolic rule learning, neural nets and reinforcement learning, numerical learning methods, and formal learning theory. We describe a few interesting early machine learning systems, review typical applications and finish with an overview of data mining standards and tools.

1.1 THE NAME OF THE GAME

Why do we not, since the phenomena are well known, build a knowledge refinery as the basis of a new industry, comparable in some ways to the industry of petroleum refining, only more important in the long run? The product to be refined is codified human knowledge.

— Donald Michie

The main problem of contemporary data analysis is the huge amount of data. In the last decade, computers have provided inexpensive and capable means to collect and store the data. The increase in data volume causes difficulties in extracting useful information. The traditional manual data analysis has become insufficient, and methods for efficient computer-based analysis are now needed. Therefore, a new interdisciplinary field has emerged, that encompasses statistical, pattern recognition, machine learning, and visualization tools to support the analysis of data and discovery of principles hidden within the data. In different communities this field is known by different names: knowledge discovery in databases (KDD), data mining (DM), or intelligent data analysis (IDA).

The terms data mining (in the database community) and knowledge discovery in databases (in machine learning and artificial intelligence communities) appeared around 1990, but the term data mining became more popular in the business community and in the press. Frequently, data mining and knowledge discovery are used interchangeably.

KDD focuses on the knowledge discovery from data (Figure 1.1), including how the data are stored and accessed; how algorithms can be scaled to massive data sets and still run efficiently; how results can be interpreted and visualized; and how the overall man-machine interaction can usefully be modeled and supported. The KDD process can be viewed as a multidisciplinary activity that encompasses techniques beyond the scope of any particular discipline such as machine learning. Note that the KDD process uses data mining as one of its steps.

Figure 1.1 The process of knowledge discovery in databases.

Data mining (DM) is an interactive and iterative process in which many steps need to be repeatedly refined in order to provide for an appropriate solution to the data analysis problem. Data mining standard CRISP-DM defines a process model that reflects the life cycle of a data mining project. It contains the phases (see Figure 1.2) of a data mining project and identifies relationships between them. The phases are cyclically iterated until some desired goal is reached. By comparing Figures 1.1 and 1.2 it is clear that basically they describe the same thing.

Figure 1.2 The cycle of a data mining process in the CRISP-DM standard.

The ambiguity whether KDD = DM arises from the fact that, strictly speaking, data mining (DM) covers the methods used in the data modeling step. This accounts both for visual methods as well as for methods that (semi) automatically deal with the extraction of knowledge from large masses of data, thus describing the data in terms of the interesting discovered regularities. Most automatic data mining methods have their origins in machine learning (ML). However, ML cannot be seen as a true subset of data mining, as it also encompasses other fields, not utilized for data mining (e.g. theory of learning, computational learning theory, and reinforcement learning).

A closely related field of intelligent data analysis (IDA) is similar to KDD and DM. It also refers to the interactive and iterative process of data analysis, with the distinguishing feature that the architectures, methodologies and techniques used in this process are those of artificial intelligence (AI) and machine learning. Figure 1.3 shows the relation of IDA to KDD, DM and ML in the strictest sense. In practice (and in the following text) KDD and DM are treated as synonyms.

Figure 1.3 Relation between ML, DM, KDD, and IDA in the stricter sense.

There is a large intersection between KDD/DM and IDA. Both fields investigate data analysis, and they share many common methods. The main difference is that IDA uses AI methods and tools, while KDD/DM employ both AI and non-AI methods. Therefore, ML methods are in the intersection of the two fields, whereas classical statistical methods and on-line analytical processing (OLAP) belong to KDD/DM but not to IDA. Typically, KDD and DM are concerned with the extraction of knowledge from large datasets (databases), whereas in IDA the datasets are smaller.

1.2 OVERVIEW OF MACHINE LEARNING METHODS

A learning machine, broadly defined, is any device whose actions are influenced by past experiences.

— Nils J. Nilsson

This overview is intended to give an impression of the field of machine learning. Therefore detailed explanations are omitted. The meaning of various terms mentioned in this section will become more familiar later in the book.

We differentiate machine learning methods with respect to how the obtained (induced) knowledge is used (see Figure 1.4): classification, regression, clustering, learning of associations, relations, and (differential) equations. Besides, machine learning is used in reinforcement learning within an iterative approach for optimization of value function approximation. Equation and reinforcement learning are not thoroughly discussed in this book, and are only described here for completeness.

Figure 1.4 A taxonomy of machine learning methods. The methods not thoroughly discussed in the book are grayed.

1.2.1 Classification

Machine learning methods are most frequently used for classification. Let us suppose that we have an object, described with many attributes (features, properties). Each object can be assigned to exactly one class from the finite set of possible classes. Attributes are independent observable variables, either continuous or discrete. The class is a dependent unobservable discrete variable and its value is determined from values of respective independent variables. Machine learning methods are, among others, used for creation of classifiers. The task of the classifier is to determine the class to which the object in question should be assigned.

A typical classification task is medical diagnosis: a patient is described with continuous (e.g. age, height, weight, body temperature, heart rate, blood pressure) and discrete attributes (e.g. sex, skin discoloration, location of pain). The classifier’s task is to produce a diagnosis – to label the patient with one of several possible diagnoses (e.g. healthy, influenza, pneumonia).

To determine the class, a classifier needs to describe a discrete function, a mapping from the attribute space to the class space. This function may be given in advance, or can be learnt from data. Data consists of learning examples (sometimes also called training instances) that describe solved past problems. For illustration let us again consider medical diagnostics. Here, solved past problems are medical records, including diagnoses, for all patients that had been treated in a hospital. The learning algorithm’s task is therefore to determine the mapping by learning from the set of patients with known diagnoses. This mapping (represented as a function, a rule, … ) can later be used for diagnosing new patients.

Different classifiers represent mapping functions in many different ways. Most common classifiers are decision trees, decision rules, naive Bayesian classifiers, Bayesian belief networks, nearest neighbor classifiers, linear discriminant functions, logistic regression, support vector machines, and artificial neural networks.

Decision trees and rules. Algorithms for building decision trees and rules can select attributes and suitable subsets of their values according to their quality, and use them as building blocks of a conjunctive rule’s antecedent. A consequent consists of one or more classes. When a new example arrives it is classified by applying a suitable rule.

For the given attribute A and its values V1 … Vm, decision trees and rules form conjunctive terms such as A = Vj . Continuous attributes are either discretized in advance or, even better, the terms like A > x, A ≤ x, or x ≥ A > y, where x, y , are formed during learning.

Bayesian classifier. Given the attribute values of a new example, the task of the Bayesian classifier is to calculate conditional probabilities for all possible classes. The exact Bayesian classifier minimizes the expected error rate and is therefore optimal in this sense. As exact Bayesian classifiers are hard to come by (unless the learning data cover the complete attribute space and there is no noise), certain approximations and assumptions have to be used when calculating conditional probabilities.

The naive Bayesian classifier assumes conditional independence of attributes given the class. This (often very strong) assumption usually ensures reliable conditional probability estimations needed for classification, even from very small datasets. Implementations of the naive Bayesian classifier often assume that only discrete attributes are used; therefore continuous attributes need to be discretized in advance.

A generalization of the naive Bayesian classifier are Bayesian belief networks. They use acyclic graphs for modeling dependencies between attributes and the class. Conditional independence is implicitly assumed between nodes (attributes) that are not directly connected. A Bayesian network’s topology can be given in advance (as a part of background knowledge for the problem), or can be learnt from data. Conditional dependencies are also learnt from data. Acyclic graph topology ensures efficient calculation of conditional probabilities for all classes. Bayesian belief networks can be used also for modeling probability distributions.

Nearest neighbors classifiers (instance-based learning). In its simplest form, the k-nearest neighbors classifier (k-NN) generates its knowledge base by storing all learning data. When a new example is presented for classification, the stored data are used to find a certain number (k) of most similar learning examples (nearest neighbors), according to some distance metric. The new example is labelled with a class, prevalent among its nearest neighbors. The number k is typically a small odd number (e.g. five or seven).

Discriminant functions. The task of the learning algorithm is to calculate coefficients for a discriminant function whose structure is fixed in advance. A discriminant function is actually a hypersurface, dichotomizing between two classes in the attribute space. As a hypersurface can be defined only in a continuous hyperspace, all attributes need to be continuous. If we have more than two classes, a separate hypersurface is necessary for each pair of classes. Discriminant functions can be linear, quadratic, polynomial, etc. In case of a linear discriminant function the corresponding hypersurface is a hyperplane dichotomizing between two classes. The hyperplane’s (hypersurface’s) coefficients are determined so as to minimize the classification error rate.

Frequently, Fisher’s linear discriminant function is used. It assumes normal distribution of learning examples within each class. It maximizes the (Euclidean) distance between averaged examples from both classes. It accounts for potentially different class variances and finds an optimal classification boundary between classes.

Support Vector Machines (SVMs). SVMs were developed in the 1990s and are among the most accurate approaches to discriminant function classification. Due to their importance we discuss them separately. SVM is a multi-criterion optimization method:

• It maximizes the distance between support vectors and a dichotomizing hyperplane. Support vectors are learning examples that lie closest to the hyperplane and therefore define its margin. The method thus maximizes the margin as well.

• Since many classification problems are not linearly separable, SVMs implicitly transform the original attribute set to the much larger (possibly even infinite) attribute set by using kernel functions. The success depends mostly on a suitably chosen kernel function.

• Since every classification problem can be exactly solved by using arbitrarily complex discriminant functions, SVMs attempt to minimize the size of solution (the sum of attributes’ weights).

• Although the transformation of attributes is nonlinear, some examples still get misclassified. SVMs therefore tend to minimize the number of misclassifications.

Completely satisfying all the above criteria is usually impossible, so a compromise needs to be found. Although SVMs use several optimization algorithms to optimize all criteria simultaneously, one still needs to appropriately weight the above criteria in order to obtain the best possible classification.

Artificial neural networks. Algorithms for artificial neural networks (ANNs) mimic biological neural networks by abstracting neural functions to a simple element (neuron), only able to summarize its input and normalize its output. Neurons are interconnected into arbitrarily complex artificial neural networks. For classification, most frequently feedforward multi-layered neural networks are used. Neurons are organized in a layered cascade: an input layer (corresponding to attributes), one or more hidden layers, and an output layer (corresponding to classes). The task of the learning algorithm is to determine weights on connections between neurons (used for calculating weighted sums in each neuron) in order to minimize the classification error rate.

For classification of a new example its attribute values are presented to the ANNs input neurons. These values are weighted according to the connections between neurons, and their weighted sums are calculated in each neuron in the subsequent layer of neurons. Normalized results in the output neurons determine the classification outcome (class).

Hybrid algorithms. Hybrid algorithms or their parts combine several different approaches by using their advantages and thus creating possibly superior learning algorithms. An example is the naive Bayesian classifier that has many nice features (speed, robustness, comprehensibility), and a sole problem – its naivety. It is often combined with other algorithms in order to – at least partially – avoid the effects of naivety:

• A locally naive (k-NN) Bayesian classifier is based on learning a naive Bayesian classifier on a set of k nearest neighbors of a new example. This reduces the naivety with respect to the original globally naive Bayesian classifier. Local classifiers are adapted to local distributions which implicitly account for attribute dependencies. The parameter k must be selected much larger than for ordinary nearest neighbor classifiers (e.g. k > 30) in order to provide sufficiently reliable conditional probability estimations.

• Clustering and the naive Bayesian classifier. For each class the learning examples are clustered (described later in this section) into several clusters. Each of them serves as a learning set for a separate new class. The number of obtained clusters is larger than the number of classes. The clustering algorithm should account for attribute dependencies and detect locally coherent subspaces, such that the naive Bayesian classifier can discriminate between them.

• The naive Bayesian classifier in the leaves of a decision tree. We may assume that the decision tree will account for dependencies among attributes, and split the original attribute space into subspaces where the remaining attributes are relatively independent. Such decision trees should be made smaller (by pruning) in order to obtain sufficiently reliable conditional probability estimations for the final naive Bayesian classification.

• The naive Bayesian classifier in internal nodes of decision trees. Instead of using original attributes to form splitting criteria, we can use a simple classifier in each internal node. Each node has for each possible class a corresponding subtree. A classifier learns from the training examples that belong to its node and assigns them to respective subtrees. The task of the subtree is to correctly classify learning examples that were misclassified in the previous node. A decision tree is built recursively until all examples are correctly classified or some other stopping criterion is fulfilled (e.g., number of examples becomes too small).

1.2.2 Regression

As in classification problems, in regression we have a set of objects (learning examples), described with several attributes (features, properties). Attributes are independent observable variables (either continuous or discrete). The dependent (target, regression) variable is continuous and its value is determined as a function of independent variables. The task of the regressional predictor is to determine the value of the dependent unobservable continuous variable for the object in question.

Similarly to classifiers, a regressional predictor implements a continuous function mapping from the attribute space to prediction values. This function can be given in advance or learnt from previously solved problems (a learning set). The task of the learning algorithm is therefore to determine a continuous function by learning from the learning set. This function can later be used for predicting values for new, previously unseen examples.

Regressional predictors differ with respect to the representation of regressional function. Most common regressional predictors (regressors) are linear regression, regression trees, locally weighted regression, support vector machines for regression, and multi-layered feedforward neural networks for regression.

Regression trees. Algorithms for building regression trees are similar to algorithms for constructing decision trees. According to their quality, they choose attributes and subsets of their values for building the node-splitting criteria. Tree nodes stand for attributes, tree branches (edges) for attribute values (conjunctively combined), and leaves for consequents of the prediction rules that map from the remaining (non-used) attributes to the prediction value. Discrete and continuous attributes are handled within internal nodes much in the same way as in decision (classification) trees. In terminal nodes (leaves) we have a continuous function. It is usually assumed that all the remaining attributes are continuous. Most frequently, single point functions (the mean value of dependent variable of the corresponding subset of learning examples), or linear regression functions are used.

Linear regression. A linear regression function assumes all attributes to be continuously-valued and suitably normalized. It also assumes the existence of a linear relationship between dependent and independent variables. The learning algorithm needs to determine the coefficients of the linear function in order to minimize the sum of squared (or absolute) errors of regressional predictions evaluated on learning examples. Either analytical solution or iterative optimization may be used. The result of learning is a hyperplane that for each example determines the value of the dependent variable. The error for a given example is its distance from the hyperplane in the dependent variable’s axis.

For non-linear problems a series of linear functions or nonlinear parts included in the functional expression can be used.

Locally weighted regression. Locally weighted regression is the nearest neighbors algorithm adapted for regression, such that linear regression function is calculated from the new example’s nearest neighbors. Linear regression is thus adapted to the local properties. Besides linear regression, the average value of nearest neighbors or some other simple function can be used – we cannot afford to use complex functions due to the small number of neighbors.

Support vector machines for regression. Since SVMs perform implicit transformation of the attribute space by utilizing kernel functions, there is no need to introduce nonlinearities for description of the hypersurface. SVMs can be easily adapted for solving regressional problems. A margin around the regressional hyperplane is defined such that correctly predicted learning examples lie inside it. Support vectors determine the margin. SVMs aim to minimize the margin as well as minimize the prediction error on learning examples. As is the case with SVM’s for classification, the criterion function needs to be optimized with respect to the prediction error of the regressional variable as well as with respect to the complexity of the criterion function (measured with magnitudes of weights for implicitly generated attributes).

Artificial neural networks. For regression, as well as for classification, multilayered feedforward networks are commonly used. Simple calculation elements (neurons) are organized in a layered cascade: an input layer (corresponding to attributes), one or more hidden layers, and an output layer with a single neuron (corresponding to the regressional variable). The task of the learning algorithm is to determine weights on connections between neurons (used for calculating weighted sums in each neuron) such that the regressional error is minimal.

Hybrid algorithms. As for classification, hybrid algorithms combine several different approaches by using their advantages and thus creating possibly superior learning algorithms. Strictly speaking, locally weighted regression is of this kind, combining linear regression with nearest neighbors. The same holds for the use of linear regression (or other simple methods) in the leaves of regression trees. The rationale here is that regression trees should partition the attribute space finely enough for a simple regressor to work well. Such regression tree should be made smaller (by pruning) in order to allow enough examples for simple regressors.

1.2.3 Logical relations

Logical relations can be considered a generalization of discrete functions. As opposed to functions, there is no single dependent discrete variable (class), and all variables (attributes) are treated equivalently. Sometimes, some variables have unknown values, and we want to predict them. In other cases, all the variables have known values and we want to check if the relation holds for this particular set of values. According to the expressive power of the language used for describing learning examples and logical relations, we differentiate between learning associations and inductive logic programming.

Associations. We speak of associations when the propositional (with attributes) data description is used, such as in classification and regression, the only difference being the absence of the dependent variable. Each variable may in turn be considered as independent (its value is given), or dependent (its value is not given). Because of the absence of an explicitly defined dependent variable, association learning is often considered unsupervised, together with clustering (to be described later).

A straightforward approach to association learning is to break the problem into a learning problems, a being the number of attributes. In each new problem, one attribute is taken as a target (dependent) variable, whereas others are treated as independent variables. The type of the learning problem (classification or regression) depends on the type of the attribute (discrete or continuous). Two approaches to association learning are associative neural networks and association rules.

• Associative neural networks are artificial neural networks where every neuron can serve both as input and output. Each neuron represents one attribute. Neurons are mutually interconnected with two-way connections, so that the signal can travel in either direction. The signal value is multiplied by the weight of the connection. Each neuron calculates its own weighted sum of the incoming connections. This (normalized) sum represents the attribute value. In an associative neural network the initial state is determined by the given attribute values. After the initial set-up, all neurons start calculating their outputs asynchronously and in parallel. This process goes on until the outputs of all neurons cease changing. Thus the unknown values of the attributes are calculated.

The task of the learning algorithm is to set the weights on connections between each pair of neurons. According to the utilized learning algorithm we speak of Hopfield and Bayesian neural networks. Hopfield neural networks are based on correlation of the state activity of neurons. Bayesian neural networks use in every neuron the naive Bayesian learning rule for calculation of the weighted sum. Similarly, its learning rule calculates the conditional probability of activity of each neuron given the states of other neurons.

Associative neural networks are frequently used for pattern recognition when only partial data are known, and the missing part needs to be reconstructed. For example, in face recognition only a part of the face can be seen. Another use is in combinatorial optimization problems, where parameters (attributes) compete with each other in the sense that higher values of one may result in lower values of others. Associative neural networks tend to find a compromise – a locally optimal combination of parameter values.

• Association rules are similar to classification rules. They differ in the consequent, where instead of class any attribute may occur. The antecedent is made of conjunctively connected terms, built from other attributes. The aim of learning association rules is to find new interesting relations between attributes. They may contribute to (human) knowledge of a certain problem and give insight about regularities in data. The task of the learning algorithm is to find all association rules that are both accurate and interesting.

Association rules have originally been used in market basket analysis. With their help the merchants can plan the arrangement of items on the shelves in their shop. For example, a rule may say, that there is a high probability that a customer, buying flour, sugar, and nuts, will also buy eggs.

Inductive logic programming. In inductive logic programming (ILP) the first-order predicate calculus is used as a description language. It is usually limited to Horn clauses with negation, or equivalently, to Prolog expressions. A learning problem is described with the background knowledge (a set of relations, or Prolog programs) and with sets of positive (true) and negative (false) facts (examples) for a target relation. The task of the learning algorithm is to derive the target relation (a Prolog program) covering as many as possible positive and as few as possible negative examples. ILP can be considered as a process for discovering systems of logical equations.

In practice, inductive logic programming is used for solving geometrical and spatial problems, such as structure prediction of chemical compounds, gene expression determination in functional genomics, spatial planning and subspace recognition, and static and modeling problems in machine engineering.

1.2.4 Equations

When modeling real-world problems, we often have to deal with measurements of different, related and mutually dependent processes. By learning systems of equations we aim to explicitly describe these dependencies. The same system of equations may as well be used for simulation and forecasting. We usually deal with real-valued (continuous) variables, numerical relations, and dependencies. In contrast to this, ILP is concerned mostly with logical variables, relations, and dependencies.

When learning systems of equations, their structure may either be given in advance in the form of background knowledge, or the learning algorithm needs to derive from data the structure of equations as well as their corresponding coefficients. For modeling dynamic systems that change continuously through the time, systems of partial differential equations need to be derived. The task of the learning algorithm is to derive a system of equations that is both simple (in terms of equations) and accurate (with respect to input data).

Equation discovery is frequently used in meteorology for modeling weather systems, in ecology for modeling biological subsystems, and in machine engineering for modeling technical and dynamical processes.

1.2.5 Clustering

Unsupervised learning is different from classification and regression since only description of examples with attributes is given, without the supervising (target) variable. Clustering is the most popular of unsupervised learning methods. The task of the learning algorithm is to determine coherent subsets (clusters) of learning examples. Clustering is frequently used in natural sciences, econometrics, process analysis, psychological and sociological research.

The number of clusters may either be given in advance as a part of the background knowledge, or determined by the learning algorithm. Its task is therefore to determine a relatively small number of coherent clusters, that is, subsets of similar examples. The choice of the (dis)similarity measure is the most important part of the background knowledge, and of utmost importance for successful and meaningful clustering. Most popular approaches to clustering are:

• Hierarchical clustering. In agglomerative (bottom-up) hierarchical clustering, each example initially forms a separate cluster. At each step of the algorithm, the most similar clusters are amalgamated, thus forming a tree of clusterings (dendrogram). Frequently, pairwise amalgamation is used, leading to binary trees. Amalgamation continues until all examples belong to a single cluster. Finally, the learning algorithm or the end user, selects the most suitable clustering level from the constructed tree. In divisive (top-down) hierarchical clustering, all examples initially belong to a single cluster. The number of clusters is incremented at each step of the algorithm by dividing an existing cluster into (usually, two) sub-clusters. The process continues until a suitable number of clusters is obtained.

• Partitional clustering. Initially, the number (c) of required disjoint clusters needs to be known. Given a criterion for measuring adequacy of a partition of examples into c clusters, the algorithm searches for optimal partitions of examples. The process starts with an initial clustering (frequently obtained by randomly selecting c examples), and a set of transformations for changing a partition into another partition. The learning algorithm (e.g., iterative relocation algorithm) modifies a partition until no allowable transformation would improve the given criterion of partition adequacy.

1.2.6 Reinforcement learning

Reinforcement learning deals with the problem of teaching an autonomous agent that acts and senses in its environment to choose optimal actions for achieving its goals. The agent receives information about the current state in the environment and performs actions to change it. Each time the agent performs an action, he may also get a positive or a negative reward (punishment). A reward is not necessarily a result of the latest action, but may be an effect of a (sometimes very long) series of actions. Since rewards are often indirect and delayed, learning is difficult and slow. A learner may be conservative and use only the knowledge it already has, or it might experiment with new, rarely used actions to obtain the knowledge of new situations.

The learner’s task is to learn from indirect, delayed rewards, how to choose sequences of actions that maximize the cumulative reward. Usually, the fastest solution (in terms of the number of actions) is preferable. Therefore, the weighting of accumulated rewards is often chosen so that immediate rewards are most important.

A commonly used approach to reinforcement learning is Q learning. This is a form of reinforcement learning in which the agent iteratively learns an evaluation function over states and actions. It can be employed even when the learner has no prior knowledge of how its

Enjoying the preview?

Page 1 of 1

Machine Learning and Data Mining

About this ebook

Igor Kononenko

Related authors

Related to Machine Learning and Data Mining

Related ebooks

Intelligence (AI) & Semantics For You

Related podcast episodes

Related articles

Related categories

Reviews for Machine Learning and Data Mining

What did you think?

Book preview

Machine Learning and Data Mining - Igor Kononenko

Machine Learning and Data Mining

Table of Contents

Copyright

Foreword

Preface

Acknowledgements

Chapter 1

Introduction

1.1 THE NAME OF THE GAME

1.2 OVERVIEW OF MACHINE LEARNING METHODS

1.2.1 Classification

1.2.2 Regression

1.2.3 Logical relations

1.2.4 Equations

1.2.5 Clustering

1.2.6 Reinforcement learning