You are on page 1of 7

In the past, mining for gold consisted of choosing a site and then sifting through endless amount of effort.

Sometimes the prospector only found a few valuable nuggets, sometimes he hit upon an entire vein, but most of the time, he found nothing at all and decided to either move on or to another promising spot or give up mining all together and stop wasting his time. Today, with scientific methods and specialized tools, mineral mining is much more accurate and productive. Mining for data has evolved in much the same way. Older methods executed by business mathematicians and statisticians took a long time to yield constructive information. Now, current software and techniques help make data mining a lucrative, more accessible process for businesses. How Data Mining Works Data mining is simply filtering through large amounts of raw data for useful information that gives business a competitive edge. This information is made of meaningful patterns and trends that are already there but were previously unseen. A company might be collecting enormous amounts of data from sources such as cash registers, bar code sweeps, surveys, inventory reports, Web hits, registration cards, and so on. Companies usually end up with so much data that it is difficult to go through it all, and then by the time its compiled and analyzed, it could be severely outdated. Instead of letting this data sit in database archives indefinitely, companies could mine this data efficiently within a relatively short amount of time. The end result of data mining should be the acquisitions of new and useful information that can help a company make better decisions that improve business. For example, lets assume that your company sells pet supplies. Mining through customer and product databases reveals that dog food purchasers tend to rent apartments and dog food purchasers usually own homes. With this type of information, which may have been difficult to piece together before, you can now more effectively target your marketing efforts. The overall mining process actually begins with a targeted problem. To keep the project manageable, the business should narrow the scope of the mining process to a single issue, such as increasing repeat business. Data mining is more successful when the company first decides what it wants to get out of the mining or what business problem it wants to solve. Searching through vast amounts of data for anything and everything wastes resources and could generate unusable results such as irrelevant data (people with the last name of Young always buy dog food), erroneous data (all customers who buy fish food also buy gum, but only three customers fit this trend) or obvious data (people who buy cat food own cats). Once the company defines a problem or focus, they can also identify the mining targets, such as the databases or data stores that are relevant to the problem. Next, the actual mining begins. This data-defining process looks for patterns, trends, relationships, frequencies, and influences on the data. Depending on how much data is involved, this process

Data Mining and Business Intelligence

might take a few hours to several days to sift through data, pick out patterns, and then verify and re-verify them.

Figure 1: Microsofts BI Architecture

Intelligence In Mining Methods. The most popular tool used when mining is artificial intelligence (AI). AI technologies try to work the way the human brain works, by making intelligent guesses, learning by example, and using deductive reasoning. When mining, you want the computer to think like you as much as possible, say that you want to predict stock prices. What you really want is to predict what the right action to take is. You find

Data Mining and Business Intelligence

that IBM goes up $1, then it goes up $3 dollars. You are pleasantly surprised, but the computer thinks it made a mistake and tries very hard to correct it. Once you define the problem, the computer should be doing the guesswork. Some of the more popular AI methods used in data mining include neural networks, clustering, and decision trees. Neural networks look at the rules of using data, which are based on the connections found or on a sample set of data. For example, you can set up mining software so that it looks at an investment database for patterns of stock prices in relation to other factors you dictate. As a result, the software continually analyzes value and compares it to the other factors, and it compares these factors repeatedly until it finds patterns emerging. These patterns are known as rules. The software then looks for other patterns based on these rules or sends out an alarm when a trigger value is hit. Neural networks are more accurate than many other methods; however, theyre less interpretable and usually slower. Clustering (also called family clusters and symbolic classifiers) divides data into groups based on similar features or limited data ranges. In a very simple example, mining might uncover a cluster of customers with low incomes. But when complex adjoining factors are filtered in and analyzed, mining might also reveal that young customers fall into this cluster, as well. Hence, the business customers who are younger tend to make less money. Clusters are used when data isnt labeled in a way that is favorable to mining. For instance, an insurance company that wants to find instances of fraud wouldnt have its records labeled as fraudulent or not fraudulent. But after analyzing patterns within clusters, the mining software can start to figure out the rules that point to which claims are likely to be false. Decision trees, like clusters, separate the data into subsets and then analyze the subsets to divide them into further subsets, and so on (for a few more levels). The final subsets are then small enough that the mining process can find interesting patterns and relationships within the data. To illustrate this, lets assume that customers are divided into one-time visitors and repeat customers. The decision tree method reviews the factors of the one-time visitor tree and then separates this data according to type of sale, such as an in-store sale or an online sale. Mining in this manner might eventually reveal that online customers do more repeat shopping. Commercial and academic entities tend to use decision trees more. They are easy to understand and map, but unfortunately, they also have the least amount of accuracy in results. Older methods used traditionally in statistical analysis can also contribute a lot to mining. Regression analysis, for instance, is a good method for finding linear patterns in the data. By looking at several explanatory factors, regression analysis draws some aggregate conclusion about a single factor. Older methods are not always as fancy or as easy to use, but they still provide some of the most accurate analyses.

Data Mining and Business Intelligence

It is best to remember that no single method is a universal solution and some are better with particular types of data. Often, using more than one method for data mining yields the best results. In doing so, you can use different data mining tools or choose a software package that incorporates more than one method. Reporting The Results. When the process is complete, the mining software generates a report. An analyst goes over the report to see if further work needs to be done, such as refining mining parameters, using other data analysis tools to examine the data, or even scrapping the data if its unusable. If no further work is required, the report proceeds to the decision makers for appropriate action. Traditional query-and-reporting tools for analyzing data involve designing specific queries or search criteria to find usable information. This involves making an assumption, such as cat food shoppers purchase more upholstery cleaner, and then searching through the database for data that confirms or rejects that assumption. This is fine if the user knows specifically what they are looking for. Mining, however, goes beyond this by finding useful information based on a general question. A way to think of it is: with a query, you have to make a hypothesis about relationships yourself, mining is an automatic hypothesis-generator. By searching through the data from the bottom up, mining combs the records for specific details, and then you can assemble answers to your queries based on what the data is actually saying. Data mining is also more than merely reporting because it tells you not just what has happened in the past, but it also tells you what could happen in the future based on the trends it finds. Queryand-reporting tools, on the other hand, can only tell you what has happened. In comparison, data mining has a lot more relevant and valuable information to offer. This can be the fun and interesting part of data mining. It is like looking into a crystal ball [and] being a wizard looking into the future. As technologies allow more and more data to be stored, query-and-reporting tools may not always be the most efficient method of finding out what you need to know. Using querying to investigate numerous scenarios and what-ifs might slow down large databases that are in use by other users. Mining does not take the place of querying, however. Querying answers targeted at complex questions is still valuable, but mining used in conjunction with query-and-reporting tools might be the best way to refine information uncovered from mining. Decide What To Mine. Naturally, to mine data, there must be data available. Believe it or not, weve seen companies that start mining without any data, in hopes of getting it or collecting it, if the information is not there,

Data Mining and Business Intelligence

no algorithm will find it. Once the data to be mined is identified, it should be cleansed. Cleansing data frees it from duplicate information and erroneous data. Next, the data should be stored in a uniform format within relevant categories or fields. Because mining must use good data to ensure productive results, cleansing or formatting the data can take up a significant amount of time in the mining process. Mining tools can work with all types of data storage, from large data warehouses to smaller desktop databases to flat files. Data warehouses and data marts are storage methods that involve archiving large amounts of data in a way that makes it easy to access when necessary. Data mining and massive data storage are highly complementary activities. In fact, some of the same vendors that offer data mining solutions also produce and market data storage tools. Even so, you do not have to collect years of data to make use of the power of data mining. Mining is most useful with a large number of factors. You dont need lots of data,. For example, if a company keeps track of 20 to 50 products or 50 to 100 customer traits, theyll probably make better use of mining their data and find several overlapping relationships, rather than just maintaining years of data that are kept in only a few fields or in a memo format. Is Mining For You? Most of the data mining examples in this article involve marketing issues, and this is certainly a common use of mining, but data mining is also popular in areas such as investment analysis and detecting insurance or banking fraud. According to Dr. Elder, these industries have adopted mining techniques early because they have had a great deal of data that is very hard to sift through. Mining turns up data that gives them just a little edge, but the payoff for this small edge is large. In addition, the power of data mining is being used for many other purposes, such as analyzing Supreme Court decisions, discovering patterns in health care, pulling stories about competitors from newswires, resolving bottlenecks in production processes, and analyzing sequences in the human genetic makeup. There really is no limit to the type of business or area of study where data mining can be beneficial. Data mining is simple in theory, but it can get quite involved in practice. That doesnt mean that mining is not accessible, it just means that youll probably need some help getting set up. Many data mining software solutions are available for small and large businesses in most industries and these tools are constantly improving. Their selection ranges from desktop applications for a few hundred dollars to complete server hardware and software solutions costing tens of thousands of dollars. The mining software you select should be able to test various scenarios, provide analysis in a decipherable format, and most importantly, it should give you information that you dont already

Data Mining and Business Intelligence

have. Not all data mining tools are equal, nor are all tools appropriate for your type of business data, but with a little research, youll find the one that fits your situation the best. Any company looking into data mining should first consider a few things. One, you should expect that the mining effort will make a difference in how you do business and affect the bottom line. Next, you must have sources of information that need to be mined. In addition to must be ready to commit the company to the mining project. Mining data takes time; time to prepare the data, time for the mining software to discover related patterns, and time for the mining process to adjust when changes occur. When beginning a mining project, we tell a company to estimate about six months to make a meaningful advance, Finally, the business should also have someone who has the time and is willing to become proficient at the mining process. It is vital to remember that when considering a process as intense as data mining, human intervention is still an extremely important part of the loop. You need to have someone who is willing to understand the tools, the methods, the queries, as well as how to interpret the data. A user can easily misinterpret mining results just to fit some preconceived perception. Hiring a consultant might help your company understand the process better and select the most appropriate software package. We dont mean to suggest, however, that experimentation is not helpful, especially if you are a smaller company or you already have some software in mind. Sometimes, theres a breakthrough when trying something new. Some mistakes will be made, mistakes that can be remedied with statistical expertise, but the technology is becoming more available for more users, Data mining is not designed to fix all business problems or magically tell you what the real problem is. If properly used, however, data mining is a useful tool that gives a company the information-sensitive microscope that it didnt already have, which in turn, can be used to help make intelligent business decisions. You might not always get the answer you expected or needed, but youll often find new information that is still constructive. Most of all, data mining is an ongoing process that involves a lot of analysis and refining along the way, so think of it as a worthy investment. And like any investment, even if your data-mining portfolio only uncovers small golden nuggets at first, those nuggetsif properly managedcan yield a lot of value

DataMiningGlossary
For a quick review of data mining terms and definitions, see the entries below.

Data Mining and Business Intelligence

data martAs a data storage term, data marts tend to store information for a single subject or department, and can be subsets of a larger warehouse. data warehouseA massive database for efficiently storing large amounts of data to accommodate the rapid processing of queries and summaries. Online Analytical Processing (OLAP)A tool that helps organize databases more efficiently for quick access to internal data, especially when querying large amounts. This type of tool is used commonly with databases that organize data in multiple dimensions or aspects. queryTo request specific data from a database or ask a question that you want the mining process to answer. ruleWhen referring to data mining processes, a rule signifies a pattern of factors that a mining method follows to analyze and compare data effectively. structured query language (SQL)A type of programming language used to perform queries and maintenance in databases. SQL uses several words in plain English and many databases are able to incorporate SQL commands, thereby making it a very popular language used with databases.

You might also like