You are on page 1of 19

3 UNIT Transaction Processing Systems

Transaction processing systems (TPS's) were among the earliest computerized systems. Their primary purpose is to record, process, validate, and store transactions that take place in the various functional areas of a business for future retrieval and use. Transaction processing systems are cross-functional information systems that process data resulting from the occurrence of business transactions, such as sales, purchases, deposits, withdrawals, refunds, and payments. A TPS is also acts as main link between the organization and external entities, such as customers' suppliers, distributors, and regulatory agencies Transaction processing systems serve the operational level of the organization. It is a computerized system that performs and records the daily routine transactions necessary to manage business; they serve the organizations operational level. The principal purpose of systems at this level is to answer routine questions and to track the flow of transactions through the organization. Examples are hotel reservation systems, payroll, employee record keeping, and shipping. At the operational level, tasks, resources, and goals are predefined and highly structured. The decision to grant credit to a customer, for instance, is made by a lower level supervisor according to predefined criteria. All that must be determined is whether the customer meets the criteria. Figure (14) depicts a payroll TPS, which is a typical accounting transaction processing system found in most firms. A payroll system keeps track of the money paid to employees. The master file is composed of discrete pieces of information (such as a name, address, or employee number) called data elements. Data are keyed into the system, updating the data elements. The elements on the master file are combined in different ways to make up reports of interest to management and government agencies and to send paychecks to employees. These TPS can generate other report combinations of existing data elements.

Types of Transaction Processing System (TPS's)


1. On-line system: involves a direct connection between operator and the TPS program. They provide immediate result and used to process a single transaction at a time. Ex: an order arrives by telephone call; it is processed at that moment and the result are produced.

2. Batch-processing system: This is a second type of TPS, where transactions are grouped together and processed as a unit. Example: cheque processing system in a bank.

Types of Transactions:
1. Internal Transactions: Those transactions, which are internal to the company and are related with the internal working of any organization. For example Recruitment Policy, Promotion Policy, Production policy etc. 2. External Transactions: Those transactions, which are external to the organization and are related with the external sources, are regarded as External Transaction. For example sales, purchase etc. TPS Properties: 1. Consistency: The transaction is a correct transformation of the state. This means that the transaction is a correct program. 2. Isolation: Even though transactions execute concurrently, it appears to the outside observer as if they execute in some serial order. Isolation is required to guarantee consistent input, which is needed for a consistent program to provide consistent output. 3. Reliability: TPS system is designed to ensure that all transactions are entered in sequential and systematic manner. 4. Standardization: Transactions must be processed in the same way each time to maximize efficiency and effectiveness. 5. Controlled Access: Since TPS also contains confidential matters or data; it acts as powerful tool for the organization. Hence access must be restricted. Objectives (Goals) of TPS 1. Process data generated by and about transactions. 2. Maintain a high degree of accuracy.

3. Ensure data and information integrity and accuracy. 4. Produce timely documents and reports. 5. Increase labor efficiency. 6. Help provide increased and enhanced service. 7. Help build and maintain customer loyalty. 8. Achieve competitive advantage. Major Characteristics of TPS 1. TPS handles data which shows the results of various activities on historical basis i.e., activities which have already happened. 2. It is relevant to all functional areas inside organization i.e. (production, marketing, finance and human resources) because each area has some kind of transaction. 3. TPS helps to assess the organizational performance. 4. The sources of data are mostly internal, and the output is intended mainly for an internal audience. 5. The TPS processes information on a regular basis: daily, weekly, monthly, annually etc. 6. It provides high processing speed to handle the high volume of data. 7. Input and output data are structured (i.e., standardized). 8. A high level of accuracy, data integrity, and security is needed which is provided by TPS. Transaction Processing Activities 1. Data collection: Capturing data necessary for the transaction.

2. Data editing: Check validity and completeness of data. 3. Data correction: Correct the wrong data. 4. Data manipulation: Calculate, summarize, Process data. 5. Data storage: Update transactions (on Databases). 6. Document production and reports: Create end result reports.

Office Automation Systems


Office automation systems (OAS) are configurations of networked computer hardware and software. A variety of office automation systems are now applied to business and communication functions that used to be performed manually or in multiple locations of a company, such as preparing written communications and strategic planning. In addition, functions that once required coordinating the expertise of outside specialists in typesetting, printing, or electronic recording can now be integrated into the everyday work of an organization, saving both time and money. Types of functions integrated by office automation systems include (1) electronic publishing; (2) electronic communication; (3) electronic collaboration; (4) image processing; and (5) office management. At the heart of these systems is often a local area network (LAN). The LAN allows users to transmit data, voice, mail, and images across the network to any destination, whether that destination is in the local office on the LAN, or in another country or continent, through a connecting network. An OAS makes office work more efficient and increases productivity. Electronic Publishing Electronic publishing systems include word processing and desktop publishing. Word processing software, (e.g., Microsoft Word, Corel Word-Perfect) allows users to create, edit, revise, store, and print documents such as letters, memos, reports, and manuscripts. Desktop publishing software (e.g., Adobe Pagemaker, Corel VENTURA, Microsoft Publisher) enables users to integrate text, images, photographs, and graphics to produce high-quality printable output. Desktop publishing software is used on a microcomputer with a mouse, scanner, and printer to create professional-looking publications. These may be newsletters, brochures, magazines, or books.

Electronic Communication Electronic communication systems include electronic mail (e-mail), voice mail, facsimile (fax), and desktop videoconferencing. Electronic Mail. E-mail is software that allows users, via their computer keyboards, to create, send, and receive messages and files to or from anywhere in the world. Most e-mail systems let the user do other sophisticated tasks such as filter, prioritize, or file messages; forward copies of messages to other users; create and save drafts of messages; send "carbon copies"; and request automatic confirmation of the delivery of a message. E-mail is very popular because it is easy to use, offers fast delivery, and is inexpensive. Examples of e-mail software are Eudora, Lotus Notes, and Microsoft Outlook. Voice Mail. Voice mail is a sophisticated telephone answering machine. It digitizes incoming voice messages and stores them on disk. When the recipient is ready to listen, the message is converted from its digitized version back to audio, or sound. Recipients may save messages for future use, delete them, or forward them to other people. Facsimile. A facsimile or facsimile transmission machine (FAX) scans a document containing both text and graphics and sends it as electronic signals over ordinary telephone lines to a receiving fax machine. This receiving fax recreates the image on paper. A fax can also scan and send a document to a fax modem (circuit board) inside a remote computer. The fax can then be displayed on the computer screen and stored or printed out by the computer's printer. Desktop Videoconferencing Desktop videoconferencing is one of the fastest growing forms of videoconferencing. Desktop videoconferencing requires a network and a desktop computer with special application software (e.g., CUSeeMe) as well as a small camera installed on top of the monitor. Images of a computer user from the desktop computer are captured and sent across the network to the other computers and users that are participating in the conference. This type of videoconferencing simulates face-to-face meetings of individuals.

Electronic Collaboration Electronic collaboration is made possible through electronic meeting and collaborative work systems and teleconferencing. Electronic meeting and collaborative work systems allow teams of coworkers to use networks of microcomputers to share information, update schedules and plans, and cooperate on projects regardless of geographic distance. Special software called groupware is needed to allow two or more people to edit or otherwise work on the same files simultaneously. Teleconferencing is also known as videoconferencing. As was mentioned in the discussion of desktop videoconferencing earlier, this technology allows people in multiple locations to interact and work collaboratively using real-time sound and images. Full teleconferencing, as compared to the desktop version, requires specialpurpose meeting rooms with cameras, video display monitors, and audio microphones and speakers. Telecommuting and Collaborative Systems. Telecommuters perform some or all of their work at home instead of traveling to an office each day,usually with the aid of office automation systems, including those that allow collaborative work or meetings. A microcomputer, a modem, software that allows the sending and receiving of work, and an ordinary telephone line are the tools that make this possible.

High-tech meeting rooms help companies make more effective presentations. At some conference halls, like this one at the Chinzan-so Four Seasons Hotel in Toyko, small video screens are built into the table tops. Telecommuting is gaining in popularity in part due to the continuing increase in population, which creates traffic congestion, promotes high energy consumption, and causes more air pollution. Telecommuting can help reduce these problems. Telecommuting can also take advantage of the skills of homebound people with physical limitations.

Studies have found that telecommuting programs can boost employee morale and productivity among those who work from home. It is necessary to maintain a collaborative work environment, however, through the use of technology and general employee management practices, so that neither on-site employees nor telecommuters find their productivity is compromised by such arrangements. The technologies used in electronic communication and teleconferencing can be useful in maintaining a successful telecommuting program. Image Processing Image processing systems include electronic document management, presentation graphics, and multimedia systems. Imaging systems convert text, drawings, and photographs into digital form that can be stored in a computer system. This digital form can be manipulated, stored, printed, or sent via a modem to another computer. Imaging systems may use scanners, digital cameras, video capture cards, or advanced graphic computers. Companies use imaging systems for a variety of documents such as insurance forms, medical records, dental records, and mortgage applications. Presentation graphics software uses graphics and data from other software tools to create and display presentations. The graphics include charts, bullet lists, text, sound, photos, animation, and video clips. Examples of such software are Microsoft Power Point, Lotus Freelance Graphics, and SPC Harvard Graphics. Multimedia systems are technologies that integrate two or more types of media such as text, graphic, sound, voice, full-motion video, or animation into a computer-based application. Multimedia is used for electronic books and newspapers, video conferencing, imaging, presentations, and web sites. Office Management Office management systems include electronic office accessories, electronic scheduling, and task management. These systems provide an electronic means of organizing people, projects, and data. Business dates, appointments, notes, and client contact information can be created, edited, stored, and retrieved. Additionally, automatic reminders about crucial dates and appointments can be programmed. Projects and tasks can be allocated, subdivided, and planned. All of these actions can either be done individually or for an entire group. Computerized systems that automate these office functions can dramatically increase productivity and improve communication within an organization. Office automation refers to the application of computer and communication technology to office functions. Office automation systems are meant to improve the productivity of managers at various levels of management by providing secretarial assistance and better communication facilities.

Office automation system is the combination of hardware, software and people in information system that process office transactions and support office activities at all levels of the organization. These systems include a wide range of support facilities, which include word processing, electronic filing, electronic mail, message switching, data storage, data and voice communications, etc. Office activities may be grouped under two classes, namely. 1) Activities performed by clerical personnel (clerks, secretaries, typists, etc) and 2) Activites performed by the executives (manages, engineers or other professionals like economists, researchers etc.) In the first category, the following is a list of activities a) b) c) d) e) Typing Mailing Scheduling of meetings and conferences, Calendar keeping and Retrieving documents The following is a list of activities in the second category (managerial category) a) b) c) Conferencing Production of information (messages, memos, reports, etc) and Controlling performance

As already discussed, information technology facilitates both types of activities. A wide variety of office automation devices like fax machine, copier, phones etc. are used in officer. Some of the applications of office automation systems are discussed, in brief. WORD PROCESSING This refers to the computer assisted preparation of documents (like letters, reports, memos, etc) from textual data. Data once entered can be manipulated in various ways. ELECTRONIC FILING This facilitates the filling of incoming and outgoing mail/documents on a magnetic media. Information is captured from the documents and is stored for future reference.

ELECTRONIC MAIL It involves the transfer of letters and other documents through telecommunication lines, rather than through physical delivery. An electronic mail system requires a telecommunication network and software.

Decision Support System


Decision Support System refers to a class of systems which support in the process of decision making and does not always give a decision it self. Decision Support Systems (DSS) are a specific class of computerized information system that supports business and organizational decision making activities. A properly designed DSS is an interactive software based system intended to help decision makers compile useful information from raw data, documents, personal knowledge, and/or business models to identify and solve problems and make decisions DSS is an application of Hebert Simon model, as discussed, the model has three phases : i) Intelligence ii) Design iii) Choice The DSS basically helps in the information system in the intelligence phase where the objective is to identify the problem and then go to the design phase for solution. The choice of selection criteria varies from problem to problem. It is therefore, required to go through these phases again and again till satisfactory solution is found. In the following three phase cycle, you may use inquiry, analysis, and models and accounting system to come to rational solution. These systems are helpful where the decision maker calls for complex manipulation of data and use of several methods to reach an acceptable solution using different analysis approach. The decision support system helps in making a decision and also in performance analysis. DSS can be built around the rule in case of programmable decision situation. The rules are not fixed or predetermined and requires every time the user to go through the decision making cycle as indicated in Herbert Simon model. Attributes : i) DSS should be adaptable and flexible. ii) DSS should be interactive and provide ease of use. iii) Effectiveness balanced with efficiency (benefit must exceed cost). iv) Complete control by decision-makers.

v) Ease of development by (modification to suit needs and changing environment) end users. vi) Support modeling and analysis. vii) Data access. viii) Standalone, integration and Web-based DSS Characteristics : i) Support for decision makers in semi structured and unstructured problems. ii) Support managers at all levels. iii) Support individuals and groups. iv) Support for interdependent or sequential decisions. v) Support intelligence, design, choice, and implementation. vi) Support variety of decision processes and styles DECISION SUPPORT SYSTEMS Management Information Systems deal with the handling of information required by managers of an organization and its techniques can be applied to Library and Information Systems also. Decision support systems are a type of MIS and this plays an important role in the decision making process of libraries. It has many application areas in a Library and Information system, which is highlighted in this paper. This paper focuses the importance of planning in library and information centres, how decision support systems are applied in the planning process. It also discusses an automated decision support system for library planning. The word information takes a variety of meanings depending on the context in which it is used. In our formulation we treat information to be data of value in decision making. To define a decision support system number of researchers have defined it in a number of ways. According to Arun Sen 7, there is a lot of disagreement and co nfusion in defining DSS. Some say that it is a tool for decision support at all planning levels, while others worry about its use in solving only semi-structured and un-structured problems. All, however agree that a DSS is to be used as a support tool and is never going to replace the managers. Sprague et. al. and Keen and Scott Morton suggest that a DSS should be used in the highest level of the management hierarchy - strategic planning. According to Antony, at this level, one decides upon the objectives of the organization, the resources to be used to attain these objectives and the policies that govern the acquisition use and disposition of these resources. Thus, in an organizational structure, strategic planning is the process of formulating long range plans and policies for the organization management control. The second level deals with a process that assures the managers that resources are obtained

and used efficiently in the accomplishment of the organizational objectives and at the lowest level, operations control assures that tasks are carried out efficiently. Decision Support Systems is a component of Management Information Systems. They are computer-based information systems that provide interactive information support to managers during the decision making process. an interactive computer- based modeling process to support the making of semistructured and unstructured decisions by individual managers. a program, which allows input of specific information in some acceptable parameters in the system and the program provides a decision based on this information. Therefore they are designed to be ad hoc, quick response systems that are initiated and controlled by managerial end users. Decision Support Systems are thus able to directly support the specific type of decisions and the personal decision-making styles ane needs of individual managers. 3.1 Decision Support System in Libraries Library and Information centres are service providing organizations where the end product is a service related product. Thus the concept of Information management and Decision support Systems are relevant and are required to be applied to Library and Information Systems. A fundamental basis for any DSS is that managers need reliable timely and processed information to help them in decision making and in this context, Library managers are no exception. Thus MIS techniques when applied in LIS, provides the required information to the library managers for decision making. In the past few centuries we have seen applications of computers in libraries multiply and they are essential in the dat-today activities and all other activities of a library. Computers are practically used in libraries to perform functions like circulation, acquisition, online catalogs, serials control etc. and services like reference services, current awareness services and most other services are designed based on computer applications. They are also used for management applications like accounting budgeting, scheduling/planning and statistical/reporting. MIS provides information required by top managers, but the usefulness of such information is based in the decision situations. Decision support Systems (DSS) supports top management decisions by allowing modeling of the organization. That is, to develop a decision support system, as a management tool for library and information managers, it is necessary to consider the context in which it will be employed. This involves understanding of the decision situations common to library managers and the information which is used to make decisions and decision processes. We know that information is required at each level of the management structure for effective management and decision making. Similarly in the context of the organizational structure of a Library and Information Centre, decisions taken at three different organizational levels can be distinguished. Each of these levels of management activity is sufficiently different in kind to require distinctive planning and control systems. Strategic Planning decisions can be defined as those concerned

with deciding the objectives of a library and information centre, the resources used to attain the objectives and policies governing acquisition, use and disposition of those resources. In a library-specific context, for example.11 1. negotiating interlibrary agreements 2. adopting technology, like automation. 3. expanding facilities Tactical management control can be seen as the process that assures the managers that resources are obtained and used efficiently in the accomplishment of the Library and Information centres objective. For example in libraries these may include: 1. allocating funds among subject areas 2. identifying staff development needs 3. determining hours of library service 4. developing weeding policy 5. purchasing equipment and services 6. setting standards for operations Operational control decisions assure that specific tasks are performed in an effective and efficient manner. Such decisions are programmable and systems can be designed to support them. 1. monitoring daily operations and activities with respect to standards 2. corrective actions 3. scheduling 4. response to complaints 5. decisions made in performing cataloguing, shelving, acquisitions, weeding, circulation, reference etc. EXPERT SYSTEM An Introduction to Expert Systems An expert system is a knowledge-based information system; that is, it uses its knowledge about a specific area to act as an expert consultant to users. The components of an expert system are a knowledge base and software modules that perform inferences on the knowledge and offer answers to a users questions. Expert systems provide answers to questions in a very specific problem area by making human like inferences about knowledge contained in a specialized knowledge base. Expert systems can provide decision support to end users in the form of advice from an expert consultant in a specific problem area. Expert systems are being used in many different fields, including medicine, engineering, the physical sciences, and business. For example, expert systems now help diagnose illnesses, search for minerals, analyze compounds, recommend repairs, and do financial planning. Expert systems can support either operations or management activities. Expert Systems Structure The components of an expert system include a knowledge base and software modules that perform inferences on the knowledge in the knowledge base and communicate answers to a users questions.

The knowledge base of an expert system contains Facts about a specific area, Heuristics (thumbs of rule) that express the reasoning procedures of an expert on the subject. There are many ways that knowledge is represented in expert systems: Case-based reasoning: Representing knowledge in an expert systems knowledge base in the form of cases. Frame-based knowledge: Knowledge represented in the form of a hierarchy or network of frames. A frame is a collection of knowledge about an entity consisting of a complex package of data values describing its attributes. Object-based knowledge: Knowledge represented as a network of objects. An object is a data element that includes both data and the methods or processes that act on those data. Rule-based knowledge: Knowledge represented in the form of rules and statements of fact. Rules are statements that typically take the form of a premise and a conclusion such as: IF (condition), Then (conclusion). Software resources: An expert system software package contains an inference engine and other programs for refining knowledge and communicating with users. The inference engine program processes the knowledge (such as rules and facts) related to a specific problem. It then makes associations and inferences resulting in recommended courses of action for a user. User interface programs for communicating with end-users are also needed, including an explanation program to explain the reasoning process to a user if requested. Differences between DSS and ES It is possible to integrate ES with DSS. There may be some components which may look similar in DSS and ES. But one should understand the differences between them. It then becomes clear as to how integration of ES with DSS can be realized. A DSS helps manager to take a decision whereas an ES acts as a decision maker or an advisor to the manager. A DSS is meant only for decision making whereas an ES provides expertise to the manager. The spectrum of complexity is high in DSS and low in ES since ES addresses issues related to specific areas only. DSS does not capability to reason whereas an ES has. A DSS cannot provide detailed explanation about the results whereas an ES can. Hence by integrating the two it is possible the blend their advantages and derive the best out of the two. Expert Systems Business Applications Expert systems help diagnose illness, search minerals, analyze compounds, recommend repairs, and do financial planning. So from a strategic business point, expert systems can and are being used to improve every step of the product cycle of a business, from finding customers to shipping products to providing customer service. ES provides a cost reduced solution, consistent advice with low level of errors, solution to handle equipments without the interference of human. It provides a high degree of

reliability and faster response time. It helps to solve complex problem with in a small domain. Executive Support Systems Senior managers use executive support systems (ESS) to help them make decisions. ESS serves the strategic level of the organization. They address nonroutine decisions requiring judgment, evaluation, and insight because there is no agreed-on procedure for arriving at a solution. Executive support systems (ESS's) are designed to incorporate data about external events, but they also draw summarized information from internal MIS and DSS. They filter, compress, and track critical data, displaying the data of greatest importance to senior managers. ESS employs the most advanced graphics software and can present graphs and data from many sources. Often the information is delivered to senior executives through a portal, which uses a Web interface to present integrated personalized business content from a variety of sources. Unlike the other types of information systems, ESS is not designed primarily to solve specific problems. Instead, ESS provides a generalized computing and communications capacity that can be applied to a changing array of problems. Although many DSS are designed to be highly analytical, ESS tends to make less use of analytical models. ESS assist in answering include the following: In what business should we be? What are the competitors doing? What new acquisitions would protect us from cyclical business swings? Which units should we sell to raise cash for acquisitions? Figure (30) illustrates a model of an ESS. It consists of workstations with menus, interactive graphics, and communications capabilities that can be used to access historical and competitive data from internal corporate systems and external databases such as Dow Jones News/Retrieval or Standard & Poors. Because ESS are designed to be used by senior managers who often have little, if any, direct contact or experience with computer-based information systems, they incorporate easy-to-use graphic interfaces. This system pools data from diverse internal and external sources and makes them available to executives in an easy-to-use form.

Interrelationships among systems DATA MINING CONCEPTS Data mining is the process of discovering actionable information from large sets of data. Data mining uses mathematical analysis to derive patterns and trends that exist in data. Typically, these patterns cannot be discovered by traditional data exploration because the relationships are too complex or because there is too much data. These patterns and trends can be collected and defined as a data mining model. Mining models can be applied to specific scenarios, such as: Forecasting: Estimating sales, predicting server loads or server downtime Risk and probability: Choosing the best customers for targeted mailings, determining the probable break-even point for risk scenarios, assigning probabilities to diagnoses or other outcomes Recommendations: Determining which products are likely to be sold together, generating recommendations Finding sequences: Analyzing customer selections in a shopping cart, predicting next likely events

Grouping: Separating customers or events into cluster of related items, analyzing and predicting affinities Building a mining model is part of a larger process that includes everything from asking questions about the data and creating a model to answer those questions, to deploying the model into a working environment. This process can be defined by using the following six basic steps: 1. Defining the Problem 2. Preparing Data 3. Exploring Data 4. Building Models 5. Exploring and Validating Models 6. Deploying and Updating Models The following diagram describes the relationships between each step in the process, and the technologies in Microsoft SQL Server that you can use to complete each step. Key steps in data mining process The process illustrated in the diagram is cyclical, meaning that creating a data mining model is a dynamic and iterative process. After you explore the data, you may find that the data is insufficient to create the appropriate mining models, and that you therefore have to look for more data. Alternatively, you may build several models and then realize that the models do not adequately answer the problem you defined, and that you therefore must redefine the problem. You may have to update the models after they have been deployed because more data has become available. Each step in the process might need to be repeated many times in order to create a good model. Microsoft SQL Server Data Mining provides an integrated environment for creating and working with data mining models. This environment includes SQL Server Development Studio, which contains data mining algorithms and query tools that make it easy to build a comprehensive solution for a variety of projects, and SQL Server Management Studio, which contains tools for browsing models and managing data mining objects. For more information, see Creating Multidimensional Models Using SQL Server Data Tools (SSDT). For an example of how the SQL Server tools can be applied to a business scenario, see the Basic Data Mining Tutorial. Defining the Problem The first step in the data mining process, as highlighted in the following diagram, is to clearly define the problem, and consider ways that data can be utilized to provide an answer to the problem. Data mining first step: defining the problem This step includes analyzing business requirements, defining the scope of the problem, defining the metrics by which the model will be evaluated, and defining specific objectives for the data mining project. These tasks translate into questions such as the following: What are you looking for? What types of relationships are you trying to find?

Does the problem you are trying to solve reflect the policies or processes of the business? Do you want to make predictions from the data mining model, or just look for interesting patterns and associations? Which outcome or attribute do you want to try to predict? What kind of data do you have and what kind of information is in each column? If there are multiple tables, how are the tables related? Do you need to perform any cleansing, aggregation, or processing to make the data usable? How is the data distributed? Is the data seasonal? Does the data accurately represent the processes of the business? To answer these questions, you might have to conduct a data availability study, to investigate the needs of the business users with regard to the available data. If the data does not support the needs of the users, you might have to redefine the project. You also need to consider the ways in which the results of the model can be incorporated in key performance indicators (KPI) that are used to measure business progress. Preparing Data The second step in the data mining process, as highlighted in the following diagram, is to consolidate and clean the data that was identified in the Defining the Problem step. Data mining second step: preparing data Data can be scattered across a company and stored in different formats, or may contain inconsistencies such as incorrect or missing entries. For example, the data might show that a customer bought a product before the product was offered on the market, or that the customer shops regularly at a store located 2,000 miles from her home. Data cleaning is not just about removing bad data or interpolating missing values, but about finding hidden correlations in the data, identifying sources of data that are the most accurate, and determining which columns are the most appropriate for use in analysis. For example, should you use the shipping date or the order date? Is the best sales influencer the quantity, total price, or a discounted price? Incomplete data, wrong data, and inputs that appear separate but in fact are strongly correlated all can influence the results of the model in ways you do not expect. Therefore, before you start to build mining models, you should identify these problems and determine how you will fix them. For data mining typically you are working with a very large dataset and cannot examine every transaction for data quality; therefore, you might need to use some form of data profiling and automated data cleansing and filtering tools, such as those supplied in Integration Services, Microsoft SQL Server 2012 Master Data Services, or SQL Server Data Quality Services to explore the data and find the inconsistencies. For more information, see these resources: Integration Services in SQL Server Data Tools Master Data Services Overview Data Quality Services

It is important to note that the data you use for data mining does not need to be stored in an Online Analytical Processing (OLAP) cube, or even in a relational database, although you can use both of these as data sources. You can conduct data mining using any source of data that has been defined as an Analysis Services data source. These can include text files, Excel workbooks, or data from other external providers. For more information, see Supported Data Source Types (SSAS Multidimensional). Exploring Data The third step in the data mining process, as highlighted in the following diagram, is to explore the prepared data. Data mining third step: exploring data You must understand the data in order to make appropriate decisions when you create the mining models. Exploration techniques include calculating the minimum and maximum values, calculating mean and standard deviations, and looking at the distribution of the data. For example, you might determine by reviewing the maximum, minimum, and mean values that the data is not representative of your customers or business processes, and that you therefore must obtain more balanced data or review the assumptions that are the basis for your expectations. Standard deviations and other distribution values can provide useful information about the stability and accuracy of the results. A large standard deviation can indicate that adding more data might help you improve the model. Data that strongly deviates from a standard distribution might be skewed, or might represent an accurate picture of a real-life problem, but make it difficult to fit a model to the data. By exploring the data in light of your own understanding of the business problem, you can decide if the dataset contains flawed data, and then you can devise a strategy for fixing the problems or gain a deeper understanding of the behaviors that are typical of your business. Note that when you create a model, Analysis Services automatically creates statistical summaries of the data contained in the model, which you can query to use in reports or further analysis. For more information, see Data Mining Queries. Building Models The fourth step in the data mining process, as highlighted in the following diagram, is to build the mining model or models. You will use the knowledge that you gained in the Exploring Data step to help define and create the models. Data mining fourth step: building mining models You define the columns of data that you want to use by creating a mining structure. The mining structure is linked to the source of data, but does not actually contain any data until you process it. When you process the mining structure, Analysis Services generates aggregates and other statistical information that can be used for analysis. This information can be used by any mining model that is based on the structure. For more information about how mining structures are related to mining models, see Logical Architecture (Analysis Services - Data Mining). You can also use parameters to adjust each algorithm, and you can apply filters to the training data to use just a subset of the data, creating different results. After you

pass data through the model, the mining model object contains summaries and patterns that can be queried or used for prediction. You can define a new model by using the Data Mining Wizard in SQL Server Data Tools, or by using the Data Mining Extensions (DMX) language. For more information about how to use the Data Mining Wizard, see Data Mining Wizard (Analysis Services - Data Mining). For more information about how to use DMX, see Data Mining Extensions (DMX) Reference. It is important to remember that whenever the data changes, you must update both the mining structure and the mining model. When you update a mining structure by reprocessing it, Analysis Services retrieves data from the source, including any new data if the source is dynamically updated, and repopulates the mining structure. If you have models that are based on the structure, you can choose to update the models that are based on the structure, which means they are retrained on the new data, or you can leave the models as is. For more information, see Processing Requirements and Considerations (Data Mining). Exploring and Validating Models The fifth step in the data mining process, as highlighted in the following diagram, is to explore the mining models that you have built and test their effectiveness. Data mining fifth step: validating mining models Before you deploy a model into a production environment, you will want to test how well the model performs. Also, when you build a model, you typically create multiple models with different configurations and test all models to see which yields the best results for your problem and your data. Analysis Services provides tools that help you separate your data into training and testing datasets so that you can accurately assess the performance of all models on the same data. You use the training dataset to build the model, and the testing dataset to test the accuracy of the model by creating prediction queries. In SQL Server 2012 Analysis Services (SSAS), this partitioning can be done automatically while building the mining model. For more information, see Testing and Validation (Data Mining). If none of the models that you created in the Building Models step perform well, you might have to return to a previous step in the process and redefine the problem or reinvestigate the data in the original dataset. Deploying and Updating Models The last step in the data mining process, as highlighted in the following diagram, is to deploy the models that performed the best to a production environment. Data mining sixth step: deploying mining models After the mining models exist in a production environment, you can perform many tasks, depending on your needs. The following are some of the tasks you can perform: Use the models to create predictions, which you can then use to make business decisions. SQL Server provides the DMX language that you can use to create prediction queries, and Prediction Query Builder to help you build the queries.

You might also like