You are on page 1of 94

A PROJECT REPORT ON

TRUTH DISCOVERY WITH MULTIPLE CONFLICTING INFORMATION PROVIDERS ON THE WEB

SUBMITTED TO JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY, ANANTAPUR.


in partial fulfilment of the requirements for the award of the degree of

BACHELOR OF TECHNOLOGY IN COMPUTER SCIENCE ANDTECHNOLOGY SUBMITTED BY M.SIVA SWETHA (07BG1A0531) Y.VAMSI KRISHNA (07BG1A0565) M.ANUSHA (07BG1A0535) A.KRISHNA CHAITANYA (07BG1A0501) L.RAGHU KISHORE REDDY (07BG1A0527) UNDER THE VALUABLE GUIDANCE OF Sri N.Govardan Reddy M.Tech., Asst.Professor

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERRING

SRI VENKATESWARA INSTITUTE OF SCIENCE AND TECHNOLOGY


(AFFILIATED TO JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY, ANANTAPUR)

KADAPA-516004 2007-2011

ACKNOWLEDGEMENT We express my special thanks to Sri N.Govardan Reddy M.Tech., Assistant Professor, for his valuable guidance and supervision and constructive suggestions to complete this project. We express my sincere thanks to Sri Er.Rajoli Veera Reddy, chairman, Sri Venkateswara Institute of Science and Technology, Kadapa, for his inspiring all the way and for arranging all the facilities and resources needed to completion of the course. We thankfully acknowledge to Smt K. Rama Devi, Director, Sri Venkateswara Institute of Science and Technology, Kadapa, for her valuable suggestions and advices throughout the course. We thankfully acknowledge to Dr.R.Veera Sudarsana Reddy, Principal, Sri Venkateswara Institute of Science and Technology, Kadapa, for his valuable suggestions and advices throughout the course. We wish to express my special thanks to Sri V. Sridhar Reddy, CEO, Sri Venkateswara Institute of Science and Technology, Kadapa, for his providing the necessary facilities throughout the course. We thankfully acknowledge to Sri K.V.Prasada Reddy, Head of the Department of CSE and IT, Sri Venkateswara Institute of Science and Technology, Kadapa, for his valuable suggestions and advices throughout the course. We thankfully acknowledge to Sri M.Purushotham Reddy, Sri Venkateswara Institute of Science and Technology, Kadapa, for his valuable suggestions and advices throughout the project We express my heartfelt thanks to my parents and my family members, who gave moral support in completion of the course We express my heartfelt thanks and gratitude to all professors, lab coordinators, non teaching staff and who have help me understanding, encouragement and support made this effort worth while and possible.

Project Associates
M.SIVA SWETHA Y.VAMSI KRISHNA M.ANUSHA A.KRISHNA CHAITANYA L.RAGHU KISHORE REDDY

DECLARATION
I here by declare that this project report entitled Truth Discovery with multiple conflicting information Providers on the Web is the work done by M.Siva swetha(07BG1A0531), A.Krishnna Y.VamsiKrishna chaitanya (07BG1A0565), L.Raghu M.Anusha Kishore

(07BG1A0535),

(07BG1A0531),

Reddy(07BG1A0527)towards the partial fulfilment of the requirement for the award of the degree of B.Tech in DEAPARTMENT and submitted to JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY, ANANTAPUR, is the result of the work carried out under the guidance of Sri N.Govardan Reddy M.Tech., Sri

Venkateswara Institute of Science and Technology, Kadapa,

I further declare that this project report has not been previously submitted before either in part or full for the award of any degree or any diploma by any organization or any university

N.Govardan Reddy M.Tech.,

SRI VENKATESWARA INSTITUTE OF SCIENCE & TECHNOLOGY


(AFFILIATED TO JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY, ANANTAPUR)

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

CERTIFICATE
This is to certify that the project entitledTruth Discovery with multiple conflicting information providers n the web is submitted by M.Siva

swetha(07BG1A0531),Y.VamsiKrishna(07BG1A0565),M.Anusha(07BG1A0535),A. Krishnna chaitanya(07BG1A0531),L.Raghu Kishore Reddy(07BG1A0527) in partial fulfillment of the requirements for the award of Bachelor of Technology in COMPUTER SCIENCE AND ENGINEERING, Sri Venkateswara Institute of Science and Technology, Kadapa,

Internal Project Guide

Head of the department

External examiner

CONTENTS
SNO TITTLE ABSTRACT NOTATIONS ABBREVATIONS LIST OF DIAGRAMS/LIST OF FIGURES LIST OF TABLES 1. COMPANY PROFILE 2. INTRODUCTION 2.1 Problem Definition 2.2 Objective of Project 2.3 Existing System 2.4 Disadvantages Of Existing System 2.5 Proposed System 2.6 Advantages Of Proposed System 3. LITERATURE SURVEY 4. SYSTEM REQUIREMENTS SPECICATION 4.1. Hardware Requirements 4.2 Software Requirements 5. SYSTEM ANALYSIS 5.1 Introduction 5.2 Feasibility Study 5.2.1. Economical Feasibility Study 5.2.2 Technical Feasibility Study 5.2.3 Operational Feasibility Study 6. SYSTEM DESIGN 6.1 Introduction 6.2 CLD Diagram 6.3 DFD/UML/ER Diagram 6.4 Tables/DD 7. LANGUAGE SPECIFICATION 8. IMPLIMENTATION 8.1 Screens Design/Forms Design 8.2 Source Code 8.3 Output Screens /Reports 9. TESTING AND VALIDATION 9.1 Introduction 9.2 Test Cases 10. CONCLUSION 10.1 Scope for Future Enhancement 11. REFERENCES PG.NO. i ii iii iv v 1 5 7 7 8 8 8 8 9 11 11 11 12 12 14 14 15 16 17 17 20 20 29 32 43 43 44 53 68 68 70 72 72 73

ABSTRACT

The world-wide web has become the most important information source for most of us. Unfortunately, there is no guarantee for the correctness of information on the web. Moreover, different web sites often provide conflicting in-formation on a subject, such as different specifications for the same product. In this paper we propose a new problem called Veracity that is conformity to truth, which studies how to find true facts from a large amount of conflicting information on many subjects that is provided by various web sites. We design a general framework for the Veracity problem, and invent an algorithm called Truth Finder, which utilizes the relationships between web sites and their information, i.e., a web site is trustworthy if it provides many pieces of true information, and a piece of information is likely to be true if it is provided by many trustworthy web sites. Our experiments show that Truth Finder successfully finds true facts among conflicting information, and identifies trustworthy web sites better than the popular search engines.

NOTATIONS

ii

ABBREVIATIONS

API

Application Programming Interface

UML Unified Modeling Language GUI Graphical User Interface

SDLC Software Development Life Cycle J2EE Java Enterprise Edition VGA Video Graphic Adapter

iii

LIST OF FIGURES S.No Fig.No 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1.1 Figure Name Page No. 2 19 19 20 21 21 22 22 23 24 25 26 27 28 33 43

Academic project Training plan

6.1a Module diagram 6.1b System Architecture 6.2 Context level diagram

6.3a DFD 6.3b first level DFD 6.3c first level DFD 6.3d first level DFD 6.3e Use case diagram 6.3f Class diagram 6.3g Sequence diagram 6.3h Collaboration diagram 6.3i 6.3j 7.1 8.1 State chart diagram Activity diagram Description of compiler and interpreter Forms Design

iv

LIST OF TABLES S.no 1 2 3 4 5 Table.no 6.4.1 6.4.2 6.4.3 6.4.4 6.4.5 Table name Table dbo.factdb Table dbo.Maindb Table dbo.PRA Table dbo.support Table dbo.user table Page.No 30 30 30 31 31

CHAPTER 1 COMPANY PROFILE

Truth Discovery with Multiple conflicting information providers on the Web

Company Profile

1.COMPANY PROFILE OVERVIEW1


The Premier and Prestigious Training Division of THE DAWN TECHNO SOLUTIONS has been involved in providing quality training to students, fresh graduates, employees, professionals sponsored by various corporate houses. The focus is on providing quality IT education comparable to standards set by educational institutions anywhere in the world. The Dawn team consists of energetic, dynamic and laborious professionals, each one trained in their own specific field. They will make the coming out professionals aware of the high-end technologies and involve them in exclusive project training. The Dawn also provides behavioral, Spoken English and Soft skills training to help in all-rounded, high-paced career growth. We try to build a long-term relationship with all our Students who get trained with us by ensuring timely delivery of quality continuously improving processes.
software training

and services using

MISSION

'Our mission is To be a Global Institute aimed at imparting industry specific, academically structured, rigorous education, focusing on building competency for enabling, measuring, and sustaining excellence in software profession, for corporate and individuals in it sector'.

CSE

Sri Venkateswara Institute of Science and Technology

Truth Discovery with Multiple conflicting information providers on the Web

Company Profile

VISION

'To become the primary resource centre for providing IT training for students and corporates'.

ACADEMIC PROJECT TRAINING PLAN

VIVA conducted by 2 Hods & 1 real timer

3rd Year Academic Project Training

4th year project implementation & Practice seminars

3rd Year Project Implementation & Practice seminars

4th Year Project hands on, Real Time Project Training certification

Fig 1.1 Academic project training plan

CSE

Sri Venkateswara Institute of Science and Technology

Truth Discovery with Multiple conflicting information providers on the Web

Company Profile

CORPORATE TRAINING

The Dawn offers you corporate training solutions and services that not only help you prepare your workforce for unparalleled growth, but also accelerate your organization's competency. The Dawn Corporate Training programs help organizations identify, invent, customize, and implement technology training solutions for the modern corporate environment, thus allowing organizations to work with a single point of contact for all their staff training needs. Our technical strengths, through an array of training programs offered therein, enable us to provide our corporate clients with the most comprehensive and cost effective training solutions. Well equipped to perform a thorough and accurate Training Needs Analysis, our Corporate Training Solutions division suggests and conducts training programs that are most appropriate and deliver optimum value to organizations. The Dawn has expanded the scope of services to meet the growing demand for new skill sets in a rapidly changing business environment. TECHNOLOGIES The growing need for industry ready personnel and for continuous learning in industry has led The Dawn Techno Solutions to establish a training academy in Hyderabad. The Dawn Techno Solutions is focusing on the following Courses, which demand the highest levels of teaching and training excellence. The Courses include: C/C++ Microsoft Office tools JAVA/J2EE .Net Technologies Oracle (SQL, PL/SQL) Testing Tools Data Warehousing

CSE

Sri Venkateswara Institute of Science and Technology

Truth Discovery with Multiple conflicting information providers on the Web

Company Profile

Oracle Applications IBM Mainframes Spoken English

The Dawn Techno Solutions has a clearly articulated training strategy that includes

Limited persons per batch for individual attention 24 hrs lab with internet facility well experienced corporate trainers as faculties On-the-job training Resume preparation & Placement Assistance

PLACEMENTS

The Dawn Techno Solutions also offer placement services and help its students get the right jobs in right time. Our Institute is actively liaisoning with industries of repute and keeps the students informed of various job opportunities and also provides guidance to the students to prepare for the interviews. Our efforts are to ensure that the brightest candidates be picked up by the top notch companies. Placement is arranged on successful course completion. These are done both at entry level as well as at Senior Positions depending on the prior background of the individuals.

CSE

Sri Venkateswara Institute of Science and Technology

CHAPTER 2 INTRODUCTION

Truth Discovery with Multiple conflicting information providers on the Web

Introduction

2.INTRODUCTION

The World Wide Web has become a necessary part of our lives and might have become the most important information source for most people. Everyday, people retrieve all kinds of information from the Web. For example, when shopping online, people find product specifications from websites like Amazon.com or ShopZilla.com. When looking for interesting DVDs, they get information and read movie reviews on websites such as NetFlix.com or IMDB.com. When they want to know the answer to a certain question, they go to Ask.com or Google.com. Is the World Wide Web always trustable? Unfortunately, the answer is no. There is no guarantee for the correctness of information on the Web. Even worse, different websites often provide conflicting information, as shown in the following examples. Example (Height of Mount Everest). Suppose a user is interested in how high Mount Everest is and queries Ask.com with What is the height of Mount Everest? Among the top 20 results, 1 he or she will find the following facts: four websites (including Ask.com itself) say 29,035 feet, five websites say 29,028 feet, one says29, 002 feet, and another one says 29,017 feet. Which answer should the user trust? Example (Authors of books). We tried to find out who wrote the book Rapid Contextual Design (ISBN: 0123540518). We found many different sets of authors from different online bookstores, and we show several of them in Table 1. From the image of the book cover, we found that A1 Books provides the most accurate information. In comparison, the information from Powells books is incomplete, and that from Lakeside books is incorrect. The trustworthiness problem of the Web has been realized by todays Internet users. According to a survey on the credibility of websites conducted by Princeton Survey Research in 2005, 54 percent of Internet users trust news websites at least most of time, while this ratio is only 26 percent for websites that offer products for sale and is merely 12 percent for blogs. There have been many studies on ranking web pages according to authority (or popularity) based on hyperlinks. The most influential studies are Authority-Hub analysis, and Page Rank, which lead to Google.com. However, does authority lead to accuracy of information? The answer is unfortunately no. Top-ranked websites are usually the most popular ones. However, popularity does
CSE Sri Venkateswara Institute of Science and Technology 5

Truth Discovery with Multiple conflicting information providers on the Web

Introduction

not mean accuracy. For example, according to our experiments, the bookstores ranked on top by Google (Barnes & Noble and Powells books) contain many errors on book author information. In comparison, some small bookstores (e.g., A1 Books) provide more accurate information. In this project, we propose a new problem called the Veracity problem, which is formulated as follows: Given a large amount of conflicting information about many objects, which is provided by multiple websites (or other types of information providers), how can we discover the true fact about each object? We use the word fact to represent something that is claimed as a fact by some website, and such a fact can be either true or false. In this paper, we only study the facts that are either properties of objects (e.g., weights of laptop computers) or relationships between two objects (e.g., authors of books). We also require that the facts can be parsed from the web pages. There are often conflicting facts on the Web, such as different sets of authors for a book. There are also many websites, some of which are more trustworthy than others.2 A fact is likely to be true if it is provided by trustworthy websites (especially if by many of them). A website is trustworthy if most facts it provides are true. Because of this interdependency between facts and websites, we choose an iterative computational method. At each iteration, the probabilities of facts being true and the trustworthiness of websites is inferred from each other. This iterative procedure is rather different from Authority-Hub analysis. Thus, we cannot compute the trustworthiness of a website by adding up the weights of its facts as in, nor can we compute the probability of a fact being true by adding up the trustworthiness of websites providing it. Instead, we have to resort to probabilistic computation. Second and more importantly, different facts influence each other. For example, if a website says that a book is written by Jessamyn Wendell and another says Jessamyn Burns Wendell, then these two websites actually support each other although they provide slightly different facts. We incorporate such influences between facts into our computational model.

In summary, we make three major distributions in this paper. First, we formulate the Veracity problem about how to discover true facts from conflicting information. Second, we propose a framework to solve this problem, by defining the trustworthiness of websites, confidence of facts, and influences between facts. Finally, we propose an algorithm called TRUTHFINDER for identifying true facts using
CSE Sri Venkateswara Institute of Science and Technology 6

Truth Discovery with Multiple conflicting information providers on the Web

Introduction

iterative methods. Our experiments show that TRUTHFINDER achieves very high accuracy in discovering true facts, and it can select better trustworthy websites than authority-based search engines such as Google.

2.1 PROBLEM DEFINITION:


The world-wide web has become the most important information source for most of us. Unfortunately, there is no guarantee for the correctness of information on the web. Moreover, different web sites often provide conflicting information on a subject, such as different specifications for the same product. In this paper we propose a new problem called Veracity that is conformity to truth, which studies how to find true facts from a large amount of conflicting information on many subjects that is provided by various web sites. We design a general framework for the Veracity problem, and invent an algorithm called Truth Finder, which utilizes the relationships between web sites and their information, i.e., a web site is trustworthy if it provides many pieces of true information, and a piece of information is likely to be true if it is provided by many trustworthy web sites. Our experiments show that Truth Finder successfully finds true facts among conflicting information, and identifies trustworthy web sites better than the popular search engines. 2.2 OBJECTIVE OF PROJECT

The world-wide web has become the most important information source for most of us. Unfortunately, there is no guarantee for the corectness of information on the web. Moreover, different web sites often provide conflicting in-formation on a subject, such as different specifications for the same product. In this project we propose a new problem called Veracity. We design a general framework for the Veracity problem, and invent an algorithm called Truth Finder.

2.3 EXISTING SYSTEM:


Page Rank and Authority-Hub analysis is to utilize the hyperlinks to find pages with high authorities.These two approaches identifying important web pages that users are interested in, Unfortunately, the popularity of web pages does not lead to accuracy of information

CSE

Sri Venkateswara Institute of Science and Technology

Truth Discovery with Multiple conflicting information providers on the Web

Introduction

2.4 DISADVANTAGES OF EXISTING SYSTEM:


The popularity of web pages does not necessarily lead to accuracy of information.Even the most popular website may contain many errors. Where as some comparatively not-so-popular websites may provide more accurate information
.

2.5 PROPOSED SYSTEM:


We formulate the Veracity problem about how to discover true facts from conflicting information. Second, we propose a framework to solve this problem, by defining the trustworthiness of websites, confidence of facts, and influences between facts. Finally, we propose an algorithm called TRUTHFINDER for identifying true facts using iterative methods.

2.6 ADVANTAGES OF PROPOSED SYSTEM:


Our experiments show that TRUTHFINDER achieves very high accuracy in discovering true facts. It can select better trustworthy websites than authority-based search engines such as Google.

CSE

Sri Venkateswara Institute of Science and Technology

CHAPTER 3 LITERATURE SURVEY

Truth Discovery with Multiple conflicting information providers on the Web

LITERATURE SUVERY

3.LITERATURE SURVEY

DATA QUALITY Data quality is the quality of data. Data are of high quality "if they are fit for their intended uses in operations, decision making and planning" (J. M. Juran). Alternatively, the data are deemed of high quality if they correctly represent the real-world construct to which they refer. These two views can often be in disagreement, even about the same set of data used for the same purpose. Before the rise of the inexpensive server, massive mainframe computers were used to maintain name and address data so that the mail could be properly routed to its destination. The mainframes used business rules to correct common misspellings and typographical errors in name and address data, as well as to track customers who had moved, died, gone to prison, married, divorced, or experienced other life-changing events. Government agencies began to make postal data available to a few service companies to crossreference customer data with the National Change of Address registry (NCOA). This technology saved large companies millions of dollars compared to manually correcting customer data. Large companies saved on postage, as bills and direct marketing materials made their way to the intended customer more accurately. Initially sold as a service, data quality moved inside the walls of corporations, as low-cost and powerful server technology became available. Companies with an emphasis on marketing often focus their quality efforts on name and address information, but data quality is recognized as an important property of all types of data. Principles of data quality can be applied to supply chain data, transactional data, and nearly every other category of data found in the enterprise. For example, making supply chain data conform to a

CSE

Sri Venkateswara Institute of Science and Technology

Truth Discovery with Multiple conflicting information providers on the Web

LITERATURE SUVERY

certain standard has value to an organization by: 1) avoiding overstocking of similar but slightly different stock; 2) improving the understanding of vendor purchases to negotiate volume discounts; and 3) avoiding logistics costs in stocking and shipping parts across a large organization. While name and address data has a clear standard as defined by local postal authorities, other types of data have few recognized standards. There is a movement in the industry today to standardize certain non-address data. The non-profit group GS1 is among the groups spearheading this movement. For companies with significant research efforts, data quality can include developing protocols for research methods, reducing measurement error, bounds checking of the data, cross tabulation, modeling and outlier detection, verifying data integrity, etc.

CSE

Sri Venkateswara Institute of Science and Technology

10

CHAPTER 4 SYSTEM REQUIRMENTS SPECIFICATION

Truth Discovery with Multiple conflicting information providers on the Web

SYSTEM REQUIREMENTS

4. SYSTEM REQUIREMENTS SPECIFICATION

4.1 HARDWARE REQUIREMENTS


PROCESSOR RAM HARD DISK : PENTIUM IV 2.6 GHz : 512 MB DD RAM : 20 GB

4.2 SOFTWARE REQUIREMENTS


FRONT END : Java

OPERATING SYSTEM : Windows XP WEB SERVER BACK END : Apache Tomcat : Sql Server 2005

CSE

Sri Venkateswara Institute of Science and Technology

11

CHAPTER 5 SYSTEM ANALYSIS

Truth Discovery with Multiple conflicting information providers on the Web

SYSTEM ANALYSIS

5. SYSTEM ANALYSIS

5.1 INTRODUCTION
REQURIEMENTS ANALYSIS The requirement phase basically consists of three activities: Requirement Analysis Requirement Specification Requirement Validation

REQUIREMENT ANALYSIS:

Requirement Analysis is a software engineering task that bridges the gap between system level software allocation and software design. It provides the system engineer to specify software function and performance, indicate softwares interface with the other system elements and establish constraints that software must meet. The basic aim of this stage is to obtain a clear picture of the needs and requirements of the end-user and also the organization. Analysis involves interaction between the clients and the analysis. Usually analysts research a problem by asking questions and reading existing documents. The analysts have to uncover the real needs of the user even if they dont know them clearly. During analysis it is essential that a complete and consistent set of specifications emerge for the system. Here it is essential to resolve the contradictions that could emerge from information got from various parties. This is essential to ensure that the final specifications are consistent. It may be divided into 5 areas of effort.
CSE

Problem recognition Evaluation and synthesis Modeling Specification Review


Sri Venkateswara Institute of Science and Technology 12

Truth Discovery with Multiple conflicting information providers on the Web

SYSTEM ANALYSIS

Each Requirement analysis method has a unique point of view. However all analysis methods are related by a set of operational principles. They are The information domain of the problem must be represented and understood. The functions that the software is to perform must be defined. The behavior of the software as a consequence of external events must be defined. The models that depict information, function and behavior must be partitioned in a hierarchical or layered fashion. The analysis process must move from essential information to Implementation detail

REQUIREMENT ANALYSIS IN THIS PROJECT

The main aim in this stage is to assess what kind of a system would be suitable for a problem and how to build it. The requirements of this system can be defined by going through the existing system and its problems. They discussing (speak) about the new system to be built and their expectations from it. The steps involved would be PROBLEM RECOGNITION:

The main problem is here while taking the appointments for the Doctors. If we want to verify the old data or historical data it is very difficult to find out. Maintain the data related to all departments is very difficult.

CSE

Sri Venkateswara Institute of Science and Technology

13

Truth Discovery with Multiple conflicting information providers on the Web

SYSTEM ANALYSIS

EVALUATION AND SYNTHESIS:

In the proposed system this application saves the lot of time, and it is time saving process when we use this application. Using this application we can easy to manage daily treatments and easy to maintain the historical data. No specific training is required for the employees to use this application. They can easily use the tool that decreases manual hours spending for normal things and hence increases the performance.

5.2 FEASIBILITY STUDY:


All projects are feasible given unlimited resources and infinite time. But the development of software is plagued by the scarcity of resources and difficult delivery rates. It is both necessary and prudent to evaluate the feasibility of a project at the earliest possible time. Three key considerations are involved in the feasibility analysis

5.2.1 ECONOMIC FEASIBILITY:

This procedure is to determine the benefits and savings that are expected from a candidate system and compare them with costs. If benefits outweigh costs, then the decision is made to design and implement the system. Otherwise, further justification or alterations in proposed system will have to be made if it is to have a chance of being approved. This is an ongoing effort that improves in accuracy at each phase of the system life cycle.

CSE

Sri Venkateswara Institute of Science and Technology

14

Truth Discovery with Multiple conflicting information providers on the Web

SYSTEM ANALYSIS

FINANCIAL FEASIBILITY

I) TIME BASED: Contrast to the manual system management can generate any report

just by single click. In manual system it is too difficult to maintain historical data which become easier in this system. Time consumed to add new records or to view the reports is very less compared to manual system. So this project is feasible in this point of view II) COST BASED: No special investment need to manage the tool. No specific training is required for employees to use the tool. Investment requires only once at the time of installation. The software used in this project is freeware so the cost of developing the tool is minimal and hence the overall cost.

5.2.2 TECHNICAL FEASIBILITY:


Technical feasibility centers on the existing computer system (hardware, software, etc.,) and to what extent it can support the proposed addition. If the budget is a serious constraint, then the project is judged not feasible.In this project the system is self-explanatory and does not need any extra sophisticated training. As the system has been built by concentrating on the Graphical User Interface Concepts, the application can also be handled very easily with a novice User. The overall time that is required to train the users upon the system is less than half an hour. The System has been added with features of menu-driven and button interaction methods, which makes the user the master as he starts working through the environment. The net time the customer should concentrate is on the installation time.

CSE

Sri Venkateswara Institute of Science and Technology

15

Truth Discovery with Multiple conflicting information providers on the Web

SYSTEM ANALYSIS

5.2.3 OPERATIONAL FEASIBILITY:

People are inherently resistant to change, and computers have been known to facilitate change. It is understandable that the introduction of a candidate system requires special effort to educate, sell, and train the staff on new ways of conducting business.

CSE

Sri Venkateswara Institute of Science and Technology

16

CHAPTER 6 SYSTEM DESIGN

Truth Discovery with Multiple conflicting information providers on the Web

SYSTEM DESIGN

6.SYSTEM DESIGN

6.1 INTRODUCTION:

The most creative and challenging phase of the life cycle is system design. The term design describes a final system and the process by which it is developed. It refers to the technical specifications that will be applied in implementations the candidate system. The design may be defined as the process of applying various techniques and principles for the purpose of defining a device, a process or a system in sufficient details to permit its physical realization. The designers goal is how the output is to be produced and in what format samples of the output and input are also presented. Second input data and database files have to be designed to meet the requirements of the proposed output. The processing phases are handled through the program Construction and Testing. Finally, details related to justification of the system and an estimate of the impact of the candidate system on the user and the organization are documented and evaluated by management as a step toward implementation. The importance of software design can be stated in a single word Quality. Design provides us with representations of software that can be assessed for quality. Design is the only way that we can accurately translate a customers requirements into a finished software product or system without design we risk building an unstable system, that might fail it small changes are made or may be difficult to test, or one whos quality cant be tested. So it is an essential phase in the development of a software product.

CSE

Sri Venkateswara Institute of Science and Technology

17

Truth Discovery with Multiple conflicting information providers on the Web

SYSTEM DESIGN

MODULE DESCRIPTION
MODULES Collection of unrelated data Data search Truth Finder algorithm Result calculation

MODULE DESCRIPTION COLLECTION OF DATA First we have to collect the specific data about an object and it is stored in related database. Create table for specific object and store the facts about a

particular object. DATA SEARCH Searching the related data link according to user input. In this module user retrieve the specific data about an object. TRUTH ALGORITHM We design a general framework for the Veracity problem, and invent an algorithm called Truth Finder, which utilizes the relationships between web sites and their information, i.e., a web site is trustworthy if it provides many pieces of true information, and a piece of information is likely to be true if it is provided by many trustworthy web sites. RESULT CALCULATION For each response of the query we are calculating the Performance. Using the count calculated find the best link and show as the out put. IN MODULE GIVEN INPUT AND EXPECTED OUTPUT COLLECTION OF DATASET: Given input : Result set (collection of data)

Expected output: separating and grouping relevant data about a particular object
CSE Sri Venkateswara Institute of Science and Technology 18

Truth Discovery with Multiple conflicting information providers on the Web

SYSTEM DESIGN

Module diagram:

Result set

Specific data Fig 6.1a Module diagram

TRUTH FINDER ALGORITHM Each object has a set of conflictive facts E.g., different author names for a book And each web site provides some facts How to find the true fact for each object

SYSTEM ARCHITECTURE:

Truth Discovery with multiple conflicting information providers on the web

Home

Login Validation

Login

Query Process

Search Engine

Conflicting Web Pages

Truth Finder

Truth Finder Webpages

Output

Fig 6.1b System Architecture

CSE

Sri Venkateswara Institute of Science and Technology

19

Truth Discovery with Multiple conflicting information providers on the Web

SYSTEM DESIGN

6.2 CONTEXT LEVEL DIAGRAM:

0.0 User TruthFinder System User

fig 6.2 Context Diagram

6.3 DATA FLOW DIAGRAMS AND UML DIAGRAMS


DFDs is used model system components.DFD shows how the information moves through the system and hw it is modified by a series of transformation.It is a graphical technique that represents data flow and those transformations

CSE

Sri Venkateswara Institute of Science and Technology

20

Truth Discovery with Multiple conflicting information providers on the Web

SYSTEM DESIGN

Web sites

Facts

Objects

W1 W2 W3 W4
FIRST LEVEL DFD

f1 f2 f3 f4
Fig 6.3a DFD

o1

o2

Store Info on DB

Validate User Enter uname password Valid user Fact info

User

Login

correct site

Fig 6.3b First level DFD

CSE

Sri Venkateswara Institute of Science and Technology

21

Truth Discovery with Multiple conflicting information providers on the Web

SYSTEM DESIGN

Search For information Query For a site 1.0 Truth Finder System

Gives Many Sites Query For a site

Fig 6.3c First level DFD

DB Info

Many sites Displays 1.1 Truth Finder System Selects Best Site

Best Site

Best Site

Fig 6.3d First level DFD

UML DIAGRAMS: INTRODUCTION:

UML is a notation that resulted from the unification of Object Modeling Technique and Object Oriented Software Technology .UML has been designed for broad range of application. Hence, it provides constructs for a broad range of systems and activities.

CSE

Sri Venkateswara Institute of Science and Technology

22

Truth Discovery with Multiple conflicting information providers on the Web

SYSTEM DESIGN

AN OVERVIEW OF UML WITH FIVE DIAGRAMS

1. USE CASE DIAGRAMS

Use cases are used during requirements elicitation and analysis to represent the functionality of the system. Use cases focus on the behavior of the system from the external point of view. The actor are Outside the boundary of the system, whereas the use cases are inside the boundary of the system.

truth discovery
(from Logical View)

home

user search

truthfinder display the details

Fig 6.3e Use case diagram

CSE

Sri Venkateswara Institute of Science and Technology

23

Truth Discovery with Multiple conflicting information providers on the Web

SYSTEM DESIGN

2. CLASS DIAGRAMS

Class diagrams to describe the structure of the system. Classes are abstraction that specify the common structure and behavior of a set Class diagrams describe the system in terms of objects, classes, attributes, operations and their associations.

Fig 6.3f Class Diagram

CSE

Sri Venkateswara Institute of Science and Technology

24

Truth Discovery with Multiple conflicting information providers on the Web

SYSTEM DESIGN

3. SEQUENCE DIAGRAMS
Sequence diagrams are used to formalize the behavior of the system and to visualize the communication among objects. They are useful for identifying additional objects that participate in the use cases. A Sequence diagram represents the interaction that take place among these objects.

user

home page

Search

Truth Finder

Output

enters login details enter uid invalid invalid valid Query conflicting information true facts

true facts

Fig 6.3g Sequence diagram

CSE

Sri Venkateswara Institute of Science and Technology

25

Truth Discovery with Multiple conflicting information providers on the Web

SYSTEM DESIGN

4.COLLABORATION DIAGRAM: A collaboration diagram emphasisies the organization of objects that participate in an interaction

Fig 6.3h Collaboration diagram

CSE

Sri Venkateswara Institute of Science and Technology

26

Truth Discovery with Multiple conflicting information providers on the Web

SYSTEM DESIGN

5. STATECHART DIAGRAMS

State chart diagrams describe the behavior of an individual object as a number of states and transitions between these states. A state represents a particular set of values for an object. The sequence diagram focuses on the messages exchanged between objects, the state chart diagrams focuses on the transition between states.

Home

Login

Query Process

Database

Conflicting Information

Truthfinder

Result

Fig 6.3i State chart diagram

CSE

Sri Venkateswara Institute of Science and Technology

27

Truth Discovery with Multiple conflicting information providers on the Web

SYSTEM DESIGN

6. ACTIVITY DIAGRAMS
An activity diagram describes a system in terms of activities. Activities are states that represents the execution of a set of operations. Activity diagrams are similar to flowchart diagram and data flow.

Fig 6.3j Activity diagram

CSE

Sri Venkateswara Institute of Science and Technology

28

Truth Discovery with Multiple conflicting information providers on the Web

SYSTEM DESIGN

6.4 TABLES IN MS SQL SERVER:

CSE

Sri Venkateswara Institute of Science and Technology

29

Truth Discovery with Multiple conflicting information providers on the Web

SYSTEM DESIGN

Table 6.4.1 dbo.factdb

Table 6.4.2 dbo.maindb

Table 6.4.3 dbo.PRA

CSE

Sri Venkateswara Institute of Science and Technology

30

Truth Discovery with Multiple conflicting information providers on the Web

SYSTEM DESIGN

Table 6.4.4 dbo.support

Table 6.4.5 dbo.usertable

CSE

Sri Venkateswara Institute of Science and Technology

31

CHAPTER 7 LANGUAGE SPECIFICATION

Truth Discovery with Multiple conflicting information providers on the Web

LANG SPEC.

7.LANGUAGE SPECIFICATION TECHNOLOGIES FEATURES:

JAVA Java has two things: a programming language and a platform.Java is a highlevel programming language that is all of the following: Simple Architecture-neutral Object-oriented Portable Secure Distributed
Interpreted Robust

Java is also unusual in that each Java program is both compiled and interpreted. With a compile you translate a Java program into an intermediate language called Java byte codes the platform-independent code instruction is passed and run on the computer. Compilation happens just once; interpretation occurs each time the program is executed. The figure illustrates how this works.

Java Program

Interpreter

Compilers

My Program

CSE

Sri Venkateswara Institute of Science and Technology

32

Truth Discovery with Multiple conflicting information providers on the Web

LANG SPEC.

fig 7.1 des of compiler and interpreter You can think of Java byte codes as the machine code instructions for the Java Virtual Machine (Java VM). Every Java interpreter, whether its a Java development tool or a Web browser that can run Java applets, is an implementation of the Java VM. The Java VM can also be implemented in hardware. Java byte codes help make write once, run anywhere possible. You can compile your Java program into byte codes on my platform that has a Java compiler. The byte codes can then be run any implementation of the Java VM. For example, the same Java program can run Windows NT, Solaris, and Macintosh.

JAVA PLATFORM The Java platform has two components: The Java Virtual Machine (Java VM) The Java Application Programming Interface (Java API) Youve already been introduced to the Java VM. Its the base for the Java platform and is ported onto various hardware-based platforms. The Java API is a large collection of ready-made software components that provide many useful capabilities, such as graphical user interface (GUI) widgets. The Java API is grouped into libraries (package) of related components. The next sections, what can Java do? Highlights each area of functionally provided by the package in the Java API. How does the Java API support all of these kinds of programs? With packages of software components that provide a wide range of functionality. The API is the API included in every full implementation of the platform.The core API gives you the following features:The Essentials: Objects, Strings, threads, numbers, input and output, data structures, system properties, date and time, and so on. Applets: The set of conventions used by Java applets.
CSE Sri Venkateswara Institute of Science and Technology 33

Truth Discovery with Multiple conflicting information providers on the Web

LANG SPEC.

Networking: URLs TCP and UDP sockets and IP addresses. Internationalization: Help for writing programs that can be localized for users. Worldwide programs can automatically adapt to specific locates and be displayed in the appropriate language. Java Program Java API Java Virtual Machine Java Program Hard Ware API and Virtual Machine insulates the Java program from hardware dependencies. As a platform-independent environment, Java can be a bit slower than native code. However, smart compilers, well-tuned interpreters, and Just-in-time-byte-code compilers can bring Javas performance close to the native code without threatening portability. However, Java is not just for writing cut, entertaining applets for the World Wide Web (WWW). Java is a general purpose, high-level programming language and a powerful software platform. Using the fineries Java API, you can write many types of programs.

Internet addresses In order to use a service, you must be able to find it. The Internet uses an address scheme for machines so that they can be located. The address is a 32 bit integer which gives the IP address. This encodes a network ID and more addressing. The network ID falls into various classes according to the size of the network address.

CSE

Sri Venkateswara Institute of Science and Technology

34

Truth Discovery with Multiple conflicting information providers on the Web

LANG SPEC.

Network address Class A uses 8 bits for the network address with 24 bits left over for other addressing. Class B uses 16 bit network addressing. Class C uses 24 bit network addressing and class D uses all 32. Subnet address Internally, the UNIX network is divided into sub networks. Building 11 is currently on one sub network and uses 10-bit addressing, allowing 1024 different hosts. Host address 8 bits are finally used for host addresses within our subnet. This places a limit of 256 machines that can be on the subnet. Port addresses A service exists on a host, and is identified by its port. This is a 16 bit number. To send a message to a server, you send it to the port for that service of the host that it is running on. This is not location transparency! Certain of these ports are "well known". Sockets A socket is a data structure maintained by the system to handle network connections. A socket is created using the call socket. It returns an integer that is like a file descriptor. Server Socket A Server Socket listens for the Socket request and performs message handling functions, file sharing, database sharing functions etc. JDBC In an effort to set an independent database standard API for Java, Sun Microsystems developed Java Database Connectivity, or JDBC. JDBC offers a generic SQL database access mechanism that provides a consistent interface to a variety of RDBMS. This consistent interface is achieved through the use of plug-in
CSE Sri Venkateswara Institute of Science and Technology 35

Truth Discovery with Multiple conflicting information providers on the Web

LANG SPEC.

database connectivity modules, or drivers. If a database vendor wishes to have JDBC support, he or she must provide the driver for each platform that the database and Java run on. To gain a wider acceptance of JDBC, Sun based JDBCs framework on ODBC. As you discovered earlier in this chapter, ODBC has widespread support on a variety of platforms. Basing JDBC on ODBC will allow vendors to bring JDBC drivers to market much faster than developing a completely new connectivity solution.

JDBC Goals Few software packages are designed without goals in mind. JDBC is one that, because of its many goals, drove the development of the API. These goals, in conjunction with early reviewer feedback, have finalized the JDBC class library into a solid framework for building database applications in Java. The goals that were set for JDBC are important. They will give you some insight as to why certain classes and functionalities behave the way they do. The eight design goals for JDBC are as follows:

1. SQL Level API

The designers felt that their main goal was to define a SQL interface for Java. Although not the lowest database interface level possible, it is at a low enough level for higher-level tools and APIs to be created. Conversely, it is at a high enough level for application programmers to use it confidently. Attaining this goal allows for future tool vendors to generate JDBC code and to hide many of JDBCs complexities from the end user. 2. SQL Conformance

SQL syntax varies as you move from database vendor to database vendor. In an effort to support a wide variety of vendors, JDBC will allow any query statement to be passed through it to the underlying database driver. This allows the connectivity module to handle non-standard functionality in a manner that is suitable for its users. 3. JDBC must be implemental on top of common database interfaces

CSE

Sri Venkateswara Institute of Science and Technology

36

Truth Discovery with Multiple conflicting information providers on the Web

LANG SPEC.

The JDBC SQL API must sit on top of other common SQL level APIs. This goal allows JDBC to use existing ODBC level drivers by the use of a software interface. This interface would translate JDBC calls to ODBC and vice versa. 4. Provide a Java interface that is consistent with the rest of the Java system Because of Javas acceptance in the user community thus far, the designers feel that they should not stray from the current design of the core Java system. 5.Keep it simple This goal probably appears in all software design goal listings. JDBC is no exception. Sun felt that the design of JDBC should be very simple, allowing for only one method of completing a task per mechanism. Allowing duplicate functionality only serves to confuse the users of the API. About JSP:

JavaServer Pages (JSP) technology enables you to mix regular, static HTML with dynamically generated content from servlets. You simply write the regular HTML in the normal manner, using familiar Web-page-building tools. You then enclose the code for the dynamic parts in special tags, most of which start with <% and end with %>. For example, here is a section of a JSP page that results in Thanks for ordering Core Web Programming for a URL of http://host/OrderConfirmation. jsp?title=Core+Web+Programming: Thanks for ordering <I><%= request.getParameter("title") %></I> Separating the static HTML from the dynamic content provides a number of benefits over servlets alone, and the approach used in JavaServer Pages offers several advantages over competing technologies such as ASP, PHP, or ColdFusion. Section 1.4 (The Advantages of JSP) gives some details on these advantages, but they basically boil down to two facts: that JSP is widely supported and thus doesnt lock you into a particular operating system or Web server and that JSP gives you full access to servlet and Java technology for the dynamic part, rather than requiring you to use an unfamiliar and weaker special- purpose language. The process of making JavaServer Pages accessible on the Web is much simpler than that for servlets. Assuming you have a Web server that supports JSP, you give your file a .jsp extension and simply install it in any place you could put a normal Web page: no compiling, no packages,
CSE Sri Venkateswara Institute of Science and Technology 37

Truth Discovery with Multiple conflicting information providers on the Web

LANG SPEC.

and no user CLASSPATH settings. However, although your personal environment doesnt need any special settings, the server still has to be set up with access to the servlet and JSP class files and the Java compiler. For details, see your servers documentation or Section 1.5 (Installation and Setup).Although what you write often looks more like a regular HTML file than a servlet, behind the scenes, the JSP page is automatically converted to a normal servlet, with the static HTML simply being printed to the output stream associated with the servlets service method. This translation is normally done the first time the page is requested. To ensure that the first real user doesnt get a momentary delay when the JSP page is translated into a servlet and compiled, developers can simply request the page themselves after first installing it. Many Web servers also let you define aliases so that a URL that appears to reference an HTML file really points to a servlet or JSP page. Depending on how your server is set up, you can even look at the source code for servlets generated from your JSP pages. With Tomcat 3.0, you need to change the isWorkDirPersistent attribute from false to true in install_dir/server.xml. After that, the code can be found in install_dir/work/port-number. With the JSWDK 1.0.1, you need to change the workDirIsPersistent attribute from false to true in install_dir/webserver.xml. After that, the code can be found in install_dir/work/%3Aport-number%2F. With the Java Web Server, 2.0 the default setting is to save source code for automatically generated servlets. They can be found in install_dir/tmpdir/default/pagecompile/jsp/_JSP. One warning about the automatic translation process is in order. If you make an error in the dynamic portion of your JSP page, the system may not be able to properly translate it into a servlet. If your page has such a fatal translation-time error, the server will present an HTML error page describing the problem to the client. Internet Explorer 5, however, typically replaces server-generated error messages with a canned page that it considers friendlier. You will need to turn off this feature when debugging JSP pages. To do so with Internet Explorer 5, go to the Tools menu, select Internet Options, choose the Advanced tab, and make sure Show friendly HTTP error messages box is not checked.

Aside from the regular HTML, there are three main types of JSP constructs that you embed in a page: scripting elements, directives, and actions. Scripting elements let you specify Java code that will become part of the resultant servlet, directives let you control the overall structure of the servlet, and actions let you
CSE Sri Venkateswara Institute of Science and Technology 38

Truth Discovery with Multiple conflicting information providers on the Web

LANG SPEC.

specify existing components that should be used and otherwise control the behavior of the JSP engine. To simplify the scripting elements, you have access to a number of redefined variables, such as request in the code snippet just shown (see Section 10.5 for more details). Scripting elements are covered in this chapter, and directives and actions are explained in the following chapters. You can also refer to Appendix (Servlet and JSP Quick Reference) for a thumbnail guide summarizing JSP syntax. JSP changed dramatically from version 0.92 to version 1.0, and although these changes are very much for the better, you should note that newer JSP pages are almost totally incompatible with the early 0.92 JSP engines, and older JSP pages are equally incompatible with 1.0 JSP engines. The changes from version 1.0 to 1.1 are much less dramatic: the main additions in version 1.1 are the ability to portably define new tags and the use of the servlet 2.2 specification for the underlying servlets. JSP 1.1 pages that do not use custom tags or explicitly call 2.2-specific statements are compatible with JSP 1.0 engines, and JSP 1.0 pages are totally upward compatible with JSP 1.1 engines.

SQL SERVER 2005:

The next release of SQL Server is designed to help enterprises address these challenges. SQL Server 2005 is Microsofts next generation data management and analysis solution that will deliver increased security, scalability, and availability to enterprise data and analytical applications while making them easier to create, deploy, and manage. Building on the strengths of SQL Server 2000, SQL Server 2005

will provide an integrated data management and analysis solution that will help organizations of any size to:

Build and deploy enterprise applications that are more secure, scalable, and reliable. Maximize the productivity of IT by reducing the complexity of creating, deploying, and managing database applications.

CSE

Sri Venkateswara Institute of Science and Technology

39

Truth Discovery with Multiple conflicting information providers on the Web

LANG SPEC.

Empower developers through a rich, flexible, modern development environment for creating more secure database applications.

Share data across multiple platforms, applications, and devices to make it easier to connect internal and external systems.

Deliver robust, integrated business intelligence solutions that help drive informed business decisions and increase productivity across your entire organization.

Control costs without sacrificing performance, availability, or scalability. Read on to learn more about the advancements SQL Server 2005 will deliver in three key areas: enterprise data management, developer productivity, and business intelligence.

Enterprise Data Management In todays connected world, data and the systems that manage that data must always be available to your users. With SQL Server 2005, users and IT professionals across your organization will benefit from reduced application downtime, increased scalability and performance, and tight security controls. SQL Server 2005 will include enhancements to enterprise data management in the following areas:

Availability. Investments in high availability technologies, additional backup and restore capabilities, and replication enhancements will enable enterprises to build and deploy highly reliable applications.

Scalability. Scalability advancements such as partitioning, snapshot isolation, and 64-bit support will enable you to build and deploy your most demanding applications using SQL Server 2005.

Security. Enhancements such as secure by default settings and an enhanced security model will help provide a high level of security for your enterprise data.

Manageability. A new management tool suite, expanded self-tuning capabilities, and a powerful new programming model will increase the productivity of database administrators.

Interoperability. Through deep support for industry standards, Web services, and the Microsoft .NET Framework, SQL Server 2005 will support interoperability with multiple platforms, applications, and devices.

CSE

Sri Venkateswara Institute of Science and Technology

40

Truth Discovery with Multiple conflicting information providers on the Web

LANG SPEC.

Developer Productivity

One of the key barriers to developer productivity has been the lack of integrated tools for database development and debugging. SQL Server 2005 will provide advancements that fundamentally change the way that database applications are developed and deployed. Enhancements for developer productivity will include:

Improved tools. Developers will be able to utilize one development tool for Transact-SQL, XML, Multidimensional Expression (MDX), and XML for Analysis (XML/A).

Expanded language support. With the common language runtime (CLR) hosted in the database engine, developers will be able to choose from a variety of familiar languages to develop database applications, including Transact-SQL, Microsoft Visual Basic .NET, and Microsoft Visual C#.NET.

XML and Web services. SQL Server 2005 will support both relational and XML data natively, so enterprises can store, manage, and analyze data in the format that best suits their needs.

Support for existing and emerging open standards such as Hypertext Transfer Protocol (HTTP), XML, Simple Object Access Protocol (SOAP), XQuery, and XML Schema Definition (XSD) will also facilitate communication across extended enterprise systems. Business Intelligence The challenge and promise of business intelligence revolves around providing employees with the right information, at the right time. Accomplishing this vision demands a business intelligence solution that is comprehensive, secure, integrated with operational systems, and available all day, every day. SQL Server will help companies to achieve this goal with SQL Server 2005. Business intelligence advancements will include:

Integrated platform. SQL Server 2005 will deliver an end-to-end business intelligence platform with integrated analytics including online analytical processing (OLAP); data mining; extract, transformation, and load (ETL) tools; data warehousing; and reporting functionality.

CSE

Sri Venkateswara Institute of Science and Technology

41

Truth Discovery with Multiple conflicting information providers on the Web

LANG SPEC.

Improved decision making. Advancements to existing business intelligence features, such as OLAP and data mining, and the introduction of a new reporting server will provide enterprises with the ability to transform information into better business decisions at all organizational levels.

Security and availability. Scalability, availability, and security enhancements will help to provide users with uninterrupted access to business intelligence applications and reports.

Enterprise-wide analytical capabilities. An improved ETL tool will enable organizations to more easily integrate and analyze data from multiple heterogeneous information sources. By analyzing data across a wide array of operational systems, organizations may gain a competitive edge through a holistic understanding of their business.

Additional Information

SQL Server 2005 is part of the Windows Server System a comprehensive and integrated server infrastructure that simplifies the development, deployment and operations of a flexible business solution.

CSE

Sri Venkateswara Institute of Science and Technology

42

CHAPTER 8 IMPLEMENTATION

Truth Discovery with Multiple conflicting information providers on the Web

IMPLEMENTATION.

8. IMPLEMENTATION 8.1 FORMS DESIGN

The development stage takes as its primary input the design elements described in the approved design document. For each design element, a set of one or more software artifacts will be produced., Appropriate test cases will be developed for each set of functionally related software artifacts, and an online help system will be developed to guide users in their interactions with the software.

fig 8.1 forms design At this point, the RTM is in its final configuration. The outputs of the development stage include a fully functional set of software that satisfies the requirements and design elements previously documented, an online help system describes the test cases to be used to validate the correctness and completeness of the software, an updated RTM, and an updated project plan.

CSE

Sri Venkateswara Institute of Science and Technology

43

Truth Discovery with Multiple conflicting information providers on the Web

IMPLEMENTATION.

8.2 SOURCE CODE Source code for new user


<script language="javascript"> function number(field) { var input=field.value; var input=field.value; var len=input.length; var status=true; for(var i=0;i<len;i++) { var chars=input.substring(i,i+1); if(chars < "0" || chars > "9") { status=false; break; } } if(status==false) { alert("Enter the Numeric Input..."); document.form1.field.focus(); //return false;

function validate() {

CSE

Sri Venkateswara Institute of Science and Technology

44

Truth Discovery with Multiple conflicting information providers on the Web

IMPLEMENTATION.

var user=document.form1.T1.value; var pass=document.form1.T2.value; var cate=document.form1.T3.value; //alert("hai"+user+pass+cate) if(user.length<1) { alert(" Enter The Username...."); document.form1.T1.focus(); return false; } else if(pass.length<1) { alert(" Enter The Password...."); document.form1.T2.focus(); return false; } else if(cate=="Select Type") { alert(" Select The User Type...."); document.form1.T3.focus(); return false; } return true;

// --> </script>

CSE

Sri Venkateswara Institute of Science and Technology

45

Truth Discovery with Multiple conflicting information providers on the Web

IMPLEMENTATION.

Source code for index


</style> <![endif]--> <script type="text/javascript"> <!--

function newImage(arg) { if (document.images) { rslt = new Image(); rslt.src = arg; return rslt; } }

function changeImages() { if (document.images && (preloadFlag == true)) { for (var i=0; i<changeImages.arguments.length; i+=2) { document[changeImages.arguments[i]].src = changeImages.arguments[i+1]; } } }

var preloadFlag = false; function preloadImages() { if (document.images) { btn_home_over = newImage("images/btn_home-over.gif"); btn_aboutus_over = newImage("images/btn_aboutusover.gif"); btn_contactus_over = newImage("images/btn_contactusover.gif"); btn_products_over = newImage("images/btn_productsover.gif");
CSE Sri Venkateswara Institute of Science and Technology 46

Truth Discovery with Multiple conflicting information providers on the Web

IMPLEMENTATION.

btn_services_over = newImage("images/btn_servicesover.gif"); preloadFlag = true; } }

// --> </script>

Source code for search


<script type="text/javascript"> <!--

function newImage(arg) { if (document.images) { rslt = new Image(); rslt.src = arg; return rslt; } } function forward( ) { <% String text=request.getParameter("textfield"); String search1=request.getParameter("search"); request.setAttribute("text",text); request.setAttribute("search",search1);%> document.location="response.jsp"; } function forward1( )
CSE Sri Venkateswara Institute of Science and Technology 47

Truth Discovery with Multiple conflicting information providers on the Web

IMPLEMENTATION.

{ document.location="Bestone.jsp"; } function changeImages() { if (document.images && (preloadFlag == true)) { for (var i=0; i<changeImages.arguments.length; i+=2) { document[changeImages.arguments[i]].src = changeImages.arguments[i+1]; } } }

var preloadFlag = false; function preloadImages() { if (document.images) { btn_home_over = newImage("images/btn_home-over.gif"); btn_aboutus_over = newImage("images/btn_aboutusover.gif"); btn_contactus_over = newImage("images/btn_contactusover.gif"); btn_products_over = newImage("images/btn_productsover.gif"); btn_services_over = newImage("images/btn_servicesover.gif"); preloadFlag = true; } }

Source code for page Ranking


<%@ page import="java.io.*"%> <%@ page import="java.sql.*"%> <%@ page import ="java.lang.*"%> <%

CSE

Sri Venkateswara Institute of Science and Technology

48

Truth Discovery with Multiple conflicting information providers on the Web

IMPLEMENTATION.

Class.forName("sun.jdbc.odbc.JdbcOdbcDriver"); Connection con=DriverManager.getConnection("jdbc:odbc:truth","sa",""); Statement st=con.createStatement(); String searchword=(String)application.getAttribute("text"); String method=(String)application.getAttribute("method"); String user=(String)session.getAttribute("uname"); //out.println(searchword+"<br>"+method); ResultSet rs=st.executeQuery("select location from MainDb where filename='"+searchword+"'"); String location=""; while(rs.next()) { location=rs.getString(1); out.println(location); } String filename=""; String filenames[]=null; File path=new File(location); File files[]=path.listFiles(); if(files!=null) { for(int i=0;i<files.length;i++) { if(files.length==0) { out.println(""); } else { // System.out.println("compiling"); try { filename=files[i].toString();
CSE Sri Venkateswara Institute of Science and Technology 49

Truth Discovery with Multiple conflicting information providers on the Web

IMPLEMENTATION.

names=filename.replace('\\','&') //filenames=names.split("&"); //out.println(filenames.length+<br>);

//Thread.sleep(1000); out.println(filenames[filenames.length-1]+"<br>"); out.println(filename+"<br>"); } catch (Exception ee) { ee.printStackTrace(); }

} } } else { //proxy1.jLabel3.setText("completed....."); } %> SOURCE CODE FOR TRUTH FINDER

<%@ page import="java.io.*"%> <%@ page import="java.sql.*"%> <%@ page import ="java.lang.*"%> <%! String resultfiles[]; int retrive; String factlength; %> <% String result=""; String valu=request.getParameter("value"); String searchword="";
CSE Sri Venkateswara Institute of Science and Technology 50

Truth Discovery with Multiple conflicting information providers on the Web

IMPLEMENTATION.

factlength=(String)application.getAttribute("fact"); int fac=Integer.parseInt(factlength); int retrive=fac-2; if(valu.equalsIgnoreCase("redirect")) { searchword=(String)application.getAttribute("text"); //out.println(searchword); } else { searchword=(String)application.getAttribute("text"); } //String resultfile=request.getParameter("result"); //resultfiles=resultfile.split("$"); Class.forName("sun.jdbc.odbc.JdbcOdbcDriver"); Connection con=DriverManager.getConnection("jdbc:odbc:truth","sa",""); Statement st=con.createStatement();

String method=(String)request.getAttribute("method"); String user=(String)session.getAttribute("uname"); ResultSet rs=st.executeQuery("select location from support where about='"+searchword+"' order by support desc"); String location=""; while(rs.next()) { result+=rs.getString(1)+"#"; } //out.println(result); /* ResultSet rs=st.executeQuery String filename=""; String filenames[]=null; File path=new File(location); File files[]=path.listFiles(); if(files!=null) {

for(int i=0;i<files.length;i++) {

if(files.length==0) { out.println(""); } else {


CSE Sri Venkateswara Institute of Science and Technology 51

Truth Discovery with Multiple conflicting information providers on the Web

IMPLEMENTATION.

//

System.out.println("compiling"); try { filename=files[i].toString(); //String names=filename.replace('\\','&'); //filenames=names.split("&"); //out.println(filenames.length+"<br>"); //Thread.sleep(1000); out.println(filenames[filenames.length-

// 1]+"<br>");

result+=filename+"#"; } catch (Exception ee) { ee.printStackTrace(); } } } } else { //proxy1.jLabel3.setText("completed....."); }*/ %>

CSE

Sri Venkateswara Institute of Science and Technology

52

Truth Discovery with Multiple conflicting information providers on the Web

IMPLEMENTATION.

8.3 OUTPUT SCREENS

CSE

Sri Venkateswara Institute of Science and Technology

53

Truth Discovery with Multiple conflicting information providers on the Web

IMPLEMENTATION.

CSE

Sri Venkateswara Institute of Science and Technology

54

Truth Discovery with Multiple conflicting information providers on the Web

IMPLEMENTATION.

CSE

Sri Venkateswara Institute of Science and Technology

55

Truth Discovery with Multiple conflicting information providers on the Web

IMPLEMENTATION.

CSE

Sri Venkateswara Institute of Science and Technology

56

Truth Discovery with Multiple conflicting information providers on the Web

IMPLEMENTATION.

CSE

Sri Venkateswara Institute of Science and Technology

57

Truth Discovery with Multiple conflicting information providers on the Web

IMPLEMENTATION.

CSE

Sri Venkateswara Institute of Science and Technology

58

Truth Discovery with Multiple conflicting information providers on the Web

IMPLEMENTATION.

CSE

Sri Venkateswara Institute of Science and Technology

59

Truth Discovery with Multiple conflicting information providers on the Web

IMPLEMENTATION.

CSE

Sri Venkateswara Institute of Science and Technology

60

Truth Discovery with Multiple conflicting information providers on the Web

IMPLEMENTATION.

CSE

Sri Venkateswara Institute of Science and Technology

61

Truth Discovery with Multiple conflicting information providers on the Web

IMPLEMENTATION.

CSE

Sri Venkateswara Institute of Science and Technology

62

Truth Discovery with Multiple conflicting information providers on the Web

IMPLEMENTATION.

CSE

Sri Venkateswara Institute of Science and Technology

63

Truth Discovery with Multiple conflicting information providers on the Web

IMPLEMENTATION.

CSE

Sri Venkateswara Institute of Science and Technology

64

Truth Discovery with Multiple conflicting information providers on the Web

IMPLEMENTATION.

CSE

Sri Venkateswara Institute of Science and Technology

65

Truth Discovery with Multiple conflicting information providers on the Web

IMPLEMENTATION.

CSE

Sri Venkateswara Institute of Science and Technology

66

Truth Discovery with Multiple conflicting information providers on the Web

IMPLEMENTATION.

CSE

Sri Venkateswara Institute of Science and Technology

67

CHAPTER 9 TESTING& VALIDATION

Truth Discovery with Multiple conflicting information providers on the Web

TESTING.

9.TESTING AND VALIDATION


9.1 INTRODUCTION Software Testing is a critical element of software quality assurance and represents the ultimate review of specification, design and coding, Testing presents an interesting anomaly for the software engineer The next step after coding is testing. Test case design on set of techniques for creation of test cases that meet overall testing objectives. The main testing objective is to execute program with the intent of finding an error. The testing can be performed in two ways: * * White box testing Black box testing

WHITE BOX TESTING : Knowing the internal working of the product, tests can be conducted to ensure that" all gears mesh", that is, that all components have been adequately experienced. Using White box testing methods, the software engineer can derive test cases that Guarantee that all independent parts in a module has been exercised at least once --Basic Path Testing. Exercise all logical decisions on their true and false sites Condition testing. Execute all loops at their boundaries and with their operation bounds Loop Testing.

BLACK BOX TESTING : Knowing the specified function that a product has been designed to perform, tests can be conducted that demonstrates each function is fully operational at the same time searching for errors in each function.
CSE Sri Venkateswara Institute of Science and Technology 68

Truth Discovery with Multiple conflicting information providers on the Web

TESTING.

Black Box Testing focuses on functional requirements of the software. Black Box Testing attempts to find the errors in the following categories: Incorrect or missing functions Interface Errors Errors in data structures or external database access Performance Errors Initialization and termination Errors

UNIT TESTING : Unit testing focuses verification effort on the smallest unit of software design - the module. This is white box oriented.In this project each and every module is tested in the following ways: The module interfaces are tested to ensure that information properly flows in and out of the program unit .The local data structure is examined to ensure data stored temporarily maintains its integrity during all steps in an algorithm execution. Boundary conditions are tested to ensure that the module operates properly at boundaries established to limit or restrict processing.

All

independent paths (basic path) through the control structure are

exercised to ensure that all statements in a module have been executed at least once. Error handling paths are tested. INTEGRATION TESTING Integration Testing is a symmetric technique for constructing the program structure while conducting tests to uncover errors associated with interfacing. In this project top-down integration was followed where the modules are integrated by moving downward through the control hierarchy beginning form the main matter.The next, depth first integration is followed that would integrate all modules on a major control path of the structure moving vertically through the structure.

TESTING OBJECTIVES INCLUDE: 1. Testing is a process of executing a program with the intent of finding an error
CSE Sri Venkateswara Institute of Science and Technology 69

Truth Discovery with Multiple conflicting information providers on the Web

TESTING.

2. A successful test is one that uncovers an undiscovered error 3. A good test case is one that has a probability of finding an as yet undiscovered error

TESTING PRINCIPLES:

All tests should be traceable to end user requirements Tests should be planned long before testing begins Testing should begin on a small scale and progress towards testing in large Exhaustive testing is not possible To be most effective testing should be conducted by a independent third party

TESTING STRATEGIES

A Strategy for software testing integrates software test cases into a series of well planned steps that result in the successful construction of software. Software testing is a broader topic for what is referred to as Verification and Validation. Verification refers to the set of activities that ensure that the software correctly implements a specific function Validation refers he set of activities that ensure that the software that has been built is traceable to customers requirements.

9.2 TEST CASES:


S.No. Test Type

Test case

Expected Output

Actual Output Search.jsp page will be displayed

Unit/ Operational/Functional

Login as a user

success

Unit/ Operational/Functional

3 CSE

Unit/

Login as a user with wrong login details Add a

unsucess

Error message will be displayed

It should add a new 70

Sri Venkateswara Institute of Science and Technology

Truth Discovery with Multiple conflicting information providers on the Web

TESTING.

Operational/Functional

New Entry for a user

success

record in the database with new user details

. Search Unit/Operational/Functional for web pages with success normal search Search for Unit/ web success Operational/Functional pages with Paged Ranking Search. Search for web Unit/ pages with success Operational/Functional Truthfinder

It displays all the web pages those contain given text

It displays all the web pages based on pages which are visited many times It displays all the web pages which are particularly related to the given text

.Unit/ Operational/Functional

. Search for web unsuccess pages when there is no page contain the given text,

It displays the ErrorPage.jsp

Table 9.2.1 Test cases

CSE

Sri Venkateswara Institute of Science and Technology

71

CHAPTER 10 CONCLUSION

Truth Discovery with Multiple conflicting information providers on the Web

CONCLUSION

10.CONCLUSION
In this project, we introduce and formulate the Veracity aims at resolving conflicting facts from problem, which

multiple websites and finding the true facts

among them. We propose TRUTHFINDER, an approach that utilizes the inter dependency between website trustworthiness and fact confidence to find trustable websites and true facts. Experiments show that TRUTHFINDER achieves high accuracy at finding true facts and at the same time identifies websites that provide more accurate information.

10.1 SCOPE FOR FUTURE ENHANCEMENT


In a real time this project shows all the best results for an every search object. An admin user can view the requirements of a user by a notification. The user can download and upload the data for requirements after his or her registration. The larger data can be loaded in database.

CSE

Sri Venkateswara Institute of Science and Technology

72

CHAPTER 11 REFERENCES

Truth Discovery with Multiple conflicting information providers on the Web

REFERENCES

11.REFERENCES
[1] B. Amento, L.G. Terveen, and W.C. Hill, Does Authority Mean Quality?

Predicting Expert Quality Ratings of Web Documents, Proc. ACM SIGIR 00, July 2000. [2] A. Borodin, G.O. Roberts, J.S. Rosenthal, and P. Tsaparas, Link Analysis Ranking: Algorithms, Theory, and Experiments, ACM Trans. Internet Technology, vol. 5, no. 1, pp. 231-297, 2005. [3] R. Guha, R. Kumar, P. Raghavan, and A. Tomkins, Propagation of Trust and Distrust, Proc. 13th Intl Conf. World Wide Web (WWW), 2004. [4] G. Jeh and J. Widom, SimRank: A Measure of Structural-Context Similarity, Proc. ACM SIGKDD 02, July 2002. [5]Logistical Equation from Wolfram MathWorld,

http://mathworld.wolfram.com/LogisticEquation.html, 2008. [6] T. Mandl, Implementation and Evaluation of a Quality-Based Search Engine, Proc. 17th ACM Conf. Hypertext and Hypermedia, Aug. 2006. [7] Princeton Survey Research Associates International, Leap of faith: Using the Internet Despite the Dangers, Results of a Natl Survey of Internet Users for Consumer Reports WebWatch, Oct. 2005. [8] Sigmoid Function from Wolfram MathWorld, http://mathworld.

wolfram.com/SigmoidFunction.html, 2008. REFERRED http://www java.sun.com http://www.java2s.com http://www.w3schools.com


http://www.microsoft.com/sql/2005/.

CSE

Sri Venkateswara Institute of Science and Technology

73

You might also like