Welcome to Scribd!

Abstract

Uploaded by

0% found this document useful (0 votes)

20 views1 page

In this paper, we present a novel method for XML duplicate detection. XMLDup uses a Bayesian network to determine the probability of two XML elements being duplicates. Experiments show that our algorithm is able to achieve high precision and recall scores.

Original Description:

Original Title

Abstract(4)

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Attribution Non-Commercial (BY-NC)

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

20 views1 page

Abstract

Uploaded by

Balamurugan Velumani

Copyright:

Attribution Non-Commercial (BY-NC)

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 1

Search inside document

EFFICIENT AND EFFECTIVE DUPLICATE DETECTION IN HIERARCHICAL DATA AbstractAlthough there is a long line of work on identifying duplicates in relational

data, only a few solutions focus on duplicate detection in more complex hierarchical structures, like XML data. In this paper, we present a novel method for XML duplicate detection, called XMLDup. XMLDup uses a Bayesian network to determine the probability of two XML elements being duplicates, considering not only the information within the elements, but also the way that information is structured. In addition, to improve the efficiency of the network evaluation, a novel pruning strategy, capable of significant gains over the unoptimized version of the algorithm, is presented. Through experiments, we show that our algorithm is able to achieve high precision and recall scores in several datasets. XMLDup is also able to outperform another state of the art duplicate detection solution, both in terms of efficiency and of effectiveness.

Slicing A New Approach To Privacy Preserving Data Publishing
Document19 pages
Slicing A New Approach To Privacy Preserving Data Publishing
keerthana1234
No ratings yet
Abstracts
Document3 pages
Abstracts
Manju Scm
No ratings yet
Iterative, Interactive and Intuitive Analytical Data Mining
Document12 pages
Iterative, Interactive and Intuitive Analytical Data Mining
Er Payal Kaul
No ratings yet
Depso Model For Efficient Clustering Using Drifting Concepts.
Document5 pages
Depso Model For Efficient Clustering Using Drifting Concepts.
Dr J S Kanchana
No ratings yet
A Comparative Study of Some Classification Algorithms Using and Algorithm
Document9 pages
A Comparative Study of Some Classification Algorithms Using and Algorithm
elmannai
No ratings yet
A Rapid Hybird Clustring Algorithm For A Large Volumes of High
Document77 pages
A Rapid Hybird Clustring Algorithm For A Large Volumes of High
Renowntechnologies Visakhapatnam
No ratings yet
Graph Autoencoder-Based Unsupervised Feature Selection With Broad and Local Data Structure Preservation
Document28 pages
Graph Autoencoder-Based Unsupervised Feature Selection With Broad and Local Data Structure Preservation
riadelectro
No ratings yet
1) A Link Based Cluster Enemble Approach For Categorical Data Clusting
Document6 pages
1) A Link Based Cluster Enemble Approach For Categorical Data Clusting
ToorRoot
No ratings yet
An Analytical Model of The Working-Set Sizes in Decision-Support Systems
Document11 pages
An Analytical Model of The Working-Set Sizes in Decision-Support Systems
Nser ELyazgi
No ratings yet
Visual Clustering Approaches
Document3 pages
Visual Clustering Approaches
K V D Sagar
No ratings yet
An Ensemble and Dynamic Ensemble Classification Methods For Data Streams A Review
Document8 pages
An Ensemble and Dynamic Ensemble Classification Methods For Data Streams A Review
International Journal of Innovative Science and Research Technology
No ratings yet
Detection of Forest Fire Using Wireless Sensor Network
Document5 pages
Detection of Forest Fire Using Wireless Sensor Network
Anonymous gIhOX7V
No ratings yet
Biblio Java PDF
Document4 pages
Biblio Java PDF
Fallen Ccil
No ratings yet
OPTICS: Ordering Points To Identify The Clustering Structure
Document12 pages
OPTICS: Ordering Points To Identify The Clustering Structure
qoberif
No ratings yet
Improving Query Performance Using Materialized XML Views: A Learning-Based Approach
Document12 pages
Improving Query Performance Using Materialized XML Views: A Learning-Based Approach
nithiananthi
No ratings yet
Self-Taught Low-Rank Coding For Visual Learning
Document12 pages
Self-Taught Low-Rank Coding For Visual Learning
NationalinstituteDsnr
No ratings yet
IEEE Solved PROJECTS 2009
Document64 pages
IEEE Solved PROJECTS 2009
Muniasamy
No ratings yet
XML Data Dissemination Using Automata On Top of Structured Overlay Networks
Document10 pages
XML Data Dissemination Using Automata On Top of Structured Overlay Networks
machinelearner
100% (2)
A Graph-Theoretic Approach To Map Conceptual Designs To XML Schemas
Document45 pages
A Graph-Theoretic Approach To Map Conceptual Designs To XML Schemas
Vladimir Calle Mayser
No ratings yet
Clustering With Multiviewpoint-Based Similarity Measure: Abstract
Document83 pages
Clustering With Multiviewpoint-Based Similarity Measure: Abstract
SathishPerla
No ratings yet
Dissertation Thesis
Document9 pages
Dissertation Thesis
Vaibhav Sawant
No ratings yet
A Learned Database Abdul Rehman (18L-1138) Talha Sipra (16L-4278)
Document9 pages
A Learned Database Abdul Rehman (18L-1138) Talha Sipra (16L-4278)
Abdulrehman FastNU
No ratings yet
A Domain-Independent Data Cleaning Algorithm For Detecting Similar-Duplicates
Document10 pages
A Domain-Independent Data Cleaning Algorithm For Detecting Similar-Duplicates
Ripon Kazi Shah Nawaz
No ratings yet
Scalable Algorithms For Association Mining: Mohammed J. Zaki, Member, IEEE
Document19 pages
Scalable Algorithms For Association Mining: Mohammed J. Zaki, Member, IEEE
Jamal aryan a
No ratings yet
An XML-based Framework For Temporal Database Implementation
Document3 pages
An XML-based Framework For Temporal Database Implementation
Rukayat Gbemisola Adebayo
No ratings yet
An Improving Genetic Programming Approach Based Deduplication Using KFINDMR
Document8 pages
An Improving Genetic Programming Approach Based Deduplication Using KFINDMR
surendiran123
No ratings yet
Pam Clustering Technique: Bachelor of Technology Computer Science and Engineering
Document11 pages
Pam Clustering Technique: Bachelor of Technology Computer Science and Engineering
samaksh
No ratings yet
Rule-Base Structure Identification in An Adaptive-Network-Based Fuzzy Inference System PDF
Document10 pages
Rule-Base Structure Identification in An Adaptive-Network-Based Fuzzy Inference System PDF
Oualid Lamraoui
No ratings yet
Chapter 5. Paper 1: Fast Rule-Based Classification Using P-Trees 5.1. Abstract
Document22 pages
Chapter 5. Paper 1: Fast Rule-Based Classification Using P-Trees 5.1. Abstract
nobeen666
No ratings yet
Novel and Efficient Approach For Duplicate Record Detection
Document5 pages
Novel and Efficient Approach For Duplicate Record Detection
IJAERS JOURNAL
No ratings yet
Efficient Data Clustering With Link Approach
Document8 pages
Efficient Data Clustering With Link Approach
seventhsensegroup
No ratings yet
Multi-Layer Perceptrons
Document8 pages
Multi-Layer Perceptrons
warrengauci
No ratings yet
D4 12 Streaming 08622392
Document11 pages
D4 12 Streaming 08622392
Lautaro González
No ratings yet
Latent Semantic Kernels: Kernel
Document27 pages
Latent Semantic Kernels: Kernel
darktuareg
No ratings yet
A Bayesian Approach To Identification of Hybrid Systems: A. Lj. Juloski, S. Weiland, and W. P. M. H. Heemels
Document14 pages
A Bayesian Approach To Identification of Hybrid Systems: A. Lj. Juloski, S. Weiland, and W. P. M. H. Heemels
Hector J Leal Villavicencio
No ratings yet
Hybrid Fuzzy Approches For Networks
Document5 pages
Hybrid Fuzzy Approches For Networks
iiradmin
No ratings yet
1 s2.0 S0031320311005188 Main
Document15 pages
1 s2.0 S0031320311005188 Main
Rohhan Rabari
No ratings yet
+A Distribution-Aware Training Scheme For Learned Indexes
Document15 pages
+A Distribution-Aware Training Scheme For Learned Indexes
aaarash
No ratings yet
Rake: Semantics Assisted Network-Based Tracing Framework: Yao Zhao, Yinzhi Cao, Yan Chen, Ming Zhang, and Anup Goyal
Document12 pages
Rake: Semantics Assisted Network-Based Tracing Framework: Yao Zhao, Yinzhi Cao, Yan Chen, Ming Zhang, and Anup Goyal
Namith Devadiga
No ratings yet
Vide
Document80 pages
Vide
RajnishKumar
No ratings yet
A Case For Hash Tables: RSKFMKSF
Document10 pages
A Case For Hash Tables: RSKFMKSF
Pepe Pompin
No ratings yet
Literature Survey Petuum
Document10 pages
Literature Survey Petuum
Sanjay
No ratings yet
Bitcoin Modules
Document7 pages
Bitcoin Modules
Anonymous vEkqfN
No ratings yet
R S F XML R: Elational Torage OR Ules
Document9 pages
R S F XML R: Elational Torage OR Ules
ijwsc
No ratings yet
A Distribution-Aware Training Scheme For Learned Indexes
Document15 pages
A Distribution-Aware Training Scheme For Learned Indexes
aaarash
No ratings yet
Research Citation Notes
Document35 pages
Research Citation Notes
Web Best Wabii
No ratings yet
Improving Analysis of Data Mining by Creating Dataset Using SQL Aggregations
Document6 pages
Improving Analysis of Data Mining by Creating Dataset Using SQL Aggregations
www.irjes.com
No ratings yet
SQLDM - Implementing K-Means Clustering Using SQL: Jay B.Simha
Document5 pages
SQLDM - Implementing K-Means Clustering Using SQL: Jay B.Simha
Moh Ali M
No ratings yet
PROFICIENCY Data Mining
Document6 pages
PROFICIENCY Data Mining
Ayushi JAIN
No ratings yet
Practical 7
Document2 pages
Practical 7
Harshada Bokan
No ratings yet
X-CLUSTER: A Novel and Efficient Clustering Tool
Document24 pages
X-CLUSTER: A Novel and Efficient Clustering Tool
Ankit Goyal
No ratings yet
Cms-Mod Shop-55-10 1 1 70 1713
Document17 pages
Cms-Mod Shop-55-10 1 1 70 1713
SamiLouisse
No ratings yet
Parallel Querying of ROLAP Cubes in The Presence of Hierarchies
Document8 pages
Parallel Querying of ROLAP Cubes in The Presence of Hierarchies
zcbluesky5
No ratings yet
1887 - 66090-Case Study
Document6 pages
1887 - 66090-Case Study
ICHSAN PRADANA
No ratings yet
Data Mining Project 11
Document18 pages
Data Mining Project 11
Abraham Zeleke
No ratings yet
Solving Ordinary Differential Equations Using Tayl
Document15 pages
Solving Ordinary Differential Equations Using Tayl
Ayesha Malik
No ratings yet
Hierarchical Clustering PDF
Document5 pages
Hierarchical Clustering PDF
Likitha Reddy
No ratings yet
Object Orientation in Database Interoperation Case Study of Version Changed Relational Databases
Document11 pages
Object Orientation in Database Interoperation Case Study of Version Changed Relational Databases
postscript
No ratings yet
An Investigation into the Use of a Neural Tree Classifier for Knowledge Discovery in OLAP Databases
From Everand
An Investigation into the Use of a Neural Tree Classifier for Knowledge Discovery in OLAP Databases
David R Swinburne
No ratings yet
Magic Data: Part 1 - Harnessing the Power of Algorithms and Structures
From Everand
Magic Data: Part 1 - Harnessing the Power of Algorithms and Structures
Chuck Sherman
No ratings yet
C All
Document164 pages
C All
Balamurugan Velumani
No ratings yet
PP Unit V Notes
Document21 pages
PP Unit V Notes
Balamurugan Velumani
No ratings yet
Sources and Classification of Air Pollutants
Document12 pages
Sources and Classification of Air Pollutants
Balamurugan Velumani
No ratings yet
CE8005 - Air Pollution and Control Engineering: Unit 1
Document61 pages
CE8005 - Air Pollution and Control Engineering: Unit 1
Balamurugan Velumani
No ratings yet
Lab Manual - CP
Document101 pages
Lab Manual - CP
Balamurugan Velumani
100% (1)
Avl
Document17 pages
Avl
Balamurugan Velumani
No ratings yet
6 8
Document42 pages
6 8
Balamurugan Velumani
No ratings yet
2 Documentation
Document59 pages
2 Documentation
Balamurugan Velumani
No ratings yet