You are on page 1of 6

EVENT DETECTION USING A MULTIMEDIA DATAMINING

FRAMEWORK

R.SHANKAR, MASTER OF COMPUTER APPLICATIONS,


MASTER OF COMPUTER APPLICATIONS, MUTHAYAMMALENGINEERINGCOLLEGE,
MUTHAYAMMAL ENGINEERING COLLEGE, EMAIL:v.sabariganesh@yahoo.com
EMAIL:shankarzoom@gmail.com

ABSTRACT
videos such as event detection and concept
The main purpose of the proposed
extraction. In this project, we target
framework is to achievefull automation in
addressing the aforementioned challenges, to
event detection and extract the video event
some extent, with the adoption of multimodal
required by theuser by the integration of
content analysis and the intelligent integration
distance-based and rule-based data mining
of distance-based and rule-based data mining
techniques.The system performs the
techniques. Here, video events are normally
comprehensive video analysis by extracting
defined as the interesting events which
the multimodal video features from
capture users’ attentions (i.e., goal events,
audio/visual sources. We propose to use the
traffic accidents, etc.);whereas the concepts
integrated data mining techniques to remove
refer to high-level semantic features. In the
the class imbalance issue along the process
literature, most of the state-of-the-art event
and to also reconstruct and refine the feature
detection frameworks were conducted toward
dimension automatically. In this system, a
the videos with loose structures or without
common set of low-level and middle-level
story units, such as sports videos, surveillance
multimodal features is extracted from the
videos, or medical videos. Most of such
video and fed to the data mining component
studies are conducted in a two-stage
without the need of the domain knowledge.
procedure. We name the first stage as video
The distance-based data mining technique is
processing, where the video clip is segmented
used in decision-making stage to prune the
into certain analysis units called ‘clips’ and
data and prevent the class imbalance issue.
their representative features are extracted. The
The rule-based data mining technique is used
second stage is called the decision-making &
to make the final extraction of the video
video detection process that extracts the
event. The experimental performance on
semantic index from the feature descriptors to
goal/corner event detection from football
make the framework more robust.
videos and other sports videos will be used to
demonstrate the effectiveness of the proposed
EXISTING SYSTEM
framework. Furthermore, the proposed
multimedia data mining framework can be It uses two-stage procedure.
extended to a wide range of different The first stage is named as video content
application domains, due to the domain-free processing, where the video clip is segmented
characteristic of the data mining techniques into certain analysis units and their
involved. representative features are extracted.Video
content processing stores the video in the
INTRODUCTION
database after dividing the video into smaller
units. The second stage is called the decision-
To automate the video event
making process that extracts the semantic
detection procedure by the combination of
index from the feature descriptors. Decision
distance-based and rule-based data mining
making process retrieves the video from the
techniques. With the proliferation of video
database based upon the user’s requirements.
data and growing requests for video
The decisionmaking stage is mostly
applications, there is an increasing need of
depending on the domain knowledge.
advanced technologies for indexing, filtering, DISADVANTAGES OF EXISTING SYSTEM
searching, and mining the vast amount of
V.SABARI GANESH,
1
The process of video detection is detection performance.
time consuming in the decision making • The third step is Instance filtering; the
process. original data set is randomly split into two
• Dependency on domain-knowledge disjoint subsets.
limits extensibility in handling other • The fourth step is Self-Refining
application domains. Training Data set, the data set has a great
• Large number of negative instances influence on the later rule-based data mining
in detection model process process
• Accuracy of the results is not always • The last step is final detection which
guaranteed uses Hunt decision tree structure, to retrieve
• It has the very small percentage of the most accurate video clip.
positive instances • The algorithm makes use of decision
tree method to recursively traverse through
PROPOSED SYSTEM the branches and nodes and removes the
nodes that do not satisfy the user’s query.
The proposed framework offers
a unique solution for both even detection and BLOCK DIAGRAM
concept extraction. It greatly eliminates the
dependency on domain knowledge and
automates the process of event/concept
detection. The proposed system is divided
into 3 parts, one for video segmentation and
the other 2 for proper detection & retrieval of
the correct video. Its uniqueness lies in the
fact that a subspace-based (a special distance-
based) data mining technique is used in the
decision-making stage to prune the data and
alleviate the classimbalance issue.

ADVANTAGES OF PROPOSED SYSTEM

• Greatly eliminates the dependency


on domain knowledge
• Accuracy of results is more
• Removes the number of negative
samples detected
• Saves time

SYSTEM DESIGN
SYSTEM ARCHITECTURE

In system architecture the following


process is taking place: MODULES
• The first step is video parsing, also
called syntactic segmentation. It involves • Video Parsing and Feature
temporal partitioning of the video sequence Extraction.
into meaningful units called ‘clips’. • Distance-Based Data Mining.
• The second step is Typical Instance • Rule-Based Data Mining.
Self-Learning where it is well acknowledged MODULE EXPLANATION:
that noisy or irrelevant data contained in the
original data set would adversely impact the

2
A textual query language, which and irrelevant information introduced during
specifies queries usingan extended Structured the video production and feature extraction
Query Language (SQL) to access thedatabase. processes. Therefore, before performing the
This project explains the POS-based pattern- actual detection process, a pre filtering
matchingapproach we use in identifying process is needed to trim as many negative
queries helps users specify querieswithout instances as possible. This paper proposes to
conforming to strict rules. eliminate a great portion of negative instances
and thus to overcome the rare event/concept
VIDEO PARSING AND FEATURE detection issue. In brief, it contains three
EXTRACTION automatic schemes: Typical instance self-
learning with feedback; instance filtering and
Video parsing, or called syntactic feature reconstruction and selection; and self-
segmentation, involves temporal partitioning refining
of the video sequence into meaningful units training data set.
which then serve as the basis for descriptor
extraction and semantic notation. In this RULE-BASED DATA MINING
study, shots are adopted as the basic syntactic
unit as they are widely accepted as a self- The Hunt decision tree is a classifier in
contained and well- defined unit, where our the form of a tree structure, where each node
shot-boundary detection algorithm consisting is either a leaf node, indicating the value of
of pixel-histogram comparison, along with the target class from observations, or a
segmentation map comparison, and object decision node, which specifies certain tests to
tracking is employed. In essence, the be carried out on a single attribute value and
differences between consecutive frames are which is proceeded by either a branch or a
compared in terms of their pixel/histogram sub-tree for each of the possible outcomes of
values, segmented regions characteristics and the test. Its main classification procedure is
foreground objects’ size/location, and shot first to construct a model for each class in
boundary is detected when the difference terms of the attribute-value pairs and thus use
reaches a certain threshold. Here, the the induced model to categorize any incoming
segmentation map and object information are testing instance. The construction of a
extracted using the simultaneous partition and decision tree is performed through the so-
class parameter estimation (SPCPE) called “Variable-valued” approach, i.e.,
unsupervised object segmentation method. In recursively partition the training set with
terms of feature extraction, multimodal respect to certain criteria until all the
features (visual and audio) are extracted for instances in a partition have the same class
each shot based on the detected shot label, or no more attributes can be used for
boundaries. further partitioning. The derived model
summarizes all given information from
DISTANCE-BASED DATA MINING training data set but express it in a more
concise and perspicuous manner.Video
It is frequently observed that the parsing and Feature Extraction Video parsing,
rare event/concept detection issue arises since or called syntactic segmentation, involves
the video data amount is typically huge and temporal partitioning of the video sequence
the ratio of the event/concept instances to the into meaningful units which then serve as the
negative instances is typically very small basis
(e.g., only less than 1:100 in our goal event for descriptor extraction and semantic
detection empirical studies). Accordingly, it annotation. In this study, shots are adopted as
would be difficult for a typical detection the basic syntactic unit as they are widely
process to capture such a small portion of accepted as a self-contained and well-defined
targeted instances from the huge amount of unit, where our shot-boundary detection
data especially with the existence of the noisy algorithm consisting of pixel-histogram
3
comparison, segmentation map comparison, In the constructed tree, each data entry
and object tracking is employed. In essence, consists of audio and visual features as well as
the differences between consecutive frames the class label. The multimodal features are
are compared in terms of their pixel/histogram extracted in the feature extraction stage as
values, segmented regions characteristics and discussed earlier. A “yes” or “non” class label
foreground objects’ size/location, and shot In is assigned to each shot manually, showing
the constructed tree, each data entry consists whether there is an interested event/concept or
of audio and visual features as well as the not.Boundary is detected when the difference
class label. The multimodal features are reaches a certain threshold. Here, the
extracted in the feature extraction stage as segmentation map and object information are
discussed earlier.The testing process of the extracted using the simultaneous partition and
decision tree is in the form of traversing a class parameter estimation (SPCPE)
path in the built tree from the root to a certain unsupervised object segmentation method.
leaf node and the corresponding class label is
assigned to the instance when it reaches a

Fig Distance-Based Data Mining

In summary, the performance of our


framework remains reasonably good and
consistent without the inference of any
domain knowledge.This experiment
demonstrates the effectiveness and potential
of the proposed framework in the domain of
video semantic analysis.

C. Comparative Experiment #2

In the second experiment, a group of well-


Fig: Rule-Based Data Mining. recognized data classification methods, such
as one rule classifier (for short, OR), random
leaf node. In our proposed framework, given forest (RF), and logistic (LG) are adopted for
the resulting training data set from the event/concept detection with and without the
distance- based data mining process, the Hunt data pruning process (i.e., subspace data
decision tree algorithm is adopted to learn a mining step). As can be seen from Table V,
classifier and the induced classification rules all the data classifiers perform poorly without
are represented in the form of a decision tree. the data pruning process. In many cases, none
4
of the event/concept unit can be detected and proposed framework is that it is automatic
thus the recall values equal to zero. In without the need of domain knowledge and
contrast, Table VI shows the results of thus can be easily extended to various
conducting data classification after the data application domains. The relax of the domain
pruning process. Clearly, both recall and knowledge is achieved by adopting several
precision scores increase dramatically in all distance-based data mining schemes to
cases. This experiment thus demonstrates the alleviate the class imbalance issue and to
capability of the subspace based data mining reconstruct and reduce the feature dimension.
approach as an independent component in Thereafter, the decision tree is employed to
tackling the rare event/concept detection construct the training model for the final
issue. On the other hand, we also show that a event/concept detection. The experimental
better detection performance can be achieved results demonstrate the effectiveness and
by reducing class imbalance situations.In adaptively of the proposed framework for
terms of feature extraction, multimodal concept/event detection. In our future work,
features (visual and audio)are extracted for this framework will be tested and extended in
each shot based on the detected shot more concept/event detection applications,
boundaries.Totally, five visual features are such as detecting significant events from
extracted for each shot, namely pixel_change, surveillance videos and other important
histochange,backgroundmean,background_va concepts (indoors, outdoor, landscape, etc.)
r,anddominant_color_ratio.Here,pixel_change from the TRECVID videos.
denotes the average percentage of the changed
pixels between the consecutive frames within REFERENCES:
a shot and histo_change represents the mean
value of the frame-to-frame histogram [1] A. Amir et al., “IBM research TRECVID-
differences in a shot. Another visual feature is 2003 video retrieval system,” in NIST
the dominant_color_ratio [4] that represents TRECVID, 2003.
the ratio of dominant color in the frame based [2] S.-C. Chen, M.-L. Shyu, C. Zhang, and R.
on histogram analysis and is widely used for L. Kashyap, “Identifying overlapped objects
shot classification. Then region-level analysis for video indexing and modeling in
is conducted based on segmentation results multimedia database systems,” Int. J. Artific.
(the background and foreground regions Intell. Tools, vol. 10, no. 4, pp. 715–
identified by SPCPE). The features 734, Dec. 2001.
background mean and background_var are [3] S.-C. Chen, M.-L. Shyu, and C. Zhang,
therefore used to capture shot-level standard “Innovative shot boundary
deviation and mean color values for each detection for video indexing,” in Video Data
segmented frame, respectively. Management and Information Retrieval, S.
Deb, Ed. Hershey, PA: Idea Group
CONCLUSION AND FUTURE WORK Publishing, 2005, pp. 217–236.
[4] M. Chen, S.-C. Chen, M.-L. Shyu, and K.
Video event/concept detection is of Wickramaratna,
great importance in video indexing, retrieval, “Semantic event detection via temporal
and summarization. However, the semantic analysis and multimodal data
gap and rare concept/event detection issues mining,”
inhibit the viability of the existing approaches IEEE Signal Processing Mag. (Special Issue
in diverse event/concept detection domains. on Semantic Retrieval of
To address these issues, in this paper, a novel Multimedia), vol. 23, no. 2, pp. 38–46, Mar.
subspace based multimedia data mining 2006.
framework is proposed that utilizes the [5] S.-C. Chen, M.-L. Shyu, C. Zhang, and M.
multimodal content analysis and the distance- Chen, “A multimodal
based and rule-based data mining techniques. data mining framework for soccer goal
One of the unique contributions of the detection based on decision tree
5
logic,” Int. J. Comput. Applic. Technol., vol.
27, no. 4, pp. 312–323,
2006.
[6] S. Dagtas and M. Abdel-Mottaleb,
“Extraction of TV highlights
using multimedia features,” in Proc. IEEE Int.
Workshop on
Multimedia Signal Processing, Cannes,
France, 2001, pp. 91–96.
[7] A. Ekin, A. M. Tekalp, and R. Mehrotra,
“Automatic soccer video
analysis and summarization,” IEEE Trans.
Image Process., vol. 12, no.
7, pp. 796–807, Jul. 2003.
[8] S. Gao, X. Zhu, and Q. Sun, “Exploiting
concept association to
boost multimedia semantic concept
detection,” in Proc. IEEE Int. Conf.
Acoustics, Speech and Signal Processing,
Honolulu, HI, 2007, vol. 1,
pp. 981–984.
[9] B. Han, Support Vector Machines Center
for Information Science
and Technology, Temple University,
Philadelphia, PA, 2003 [Online].
Available:
http://www.ist.temple.edu/~vucetic/cis526fall
2003/lecture8.

You might also like