You are on page 1of 74

Multimodal Interfaces

[5] Multimedia content representation


&
Information Visualization
O. Abou Khaled, D. Lalanne, J. Bapst
Assistants : Tadeusz Senn, Sandro Gerardi, Florian Evquoz, Bruno Dumas

Multimodal
Interfaces

Overview
Multimedia Content Representation
Introduction,

definitions, etc.

9 Audio, Image, Video


Multimedia content protection (Watermarking)
Multimedia

storage and transmission


Digital library, content-based multimedia systems.

Multimedia Information Retrieval


9 Audio, image, video

Examples & research Projects


Goal: have a wide idea about
Multimedia

fundamental and multimedia system


construction/access (over the networks)
Multimedia Information Retrieval
The MMI team

MMI_05

Multimodal
Interfaces

Overview
Information Visualization
Introductio,

Definitions
The Power of Information Visualization
Visualization for What ?

Examples of Information Visualization

Goal: have a wide idea about


Visualization

The MMI team

techniques

MMI_05

Multimodal
Interfaces

Whats Multimedia?
Multi: Many
Media:

A means to distribute and represent information: Text, graphics, pictures, voice,


sound and music..
9 Perception media (how do humans perceive information?)
Audio/visual media
9 Representation media (how is information encoded?)
ASCII, JPG, MPEG, PAL.
9 Presentation media (medium used for output/input)
Input/output media (keyboards, papers)
9 Storage media (Where is information stored?)
Magnetic disk, optical disk

Multimedia:

To distribute and present information coded as


9 Text, Graphics, animation, audio and video..

By Computer, TV, phone, etc.

Multimedia: a working definition

A combination of two or more categories of information having different


transport signal characteristics. Typically, one medium is a continuous medium
while another is discrete
Image, audio, video and graphics are usually the examples of media

The MMI team

MMI_05

Multimodal
Interfaces

Why we need Media?


Our words cannot exactly describe the images.

Speaking is faster than writing


Listing is easier than reading
Showing is easier and clearer than describing

The MMI team

MMI_05

Multimodal
Interfaces

The Types of Media


Perception Media
The

nature of information perceived by humans (How do humans


perceive information?)
Auditory media and Visual media

Representation Media
How

information is represented internally to the computer (How is


information encoded in the computer?)
Character (ASCII) , image (JPEG), audio (PCM) , video (TV signal,
MPEG)

Presentation Media
Physical

means used by systems to reproduce information for humans


(Which medium is used to output information from the or input in the
computer?)
Monitors, keyboard, cameras (Input output devices)

The MMI team

MMI_05

Multimodal
Interfaces

The Types of Media


Storage Media
Physical

means for storing computer data (where is information stored?)


Magnetic tapes, magnetic disks, optical disks.

Transmission Media
Physical

means that allow the transmission of signals. (which medium is


used to transmit data?)
Cables, Radio tower, satellite...

Information Exchange Media


All

data media used to transport information. (which data medium is


used to exchange information between different locations?)

The MMI team

MMI_05

Multimodal
Interfaces

Context
Recent advances in the technologies of communication,
computer science and electronics have facilitated production,
and distribution of multimedia data

Huge quantity of multimedia data is accessible in different


domains (education, entertainment, communication, etc.)

Rich content of multimedia data


Integration

of different media (audio, video, still images, text)


Complex relationships (spatio-temporal, composition, etc.)

Professional/non professional users need to access multimedia


data

=> Important need for elaborate multimedia


indexing and retrieval techniques.

The MMI team

MMI_05

Multimodal
Interfaces

Multimedia IR vs. Text IR


People are used to express their needs using natural
language

Natural language queries are frequently used for text


information retrieval

Matching between text queries and text documents is


more or less straightforward

Multimedia documents contain non-textual data


To provide text queries on top of multimedia
documents, the document content should be
described (annotated) textually

The MMI team

MMI_05

Multimodal
Interfaces

Two approaches in retrieving multimedia


document

Text-based retrieval
Approach

9 Annotating the multimedia content with text descriptions and


allowing text queries on top of these descriptions
Pros

9 Some multimedia contain text zones which can be used for


description
Cons

9 Automatic description is very hard to achieve


9 Manual description is time consuming
9 Some contents are difficult to describe.

The MMI team

MMI_05

10

Multimodal
Interfaces

Two approaches in retrieving multimedia


document

Content-based retrieval
Approach

9 Querying the content based on similarities with a given


multimedia example, a sketch, etc.
Pros

9 Can be useful for restricted/specialised content (such as logo


databases)
Cons

9 Examples are not always easy to find/create


9 Performances are very limited
9 Semantic retrieval is almost impossible.

The MMI team

MMI_05

11

Multimodal
Interfaces

What is a Multimedia System?


A system that involves:

A system that involves:

Generation

production/authoring

Representation

compression

Storage

file

Transmission

networking

search

database

and retrieval

delivery

of multimedia
information

The MMI team

and formats

system design

server

MMI_05

tools

issues

management

design, streaming

12

Multimodal
Interfaces

How to Build Multimedia Database Systems?


How to build text database?
Yahoo, Google
Natural language processing
Text document

Actions

Tree-based indexing

Text database

Transmission

Multimedia analysis
Multimedia data
Multimedia Indexing

Actions
Transmission

The MMI team

Multimedia database

MMI_05

13

Multimodal
Interfaces

Audio
Sound Fundamentals
Sound

is a continuous wave that travels through the air.


The wave is made up of pressure differences.
Sound is detected by measuring the pressure level at a location.
Sound waves have normal wave properties (reflection, refraction,
diffraction, etc.).
9 Sound reflects off walls if small wave length
reflection
9 Sound bends around walls if large wave lengths
diffraction
9 Sound changes direction due to temperature shifts
refraction

Not covered subjects


Signal

Fundamental, Human Perception, Sound Quality Measures,


Sound Codec Standards, etc.

The MMI team

MMI_05

14

Multimodal
Interfaces

Audio Information Retrieval

The Basics of Audio Search and Audio


Information Retrieval
Audio

Information Retrieval is the process of


retrieving audio information by using various
available resources:
9 If the available resources are series of keywords
annotated manually -> Text-based retrieval
9 Text based searching for audio information is most
common
9 If the available resource is a piece of audio information
(ex: a melody of a song) -> Content-based retrieval.
9 Content based audio research is promising and attractive,
but there is a long way to go

The MMI team

MMI_05

15

Multimodal
Interfaces

Audio Information Retrieval

What are some Audio IR Mechanisms?


Annotation

based Audio Retrieval


Content-based Audio Retrieval

Annotation based Audio Retrieval


Peer-to-peer

file sharing software


FTP, Streaming audio, Websites
Online network drives, Clip Art

Problems
Spyware,

The MMI team

virus, unrelated results returned

MMI_05

16

Multimodal
Interfaces

Audio Information Retrieval


Content-based Audio Retrieval
Audio

feature extraction
Audio classification and Retrieval

Whats content-based retrieval?


The

retrieval is facilitated by the information content in contrast to


simple retrieval based on manual index terms or keywords.

Whats the content?


The

semantic concept meaning of the information.

Why key-words annotation is not good.


Subjective

and expensive

Insufficient

How to describe the content?


Features:
Audio:

Loudness, bandwidth, pitch..


Image: Color, Texture, Objects..
Video: Temporal information change, Image+Audio
The MMI team

MMI_05

17

Multimodal
Interfaces

General Audio Retrieval Framework

Audio Repository

Classification: Male,
Laughing,

Features Extraction

Indexing:Using
feature describe
audio unit
Audio Example

Keywords

Features
Extraction

Retrieval
User
Interface

Audio Database

Browsing

The MMI team

MMI_05

18

Multimodal
Interfaces

Content-based Audio Retrieval


General Audio Features for information retrieval.
Time-domain

Features
Frequency-domain Features

Audio classification
Goal

9 Classify audio into speech, music, and other categories and


subcategories
Motivation

9 Different audio types require different processing and indexing


retrieval techniques
9 Different audio types have diff signification to different
application
9 Search space after classification is reduced to a particular audio
class

The MMI team

MMI_05

19

Multimodal
Interfaces

Content-based Audio Retrieval


A demo on Speech and Music Classification
http://www.musclefish.com
http://www.soundfisher.com/download/

Content-based Audio database browse and retrieval


http://www.soundfisher.com/index_flash.html
http://www.musclefish.com/frameset.html

Conclusions

Audio Information Retrieval


9 Annotation Based:
Peer-to-peer system, ftp
9 Content Based:
Audio Feature extraction
Audio Classification and Retrieval
Music Retrieval.

The MMI team

MMI_05

20

Multimodal
Interfaces

Image
Whats an Image?

An image is a 2D rectilinear array of Pixels

Image Data Structure

Pixel
9 Picture elements in digital images, it usually indicate a point in an image.

Image Resolution
9 The number of pixels in a digital image.

Depth
9 The number of bit used to characterize each pixel information.
Bit Map: 1 bit/pixel, Gray scale: 2-8bits/pixel, Full color: 24 bits/ pixel, Color
mapped: 8 bits/ pixel

The Quality of the image

Resolution (The number of pixel)


Image Depth
Adopted compression algorithm (if adopted)

Not covered subjects

Image Depth, Monochrome/Bit-Map, Dithering, Gray Scale Images, 8-bit/24bit Color Images, Image Format, etc.

The MMI team

MMI_05

21

Multimodal
Interfaces

Image Retrieval
Text-based retrieval
Using

the text surrounding the image

9 Text close to the image in the containing document


URL: http://www.host.com/animals/dogs/poodle.gif
Alt text: <img src=URL alt="picture of poodle">
Hyperlink text: <a href=URL>Sally the poodle</a>
Using

the text inside the image

9 Required OCR technique


Some

image search engines use this technique

9 Google, Altavista, www.ditto.com


Pros

9 Easy to implement and use


9 Useful for simple and non-professional image retrieval
Cons

9 It is incomplete and subjective


9 Some features are difficult to define in text such as texture or object shape

The MMI team

MMI_05

22

Multimodal
Interfaces

Image retrieval

The MMI team

MMI_05

23

Multimodal
Interfaces

Image retrieval

Almost impossible to describe all the contents.


Some contents are difficult to describe.

The MMI team

MMI_05

24

Multimodal
Interfaces

Image retrieval
Content-based image retrieval (CBIR)

The commonly acceptable way is


9 To show a sample image, or draw a sketch of the desired images to computer, and
ask the system to retrieve all the images similar to that sample image or sketch.
9 It relies on features such as colour, shape, texture

Examples

IBMs Query By Image Content (QBIC)


9 Retrieves based on visual content, including properties such as color percentage,
color layout and texture.
9 Fine Arts Museum of San Francisco uses QBIC.

Viragess VIR Image Engine


9 Can search based on color, composition, texture and structure.

Challenges

The term similarity has different meaning for different people.


Even the same person uses different similarity measures in different
situations.

The MMI team

MMI_05

25

Multimodal
Interfaces

Images containing similar colors

The MMI team

MMI_05

26

Multimodal
Interfaces

Images containing similar shape

The MMI team

MMI_05

27

Multimodal
Interfaces

Images containing similar content

The MMI team

MMI_05

28

Multimodal
Interfaces

Variety of Similarity

Degree of difficulty

Similar color distribution

Histogram matching

Similar texture pattern

Texture analysis

Similar shape/pattern

Image Segmentation,
Pattern recognition

Similar real content

Eternal goal :-)

The MMI team

MMI_05

29

Multimodal
Interfaces

Indexing
Use any text available: Title, Subject, Caption
Use content information: Colour histogram, Shape, Texture

The MMI team

MMI_05

30

Multimodal
Interfaces

Image retrieval
Demonstrations
http://zomax.wins.uva.nl:5345/ret_user/
http://www.ifp.uiuc.edu/~nakazato/CBIR/

Other demos
http://eidetic.ai.ru.nl/egon/cogw/co440/CBIR_Demo-s.html
http://www.ee.surrey.ac.uk/Research/VSSP/imagedb/demo.html
http://www.fb9-ti.uni-duisburg.de/rotdemo.html
http://mmdb.ece.ucsb.edu/~demo/corelacm/

Conclusions
Image

Retrieval
Content-based Image Retrieval (CBIR)
General Measures:
9 Gray intensity, Color, Texture, Shape
Distances

Measures:

9 Color similarity, Texture similarity, Shape similarity, Object and


relationship similarity.

The MMI team

MMI_05

31

Multimodal
Interfaces

Video
Video consists of images
Whats the interval of images?
Sampling rates must be high enough to avoid motion "aliasing.
1. At

least 15 frames/Sec
2. 30 frames/ Sec appears smoothly
3. At least 50 frames/ sec needed in the ideal case

Not covered subjects


Video

standards, Broadcast System Elements, Analog Video


Representations, human perception, Compression Standards, Video
Processing Techniques, etc.

The MMI team

MMI_05

32

Multimodal
Interfaces

Content-based Video Retrieval


Applications
Video

Surveillance

9 Find where else the person appears


Experience

On-Demand

9 Help to remember previous events


Provide

useful information on traveling

9 Equipment on cars to retrieve useful multimedia information according to


your location/preference

Typical Retrieval Framework


User

: provide query information that represents his information


needs
Database: store a large collection of video data
Goal: Find the most relevant shots from the database
9 Shots: paragraph in video, typically 20 40 seconds, which is the
basic unit of video retrieval

The MMI team

MMI_05

33

Multimodal
Interfaces

Sample Query
Text : Find pictures of George Washington
Image:

The MMI team

Video:

MMI_05

34

Multimodal
Interfaces

Bridging the Gap


Video Database

User

Result

The MMI team

MMI_05

35

Multimodal
Interfaces

Automatically Structure Video Data


The first step for video retrieval: Video programmes are
structured into logical scenes, and physical shots

If dealing with text, then the structure is obvious:


paragraph,

section, topic, page, etc.

All text-based indexing, retrieval, linking, etc. builds upon this


structure;

Automatic shot boundary detection and selection of


representative keyframes is usually the first step;

The MMI team

MMI_05

36

Multimodal
Interfaces

Typical automatic structuring of video


a video document

A set of
shots

Keyframe browser
combined with
transcript or objectbased search
The MMI team

MMI_05

37

Multimodal
Interfaces

Bridging the Gap


Video Database
User

Information Need

Video Structure

Result
The MMI team

MMI_05

38

Multimodal
Interfaces

Ideal solution
Video Database
User

Information Need

Video Structure
Understanding the
semantic meaning and
retrieve

Result
The MMI team

MMI_05

39

Multimodal
Interfaces

Ideal solution

However,
1. Hard to represent query in natural
language and for computer to understand
2. Computers have no experience
3. Other representation restriction like
position, time

Video Database
User

Information Need

Video Structure
Understanding the
semantic meaning and
retrieve

Result
The MMI team

MMI_05

40

Multimodal
Interfaces

Alternative Solution
Video Database
User

Video Structure

Provide evidence of
relevant information ( text,
image, audio)

Information Need

Match and combine

Result
The MMI team

MMI_05

41

Multimodal
Interfaces

Evidence-based Retrieval System


General framework for current video retrieval system
Video retrieval based on the evidence from both users and
database, including
Text

information
Image information
Motion information
Audio information

Return a relevant score for each evidence


Combination of the scores

The MMI team

MMI_05

42

Multimodal
Interfaces

More Evidence in Video Retrieval


Video Database
User

Text
Information

Keyword
Information Need

Video Structure
Image
Information

The MMI team

Query
Images

Motion
Information

Motion

Audio
Information

Audio

MMI_05

43

Multimodal
Interfaces

Combination of multi-modal results


Difference characteristics between multi-modal information
Text-based

Information: better for middle and high level queries

9 e.g. Find the video clip of dancing women wearing dresses


Image-based

Information: better for low and middle level queries

9 e.g. Find the video clip of green trees

Combination of multi-modal information

The MMI team

MMI_05

44

Multimodal
Interfaces

Video retrieval
Primitives of Color Moments Method
http://debut.cis.nctu.edu.tw/Demo/ContentBasedVideoRetrieval/CBV

R/PrimitivesE/index.html

Dominant Colors Method


http://debut.cis.nctu.edu.tw/Demo/ContentBasedVideoRetrieval/CBV

R/DominantE/index.html

Combination Method
http://debut.cis.nctu.edu.tw/Demo/ContentBasedVideoRetrieval/CBV

R/demoE.html

The MMI team

MMI_05

45

Multimodal
Interfaces

Ideal solution: TSR Case study


Video Database
User

Information Need

Video Structure
Understanding the
semantic meaning and
retrieve

Result
The MMI team

MMI_05

46

Multimodal
Interfaces

TSR Study

Retrieval

Production

TV news

Indexing

The MMI team

MMI_05

47

Multimodal
Interfaces

Problems (I): Indexing


Non-optimal reuse of information

Production

Script
Subtitle
Teletext
Edited Video
Described rushes
Journalist
commentaries

The MMI team

Described video
relevant to the
query
Ineffective
information
exchange

Indexing

MMI_05

Retrieval

Video
segments
described
following the
TSR scheme
(places,
events,
persons,
dates, etc.)

48

Multimodal
Interfaces

MPEG-7 in Practice
Library of audiovisual descriptions
Coverage

9 Providing comprehensive set of descriptions needed in known


audiovisual applications
Interoperability

9 Data Definition Language based on the W3C XML Schema

The MMI team

MMI_05

49

Multimodal
Interfaces

Example of MPEG-7 Description

TV news audiovisual data

The MMI team

MMI_05

50

Multimodal
Interfaces

Example of MPEG-7 Description

<Mpeg7>

<StillRegion id = news>
</StillRegion>

</Mpeg7>

Title

The MMI team

MMI_05

51

Multimodal
Interfaces

Example of MPEG-7 Description

<Mpeg7>

<StillRegion id = news>
<SpatialDecomposition>
<StillRegion id = background>
Back ground
<VisualDescriptor
features
xsi:type=DominantColorType>
110 108 140
</VisualDescriptor>
<StillRegion id = speaker>
</SpatialDecomposition>
</StillRegion>

</Mpeg7>

The MMI team

MMI_05

52

Multimodal
Interfaces

Example of MPEG-7 Description


<Mpeg7>

<StillRegion id = speaker>
<TextAnnotation>
<FreeTextAnnotation> Journalist
Anna Blanco
More features
</FreeTextAnnotation>
</TextAnnotation>
<Mask xsi:type="SpatialMaskType">
<SubRegion>
<Poly>
<Coords> 80 288, 100 200, ,
352 288
</Coords>
</Poly>
</SubRegion>
</Mask>
</StillRegion>

</Mpeg7>

The MMI team

MMI_05

53

Multimodal
Interfaces

Semantic Views Model (I)


Goal

Provide a common TV news retrieval platform for professional and nonprofessional users

Cover a rich combination of content descriptions and AV structure via a


simple model

Design

Analyzing queries asked by different users in Tlvision Suisse Romande


(TSR) revealed a set of common description types

Example

Find A news item in the context of Euro 2000 football games containing a shot of at least 5 seconds
showing a French football supporter saying que le meilleur gagne

The MMI team

MMI_05

54

Multimodal
Interfaces

Semantic Views Model (II)


=> Users describe AV information following a set of Views

Its duration
Physical is
at leastView
5 seconds
It is
in the
context
Thematic
of EURO
View2000
football games

A video segment

I can hear
Audio
Que le meilleur
View
gagne!

It is
Production
a news
Viewitem

The MMI team

It contains a shot
Visual
showing an French
View
football supporter

MMI_05

55

Multimodal
Interfaces

TV news indexing and retrieval platform of COAL

COALA

Audiovisual
Repository
system

Description
system

TV news
MPEG-7 corpus

Retrieval
system

Visualization
system

The MMI team

MMI_05

56

Multimodal
Interfaces

Indexing tool

Demo
The MMI team

MMI_05

57

Multimodal
Interfaces

Indexing tool

The MMI team

MMI_05

58

Multimodal
Interfaces

Querying tool based on


Semantic Views Model

Five Views

ViewDescriptions

BasicViewEntities

InterViewRelations
IntraViewRelations
The MMI team

MMI_05

59

Multimodal
Interfaces

Hierarchical browsing tool

Demo

The MMI team

MMI_05

60

Multimodal
Interfaces

Semantic views browsing tool

The MMI team

MMI_05

61

Multimodal
Interfaces

Conclusion and discussion


Recent approaches to the problem of multimedia IR
are mostly based on the extraction of text/audiovisual
features

Extraction/creation of descriptions is hard and


expensive
Manual

approaches are time consuming


Automatic approaches are not always possible, some are not
sufficiently accurate

Multimedia descriptions are very precious


Applications

need to exchange them


Created descriptions should be conserved

Important need for a standard multimedia description


language
The MMI team

MMI_05

62

Multimodal
Interfaces

Information visualization
What is Information Visualization?
Visualize:

to form a mental image or vision of

Visualize:

to imagine or remember as if actually seeing.


American Heritage dictionary, Concise Oxford dictionary

Transformation

of the symbolic into the geometric (McCormick et al., 1987)

...

finding the artificial memory that best supports our natural means
of perception.' (Bertin, 1983)

The

depiction of information using spatial or graphical


representations, to facilitate comparison, pattern recognition, change
detection, and other cognitive skills by making use of the visual
system (Hearst 03).

The MMI team

MMI_05

63

Multimodal
Interfaces

The Power of Visualization


Visualization for Problem Solving
Visualization for Eliciting
Knowledge from Data
statistics

Visualization for Clarification


Mappy,

The MMI team

etc.

MMI_05

64

Multimodal
Interfaces

Two Different Primary Goals:


Two Different Types of Viz
Explore/Calculate
Analyze
Reason

about Information

Communicate
Explain
Make

Decisions
Reason about Information

In more detail, visualization should:


Make

large datasets coherent

9 (Present huge amounts of information compactly)


Present

information from various viewpoints


Present information at several levels of detail
9 (from overviews to fine structure)
Support

visual comparisons
Tell stories about the data
The MMI team

MMI_05

65

Multimodal
Interfaces

Human Perceptual Facilities


Use the eye for pattern recognition; people are good at
Scanning,

recognizing, remembering images

Graphical elements facilitate comparisons via


Length,
shape,
orientation,
texture

Animation shows changes across time


Color helps make distinctions
Aesthetics make the process appealing

The MMI team

MMI_05

66

Multimodal
Interfaces

Information visualization: context


Large amount of information vs. relatively small computer
screen.

locating a given item of information? interpreting an item of


information? relating an item with some other items?

Two Category of Approaches


Non-distortion-oriented

9
9
9
9

approaches.

Displaying a portion of the information at a time;


Scrolling or paging access
Providing hierarchical access
Structure-specific presentation

Distortion-oriented

Approaches:

9 Distort an image of large amount of information so that it can fit in


screen.
9 Allow the user to examine a local area in detail;
9 At the same time, present a global view of the information space;
9 Provide navigation mechanism.

The MMI team

MMI_05

67

Multimodal
Interfaces

Information visualization: context

The MMI team

MMI_05

68

Distortion-based Techniques

Multimodal
Interfaces

Idea of Distortion-based Techniques


Co-existence

of local details with global context at reduced


magnification.
A focus region to display detailed information.
Demagnified view of the peripheral areas is presented around the
focus area.
A distorted view is created by applying a transformation function to
an undistorted image.
A magnification function, provides a profile of the magnification
factors for the entire area of image.

Ex. Bifocal Display


Distortion

at one or two dimensions with linear transformation

function.
Combination of detailed view and two distorted side views.

The MMI team

MMI_05

69

Distortion-based Techniques

Multimodal
Interfaces

Ex. Polyfocal Display


Perspective Wall

A conceptual descendent of the Bifocal display.


Smoothly integrated detailed and contextual views.
Side panels are demagnified directly proportional to their distance from
the viewer.

The MMI team

MMI_05

70

Distortion-based Techniques

Multimodal
Interfaces

Fisheye View
Basic

idea: more relevant information presented in great detail; the


less relevant information presented as an abstraction.
Relevance is computed on basis of the importance of information
elements and their distance to the focus.

Graphical Fisheye View


An

extension of the fisheye view concept.


Could be also considered as a special case of polyfocal display.

The MMI team

MMI_05

71

Multimodal
Interfaces

Why Visualize Text?


To help with Information Retrieval
give

an overview of a collection
show user what aspects of their interests are present in a collection
help user understand why documents retrieved as a result of a
query

The MMI team

MMI_05

72

Multimodal
Interfaces

Exploiting Visual Properties


Analyzing retrieval results
KartOO

http://www.kartoo.com/

Grokker

http://www.groxis.com/service/grok

The MMI team

MMI_05

73

Multimodal
Interfaces

Exploiting Visual Properties

The MMI team

MMI_05

74

You might also like