Manual - VMon and VQuad Results Description

SwissQual...
Diversity
VMon and VQuad Results Description
Manual
Manual
Test & Measurement
VMon and VQuad Results Description 01
The firmware of the instrument makes use of several valuable open source software packages. For information, see the "Open
Source Acknowledgement" on the user documentation CD-ROM (included in delivery).
Rohde & Schwarz would like to thank the open source community for their valuable contribution to embedded computing.
SwissQual AG
Allmendweg 8, 4528 Zuchwil, Switzerland
Phone: +41 32 686 65 65
Fax:+41 32 686 65 66
E-mail: info@swissqual.com
Internet: http://www.swissqual.com/
Printed in Germany Subject to change Data without tolerance limits is not binding.
R&S is a registered trademark of Rohde & Schwarz GmbH & Co. KG.
Trade names are trademarks of the owners.
SwissQual has made every effort to ensure that eventual instructions contained in the document are adequate and free of errors and
omissions. SwissQual will, if necessary, explain issues which may not be covered by the documents. SwissQuals liability for any
errors in the documents is limited to the correction of errors and the aforementioned advisory services.
Copyright 2000 - 2013 SwissQual AG. All rights reserved.
No part of this publication may be copied, distributed, transmitted, transcribed, stored in a retrieval system, or translated into any
human or computer language without the prior written permission of SwissQual AG.
Confidential materials.
All information in this document is regarded as commercial valuable, protected and privileged intellectual property, and is provided
under the terms of existing Non-Disclosure Agreements or as commercial-in-confidence material.
When you refer to a SwissQual technology or product, you must acknowledge the respective text or logo trademark somewhere in
your text.
SwissQual, Seven.Five, SQuad, QualiPoc, NetQual, VQuad, Diversity as well as the following logos are registered trademarks of SwissQual AG.
Diversity ExplorerTM, Diversity RangerTM, Diversity UnattendedTM, NiNA+TM, NiNATM, NQAgentTM, NQCommTM, NQDITM, NQTMTM,
NQViewTM, NQWebTM, QPControlTM, QPViewTM, QualiPoc FreeriderTM, QualiPoc iQTM, QualiPoc MobileTM, QualiPoc StaticTM, QualiWatch-MTM, QualiWatch-STM, SystemInspectorTM, TestManagerTM, VMonTM, VQuad-HDTM are trademarks of SwissQual AG.
The following abbreviations are used throughout this manual: R&S___ is abbreviated as R&S ___.
SwissQual... Diversity
Contents
Contents
1 Introduction............................................................................................ 5
2 Visual Quality Overview........................................................................ 6
2.1
Visual Quality................................................................................................................ 6
2.2
Mean Opinion Score..................................................................................................... 7
2.3
Subjective and Objective Quality Assessment.......................................................... 8
2.4
Full-Reference and No-Reference Assessments....................................................... 9
3 Technical Requirements and Performance....................................... 12

3.1
Technical Requirements.............................................................................................12
3.1.1
Frame Rate................................................................................................................... 13
3.1.2
Video Sample Length....................................................................................................13
3.2
VMon and VQuad Performance Accuracy................................................................ 13
4 VMon Video Quality Assessment....................................................... 19

4.1
Technical Background................................................................................................19
4.2
Perceptual Degradation and MOS Prediction...........................................................20
4.2.1
Blockiness..................................................................................................................... 20
4.2.1.1
Root Causes................................................................................................................. 21
4.2.2
Tiling..............................................................................................................................21
4.2.2.1
Root Causes................................................................................................................. 22
4.2.3
Blurring..........................................................................................................................23
4.2.3.1
Root Causes................................................................................................................. 23
4.2.4
Jerkiness....................................................................................................................... 23
4.2.4.1
Root Causes................................................................................................................. 24
4.2.5
Additional Results......................................................................................................... 24
4.2.6
MOS Prediction............................................................................................................. 25
4.3
VMon Results in NQDI................................................................................................ 25
4.4
Scene analysis in VMon............................................................................................. 27
4.5
Content Dependency of Perceived Quality and Prediction Problems................... 27
4.6
Application of No-Reference Models........................................................................ 27
4.6.1
Use Case 1: Optimization or Benchmarking - Averaging of Results.............................28
4.6.2
Use Case 2: Network Monitoring.................................................................................. 28
Manual VMon and VQuad Results Description 01
Contents
5 VQuad Video Quality Assessment..................................................... 30

5.1
Perceptual Degradation and MOS Prediction...........................................................30
5.1.1
Perceptual Difference....................................................................................................31
5.1.2
Additional Results......................................................................................................... 31
5.1.3
MOS Prediction............................................................................................................. 31
5.2
VQuad08 Results in NQDI.......................................................................................... 32
5.3
VQuad08 Application.................................................................................................. 33
5.3.1
Lip-Sync in VQuad........................................................................................................ 33
A Acknowledgements............................................................................. 34
Glossary: Abbreviations......................................................................35
Introduction
1 Introduction
The following sections describe the technical background, the application scenarios as
well as the parameters that SwissQual video quality measurements record.
To objectively predict visual quality, SwissQual uses the VMon algorithm for the no-reference approach and the VQuad algorithm for the full-reference approach. SwissQual
has successfully used these algorithms in quality measurement systems for several
years. For the full-reference approach, SwissQual provides a set of video clips that
cover different types of videos. For higher confidence in the measurement results,
SwissQual has adjusted the algorithms to perfectly harmonize with these video clips.
To keep pace with rapidly evolving video compression and transmission techniques,
SwissQual is constantly improving the VMon and VQuad algorithms.
The main indicators and the presentation scheme have not been changed.
The latest versions of these algorithms are VMon08 and VQuad08, which improve and
extend detectors as well as the perceptive weighting for individual degradations. The
new versions are more robust with respect to the latest coding technologies than the
previous versions and are less dependent on content. The redesign of the internal
structure also provides a framework for High Definition (HD) resolution.
Visual Quality Overview

Visual Quality
2 Visual Quality Overview

Visual quality in video services is a major factor for customer satisfaction. As a result,
determining where and why visual quality degradation occurs is important to network
operators and equipment manufacturers. Encoding techniques must be designed for
optimal visual quality on specific transmission bandwidths as well as for error robustness.
In order to assess the quality of video encoding techniques, large-scale visual tests
with individual human testers are commonly employed. However, the reproducibility of
the measurement results in such tests depends entirely on the motivation of the individual test candidates. The advantage of an automated test is more consistent visual
quality results that closely correlate with the subjective test results.
Compared to frame rate, bit rate, or luminance, visual quality is a vague term. However, customer satisfaction can only be measured by considering perceived degradations, while encoding and transmission techniques must be selected and optimized
based on quality perception.
2.1 Visual Quality

Visual quality is the measure of a viewers satisfaction that is based on his or her experience and expectation with respect to a received and a perceived video stream. Visual
quality is generally expressed as a Mean Opinion Score (MOS), which reflects the
mean of individual scores that have been ranked by human viewers during a subjective
test in a lab. Visual quality is often obtained by applying Absolute Category Rating
Tests (ACR), which display the MOS on a scale from 1 (bad) to 5 (excellent).
This measurement denotes the average of many individual opinions on perceived quality, which are obtained from a representative number of viewers of both genders and
different ages. Quality perception is a complex phenomenon within the process of
human perception and as such is a subjective measurement.
Quality, or more specifically visual quality, is the rating of the difference between what
a viewer sees and what he or she expects in this context. In a certain context, a viewer
expects 'naturalness' in a video and rates any deviations as degradations. The internal
reference for 'naturalness' is strongly dependent on the video content. For example, for
faces or other well known natural patterns, the tolerance of 'naturalness' is restrictive.
Conversely, degradations are accepted in a video with 'unnatural' artificial content,
such as cartoons or graphics.
Subjective tests usually have different content categories. The spread of scores for
individual video contents with the same error conditions is much wider than the same
spread of scores in listening quality tests, which use samples from different talkers. To
obtain a complete picture of visual quality, video clips from different content categories
must be transmitted and scored.

Mean Opinion Score
2.2 Mean Opinion Score

Quality, as a perceived value, cannot be defined by objective technical means such as
the decibel (dB) level or the delay in seconds. Quality values, that is, Mean Opinion
Scores (MOS), are obtained by asking human subjects for their perception of the listening and visual quality of video clips in a controlled environment. The subjects are
asked to assign a score to a pre-recorded speech or video sample of a few seconds in
length. The MOS of these scores represents the perceptual quality of the sample. Usually, the score is assigned to a 1 to 5 scale that has verbal categories in the native language of the subjects.
Absolute Category Ratings are often used instead of MOS values.
Table 2-1: Explanation of MOS values

MOS
English
German
French
Spanish
excellent
ausgezeichnet
excellent
excelente
good
gut
bonne
buena
fair
ordentlich
assez bonne
regular
poor
drftig
mediocre
mediocre
bad
schlecht
mauvaise
mala
Each individual score is influenced by the global experience of the user, expectation,
and individual preferences. That is, different people tend to assign different quality
scores to the same clip. Scores are also subject to short term focus and accidental
assignment. Consequently, a MOS value is the average of a wider or narrower distribution of individual scores.
The main disadvantage of this approach is that individuals assign different scores to a
clip of perfect quality due to a lack of confidence, accidental down-scoring, or from
being overly critical. The highest MOS value in subjective tests is usually around 4.5.
Conversely, people tend to assign a score of 'bad' to most of the lower quality clips.
The main reason for this score is that the lower end of the quality values is much wider
and one can also choose 'worse than bad' while at the upper end one cannot assign a
quality value that is better than undisturbed speech or video.
However, we also have to consider that the MOS is an average value for the scores
from a group of at least 24 people. In scientific papers, the standard deviation of the
MOS is also included to represent the distribution width of the individual scores. An
additional value that is often included with the MOS is the 95 % confidence interval,
which represents the range of statistically probable MOS assignments by 95 % of the
global population. This interval allows you to determine how close the MOS is to the
'true quality' of the clip. Logically, this confidence interval is smaller for larger test
groups. In a well designed traditional test, the interval is about 0.2.

Subjective and Objective Quality Assessment
The term 'MOS' is only a generic approximation of a measurement unit and is meaningless if you do not specify the kind of quality perception that the MOS describes. A
MOS can be obtained for listening quality and visual quality.
Objective measurements do not evaluate quality in the traditional sense, but rather
estimate or predict quality as if the clip had been observed by a large group of people.
More than 5000 of subjectively scored samples are used to train the VQuad, VMon,
and SQuad algorithms. These objective measures are based on sophisticated psychoacoustic, psycho-visual, and perceptive models that processes signals in a similar way
to the human auditory and visual systems. The signal analysis and the subsequent
comparison to the undistorted original signal leads to a quality value that is mapped to
a common 5 to 1 scale.
The performance of the objective measures is usually represented by correlation coefficient and residual prediction error data on a scatter plot where the subjective and
objective data are plotted on the X and Y axes, respectively. On such a diagram, a
good objective measure is narrowly distributed along the 45 line.
2.3 Subjective and Objective Quality Assessment

Assessing the quality of a telecommunication network is important for achieving and
maintaining a high service quality.
One method to assess the service quality is to evaluate the quality of the signal that is
transmitted through the telecommunications network, which involves the following
groups of objective approaches:
No-Reference: Non-intrusive or single-ended approach, which only evaluates and

rates the received signal.
For example, a test call to an answering machine or live monitoring.
Full-Reference: Intrusive or double-ended approach, which evaluates and rates a

transmitted reference signal with the original reference signal.
Both of these quality assessment methods predict the Mean Opinion Score (MOS) that
would be obtained from a subjective test. figure 2-1 provides an overview of the basic
relationship between subjective and objective assessments as well as the full and no
reference approaches.

Full-Reference and No-Reference Assessments
Network under test

Reference video signal
Transmitted video signal
Internal
reference
expectation
Human viewer
Methods that require

a reference signal
(double-ended
Methods that do not

require a
reference signal
(single-ended)
Quality rating
Fig. 2-1: Subjective versus objective quality assessment
2.4 Full-Reference and No-Reference Assessments

You can use intrusive and non-intrusive methods in the following objective quality test
scenarios:
No-Reference: Establishes a test connection to an answering station, which plays

an unknown video signal for the receiving side, for example, from a streaming
server or a live TV application.
Non-intrusive In-Service Monitoring: Assesses the video signals in real applications, such as IPTV or video telephony, by parallel monitoring in the core network.
This method includes no-reference approaches.

Full-Reference: Controls both ends of a connection and transmits a known video

sequence.
This scenario requires a streaming server that contains known video clips.
Full-Reference Video Quality Assessment: Controls both ends of the connection

and transmits a known video sequence.
The disadvantage to this approach is the necessity to intervene at the source of the
signal and the network that you want to test. At least one transmission channel
must be occupied to transmit the reference signal in order to determine the signal
quality.
The advantage of the double-ended method is that the input or reference signal is
known, which allows for a very accurate and detailed analysis of video quality
impairments. Through the application of visual perception models, each change in
the signal during transmission can be detected and checked for an impact on the
perceived quality. Full reference methods are well-suited for optimization processes in laboratories as well as in real networks. These methods can even measure minimal degradations of the signal and can be applied to compare transmission scenarios.
No-Reference Video Quality Assessment: Assesses the visual quality of the

transmitted signal without a pre-defined reference signal for comparison.
This assessment is also referred to as a non-intrusive or single-ended model.
The single-ended models use signal analysis methods to look for known types of distortions. For example, the models search for typical coding artefacts such as visible
block structures or freezing events. More advanced methods apply perceptual models
to the detected distortions that consider the effects of the human visual system such as
local contrast adaptation or masking.
The accuracy of a no-reference approach is lower than the full-reference approach.
However, the accuracy is more than sufficient for a basic classification of the video
quality and the detection of consistently poor quality links.
Since the reference signal is not available, no-reference video quality models are subject to a content dependency. If the video contains natural objects and a small amount
of motion, the extraction of the individual features performs well. However, if the video
contains unnatural content such as cartoons, moving or fixed graphical objects or still
sequences, the feature extraction can lead to inaccurate results. Such results are
caused by the similarity of the content characteristics to typical compression and transmission distortions.
Cartoons, for example, contain a restricted number of Colour as well as entire areas
that are filled with the same colour and without natural texture. which is acceptable in a
cartoon. However, unlike a cartoon, such effects in a video with natural content are
seen as a strong distortion. Since the measure has no a-priori knowledge of the content, these contents are predicted with low quality, even though this is not true for the
cartoon.
A similar case is a graphically animated background, for example, during a TV newscast. This type of background can contain solid colour areas with horizontal and vertical sharp edges or even moving blocks. These objects are easily interpreted as
unnatural coding artefacts and can become subject of misinterpretation.
10

The analysis results from one short clip might provide information about serious distortions. However, for a more accurate quality analysis, SwissQual strongly recommends
evaluating several video sequences with a no-reference model and using the average
of the results to completely characterize a transmission channel.
11
Technical Requirements and Performance

Technical Requirements
3 Technical Requirements and Performance

The following sections outline the technical requirements and performance of the
SwissQual VMon and VQuad solutions.
3.1 Technical Requirements

The SwissQual VMon and VQuad solutions run on the Windows 32 bit platform and
require an uncompressed 24 bit RGB video signal in AVI file format. The accepted
range of image resolutions is from QQVGA (120x160 pixels) up to VGA (480x640 pixels).
Based on the recommendations of VQEG , the accepted resolutions are subdivided
into three resolution groups:
Smart phones: QQVGA, QCIF, QCIF+
PDA, Hand-held: QVGA, CIF
PC applications: VGA, (SDTV)
VQuad uses a reference signal, which must be in uncompressed format, that is, perfect
quality, with a frame rate of 25 or 30 fps.
The reference signal must have the same image resolution as the degraded video.
VQuad does not rescale the video.
The VMon and VQuad methods analyze a video clip in a raw non-encoded format such
as RGB24, where each frame is considered a bitmap and the RGB values for each
pixel are available. In addition to this spatial information, these methods also require
the display time of each individual frame to calculate temporal effects.
A Diversity system only uses RGB24 to store and analyze uncompressed video clips.
The VMon08 algorithm can also use YUV format.
VMon evaluation, as measured on an Intel Xeon processor at 2.33 GHz, is faster than
playback time due to consequent run time optimization, even for larger image sizes
such as a VGA signal that has been sampled at 25 fps. Due to a pre-evaluation of the
reference video, VQuad has a slightly longer evaluation time.
As the VMon solution can dynamically adjust the algorithm computations to the available processing resources, VMon can be run on the Symbian mobile OS platform. As a
result, VMon is an ideal component for lower performing platforms such as mobile
phone operating systems and digital signal processors.
On low performing platforms, the estimation of quality related values can be less accurate due to the dynamic adjustment of calculation depth.
12

VMon and VQuad Performance Accuracy
3.1.1 Frame Rate

The accepted frame rates are between 3 fps and 30 fps.
Frame rates of 3 fps or slightly higher are interpreted as a strong jerkiness effect.
Still images or completely frozen video sequences are signalized, but MOS values are
not calculated.
3.1.2 Video Sample Length

A sample length of 5 to 15 seconds, which is automatically checked by the SwissQual
software, is required for the evaluation. Video samples that are less than 5 seconds
long will not be accepted. Samples that are longer than 15 seconds will be truncated to
15 seconds and a warning message will be displayed.
3.2 VMon and VQuad Performance Accuracy

The high accuracy of the VMon08 and VQuad08 algorithms is based on a large
amount of subjective pre-scored databases that cover the complete scope of modern
video degradations, including compression and erroneous transmission degradations
that typically occur in telephone networks and broadcasting scenarios.
SwissQual has compiled these databases with a focus on modern cell and IPTV networks as well as data that has been obtained in collaborative efforts with international
standardization bodies.
At the end of 2006, the Video Quality Experts Group (VQEG), an international and
independent organization, started a worldwide evaluation of objective video quality
measures. SwissQual previously took part in this evaluation with the VMon06 objective
model, the precursor to VMon08. The constantly improving video compression techniques and the availability of the VQEG data sets during the evaluation phase encouraged SwissQual to improve and to update the VMon and VQuad models to version
VMon08 and VQuad08, respectively.
Although the main focus of VMon08 and VQuad08 was excellent performance on
SwissQual databases, a significant improvement was also achieved for VQEG data.
This publication is partly based on the subjective scores collected by the Video Quality
Experts Group (VQEG). The results presented in this manual are not to be compared
to the results presented in the VQEG Final Report of Multimedia Phase I because the
models in the report were validated using this data. Thus, the data was not available to
the models that were submitted to the VQEG evaluation. See further acknowledgment
at the end of this manual.
13

Usually, the prediction accuracy is provided by the correlation coefficient of the objective scores to the subjective MOS as a single number value for performance accuracy.
A score close to 1.0 indicates a high prediction accuracy while lower scores indicate a
lower prediction accuracy. In general, correlations of less than 0.7 describe a model as
weak, while correlations less than 0.5 describe a model as unusable. A more detailed
view is possible with scatter plots, which plot the subjective MOS versus the objective
scores. figure 3-1 contains an example of a detailed analysis of VQEG QCIF datasets.
Each point in the diagram represents a video sample that has been scored subjectively
and objectively. For points above the 45 line, the objective measure indicates a higher
quality than the quality that was derived in the subjective test. Similarly, points below
the 45 line indicate a more pessimistic quality prediction.
In figure 3-1, the accuracy of VQuad08 predictions is noticeably better than VMon08
predictions. The VQuad scores are closely grouped and are nearly symmetrically distributed along the 45 line. However, due to content dependencies, a few outliers are
incorrectly predicted, that is, VQuad rates individual files in one condition either too
highly or too lowly.
To avoid under predictions, VMon08 searches for known distortions that are based on
a general expectation. If a distortion is found with confidence, the score is calculated
correctly. An over prediction can occur if VMon08 does not detect a visible distortion. In
essence, VMon08 tends to yield an over prediction for missed distortions and no under
predictions. For applications such as a trigger-based troubleshooting system VMon08
tends toward 'false acceptance' but avoids 'false rejection', which is useful for systems
where false alarms require more operational effort.
VQuad08 vs Visual Quality (MOS)
per file analysis
QCIF data VQEG q05
VMon08
VQuad08
VMon08 vs MOS (subj)

per file analysis
QCIF data VQEG q05
= 0.84
Visual MOS
= 0.93
Visual Quality MOS
Fig. 3-1: Per-sample comparison between VMon08 and VQuad08 data on example data in QCIF resolution
The results that have been discussed up to now have been on a per sample basis. To
evaluate a channel or video system, a set of different samples with different contents
are typically used and transmitted through the system.
In voice quality tests, a so-called per-condition analysis is usually performed as well.
This analysis averages the scores of a condition, that is, a given codec setting, for
each talker and each sentence. This averaging minimizes dependencies on individual
characteristics and instead focuses more on the system being tested.
14

This approach can also be applied to video analysis by averaging across different contents. The deviation of the per-sample scores for the same condition is wider for video
than for speech, which is mainly caused by the wider variation of the video content that
was transmitted. However, content averaging provides a good real-life overview for a
channel or codec performance in which a wide range of contents must be processed.
In the example data set, eight different contents were always processed with the same
condition. A so-called per-condition evaluation is obtained when these eight individual
scores are averaged in the subjective and objective domain. figure 3-2 displays these
results, which are based on the same example data that was taken from the VQEG
data set in QCIF.
VMon08 vs MOS (subj)
VQuad08 vs MOS (subj)

per condition analysis
QCIF data VQEG q05
VMon08
VQuad08
per condition analysis

QCIF data VQEG q05
= 0.97
Visual MOS
= 0.98
Visual Quality (MOS)
Fig. 3-2: Per-condition comparison between VMon08 and VQuad08 data on example data in QCIF resolution
The charts show that the prediction accuracy of VMon08 and VQuad08 increases significantly while under- or over-predictions that are caused by individual contents (see
figure 3-1) are 'averaged out' completely.
For a complete overview of the algorithm accuracy, the correlation coefficients for the
14 QCIF data sets are shown in figure 3-3. Initially, a 'per-sample' evaluation is performed during which each score for each video sample file is considered individually.
The statistical evaluation procedure is equivalent to the VQEG primary analysis.
15

Fig. 3-3: Correlation coefficients between MOS values obtained in auditory tests and objective scores
based on the VQEG QCIF data set
For comparison, the VMon06 VQEG performance, the predecessor to VMon08, has
also been included in figure 3-3. As the chart clearly shows, the VMon08 performance
is significantly better than VMon06. The chart also shows the performance of VQuad08
on the same data set. Due to the fact that VQuad is a full-reference model and can
perform more detailed analysis, the prediction accuracy for VQuad is higher than the
no-reference models.
The statistical evaluation of VQuad08 is equivalent to the method applied by VQEG to
the full-reference models within its evaluation. Note that there are small differences in
the evaluation method for full-reference and no-reference models.
figure 3-4 shows the accuracy in a 'per-condition' analysis for all 14 data sets.
Taking the discussion of figure 3-1 into consideration, the performance increases in
figure 3-4 are due to averaging across the individual video contents.
The applied method for content averaging is different to VQEGs so-called secondary
analysis and should not directly be compared to these results.
VMon08 and VQuad08 have been optimized for the QCIF-like resolution sizes that
mobile phone applications and devices use today. Along with the widening up of the
16

data channels in the mobile networks and the progress of IPTV solutions, SwissQual is
continuing to improve VMon08 and VQuad08, especially for larger video resolutions.
For comparison, the evaluation of the VQEG data at higher resolutions, that is, CIF and
VGA, is shown in table 3-1. The data is obtained from 14 databases for CIF and 13 for
VGA. The evaluation follows the same rules as the results in figure 3-3, and figure 3-4.
Fig. 3-4: Correlation coefficients between MOS values obtained in subjective tests and objective
scores based on the 14 VQEG QCIF data sets on a per-condition evaluation
The main value is the correlation coefficient, which is averaged for the databases. The
value in parenthesis, the average r.m.s.e., allows for a rough estimate of the size of
prediction errors. For a good prediction accuracy, the correlation coefficients must be
close to 1.0 and the r.m.s.e. must be small.
Table 3-1: Results of VMon08 and VQuad08 for all three resolution sizes
Resolution
VMon08
VMon08
VQuad08
VQuad08
(per sample)
(per condition)
(per sample)
(per condition)
QCIF
0.73 (0.70)
0.91 (0.37)
0.88 (0.50)
0.95 (0.27)
CIF
0.63 (0.78)
0.81 (0.51)
0.86 (0.51)
0.97 (0.22)
VGA
0.52 (0.92)
0.75 (0.57)
0.86 (0.54)
0.93 (0.31)
As mentioned before, VMon08 has been optimized for the smaller image sizes that are
used in mobile services, and has the highest accuracy for QCIF and similar resolu-
17

tions. Although VMon08 is still acceptable for CIF resolutions, only a rough categorization of the visual quality is possible for VGA video.
The full-reference VQuad08 method has far more information available for a quality
estimation and has an 0.86 correlation that is higher for almost all image sizes.
18
VMon Video Quality Assessment

Technical Background
4 VMon Video Quality Assessment

SwissQual developed the VMon method in 2005 and has been developing and releasing new versions as video transmission and compression technology constantly
improve.
Although the basic and well-accepted structures in the latest version are identical with
the original VMon, many smaller improvements in the detectors have been made. In
addition to these improvements, VMon series 08 was also re-structured to support a
frame-wise analysis. This approach eliminates the need to pre-store the video
sequence before analysis and allows the use of VMon series 08 for the real-time evaluation of video in the QualiPoc product series.
The latest VMon can also efficiently analyze larger image resolutions such as VGA and
SDTV.
As previously discussed, the accuracy of a no-reference model such as VMon is lower
than the VQuad full-reference model and shows a content dependency of the results.
However, most current video applications, for example all live TV applications, cannot
transmit a pre-stored video and do not support a full-reference model. To minimize the
disadvantages of the no-reference model, SwissQual has invested a significant amount
of effort.
In addition to the overall quality as a MOS value, VMon produces a set of results with
more details about the type of problems that are observed. These results, along with
the unique cause analysis of VMon, enable the easy interpretation and localization of
potential quality problems.
4.1 Technical Background

Unlike VQuad, which uses a perceptual model to compare a high quality reference signal to the degraded signal, VMon predicts the Visual Quality of a transmitted signal
without prior knowledge of the input reference signal.
The VMon approach is akin to a human expert who watches a video on a test device,
such as a commercial video player client on a PC, and rates the scores that are calculated by VMon.
VMon analyzes the transmitted video for typical distortions that have been introduced
by compression techniques and transmission problems. VMon separates these distortions into spatial and temporal types and then weights the distortions with models of
the human perceptual system to form the basis to find the root causes for a potential
degradation.
19

Perceptual Degradation and MOS Prediction
4.2 Perceptual Degradation and MOS Prediction

VMon is a no-reference algorithm for objective prediction of visual quality. This algorithm only analyses the transmitted and potentially degraded video sequence without
comparison to a high quality reference sequence.
VMon analyses the video sequence and identifies the following perceptual degradations:
Blockiness: Visible block borders that are caused by compression during the
encoding process
Tiling: Visible macro-block and slice edges that are caused by encoding or transmission errors
Blurring: Loss of sharp edges, which is caused by strong compression or decoding filters
Jerkiness: Temporal artefacts such as low bit-rates or freezing
These perceptual degradation measures are the basis for MOS prediction.
Root causes use a technical scale that ranges from 0 % to 100 %, where 0 % represents no degradation and 100 % represents the maximum possible degradation.
The percentage values of one degradation measure do not relate directly to the perceived quality, which depends on a combination of all degradations. That is, you cannot interpret VMon results in the form of "30 % jerkiness is poor quality". However, the
individual values are of importance for relative measurements of the form "video A has
20 % blockiness and video B has 25 % blockiness, therefore video A has less blockiness than video B".
Due to the nature of the content, a small amount of degradations are often present. In
general, results below 10 % might be caused by the actual content of the video and will
have no considerable influence on the quality prediction.
4.2.1 Blockiness
Blockiness is an effect that is caused by the division of an image into smaller squares,
that is, blocks, by the encoding process. Almost all of current video encoders use a
block based transformation. Due to a lossy encoding of these blocks, a resulting block
structure can be seen in the decoded video sequence. Various block sizes are used,
with 8 x 8 and 16 x 16 pixels as the most frequent ones.
20

Traditionally, that is, in MPEG4 part2 or H.263, those blocks are 8x8 pixels where the
luminance information is encoded (so-called micro-blocks) The chrominance information is encoded in so-called macro-blocks of 16x16 pixels. The entire information related to one macro-block consists of the related chrominance information and the corresponding micro-blocks and their luminance information. The macro-block is the smallest entity of the encoded image; position and upgrade information are referring to
macro-blocks. Macro-blocks are displayed at a fixed position in a frame.
More recent video encoders, such as H.264, allow even a scalable micro-block size of
4x4, 8x8 or 16x16 pixels.
The image information in a block is normally transformed with a DCT-based transformation. Usually luminance and chrominance information are encoded separately, even
for different block sizes and only the most significant coefficients of the transformed
values are retained.
For strong compression, only a few coefficients are retained and in extreme cases,
only one, coefficient is retained, which most of the time is the one that represents a
uniform colour or luminance of the whole block. As a result of strong compression, a
block contains less or no spatial detail, and has visible transitions along its borders.
Due to the lack of transition details, the border area with the neighboring blocks
becomes more visible.
The blockiness value is an estimate of the visibility of these block borders. This value is
based on a measure of the luminance differences at block borders and is related to the
amount of spatial detail as a block border has a stronger visibility in the absence of
spatial details.
Although the blockiness measure takes into account that blocks might have different
sizes, the block borders must always be oriented horizontally or vertically and form a
right angle. The blockiness value also takes into account the luminance of the neighboring area. In very bright or very dark areas, the degradation by block borders is less
visible even though the borders are clearly measurable.
Table 4-1: Blockiness scale
4.2.1.1
Percent
Image
0%
Uniform grey image
100 %
8 x 8 black and white checker board
Root Causes
The main root cause for blockiness is strong compression during encoding. In addition,
packet loss during transmission might increase blockiness.
4.2.2 Tiling
During the encoding process, a video frame is divided into blocks. An important loss of
information corresponding to one ore multiple blocks either during encoding or during
21

transmission leads to tiling, which are visible tile-like artefacts in the image or video
frame.
The tiling value is focused on distortions at block borders that are caused by transmission errors. Transmission errors are handled differently by the receiving decoder. The
simplest way of handling this type of error is to freeze the last successfully updated
image to the next key-frame to provide a complete image. Other strategies include
replacing the incorrectly transmitted parts of the image with the same area of the previous frame. Advanced concealments predict missing data by using the neighboring
areas, that is, using the same motion compensation or similar spatial textures. Of
course, simple implementations just display the erroneous data, which can leads to
some strange effects. Since no concealment strategy is perfect, the residual error will
be propagated by the differential frames up to the next key-frame.
Since the transmission is organized so that macro blocks as the smallest entity, transmission errors or residual errors often have a macro-block visible structure. At least
border lines of the erroneous areas are always orientated horizontally or vertically. The
VMon08 tiling detector is especially designed to recognize such erroneous areas by
checking incoherent vertical and horizontal edges.
A threshold is applied to avoid the false detection of the tiling measure in the content of
a video sequence and hence lower scores.
Visible macro-block borders that are caused by spatial compression were counted as
tiling in previous versions of VMon. Due to the high correlation to blockiness, the blockiness value in the 08 series includes visible macro-block borders that are caused by
compression.
However, suddenly appearing macro-block structures due to a highly compressed keyframe or temporarily increased spatial compression can also be considered as tiling.
In case of high motion, the affected macro-blocks in this area might be encoded as socalled intra-blocks. Due to the limited amount of bits, the intra-blocks are highly compressed and suddenly become visible.
Table 4-2: Tiling scale
4.2.2.1
Percent
Video
0%
Low motion video with no sharp rectangular edges

or a uniform grey sequence
100 %
Several fast moving or jumping black squares on a

white image
Root Causes
The main root cause for tiling is packet loss during transmission. Strong compression
of encoding might also increase tiling.
22

4.2.3 Blurring
In VMon, blurring is measured indirectly by measuring sharpness, that is, the sharpness of the luminance edges in the frames. More specifically, sharpness measures the
luminance offset at the edge borders and relates this offset to the local contrast at the
edge location. In addition, the sharpness measure tries to avoid block border edges,
which are the result of strong compression.
The blurring value is the decrease of the sharpness in an average high quality video
sequence with respect to the sharpness of the video sequence that is being tested.
Sharpness is a value that strongly depends on the content of a video signal. For example, a cloudy sky over a meadow does not contain sharp edges. In such an image, the
sharpness is measured at the position of the sharpest edges in the frame.
Table 4-3: Blurring scale
4.2.3.1
Percent
Image
0%
Black and white diagonal lines, 2 pixels gap
100%
Uniform grey image
Root Causes
The main root cause for blurring is the use of de-blocking filters of the video decoder.
4.2.4 Jerkiness
Jerkiness is a perceptual value that measures jerks from one frame to the next. High
jerkiness is the result of a bad representation of moving objects in the video sequence.
In other words, jerkiness measures the loss of information due to a freezing period or a
low frame rate.
In case of freezing, jerkiness considers the freezing period and the assumed loss of
information during this period. This loss of information is estimated by the inter-frame
difference at the end of the period. The measure of Jerkiness comprises freezing, the
anticipated loss of information, and the Dominating Frame Rate, which is described in
the "Additional Results" sections.
In absence of explicitly frozen periods, jerkiness is mainly related to the technical value
of the Dominating Frame Rate. For moderate or high motion videos, jerkiness and
frame rate are highly negatively correlated. In low motion videos, a low frame rate
value does not necessarily imply that jerkiness is high. In this situation, the jerkiness
measure takes into account the amount of motion in the video and the frame rate only
measures the display time of frames.
In most cases, a regular temporal degradation such as a lower frame rate is usually
better accepted than an irregular freezing. The jerkiness calculation takes this effect
into account.
23

This might lead to the effect that in a moderate moving clip, a consistent frame rate of
5fps causes a Jerkiness of only 15% whereas two longer freezing events, for example, 2 times 500ms, in a 15fps clip result in a Jerkiness of >50%.
Table 4-4: Jerkiness scale
4.2.4.1
Percent
Video
0%
30 fps video sequence of smooth motion
100%
video sequence consisting of only several seconds

of freezing
Root Causes
Large jerkiness values are the result of the reduction to a low encoder frame rate or of
transmission delays and strong packet loss during transmission.
4.2.5 Additional Results

In addition to the root causes that are considered in the MOS prediction, VMon also
provides the technical figures for the following items:
Dominating Frame Rate in Fps (Frames Per Second): As for jerkiness, the basis
for this value is the display time of a frame, that is, the amount of time an image
remains visible until the image information changes in the next update.
In the case of a constant frame rate, the dominating frame rate is equal to the constant frame rate. In the case of a variable frame rate, the dominating frame rate is
the median of the frame rates.
Black Frame Ratio: This value provides the ratio of detected black frames with
respect to all frames in the sequence.
More specifically, the black frame ratio in percentage is the total time black frames
are displayed divided by the video sequence length. In NQDI, intervals of black
frames have a grey background in the time analysis graph.
All of the mono colour frames, including black frames, are discarded before the
MOS estimation. In the time analysis graph of NQDI, all intervals of mono colour
frames have a grey background. In the previous version, that is, VMon06, only blue
frames were discarded for the MOS calculation and sequences of other mono colour frames were considered as highly blurred frames. In QualiPoc, all mono colour
frames are counted and reported as black-frames.
Freezing: If the display time of a frame exceeds 350 ms, the frame is considered
frozen.
The freezing value is displayed in percentage as the total freezing time divided by
the video sequence length. On the time analysis graph in NQDI, freezing intervals
have a blue background
Sequences of black or other mono colour frames are not considered freezing even
if the display time exceeds the given limits.
24

VMon Results in NQDI
4.2.6 MOS Prediction

The MOS is an overall value that takes into account the individual dimensions of distortions and combines these scores into a single value. The MOS prediction also considers all of the root causes that are described in this document. Similarly, VMon uses
internal detectors to obtain information about the naturalness of movements and
objects as well as to check for coherent motion and for a possible loss of spatial details
in the colour planes.
Previous versions of VMon calculated the MOS from the individual inputs with a simple
linear formula. The current VMon08 version uses a more sophisticated approach,
which also considers the non-additivity and therefore non-linearity of the individual
dimensions. Basically, the most important degradation determines the quality, while
further distortions have a reduced influence. This approach is implemented by a multiplicative aggregation, which is dominated by the largest degradation measure. More
specifically, the basis for MOS prediction is a product of a temporal and a spatial degradation measure:
predicted MOS = ftemporal(video) fspatial(video)
where ftemporal is a function of temporal degradations and fspatial is a function of spatial
degradations of the video sequence. However, the complete MOS prediction is slightly
more complicated as additional spatio-temporal degradations are estimated for prediction.
The maximum predictable MOS is 4.5 whereas the minimum predictable MOS is 1.0.
4.3 VMon Results in NQDI

n NQDI, the display of VMon results is focused on the relevant quality prediction information from VMon. The upper part of figure 4-1 shows the overall visual quality as estimated by VMon, which NQDI also assigns a category to the VMon score, that is, excellent, good, fair, poor, and bad. This allows for an easier rating of the video quality.
Thresholds for these categories can be adjusted in NQDI for individual applications.
The left column displays technical values, such as freezing and frame rate. The next
column displays the outcomes of the different detectors as described in this document.
Furthermore, the window contains some basic information, such as the application scenario, protocol, and player information.
25

VMon Results in NQDI
Fig. 4-1: VMon analysis overall results
The lower part of the NQDI window displays per-frame information. The upper chart
shows the inter-frame differences along with the blurring and blockiness value for each
frame. The bars that indicate the inter-frame difference are green for regular frames,
blue for repeated frames, and black for black frames that were detected. Freezings and
mono colour frames are marked with a shadowed background for easy visibility.
Fig. 4-2: Per-frame analysis in VMon
The second chart displays results from the content and scene analysis, which is
restricted to the audio activity (channel active / inactive) and detected scene changes
(vertical black lines). The scene analysis is subject to extensions for the next releases
of VMon.
Fig. 4-3: Scene analysis in VMon
26

Scene analysis in VMon
4.4 Scene analysis in VMon

A no-reference model only analyzes the video sequence that is received during a test.
As a result, this model has lower prediction accuracy than a full-reference model,
which also analyzes the reference signal.
Since VMon does not require a reference signal, this method can be applied to services where the customer has no control over the content, for example, streaming from
a server and live TV services.
Although a no-reference model is less accurate than a full-reference model, the no-reference model can evaluate a wider range of services and can deliver valuable results
in well-designed measurement applications.
A similar analysis that is based on the previous VMon06 series was published in ITU- T
in May 2008.
4.5 Content Dependency of Perceived Quality and Prediction Problems

A no-reference model can detect typical compression and transmission distortions, but
cannot separate or distinguish between these artefacts and content areas. For example, naturally occurring content with soft edges such as a cloudy sky or a meadow is
scored as blurry, a graphical object is scored as a compression artefact, and a cartoon
that contains only a few different Colour in wide areas is scored as unnatural. However, if the content has a natural spatial complexity and a minimum of movement, a noreference model can deliver valuable results.
4.6 Application of No-Reference Models

Unlike a full-reference model where a user has full control over the video sequences,
pure codec evaluation and tuning is not the focus of a no-reference model. Instead, a
no-reference model is typically applied in a situation where a user does not have
access to the source video, for example, in-service monitoring of networks, streaming
applications from unknown sources, and live TV applications. In these cases, a user is
determined to find the best compromise between codec settings and the current network behavior.
Although a no-reference model is optimized for this purpose, usage guidelines and the
interpretation of results must also be considered.
To demonstrate the performance of the SwissQual no-reference VMon MOS prediction, the following typical use cases are considered:
Optimization or Benchmarking - Averaging of Results: Quality evaluation of a

specific transmission chunk or a specific location while requesting video streams
from a live TV server.
This type of evaluation is used for service optimization or benchmarking.
27

Application of No-Reference Models
Network Monitoring: Network monitoring by an in-service observation to find

severe quality problems.
4.6.1 Use Case 1: Optimization or Benchmarking - Averaging of Results

In use case 1), the aim is to analyze the general behavior of a transmission channel
from a user perspective by using the service over a period of time. For this type of
analysis, the user behavior is determined by analyzing a series of typical video examples and not by analyzing a short individual video sequence. This series can consist of
several samples that are taken from a longer video sequence or of several samples
that are taken from typical video content categories during a longer observation period.
For simplification, the model uses a combination of compression ratios, frame-rates,
and specific error patterns to target a specific codec type. By averaging across the different contents in a transmission condition, which is referred to as HRC in this document, the model can create a general view of a channel.
Furthermore, averaging across the individual contents for each condition dramatically
minimizes the content dependency of the perceived quality as well as the content
dependency of the model.
table 4-5 shows the correlation coefficients for the different resolutions and analysis
methods on a per-sample and per-condition basis. The main value is the correlation
coefficient, which is averaged over the databases. The value in parenthesis is the averaged r.m.s.e., which provides a rough idea about prediction errors.
Table 4-5: Result comparison of VMon08 for a per-sample and per-condition evaluation, all three resolutions
Resolution
VMon08
VMon08
(per file)
(per condition)
QCIF
0.73 (0.70)
0.91 (0.37)
CIF
0.63 (0.78)
0.81 (0.51)
VGA
0.52 (0.92)
0.75 (0.57)
Since VMon is optimized for the smaller image sizes that are currently used in mobile
services, the accuracy is best for QCIF and similar resolutions and is still acceptable
for CIF resolutions; however, VMon can only provide a rough categorization of visual
quality for VGA resolutions. However, the next use case described below allows the
use of VMon even for VGA.
4.6.2 Use Case 2: Network Monitoring

In use case 2), the behavior of a transmission channel in a live scenario is observed
and critical quality issues are accordingly signaled. This signalling is a threshold-based
trigger. For simplification, the threshold is only applied to the pure predicted MOS value
of each sample. In a real-world application, all of the partial results can be used to produce more confident results.
28

Application of No-Reference Models
The following rules are applied to the data:
Threshold signalizing bad quality: < 2.5
Uncertainty of subjective test results: 0.2 MOS
Criteria A False Rejection: MOS > 2.7 & VMon < 2.5
Criteria B False Acceptance: MOS < 2.3 & VMon > 2.5
Table 4-6: False acceptance and false rejection ratio of all experiments for each format
Format
Mean: False Acceptance (PerFile)
Mean: False Rejection (Per-File)
QCIF
7.6%
2.8%
CIF
11.5%
3.0%
VGA
15.6%
4.8%
The results in table 4-6 show at an alarm is only incorrectly raised in approximately 3 to
4% of the cases on a per-sample basis. However, quality problems that are not identified remain within a range of 8 % to 15 %. This asymmetry is particularly useful to
avoid false alarms and to focus on cases where the quality drops with confidence.
In a real world application, such decisions are not exclusively based on a MOS.
Instead, these decisions also take partial results of the analysis into account, which
leads to even more confident results.
In summary, no-reference models can be used in certain applications which cannot be
addressed by full-reference approaches and can deliver worthwhile results.
29
VQuad Video Quality Assessment

5 VQuad Video Quality Assessment

The latest version of the VQuad algorithm is a response to the rapidly progressing
video transmission and compression technology.
Although the basic and well-accepted structures in the latest version are identical with
the original VQuad, many improvements have been made in the perceptual degradation measures.
In addition to the overall quality as a MOS value, VQuad produces a set of results that
provide more details about the type of problems that are observed. These results,
along with the unique cause analysis of VQuad, allows for the easy interpretation and
localization of potential problems in quality.
5.1 Perceptual Degradation and MOS Prediction

VQuad is the full-reference algorithm for the objective prediction of visual quality. This
algorithm analyses the transmitted and potentially degraded video sequence and compares the sequence to a high quality reference file.
VQuad analyses the video sequence and identifies the following perceptual degradations:
Blockiness: Visible block borders that are caused by compression during encoding
Tiling: Visible macro-block and slice edges that are caused by encoding or transmission errors
Blurring: Loss of sharp edge details that are caused by strong compression or
decoding filters
Jerkiness: Temporal artefacts such as low bit-rates or freezing
Perceptual difference: Perceived difference between matched frames of the reference and degraded video sequence.
These perceptual degradation measures are the bases for MOS prediction. The degradation measures for blockiness, tiling, blurring, and jerkiness are the same as for
VMon.
For more information, see chapter 4.2, "Perceptual Degradation and MOS Prediction",
on page 20.
Root causes use a technical scale that ranges from 0 % to 100 %, where 0 % is no
degradation and 100 % is the maximum possible degradation.
The degradation percentage values are reported with respect to the reference value,
that is, a 0 % value means that the transmitted sequence has not been degraded with
respect to the reference sequence.
30

5.1.1 Perceptual Difference

Unlike the no-reference approach, the full reference method has access to the reference video sequence. This access allows for a detailed comparison of the reference
video sequence to the encoded and transmitted video sequence. To calculate the perceptual difference measure, a so-called time alignment is performed, which involves
assigning a matching frame in the reference video to each frame in the coded and
transmitted sequence. If VQuad cannot assign a frame from the reference sequence
due to strong distortions of the transmitted video sequence, the frame is becomes
unmatched. VQuad also returns the relative number of matched frames of the transmitted video.
Once the frames of the transmitted video are aligned to the reference sequence, a perceptual difference is calculated between corresponding frames. The average value
serves as a key parameter for the prediction of the MOS value. The perceptual difference measure between matched frames calculates the inter-frame difference, emphasizes large edges, and takes into account the adaptation effects to luminance and local
contrast.
5.1.2 Additional Results

In addition to the root causes that are considered in the MOS prediction, VQuad also
provides the technical figures for the following items:
Dominating Frame Rate, Freezing, Black Frame Ratio: For more information,
see chapter 4.2, "Perceptual Degradation and MOS Prediction", on page 20.
PSNR: Peak-Signal-to-Noise-Ratio in dB is the average PSNR of the frames in the

encoded and transmitted video sequence with respect to corresponding frames in
the reference sequence.
Matched Frames: Relative number of frames in the coded and transmitted video
sequence that match a frame in the reference video sequence.
This value is calculated with respect to the total number of frames in the transmitted video sequence. In other words, 100 % of matched frames means that all
frames of the coded and transmitted video sequence could be matched to a frame
of the reference sequence.
Frame Jitter: Standard deviation of the frame display time where a high value of
the frame jitter is the result of irregular video playback.
5.1.3 MOS Prediction

The MOS is an overall value that takes into account the individual dimensions of distortions and rates them into a single value.
The VQuad08 series takes the non-additivity into account and therefore the non-linearity of the individual dimensions. Basically, the most important degradation determines
the quality and further distortions only have a reduced influence. The prediction is
implemented by a multiplicative aggregation of the form:
predicted MOS = ftemporal(video) fspatial_fullRef(video)
31

VQuad08 Results in NQDI
where ftemporal is a function of temporal degradations and fspatial_fullRef is a function of spatial degradations, which is dominated by the perceptual difference measure.
The maximum predictable MOS is 4.5 and the minimum is 1.0.
5.2 VQuad08 Results in NQDI

figure 5-1 shows how the VQuad08 results appear in NQDI.
Fig. 5-1: Presentation of VQuad overall analysis results
The individual results are explained in the earlier sections of this document. The main
value, visual quality (estimated MOS), is rated into one of five categories. You can configure these categories individually in NQDI.
If a video sequence has an audio track, VQuad also performs an audio-video synchronization evaluation. The result of this lip-sync evaluation is displayed on the lower right
of the window.
In the time domain charts, the results are sub-divided into two sections. The first diagram shows the inter-frame differences and the IP throughput during IP streaming services. The inter-frame differences provide information about the movement and the
temporal complexity. The IP-throughput is drawn along the time axis as a red line.
The lower chart in shows the per-frame results for blurring, blockiness, and PSNR.
Fig. 5-2: Presentation of VQuad frame-wise results
32

VQuad08 Application
5.3 VQuad08 Application

VQuad can only be applied to services that can play back pre-recorded content from a
streaming server or a broadcasting service. To use VQuad, special SwissQual reference clips have to be installed on the server or streamed from a server that is hosted
by SwissQual.
VQuad supports the following groups of image sizes:
Smart Phone: QQVGA, QCIF, QCIF+
Hand Held: QVGA, CIF
PC: VGA, SD
For performance reasons, the clips that are used for VQuad have a small watermark in
the last lines of each clip. VQuad uses this watermark to assign the correct reference
clip. An individual marker is included in each frame so that the match between a
received frame and the corresponding reference frame can be found efficiently. The
marker lines are ignored in the quality analysis.
You cannot analyze VGA or SD contents on the Diversity platform. Instead, these
image sizes must be hosted by PC server applications.
5.3.1 Lip-Sync in VQuad

Lip-sync refers to audio-visual synchronization. To determine lip-sync, VQuad provides
the exact delay calculation for the video signal to the corresponding reference input
signal. A corresponding framework in the SwissQual analysis tool combines this information with the audio delay analysis that was performed by SQuad-LQ to calculate lipsync in one second increments for the duration of the clip. The average lip-sync along
with the standard deviation for the duration of the clip demonstrate the constancy of the
A/V synchronization.
The new lip-sync measure is only supported by 'Streaming PC Full Reference' tests
in Diversity. Lip-sync measurements cannot be provided for no-reference measures
such as VMon.
33
Acknowledgements
A Acknowledgements
SwissQual would like to thank VQEG and the parties that were involved in the multimedia test phase I.
The video data and the subjective scores that were used for development VMon and
VQuad were provided to VQEG by the namely listed companies below:
Acreo AB (Sweden)
CRC (Canada)
FUB (Italy)
Ghent University - IBBT (Belgium)
KDDI (Japan)
INTEL (USA)
IRCCyN (France)
NTIA/ITS (USA)
NTT (Japan)
OPTICOM (Germany)
Psytechnics (UK)
Symmetricom (USA)
Toyama University (Japan)
Yonsei University (Korea)
University of Nantes (France)
34
Glossary: Abbreviations
Glossary: Abbreviations
M
MOS: Mean Opinion Score
V
VQEG: Video Quality Expert Group, an independent international forum for video quality evaluation metrics
35

Manual - VMon and VQuad Results Description

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Manual - VMon and VQuad Results Description

Uploaded by

Copyright:

Available Formats

SwissQual...

Test & Measurement

VMon and VQuad Results Description 01

Mean Opinion Score..................................................................................................... 7

Subjective and Objective Quality Assessment.......................................................... 8

Full-Reference and No-Reference Assessments....................................................... 9

3 Technical Requirements and Performance....................................... 12

Video Sample Length....................................................................................................13

VMon and VQuad Performance Accuracy................................................................ 13

4 VMon Video Quality Assessment....................................................... 19

Perceptual Degradation and MOS Prediction...........................................................20

VMon Results in NQDI................................................................................................ 25

Scene analysis in VMon............................................................................................. 27

Content Dependency of Perceived Quality and Prediction Problems................... 27

Application of No-Reference Models........................................................................ 27

Use Case 1: Optimization or Benchmarking - Averaging of Results.............................28

Use Case 2: Network Monitoring.................................................................................. 28

Manual VMon and VQuad Results Description 01

5 VQuad Video Quality Assessment..................................................... 30

Perceptual Degradation and MOS Prediction...........................................................30

VQuad08 Results in NQDI.......................................................................................... 32

Manual VMon and VQuad Results Description 01

Manual VMon and VQuad Results Description 01

Visual Quality Overview

2 Visual Quality Overview

2.1 Visual Quality

Manual VMon and VQuad Results Description 01

Visual Quality Overview

2.2 Mean Opinion Score

Table 2-1: Explanation of MOS values

Manual VMon and VQuad Results Description 01

Visual Quality Overview

2.3 Subjective and Objective Quality Assessment

No-Reference: Non-intrusive or single-ended approach, which only evaluates and

Full-Reference: Intrusive or double-ended approach, which evaluates and rates a

Manual VMon and VQuad Results Description 01

Visual Quality Overview

Network under test

Transmitted video signal

Methods that require

Methods that do not

Fig. 2-1: Subjective versus objective quality assessment

2.4 Full-Reference and No-Reference Assessments

No-Reference: Establishes a test connection to an answering station, which plays

Manual VMon and VQuad Results Description 01

Visual Quality Overview

Full-Reference: Controls both ends of a connection and transmits a known video

Full-Reference Video Quality Assessment: Controls both ends of the connection

No-Reference Video Quality Assessment: Assesses the visual quality of the

Manual VMon and VQuad Results Description 01

Visual Quality Overview

Manual VMon and VQuad Results Description 01

Technical Requirements and Performance

3 Technical Requirements and Performance

3.1 Technical Requirements

Smart phones: QQVGA, QCIF, QCIF+

PDA, Hand-held: QVGA, CIF

PC applications: VGA, (SDTV)

Manual VMon and VQuad Results Description 01

Technical Requirements and Performance

3.1.1 Frame Rate

3.1.2 Video Sample Length

3.2 VMon and VQuad Performance Accuracy

Manual VMon and VQuad Results Description 01