Manual - VMon and VQuad Results Description

WHEN QUALITY MATTERS
VMon and VQuad Results Description

Manual June 2012
SwissQual License AG Allmendweg 8 CH-4528 Zuchwil Switzerland t +41 32 686 65 65 f +41 32 686 65 66 e info@swissqual.com www.swissqual.com
Part Number: 16-070-200640 Rev 1
Copyright 2000 - 2012 SwissQual AG. All rights reserved. No part of this publication may be copied, distributed, transmitted, transcribed, stored in a retrieval system, or translated into any human or computer language without the prior written permission of SwissQual AG. SwissQual has made every effort to ensure that eventual instructions contained in the document are adequate and free of errors and omissions. SwissQual will, if necessary, explain issues which may not be covered by the documents. SwissQuals liability for any errors in the documents is limited to the correction of errors and the aforementioned advisory services. When you refer to a SwissQual technology or product, you must acknowledge the respective text or logo trademark somewhere in your text. SwissQual, Seven.Five, SQuad, QualiPoc, NetQual, VQuad, Diversity as well as the following logos are registered trademarks of SwissQual AG.
Diversity Explorer, Diversity Ranger, Diversity Unattended, NiNA+, NiNA, NQAgent, NQComm, NQDI, NQTM, NQView, NQWeb, QPControl, QPView, QualiPoc Freerider, QualiPoc iQ, QualiPoc Mobile, QualiPoc Static, QualiWatch-M, QualiWatch-S, SystemInspector, TestManager, VMon, VQuad-HD are trademarks of SwissQual AG. SwissQual acknowledges the following trademarks for company names and products: Adobe, Adobe Acrobat, and Adobe Postscript are trademarks of Adobe Systems Incorporated. Apple is a trademark of Apple Computer, Inc. DIMENSION, LATITUDE, and OPTIPLEX are registered trademarks of Dell Inc. ELEKTROBIT is a registered trademark of Elektrobit Group Plc. Google is a registered trademark of Google Inc. Intel, Intel Itanium, Intel Pentium, and Intel Xeon are trademarks or registered trademarks of Intel Corporation. INTERNET EXPLORER, SMARTPHONE, TABLET are registered trademarks of Microsoft Corporation. Java is a U.S. trademark of Sun Microsystems, Inc. Linux is a registered trademark of Linus Torvalds. Microsoft, Microsoft Windows, Microsoft Windows NT, and Windows Vista are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries U.S. NOKIA is a registered trademark of Nokia Corporation. Oracle is a registered US trademark of Oracle Corporation, Redwood City, California. SAMSUNG is a registered trademark of Samsung Corporation. SIERRA WIRELESS is a registered trademark of Sierra Wireless, Inc. TRIMBLE is a registered trademark of Trimble Navigation Limited. U-BLOX is a registered trademark of u-blox Holding AG. UNIX is a registered trademark of The Open Group.
VMon and VQuad Results Description Manual

2000 - 2012 SwissQual AG
Contents
1 2 Introduction .......................................................................................................................................... 1 Visual Quality ....................................................................................................................................... 2 Visual Quality ......................................................................................................................................... 2 Mean Opinion Score .............................................................................................................................. 2 Subjective and Objective Quality Assessment ...................................................................................... 3 Full-Reference and No-Reference Assessments .................................................................................. 4 3 Technical Requirements and Performance ...................................................................................... 6 Technical Requirements ........................................................................................................................ 6 Frame Rate....................................................................................................................................... 6 Video Sample Length ....................................................................................................................... 7 VMon and VQuad Performance Accuracy ............................................................................................ 7 4 VMon Video Quality Assessment ..................................................................................................... 11 Technical Background ......................................................................................................................... 11 Perceptual Degradation and MOS Prediction ..................................................................................... 11 Blockiness ...................................................................................................................................... 12 Tiling ............................................................................................................................................... 12 Blurring ........................................................................................................................................... 13 Jerkiness ........................................................................................................................................ 14 Additional Results ........................................................................................................................... 14 MOS Prediction .............................................................................................................................. 15 VMon Results in NQDI......................................................................................................................... 15 VMon Application Scenarios ................................................................................................................ 16 Content Dependency of Perceived Quality and Prediction Problems ............................................ 16 Application of No-Reference Models .............................................................................................. 16 5 VQuad Video Quality Assessment ................................................................................................... 19 Perceptual Degradation and MOS Prediction ..................................................................................... 19 Perceptual Difference ..................................................................................................................... 19 Additional Results ........................................................................................................................... 20 MOS Prediction .............................................................................................................................. 20 VQuad08 Results in NQDI ................................................................................................................... 20 VQuad08 Application ........................................................................................................................... 21 Lip-Sync in VQuad ......................................................................................................................... 21 A Acknowledgements ........................................................................................................................... 22
Figures
Contents | CONFIDENTIAL MATERIALS ii

Figure 2-1 Subjective versus objective quality assessment .............................................................................. 4 Figure 3-1 Per-sample comparison between VMon08 and VQuad08 data on example data in QCIF resolution ........................................................................................................................................................................... 8 Figure 3-2 Per-condition comparison between VMon08 and VQuad08 data on example data in QCIF resolution ........................................................................................................................................................... 8 Figure 3-3 Correlation coefficients between MOS values obtained in auditory tests and objective scores based on the VQEG QCIF data set ................................................................................................................... 9 Figure 3-4 Correlation coefficients between MOS values obtained in subjective tests and objective scores based on the 14 VQEG QCIF data sets on a per-condition evaluation. ...................................................... 10 Figure 4-1 VMon analysis overall results ......................................................................................................... 15 Figure 4-2 Per-frame analysis in VMon ........................................................................................................... 16 Figure 4-3 Scene analysis in VMon ................................................................................................................. 16 Figure 5-1 Presentation of VQuad overall analysis results ............................................................................. 20 Figure 5-2 Presentation of VQuad frame-wise results .................................................................................... 21
Tables
Table 3-1 Results of VMon08 and VQuad08 for all three resolution sizes...................................................... 10 Table 4-1 Blockiness scale .............................................................................................................................. 12 Table 4-2 Tiling scale....................................................................................................................................... 13 Table 4-3 Blurring scale ................................................................................................................................... 13 Table 4-4 Jerkiness scale ................................................................................................................................ 14 Table 4-5 Result comparison of VMon08 for a per-sample and per-condition evaluation, all three resolutions. ......................................................................................................................................................................... 17 Table 4-6 : False acceptance and false rejection ratio of all experiments for each format. ............................ 18
Contents | CONFIDENTIAL MATERIALS
iii

Introduction
The VMon and VQuad Results Description Manual describes the technical background, the application scenarios as well as the parameters that SwissQual video quality measurements record. VMon and VQuad have been part of the SwissQual Diversity measurement system since the 10.1 release. To objectively predict visual quality, SwissQual uses the VMon algorithm for the no-reference approach and the VQuad algorithm for the full-reference approach. SwissQual has successfully used these algorithms in quality measurement systems for several years. For the full-reference approach, SwissQual provides a set of video clips that cover different types of videos. For higher confidence in the measurement results, SwissQual has adjusted the algorithms to perfectly harmonize with these video clips. To keep pace with rapidly evolving video compression and transmission techniques, SwissQual is constantly improving the VMon and VQuad algorithms. Note: The main indicators and the presentation scheme have not been changed. The latest versions of these algorithms are VMon08 and VQuad08, which improve and extend detectors as well as the perceptive weighting for individual degradations. The new versions are more robust with respect to the latest coding technologies than the previous versions and are less dependent on content. The redesign of the internal structure also provides a framework for High Definition (HD) resolution.
Chapter 1 | Introduction CONFIDENTIAL MATERIALS

Visual Quality
Visual quality in video services is a major factor for customer satisfaction. As a result, determining where and why visual quality degradation occurs is important to network operators and equipment manufacturers. Encoding techniques must be designed for optimal visual quality on specific transmission bandwidths as well as for error robustness. In order to assess the quality of video encoding techniques, large-scale visual tests with individual human testers are commonly employed. However, the reproducibility of the measurement results in such tests depends entirely on the motivation of the individual test candidates. The advantage of an automated test is more consistent visual quality results that closely correlate with the subjective test results . Compared to frame rate, bit rate, or luminance, visual quality is a vague term. However, customer satisfaction can only be measured by considering perceived degradations, while encoding and transmission techniques must be selected and optimized based on quality perception.
Visual Quality
Visual quality is the measure of a viewers satisfaction that is based on his or her experience and expectation with respect to a received and a perceived video stream. Visual quality is generally expressed as a Mean Opinion Score (MOS), which reflects the mean of individual scores that have been ranked by human viewers during a subjective test in a lab. Visual quality is often obtained by applying Absolute Category Rating Tests (ACR), which display the MOS on a scale from 1 (bad) to 5 (excellent). This measurement denotes the average of many individual opinions on perceived quality, which are obtained from a representative number of viewers of both genders and different ages. Quality perception is a complex phenomenon within the process of human perception and as such is a subjective measurement. Quality, or more specifically visual quality, is the rating of the difference between what a viewer sees and what he or she expects in this context. In a certain context, a viewer expects 'naturalness' in a video and rates any deviations as degradations. The internal reference for 'naturalness' is strongly dependent on the video content. For example, for faces or other well known natural patterns, the tolerance of 'naturalness' is restrictive. Conversely, degradations are accepted in a video with 'unnatural' artificial content, such as cartoons or graphics. Subjective tests usually have different content categories. The spread of scores for individual video contents with the same error conditions is much wider than the same spread of scores in listening quality tests, which use samples from different talkers. To obtain a complete picture of visual quality, video clips from different content categories must be transmitted and scored.
Mean Opinion Score

Quality, as a perceived value, cannot be defined by objective technical means such as the decibel (dB) level or the delay in seconds. Quality values, that is, Mean Opinion Scores (MOS), are obtained by asking human subjects for their perception of the listening and visual quality of video clips in a controlled environment. The subjects are asked to assign a score to a pre-recorded speech or video sample of a few seconds in length. The MOS of these scores represents the perceptual quality of the sample. Usually, the score is assigned to a 1 to 5 scale that has verbal categories in the native language of the subjects. Note: Absolute Category Ratings are often used instead of MOS values.
MOS
5 4 3
English
excellent good fair
German
ausgezeichnet gut ordentlich
French
excellent bonne assez bonne
Spanish
excelente buena regular
2
Chapter 2 | Visual Quality CONFIDENTIAL MATERIALS

MOS
2 1
English
poor bad
German
drftig schlecht
French
mediocre mauvaise
Spanish
mediocre mala
Each individual score is influenced by the global experience of the user, expectation, and individual preferences. That is, different people tend to assign different quality scores to the same clip. Scores are also subject to short term focus and accidental assignment. Consequently, a MOS value is the average of a wider or narrower distribution of individual scores. The main disadvantage of this approach is that individuals assign different scores to a clip of perfect quality due to a lack of confidence, accidental down-scoring, or from being overly critical. The highest MOS value in subjective tests is usually around 4.5. Conversely, people tend to assign a score of 'bad' to most of the lower quality clips. The main reason for this score is that the lower end of the quality values is much wider and one can also choose 'worse than bad' while at the upper end one cannot assign a quality value that is better than undisturbed speech or video. However, we also have to consider that the MOS is an average value for the scores from a group of at least 24 people. In scientific papers, the standard deviation of the MOS is also included to represent the distribution width of the individual scores. An additional value that is often included with the MOS is the 95 % confidence interval, which represents the range of statistically probable MOS assignments by 95 % of the global population. This interval allows you to determine how close the MOS is to the 'true quality' of the clip. Logically, this confidence interval is smaller for larger test groups. In a well designed traditional test, the interval is about 0.2. The term 'MOS' is only a generic approximation of a measurement unit and is meaningless if you do not specify the kind of quality perception that the MOS describes. A MOS can be obtained for listening quality and visual quality. Objective measurements do not evaluate quality in the traditional sense, but rather estimate or predict quality as if the clip had been observed by a large group of people. More than 5000 of subjectively scored samples are used to train the VQuad, VMon, and SQuad algorithms. These objective measures are based on sophisticated psycho-acoustic, psycho-visual, and perceptive models that processes signals in a similar way to the human auditory and visual systems. The signal analysis and the subsequent comparison to the undistorted original signal leads to a quality value that is mapped to a common 5 to 1 scale. The performance of the objective measures is usually represented by correlation coefficient and residual prediction error data on a scatter plot where the subjective and objective data are plotted on the X and Y axes, respectively. On such a diagram, a good objective measure is narrowly distributed along the 45 line.
Subjective and Objective Quality Assessment

Assessing the quality of a telecommunication network is important for achieving and maintaining a high service quality. One method to assess the service quality is to evaluate the quality of the signal that is transmitted through the telecommunications network, which involves the following groups of objective approaches: No-Reference: Non-intrusive or single-ended approach, which only evaluates and rates the received signal. For example, a test call to an answering machine or live monitoring. Full-Reference: Intrusive or double-ended approach, which evaluates and rates a transmitted reference signal with the original reference signal.
Both of these quality assessment methods predict the Mean Opinion Score (MOS) that would be obtained from a subjective test. Figure 2-1 provides an overview of the basic relationship between subjective and objective assessments as well as the full and no reference approaches.

reference video signal
Network under test
transmitted video signal
Internal reference 'expectation
Human viewer
Methods that use a reference signal (double-ended)
Methods that do not require a reference signal (single-ended)
Figure 2-1 Subjective versus objective quality assessment
Full-Reference and No-Reference Assessments

You can use intrusive and non-intrusive methods in the following objective quality test scenarios: No-Reference: Establishes a test connection to an answering station, which plays an unknown video signal for the receiving side, for example, from a streaming server or a live TV application. Non-intrusive In-Service Monitoring: Assesses the video signals in real applications, such as IPTV or video telephony, by parallel monitoring in the core network. Note: This method includes no-reference approaches. Full-Reference: Controls both ends of a connection and transmits a known video sequence. This scenario requires a streaming server that contains known video clips. Full-Reference Video Quality Assessment: Controls both ends of the connection and transmits a known video sequence. The disadvantage to this approach is the necessity to intervene at the source of the signal and the network that you want to test. At least one transmission channel must be occupied to transmit the reference signal in order to determine the signal quality. The advantage of the double-ended method is that the input or reference signal is known, which allows for a very accurate and detailed analysis of video quality impairments. Through the application of visual perception models, each change in the signal during transmission can be detected and checked for an impact on the perceived quality. Full reference methods are well-suited for optimization processes in laboratories as well as in real networks. These methods can even measure minimal degradations of the signal and can be applied to compare transmission scenarios. No-Reference Video Quality Assessment: Assesses the visual quality of the transmitted signal without a pre-defined reference signal for comparison. This assessment is also referred to as a nonintrusive or single-ended model.
4 CONFIDENTIAL MATERIALS
Chapter 2 | Visual Quality

The single-ended models use signal analysis methods to look for known types of distortions. For example, the models search for typical coding artifacts such as visible block structures or freezing events. More advanced methods apply perceptual models to the detected distortions that consider the effects of the human visual system such as local contrast adaptation or masking. The accuracy of a no-reference approach is lower than the full-reference approach. However, the accuracy is more than sufficient for a basic classification of the video quality and the detection of consistently poor quality links. Since the reference signal is not available, no-reference video quality models are subject to a content dependency. If the video contains natural objects and a small amount of motion, the extraction of the individual features performs well. However, if the video contains unnatural content such as cartoons, moving or fixed graphical objects or still sequences, the feature extraction can lead to inaccurate results. Such results are caused by the similarity of the content characteristics to typical compression and transmission distortions. Cartoons, for example, contain a restricted number of colours as well as entire areas that are filled with the same colour and without natural texture. which is acceptable in a cartoon. However, unlike a cartoon, such effects in a video with natural content are seen as a strong distortion. Since the measure has no a-priori knowledge of the content, these contents are predicted with low quality, even though this is not true for the cartoon. A similar case is a graphically animated background, for example, during a TV newscast. This type of background can contain solid colour areas with horizontal and vertical sharp edges or even moving blocks. These objects are easily interpreted as unnatural coding artifacts and can become subject of misinterpretation. The analysis results from one short clip might provide information about serious distortions. However, for a more accurate quality analysis, SwissQual strongly recommends evaluating several video sequences with a no-reference model and using the average of the results to completely characterize a transmission channel.

Technical Requirements and Performance
This chapter outlines the technical requirements and performance of the SwissQual VMon and VQuad solutions.
Technical Requirements
The SwissQual VMon and VQuad solutions run on the Windows 32 bit platform and require an uncompressed 24 bit RGB video signal in AVI file format. The accepted range of image resolutions is from QQVGA (120x160 pixels) up to VGA (480x640 pixels). Based on the recommendations of VQEG , the accepted resolutions are subdivided into three resolution groups: Smart phones: QQVGA, QCIF, QCIF+ PDA, Hand-held: QVGA, CIF PC applications: VGA, (SDTV)
1
VQuad uses a reference signal, which must be in uncompressed format, that is, perfect quality, with a frame rate of 25 or 30 fps. Note: The reference signal must have the same image resolution as the degraded video. VQuad does not rescale the video. The VMon and VQuad methods analyse a video clip in a raw non-encoded format such as RGB24, where each frame is considered a bitmap and the RGB values for each pixel are available. In addition to this spatial information, these methods also require the display time of each individual frame to calculate temporal 2 effects. VMon evaluation, as measured on an Intel Xeon processor at 2.33 GHz, is faster than playback time due to consequent run time optimization, even for larger image sizes such as a VGA signal that has been sampled at 25 fps. Due to a pre-evaluation of the reference video, VQuad has a slightly longer evaluation time. As the VMon solution can dynamically adjust the algorithm computations to the available processing resources, VMon can be run on the Symbian mobile OS platform. As a result, VMon is an ideal component 3 for lower performing platforms such as mobile phone operating systems and digital signal processors.
Frame Rate
The accepted frame rates are between 3 fps and 30 fps. Note: Frame rates of 3 fps or slightly higher are interpreted as a strong jerkiness effect. Still images or completely frozen video sequences are signalized, but MOS values are not calculated.
VQEG: Video Quality Expert Group, an independent international forum for video quality evaluation metrics
A Diversity system only uses RGB24 to store and analyze uncompressed video clips. The VMon08 algorithm can also use YUV format.
On low performing platforms, the estimation of quality related values can be less accurate due to the dynamic adjustment of calculation depth.
Chapter 3 | Technical Requirements and Performance CONFIDENTIAL MATERIALS 6

Video Sample Length

A sample length of 5 to 15 seconds, which is automatically checked by the SwissQual software, is required for the evaluation. Video samples that are less than 5 seconds long will not be accepted. Samples that are longer than 15 seconds will be truncated to 15 seconds and a warning message will be displayed.
VMon and VQuad Performance Accuracy

The high accuracy of the VMon08 and VQuad08 algorithms is based on a large amount of subjective prescored databases that cover the complete scope of modern video degradations, including compression and erroneous transmission degradations that typically occur in telephone networks and broadcasting scenarios. SwissQual has compiled these databases with a focus on modern cell and IPTV networks as well as data that has been obtained in collaborative efforts with international standardization bodies. At the end of 2006, the Video Quality Experts Group (VQEG), an international and independent organization, started a worldwide evaluation of objective video quality measures. SwissQual previously took part in this evaluation with the VMon06 objective model, the precursor to VMon08. The constantly improving video compression techniques and the availability of the VQEG data sets during the evaluation phase encouraged SwissQual to improve and to update the VMon and VQuad models to version VMon08 and VQuad08, respectively. Although the main focus of VMon08 and VQuad08 was excellent performance on SwissQual databases, a 4 significant improvement was also achieved for VQEG data. Usually, the prediction accuracy is provided by the correlation coefficient of the objective scores to the subjective MOS as a single number value for performance accuracy. A score close to 1.0 indicates a high prediction accuracy while lower scores indicate a lower prediction accuracy. In general, correlations of less than 0.7 describe a model as weak, while correlations less than 0.5 describe a model as unusable. A more detailed view is possible with scatter plots, which plot the subjective MOS versus the objective scores. Figure 3-1 contains an example of a detailed analysis of VQEG QCIF datasets. Each point in the diagram represents a video sample that has been scored subjectively and objectively. For points above the 45 line, the objective measure indicates a higher quality than the quality that was derived in the subjective test. Similarly, points below the 45 line indicate a more pessimistic quality prediction. In Figure 3-1, the accuracy of VQuad08 predictions is noticeably better than VMon08 predictions. The VQuad scores are closely grouped and are nearly symmetrically distributed along the 45 line. However, due to content dependencies, a few outliers are incorrectly predicted, that is, VQuad rates individual files in one condition either too highly or too lowly. To avoid under predictions, VMon08 searches for known distortions that are based on a general expectation. If a distortion is found with confidence, the score is calculated correctly. An over prediction can occur if VMon08 does not detect a visible distortion. In essence, VMon08 tends to yield an over prediction for missed distortions and no under predictions. For applications such as a trigger-based troubleshooting system VMon08 tends toward 'false acceptance' but avoids 'false rejection', which is useful for systems where false alarms require more operational effort.
This publication is partly based on the subjective scores collected by the Video Quality Experts Group (VQEG). The results presented in this manual are not to be compared to the results presented in the VQEG Final Report of Multimedia Phase I because the models in the report were validated using this data. Thus, the data was not available to the models that were submitted to the VQEG evaluation. See further acknowledgment at the end of this manual.
7
Chapter 3 | Technical Requirements and Performance CONFIDENTIAL MATERIALS

5.0 4.5 4.0
VMon08 vs. MOS(subj)

per file analysis QCIF data VQEG q05
5.0 4.5 4.0

VQuad08
VQuad08 vs. Visual Quality (MOS)

per file analysis QCIF data VQEG q05
VMon08
3.5 3.0 2.5 2.0 1.5 1.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 visual MOS
3.5 3.0 2.5 2.0
= 0.84
1.5 1.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0
= 0.93
4.5 5.0
Visual Quality (MOS)
Figure 3-1 Per-sample comparison between VMon08 and VQuad08 data on example data in QCIF resolution
The results that have been discussed up to now have been on a per sample basis. To evaluate a channel or video system, a set of different samples with different contents are typically used and transmitted through the system. In voice quality tests, a so-called per-condition analysis is usually performed as well. This analysis averages the scores of a condition, that is, a given codec setting, for each talker and each sentence. This averaging minimizes dependencies on individual characteristics and instead focuses more on the system being tested. This approach can also be applied to video analysis by averaging across different contents. The deviation of the per-sample scores for the same condition is wider for video than for speech, which is mainly caused by the wider variation of the video content that was transmitted. However, content averaging provides a good real-life overview for a channel or codec performance in which a wide range of contents must be processed. In the example data set, eight different contents were always processed with the same condition. A so-called per-condition evaluation is obtained when these eight individual scores are averaged in the subjective and objective domain. Figure 3-2 displays these results, which are based on the same example data that was taken from the VQEG data set in QCIF.
5.0 4.5 4.0
VMon08 vs. MOS(subj)

per condition analysis QCIF data VQEG q05
5.0 4.5 4.0
VQuad08 vs. MOS(subj)

per condition analysis QCIF data VQEG q05
3.0 2.5 2.0 1.5 1.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 visual MOS
VQuad08
VMon08
3.5
3.5 3.0 2.5 2.0
= 0.97
1.5 1.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0
= 0.98
4.5 5.0
Visual Quality (MOS)
Figure 3-2 Per-condition comparison between VMon08 and VQuad08 data on example data in QCIF resolution
The charts show that the prediction accuracy of VMon08 and VQuad08 increases significantly while underor over-predictions that are caused by individual contents (see Figure 3-1) are 'averaged out' completely. For a complete overview of the algorithm accuracy, the correlation coefficients for the 14 QCIF data sets are shown in Figure 3-3. Initially, a 'per-sample' evaluation is performed during which each score for each video
Chapter 3 | Technical Requirements and Performance CONFIDENTIAL MATERIALS 8

sample file is considered individually. The statistical evaluation procedure is equivalent to the VQEG primary analysis.
Performance of VMon and VQuad on VQEG data sets (per video sample)
1.0 0.9 0.8
Correlation Coefficient
0.7 0.6 0.5 0.4
Avg.
0.3 0.2 0.1 0.0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
VMon06: 0.64 VMon08: 0.73 VQuad08: 0.88
VMon06 VMon08 VQuad08
Database (QCIF)
Figure 3-3 Correlation coefficients between MOS values obtained in auditory tests and objective scores based on the VQEG QCIF data set
For comparison, the VMon06 VQEG performance, the predecessor to VMon08, has also been included in Figure 3-3. As the chart clearly shows, the VMon08 performance is significantly better than VMon06. The chart also shows the performance of VQuad08 on the same data set. Due to the fact that VQuad is a fullreference model and can perform more detailed analysis, the prediction accuracy for VQuad is higher than 5 the no-reference models. Figure 3-4 shows the accuracy in a 'per-condition' analysis for all 14 data sets. Taking the discussion of Figure 3-1 into consideration, the performance increases in Figure 3-4 are due to 6 averaging across the individual video contents. VMon08 and VQuad08 have been optimized for the QCIF-like resolution sizes that mobile phone applications and devices use today. Along with the widening up of the data channels in the mobile networks and the progress of IPTV solutions, SwissQual is continuing to improve VMon08 and VQuad08, especially for larger video resolutions. For comparison, the evaluation of the VQEG data at higher resolutions, that is, CIF and VGA, is shown in Table 3-1. The data is obtained from 14 databases for CIF and 13 for VGA. The evaluation follows the same rules as the results in Figure 3-3 and Figure 3-4.
The statistical evaluation of VQuad08 is equivalent to the method applied by VQEG to the full-reference models within its evaluation. Note that there are small differences in the evaluation method for full-reference and no-reference models.
6
Note that the applied method for content averaging is different to VQEGs so-called secondary analysis and should not directly be compared to these results.
9

Performance of VMon and VQuad on VQEG data sets (average over contents)
1.0 0.9
Correlation Coefficient
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Database (QCIF)
Avg.
VMon06: 0.90 VMon08: 0.91 VQuad08: 0.95 VMon06 VMon08 VQuad08
Figure 3-4 Correlation coefficients between MOS values obtained in subjective tests and objective scores based on the 14 VQEG QCIF data sets on a per-condition evaluation.
The main value is the correlation coefficient, which is averaged for the databases. The value in parenthesis, the average r.m.s.e., allows for a rough estimate of the size of prediction errors. For a good prediction accuracy, the correlation coefficients must be close to 1.0 and the r.m.s.e. must be small.
Table 3-1 Results of VMon08 and VQuad08 for all three resolution sizes
Resolution
QCIF CIF VGA
VMon08 (per sample)

0.73 (0.70) 0.63 (0.78) 0.52 (0.92)
VMon08 (per condition)

0.91 (0.37) 0.81 (0.51) 0.75 (0.57)
VQuad08 (per sample)

0.88 (0.50) 0.86 (0.51) 0.86 (0.54)
VQuad08 (per condition)

0.95 (0.27) 0.97 (0.22) 0.93 (0.31)
As mentioned before, VMon08 has been optimized for the smaller image sizes that are used in mobile services, and has the highest accuracy for QCIF and similar resolutions. Although VMon08 is still acceptable for CIF resolutions, only a rough categorization of the visual quality is possible for VGA video. The full-reference VQuad08 method has far more information available for a quality estimation and has an 0.86 correlation that is higher for almost all image sizes.
10

VMon Video Quality Assessment
SwissQual developed the VMon method in 2005 and has been developing and releasing new versions as video transmission and compression technology constantly improve. Although the basic and well-accepted structures in the latest version are identical with the original VMon, many smaller improvements in the detectors have been made. In addition to these improvements, VMon series 08 was also re-structured to support a frame-wise analysis. This approach eliminates the need to prestore the video sequence before analysis and allows the use of VMon series 08 for the real-time evaluation of video in the QualiPoc product series. The latest VMon can also efficiently analyze larger image resolutions such as VGA and SDTV. As previously discussed, the accuracy of a no-reference model such as VMon is lower than the VQuad fullreference model and shows a content dependency of the results. However, most current video applications, for example all live TV applications, cannot transmit a pre-stored video and do not support a full-reference model. To minimize the disadvantages of the no-reference model, SwissQual has invested a significant amount of effort . In addition to the overall quality as a MOS value, VMon produces a set of results with more details about the type of problems that are observed. These results, along with the unique cause analysis of VMon, enable the easy interpretation and localization of potential quality problems.
Technical Background
Unlike VQuad, which uses a perceptual model to compare a high quality reference signal to the degraded signal, VMon predicts the Visual Quality of a transmitted signal without prior knowledge of the input reference signal. The VMon approach is akin to a human expert who watches a video on a test device, such as a commercial video player client on a PC, and rates the scores that are calculated by VMon. VMon analyzes the transmitted video for typical distortions that have been introduced by compression techniques and transmission problems. VMon separates these distortions into spatial and temporal types and then weights the distortions with models of the human perceptual system to form the basis to find the root causes for a potential degradation.
Perceptual Degradation and MOS Prediction

VMon is a no-reference algorithm for objective prediction of visual quality. This algorithm only analyses the transmitted and potentially degraded video sequence without comparison to a high quality reference sequence. VMon analyses the video sequence and identifies the following perceptual degradations: Blockiness: Visible block borders that are caused by compression during the encoding process Tiling: Visible macro-block and slice edges that are caused by encoding or transmission errors Blurring: Loss of sharp edges, which is caused by strong compression or decoding filters Jerkiness: Temporal artifacts such as low bit-rates or freezing
These perceptual degradation measures are the basis for MOS prediction. Root causes use a technical scale that ranges from 0 % to 100 %, where 0 % represents no degradation and 100 % represents the maximum possible degradation. Note: The percentage values of one degradation measure do not relate directly to the perceived quality, which depends on a combination of all degradations. That is, you cannot interpret VMon results in the form of "30 % jerkiness is poor quality". However, the individual
Chapter 4 | VMon Video Quality Assessment CONFIDENTIAL MATERIALS 11

values are of importance for relative measurements of the form "video A has 20 % blockiness and video B has 25 % blockiness, therefore video A has less blockiness than video B". Due to the nature of the content, a small amount of degradations are often present. In general, results below 10 % might be caused by the actual content of the video and will have no considerable influence on the quality prediction.
Blockiness
Blockiness is an effect that is caused by the division of an image into smaller squares, that is, blocks, by the encoding process. Almost all of current video encoders use a block based transformation. Due to a lossy encoding of these blocks, a resulting block structure can be seen in the decoded video sequence. Various 7 block sizes are used, with 8 x 8 and 16 x 16 pixels as the most frequent ones. The image information in a block is normally transformed with a DCT-based transformation. Usually luminance and chrominance information are encoded separately, even for different block sizes and only the most significant coefficients of the transformed values are retained. For strong compression, only a few coefficients are retained and in extreme cases, only one, coefficient is retained, which most of the time is the one that represents a uniform colour or luminance of the whole block. As a result of strong compression, a block contains less or no spatial detail, and has visible transitions along its borders. Due to the lack of transition details, the border area with the neighbouring blocks becomes more visible. The blockiness value is an estimate of the visibility of these block borders. This value is based on a measure of the luminance differences at block borders and is related to the amount of spatial detail as a block border has a stronger visibility in the absence of spatial details. Although the blockiness measure takes into account that blocks might have different sizes, the block borders must always be oriented horizontally or vertically and form a right angle. The blockiness value also takes into account the luminance of the neighbouring area. In very bright or very dark areas, the degradation by block borders is less visible even though the borders are clearly measureable.
Table 4-1 Blockiness scale
Percent
0% 100 %
Image
Uniform grey image 8 x 8 black and white checker board
Root Causes
The main root cause for blockiness is strong compression during encoding. In addition, packet loss during transmission might increase blockiness.
Tiling
During the encoding process, a video frame is divided into blocks. An important loss of information corresponding to one ore multiple blocks either during encoding or during transmission leads to tiling, which are visible tile-like artifacts in the image or video frame.
Traditionally, that is, in MPEG4 part2 or H.263, those blocks are 8x8 pixels where the luminance information is encoded (so-called micro-blocks) The chrominance information is encoded in so-called macro-blocks of 16x16 pixels. The entire information related to one macro-block consists of the related chrominance information and the corresponding microblocks and their luminance information. The macro-block is the smallest entity of the encoded image; position and upgrade information are referring to macro-blocks. Macro-blocks are displayed at a fixed position in a frame.
More recent video encoders, such as H.264, allow even a scalable micro-block size of 4x4, 8x8 or 16x16 pixels.
Chapter 4 | VMon Video Quality Assessment

The tiling value is focused on distortions at block borders that are caused by transmission errors. Transmission errors are handled differently by the receiving decoder. The simplest way of handling this type of error is to freeze the last successfully updated image to the next key-frame to provide a complete image. Other strategies include replacing the incorrectly transmitted parts of the image with the same area of the previous frame. Advanced concealments predict missing data by using the neighbouring areas, that is, using the same motion compensation or similar spatial textures. Of course, simple implementations just display the erroneous data, which can leads to some strange effects. Since no concealment strategy is perfect, the residual error will be propagated by the differential frames up to the next key-frame. Since the transmission is organized so that macro blocks as the smallest entity, transmission errors or residual errors often have a macro-block visible structure. At least border lines of the erroneous areas are always orientated horizontally or vertically. The VMon08 tiling detector is especially designed to recognize such erroneous areas by checking incoherent vertical and horizontal edges. A threshold is applied to avoid the false detection of the tiling measure in the content of a video sequence and hence lower scores. Visible macro-block borders that are caused by spatial compression were counted as tiling in previous versions of VMon. Due to the high correlation to blockiness, the blockiness value in the 08 series includes visible macro-block borders that are caused by compression. However, suddenly appearing macro-block structures due to a highly compressed key-frame or temporarily 8 increased spatial compression can also be considered as tiling.
Table 4-2 Tiling scale
Percent
0% 100 %
Video
Low motion video with no sharp rectangular edges or a uniform grey sequence Several fast moving or jumping black squares on a white image
Root Causes
The main root cause for tiling is packet loss during transmission. Strong compression of encoding might also increase tiling.
Blurring
In VMon, blurring is measured indirectly by measuring sharpness, that is, the sharpness of the luminance edges in the frames. More specifically, sharpness measures the luminance offset at the edge borders and relates this offset to the local contrast at the edge location. In addition, the sharpness measure tries to avoid block border edges, which are the result of strong compression. The blurring value is the decrease of the sharpness in an average high quality video sequence with respect to the sharpness of the video sequence that is being tested. Note: Sharpness is a value that strongly depends on the content of a video signal. For example, a cloudy sky over a meadow does not contain sharp edges. In such an image, the sharpness is measured at the position of the sharpest edges in the frame.
Table 4-3 Blurring scale
Percent
0%
Image
Black and white diagonal lines, 2 pixels gap
In case of high motion, the affected macro-blocks in this area might be encoded as so-called intra-blocks. Due to the limited amount of bits, the intra-blocks are highly compressed and suddenly become visible.

Percent
100%
Image
Uniform grey image
Root Causes
The main root cause for blurring is the use of de-blocking filters of the video decoder.
Jerkiness
Jerkiness is a perceptual value that measures jerks from one frame to the next. High jerkiness is the result of a bad representation of moving objects in the video sequence. In other words, jerkiness measures the loss of information due to a freezing period or a low frame rate. In case of freezing, jerkiness considers the freezing period and the assumed loss of information during this period. This loss of information is estimated by the inter-frame difference at the end of the period. The measure of Jerkiness comprises freezing, the anticipated loss of information, and the Dominating Frame Rate, which is described in the "Additional Results" sections. In absence of explicitly frozen periods, jerkiness is mainly related to the technical value of the Dominating Frame Rate. For moderate or high motion videos, jerkiness and frame rate are highly negatively correlated. In low motion videos, a low frame rate value does not necessarily imply that jerkiness is high. In this situation, the jerkiness measure takes into account the amount of motion in the video and the frame rate only measures the display time of frames. Note: In most cases, a regular temporal degradation such as a lower frame rate is usually better accepted than an irregular freezing. The jerkiness calculation takes this effect into account. This might lead to the effect that in a moderate moving clip, a consistent frame rate of 5fps causes a Jerkiness of only 15% whereas two longer freezing events (e.g. 2 times 500ms) in a 15fps clip result in a Jerkiness of >50%.
Table 4-4 Jerkiness scale
Percent
0% 100%
Video
30 fps video sequence of smooth motion video sequence consisting of only several seconds of freezing
Root Causes
Large jerkiness values are the result of the reduction to a low encoder frame rate or of transmission delays and strong packet loss during transmission.
Additional Results
In addition to the root causes that are considered in the MOS prediction, VMon also provides the technical figures for the following items: Dominating Frame Rate in Fps (Frames Per Second): As for jerkiness, the basis for this value is the display time of a frame, that is, the amount of time an image remains visible until the image information changes in the next update. In the case of a constant frame rate, the dominating frame rate is equal to the constant frame rate. In the case of a variable frame rate, the dominating frame rate is the median of the frame rates. Black Frame Ratio: This value provides the ratio of detected black frames with respect to all frames in the sequence. More specifically, the black frame ratio in percentage is the total time black frames

are displayed divided by the video sequence length. In NQDI, intervals of black frames have a grey background in the time analysis graph. Note: All of the mono colour frames, including black frames, are discarded before the MOS estimation. In the time analysis graph of NQDI, all intervals of mono colour frames have a grey background. In the previous version, that is, VMon06, only blue frames were discarded for the MOS calculation and sequences of other mono colour frames were considered as highly blurred frames. In QualiPoc, all mono colour frames are counted and reported as black-frames. Freezing: If the display time of a frame exceeds 350 ms, the frame is considered frozen. The freezing value is displayed in percentage as the total freezing time divided by the video sequence length. On the time analysis graph in NQDI, freezing intervals have a blue background Note: Sequences of black or other mono colour frames are not considered freezing even if the display time exceeds the given limits.
MOS Prediction
The MOS is an overall value that takes into account the individual dimensions of distortions and combines these scores into a single value. The MOS prediction also considers all of the root causes that are described in this document. Similarly, VMon uses internal detectors to obtain information about the naturalness of movements and objects as well as to check for coherent motion and for a possible loss of spatial details in the colour planes. Previous versions of VMon calculated the MOS from the individual inputs with a simple linear formula. The current VMon08 version uses a more sophisticated approach, which also considers the non-additivity and therefore non-linearity of the individual dimensions. Basically, the most important degradation determines the quality, while further distortions have a reduced influence. This approach is implemented by a mulitplicative aggregation, which is dominated by the largest degradation measure. More specifically, the basis for MOS prediction is a product of a temporal and a spatial degradation measure: predicted MOS = ftemporal(video) fspatial(video) where ftemporal is a function of temporal degradations and fspatial is a function of spatial degradations of the video sequence. However, the complete MOS prediction is slightly more complicated as additional spatiotemporal degradations are estimated for prediction. Note: The maximum predictable MOS is 4.5 whereas the minimum predictable MOS is 1.0.
VMon Results in NQDI

In NQDI, the display of VMon results is focused on the relevant quality prediction information from VMon. The upper part of Figure 4-1 shows the overall visual quality as estimated by VMon, which NQDI also assigns a category to the VMon score, that is, excellent, good, fair, poor, and bad. This allows for an easier 9 rating of the video quality. The left column displays technical values, such as freezing and frame rate. The next column displays the outcomes of the different detectors as described in this document. Furthermore, the window contains some basic information, such as the application scenario, protocol, and player information.
Figure 4-1 VMon analysis overall results
Thresholds for these categories can be adjusted in NQDI for individual applications.

The lower part of the NQDI window displays per-frame information. The upper chart shows the inter-frame differences along with the blurring and blockiness value for each frame. The bars that indicate the inter-frame difference are green for regular frames, blue for repeated frames, and black for black frames that were detected. Freezings and mono colour frames are marked with a shadowed background for easy visibility.
Figure 4-2 Per-frame analysis in VMon
The second chart displays results from the content and scene analysis, which is restricted to the audio activity (channel active / inactive) and detected scene changes (vertical black lines). The scene analysis is subject to extensions for the next releases of VMon.
Figure 4-3 Scene analysis in VMon
VMon Application Scenarios

A no-reference model only analyzes the video sequence that is received during a test. As a result, this model has lower prediction accuracy than a full-reference model, which also analyzes the reference signal. Since VMon does not require a reference signal, this method can be applied to services where the customer has no control over the content, for example, streaming from a server and live TV services. Although a no-reference model is less accurate than a full-reference model, the no-reference model can evaluate a wider range of services and can deliver valuable results in well-designed measurement applications. A similar analysis that is based on the previous VMon06 series was published in ITU- T in May 2008.
Content Dependency of Perceived Quality and Prediction Problems

A no-reference model can detect typical compression and transmission distortions, but cannot separate or distinguish between these artifacts and content areas. For example, naturally occurring content with soft edges such as a cloudy sky or a meadow is scored as blurry, a graphical object is scored as a compression artefact, and a cartoon that contains only a few different colours in wide areas is scored as unnatural. However, if the content has a natural spatial complexity and a minimum of movement, a no-reference model can deliver valuable results.
Application of No-Reference Models

Unlike a full-reference model where a user has full control over the video sequences, pure codec evaluation and tuning is not the focus of a no-reference model. Instead, a no-reference model is typically applied in a situation where a user does not have access to the source video, for example, in-service monitoring of networks, streaming applications from unknown sources, and live TV applications. In these cases, a user is determined to find the best compromise between codec settings and the current network behaviour.
Chapter 4 | VMon Video Quality Assessment CONFIDENTIAL MATERIALS 16

Although a no-reference model is optimized for this purpose, usage guidelines and the interpretation of results must also be considered. To demonstrate the performance of the SwissQual no-reference VMon MOS prediction, the following typical use cases are considered: Optimization or Benchmarking - Averaging of Results: Quality evaluation of a specific transmission chunk or a specific location while requesting video streams from a live TV server. This type of evaluation is used for service optimization or benchmarking. Network Monitoring: Network monitoring by an in-service observation to find severe quality problems.
Use Case 1: Optimization or Benchmarking - Averaging of Results

In use case 1), the aim is to analyze the general behaviour of a transmission channel from a user perspective by using the service over a period of time. For this type of analysis, the user behaviour is determined by analyzing a series of typical video examples and not by analyzing a short individual video sequence. This series can consist of several samples that are taken from a longer video sequence or of several samples that are taken from typical video content categories during a longer observation period. For simplification, the model uses a combination of compression ratios, frame-rates, and specific error patterns to target a specific codec type. By averaging across the different contents in a transmission condition, which is referred to as HRC in this document, the model can create a general view of a channel. Furthermore, averaging across the individual contents for each condition dramatically minimizes the content dependency of the perceived quality as well as the content dependency of the model. Table 4-5 shows the correlation coefficients for the different resolutions and analysis methods on a persample and per-condition basis. The main value is the correlation coefficient, which is averaged over the databases. The value in parenthesis is the averaged r.m.s.e., which provides a rough idea about prediction errors.
Table 4-5 Result comparison of VMon08 for a per-sample and per-condition evaluation, all three resolutions.
Resolution
QCIF CIF VGA
VMon08 (per file)

0.73 (0.70) 0.63 (0.78) 0.52 (0.92)
VMon08 (per condition)

0.91 (0.37) 0.81 (0.51) 0.75 (0.57)
Since VMon is optimized for the smaller image sizes that are currently used in mobile services, the accuracy is best for QCIF and similar resolutions and is still acceptable for CIF resolutions; however, VMon can only provide a rough categorization of visual quality for VGA resolutions. However, the next use case described below allows the use of VMon even for VGA.
Use Case 2: Network Monitoring

In use case 2), the behaviour of a transmission channel in a live scenario is observed and critical quality issues are accordingly signalled. This signalling is a threshold-based trigger. For simplification, the threshold is only applied to the pure predicted MOS value of each sample. In a real-world application, all of the partial results can be used to produce more confident results. The following rules are applied to the data: Threshold signalizing bad quality: < 2.5 Uncertainty of subjective test results: 0.2 MOS Criteria A False Rejection: MOS > 2.7 & VMon < 2.5 Criteria B False Acceptance: MOS < 2.3 & VMon > 2.5
Chapter 4 | VMon Video Quality Assessment CONFIDENTIAL MATERIALS
17

Table 4-6 : False acceptance and false rejection ratio of all experiments for each format.
Format
QCIF CIF VGA
Mean: False Acceptance (Per-File)

7.6% 11.5% 15.6%
Mean: False Rejection (Per-File)

2.8% 3.0% 4.8%
The results in Table 4-6 show that an alarm is only incorrectly raised in approximately 3 to 4% of the cases on a per-sample basis. However, quality problems that are not identified remain within a range of 8 % to15 %. This asymmetry is particularly useful to avoid false alarms and to focus on cases where the quality drops with confidence. In a real world application, such decisions are not exclusively based on a MOS. Instead, these decisions also take partial results of the analysis into account, which leads to even more confident results. In summary, no-reference models can be used in certain applications which cannot be addressed by fullreference approaches and can deliver worthwhile results.
Chapter 4 | VMon Video Quality Assessment CONFIDENTIAL MATERIALS
18

VQuad Video Quality Assessment
The latest version of the VQuad algorithm is a response to the rapidly progressing video transmission and compression technology. Although the basic and well-accepted structures in the latest version are identical with the original VQuad, many improvements have been made in the perceptual degradation measures. In addition to the overall quality as a MOS value, VQuad produces a set of results that provide more details about the type of problems that are observed. These results, along with the unique cause analysis of VQuad, allows for the easy interpretation and localization of potential problems in quality.
Perceptual Degradation and MOS Prediction

VQuad is the full-reference algorithm for the objective prediction of visual quality. This algorithm analyses the transmitted and potentially degraded video sequence and compares the sequence to a high quality reference file. VQuad analyses the video sequence and identifies the following perceptual degradations: Blockiness: Visible block borders that are caused by compression during encoding Tiling: Visible macro-block and slice edges that are caused by encoding or transmission errors Blurring: Loss of sharp edge details that are caused by strong compression or decoding filters Jerkiness: Temporal artifacts such as low bit-rates or freezing Perceptual difference: Perceived difference between matched frames of the reference and degraded video sequence.
These perceptual degradation measures are the bases for MOS prediction. The degradation measures for blockiness, tiling, blurring, and jerkiness are the same as for VMon. Important: For more information, see the "Perceptual Degradation and MOS Prediction" section on page 11. Root causes use a technical scale that ranges from 0 % to 100 %, where 0 % is no degradation and 100 % is the maximum possible degradation. Note: The degradation percentage values are reported with respect to the reference value, that is, a 0 % value means that the transmitted sequence has not been degraded with respect to the reference sequence.
Perceptual Difference
Unlike the no-reference approach, the full reference method has access to the reference video sequence. This access allows for a detailed comparison of the reference video sequence to the encoded and transmitted video sequence. To calculate the perceptual difference measure, a so-called time alignment is performed, which involves assigning a matching frame in the reference video to each frame in the coded and transmitted sequence. If VQuad cannot assign a frame from the reference sequence due to strong distortions of the transmitted video sequence, the frame is becomes unmatched. VQuad also returns the relative number of matched frames of the transmitted video. Once the frames of the transmitted video are aligned to the reference sequence, a perceptual difference is calculated between corresponding frames. The average value serves as a key parameter for the prediction of the MOS value. The perceptual difference measure between matched frames calculates the inter-frame difference, emphasizes large edges, and takes into account the adaptation effects to luminance and local contrast.
Chapter 5 | VQuad Video Quality Assessment CONFIDENTIAL MATERIALS
19

Additional Results
In addition to the root causes that are considered in the MOS prediction, VQuad also provides the technical figures for the following items: Dominating Frame Rate, Freezing, Black Frame Ratio: For more information, see the "Perceptual Degradation and MOS Prediction" section on page 11. PSNR: Peak-Signal-to-Noise-Ratio in dB is the average PSNR of the frames in the encoded and transmitted video sequence with respect to corresponding frames in the reference sequence. Matched Frames: Relative number of frames in the coded and transmitted video sequence that match a frame in the reference video sequence. This value is calculated with respect to the total number of frames in the transmitted video sequence. In other words, 100 % of matched frames means that all frames of the coded and transmitted video sequence could be matched to a frame of the reference sequence. Frame Jitter: Standard deviation of the frame display time where a high value of the frame jitter is the result of irregular video playback.
MOS Prediction
The MOS is an overall value that takes into account the individual dimensions of distortions and rates them into a single value. The VQuad08 series takes the non-additivity into account and therefore the non-linearity of the individual dimensions. Basically, the most important degradation determines the quality and further distortions only have a reduced influence. The prediction is implemented by a mulitplicative aggregation of the form: predicted MOS = ftemporal(video) fspatial_fullRef(video) where ftemporal is a function of temporal degradations and fspatial_fullRef is a function of spatial degradations, which is dominated by the perceptual difference measure. Note: The maximum predictable MOS is 4.5 and the minimum is 1.0.
VQuad08 Results in NQDI

Figure 5-1 shows how the VQuad08 results appear in NQDI.
Figure 5-1 Presentation of VQuad overall analysis results
The individual results are explained in the earlier sections of this document. The main value, visual quality (estimated MOS), is rated into one of five categories. You can configure these categories individually in NQDI. If a video sequence has an audio track, VQuad also performs an audio-video synchronization evaluation. The result of this lip-sync evaluation is displayed on the lower right of the window. In the time domain charts, the results are sub-divided into two sections. The first diagram shows the interframe differences and the IP throughput during IP streaming services. The inter-frame differences provide information about the movement and the temporal complexity. The IP-throughput is drawn along the time axis as a red line. The lower chart in Figure 5-2 shows the per-frame results for blurring, blockiness, and PSNR.
20

Figure 5-2 Presentation of VQuad frame-wise results
VQuad08 Application
VQuad can only be applied to services that can play back pre-recorded content from a streaming server or a broadcasting service. To use VQuad, special SwissQual reference clips have to be installed on the server or streamed from a server that is hosted by SwissQual. VQuad supports the following groups of image sizes: Smart Phone: QQVGA, QCIF, QCIF+ Hand Held: QVGA, CIF PC: VGA, SD
For performance reasons, the clips that are used for VQuad have a small watermark in the last lines of each clip. VQuad uses this watermark to assign the correct reference clip. An individual marker is included in each frame so that the match between a received frame and the corresponding reference frame can be found efficiently. The marker lines are ignored in the quality analysis. Note: You cannot analyze VGA or SD contents on the Diversity platform. Instead, these image sizes must be hosted by PC server applications.
Lip-Sync in VQuad
Lip-sync refers to audio-visual synchronization. To determine lip-sync, VQuad provides the exact delay calculation for the video signal to the corresponding reference input signal. A corresponding framework in the SwissQual analysis tool combines this information with the audio delay analysis that was performed by SQuad-LQ to calculate lip-sync in one second increments for the duration of the clip. The average lip-sync along with the standard deviation for the duration of the clip demonstrate the constancy of the A/V synchronization. Note: The new lip-sync measure is only supported by 'Streaming PC Full Reference' tests in Diversity. Lip-sync measurements cannot be provided for no-reference measures such as VMon.
21

Acknowledgements
SwissQual would like to thank VQEG and the parties that were involved in the multimedia test phase I. The video data and the subjective scores that were used for development VMon and VQuad were provided to VQEG by the namely listed companies below: Acreo AB (Sweden) CRC (Canada) FUB (Italy) Ghent University - IBBT (Belgium) KDDI (Japan) INTEL (USA) IRCCyN (France) NTIA/ITS (USA) NTT (Japan) OPTICOM (Germany) Psytechnics (UK) Symmetricom (USA) Toyama University (Japan) Yonsei University (Korea) University of Nantes (France)
Appendix A | Acknowledgements CONFIDENTIAL MATERIALS
22

Manual - VMon and VQuad Results Description

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Manual - VMon and VQuad Results Description

Uploaded by

Copyright:

Available Formats

WHEN QUALITY MATTERS

VMon and VQuad Results Description

Part Number: 16-070-200640 Rev 1

VMon and VQuad Results Description Manual

VMon and VQuad Results Description Manual

Contents | CONFIDENTIAL MATERIALS

VMon and VQuad Results Description Manual

Chapter 1 | Introduction CONFIDENTIAL MATERIALS

VMon and VQuad Results Description Manual

Mean Opinion Score

Chapter 2 | Visual Quality CONFIDENTIAL MATERIALS

VMon and VQuad Results Description Manual

Subjective and Objective Quality Assessment

Chapter 2 | Visual Quality CONFIDENTIAL MATERIALS

VMon and VQuad Results Description Manual

reference video signal

Network under test

transmitted video signal

Internal reference 'expectation

Methods that use a reference signal (double-ended)

Methods that do not require a reference signal (single-ended)

Figure 2-1 Subjective versus objective quality assessment

Full-Reference and No-Reference Assessments

Chapter 2 | Visual Quality

VMon and VQuad Results Description Manual

Chapter 2 | Visual Quality CONFIDENTIAL MATERIALS

VMon and VQuad Results Description Manual

Technical Requirements and Performance

VMon and VQuad Results Description Manual

Video Sample Length

VMon and VQuad Performance Accuracy

Chapter 3 | Technical Requirements and Performance CONFIDENTIAL MATERIALS

VMon and VQuad Results Description Manual

5.0 4.5 4.0

VMon08 vs. MOS(subj)

5.0 4.5 4.0

VQuad08 vs. Visual Quality (MOS)

3.5 3.0 2.5 2.0

1.5 1.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0

Visual Quality (MOS)

VMon08 vs. MOS(subj)

5.0 4.5 4.0

VQuad08 vs. MOS(subj)

3.5 3.0 2.5 2.0

1.5 1.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0

Visual Quality (MOS)

VMon and VQuad Results Description Manual

0.7 0.6 0.5 0.4

VMon06: 0.64 VMon08: 0.73 VQuad08: 0.88

VMon06 VMon08 VQuad08

Chapter 3 | Technical Requirements and Performance CONFIDENTIAL MATERIALS

VMon and VQuad Results Description Manual

VMon08 (per sample)

VMon08 (per condition)

VQuad08 (per sample)

VQuad08 (per condition)

Chapter 3 | Technical Requirements and Performance CONFIDENTIAL MATERIALS

VMon and VQuad Results Description Manual

VMon Video Quality Assessment

Perceptual Degradation and MOS Prediction

VMon and VQuad Results Description Manual

VMon and VQuad Results Description Manual

Chapter 4 | VMon Video Quality Assessment