You are on page 1of 173

Theory of Quantum Information

Lecture notes from Fall 2008

John Watrous
Institute for Quantum Computing
University of Waterloo
Lecture contents
1 Mathematical Preliminaries – Part I 1
1.1 Complex Euclidean spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Definition of complex Euclidean spaces . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Inner product and norms of vectors . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.3 Orthogonal and orthonormal sets . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Linear operators and super-operators . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Matrices and their association with operators . . . . . . . . . . . . . . . . . . . 4
1.2.2 The entry-wise conjugate, transpose, and adjoint . . . . . . . . . . . . . . . . . 5
1.2.3 Algebras of operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.4 Trace and determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.5 Eigenvectors and eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Important classes of operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 The spectral and singular value theorems . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.1 The spectral theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.2 Functions of normal operators . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4.3 The singular-value theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4.4 The Moore-Penrose pseudo-inverse . . . . . . . . . . . . . . . . . . . . . . . . 13

2 Mathematical Preliminaries – Part II 14


2.1 Linear super-operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Tensor products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1 Concrete representation by Kronecker products . . . . . . . . . . . . . . . . . 15
2.2.2 Abstract notion of tensor products . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.3 Tensor products of operators and super-operators . . . . . . . . . . . . . . . . 16
2.2.4 Further properties of the tensor product . . . . . . . . . . . . . . . . . . . . . . 18
2.3 The operator-vector correspondence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4 Norms of operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.1 The trace norm, Frobenius norm, and spectral norm . . . . . . . . . . . . . . . 22

3 Registers, reduced states, and measurements 24


3.1 Definitions of registers, reduced states, and measurements . . . . . . . . . . . . . . . 24
3.1.1 Registers and register arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.1.2 Reduced states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.1.3 Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 A few observations about registers, reduced states, and measurements . . . . . . . . 26
3.2.1 Information complete measurements . . . . . . . . . . . . . . . . . . . . . . . 26
3.2.2 Nothing is hidden from product measurements . . . . . . . . . . . . . . . . . 27
3.2.3 Reduced states and the partial trace . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 Observable differences between reduced states . . . . . . . . . . . . . . . . . . . . . . 28
3.3.1 The 1-norm distance between probability vectors . . . . . . . . . . . . . . . . 28
3.3.2 Relationship to the trace norm distance . . . . . . . . . . . . . . . . . . . . . . 29
3.4 Distinguishing convex sets of density operators . . . . . . . . . . . . . . . . . . . . . 30

i
Lecture contents ii

4 Pure states, purifications, and fidelity 32


4.1 Pure states and purifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.1.1 Definition of pure states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.1.2 Definition of purifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.1.3 Properties of purifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2 Fidelity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2.1 Definition of the fidelity function . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2.2 Basic properties of the fidelity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2.3 Uhlmann’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2.4 Alberti’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2.5 Fuchs–van de Graaf Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5 Naimark’s Theorem; Characterization of quantum operations 41


5.1 Naimark’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.2 Representations of super-operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.2.1 The natural representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.2.2 The Choi-Jamiołkowski representation . . . . . . . . . . . . . . . . . . . . . . 43
5.2.3 Kraus representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.2.4 Stinespring representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.2.5 Relationships among the representations . . . . . . . . . . . . . . . . . . . . . 44
5.3 Characterization of quantum operations . . . . . . . . . . . . . . . . . . . . . . . . . . 45

6 Examples of states, measurements, and operations 49


6.1 Convex combinations of states, measurements, and operations . . . . . . . . . . . . . 49
6.2 Mixed-unitary operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.3 Non-destructive measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.4 Generalized Pauli operators and Bell states . . . . . . . . . . . . . . . . . . . . . . . . 52
6.5 The completely depolarizing operation . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.6 Teleportation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

7 Entropy and compression 56


7.1 Shannon entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
7.2 Classical compression and Shannon’s Source Coding Theorem . . . . . . . . . . . . . 57
7.2.1 Compression schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7.2.2 Statement of Shannon’s Source Coding Theorem . . . . . . . . . . . . . . . . . 57
7.2.3 Typical strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7.2.4 Proof of Shannon’s Source Coding Theorem . . . . . . . . . . . . . . . . . . . 59
7.3 Von Neumann entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
7.4 Quantum compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
7.4.1 Informal discussion of quantum compression . . . . . . . . . . . . . . . . . . 61
7.4.2 Quantum channel fidelity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
7.4.3 Schumacher’s Quantum Source Coding Theorem . . . . . . . . . . . . . . . . 62

8 Properties of the von Neumann entropy 66


8.1 Continuity of von Neumann entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
8.1.1 Continuity of the Shannon entropy . . . . . . . . . . . . . . . . . . . . . . . . . 66
8.1.2 Continuity of von Neumann entropy . . . . . . . . . . . . . . . . . . . . . . . 67
Lecture contents iii

8.1.3 Fannes’ Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68


8.2 Quantum relative entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
8.3 Subadditivity and Concavity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
8.4 Conditional entropy and mutual information . . . . . . . . . . . . . . . . . . . . . . . 72

9 Strong subadditivity of the von Neumann entropy 74


9.1 Joint convexity of the quantum relative entropy . . . . . . . . . . . . . . . . . . . . . 74
9.2 Monotonicity and strong subadditivity . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

10 Holevo’s Theorem and Nayak’s Bound 80


10.1 Holevo’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
10.1.1 Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
10.1.2 Accessible information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
10.1.3 The Holevo quantity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
10.1.4 Statement and proof of Holevo’s Theorem . . . . . . . . . . . . . . . . . . . . 82
10.2 Nayak’s Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
10.2.1 Definition of quantum random access encodings . . . . . . . . . . . . . . . . . 84
10.2.2 Fano’s Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
10.2.3 Statement and proof of Nayak’s bound . . . . . . . . . . . . . . . . . . . . . . 86

11 Majorization for real vectors and Hermitian operators 88


11.1 Doubly stochastic operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
11.2 Majorization for real vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
11.3 Majorization for Hermitian operators . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
11.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

12 Separable operators 97
12.1 Definition and basic properties of separable operators . . . . . . . . . . . . . . . . . . 97
12.2 The Woronowicz–Horodecki Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
12.3 Separable ball around the identity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

13 Separable super-operators and the LOCC paradigm 103


13.1 Min-rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
13.2 Separable super-operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
13.3 LOCC super-operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
13.3.1 Definition of LOCC super-operators . . . . . . . . . . . . . . . . . . . . . . . . 105
13.3.2 LOCC super-operators are separable . . . . . . . . . . . . . . . . . . . . . . . . 107

14 Pure state entanglement transformation 108


14.1 A restricted definition of LOCC operations . . . . . . . . . . . . . . . . . . . . . . . . 108
14.2 Proof of Nielsen’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
14.2.1 Mixed unitary to LOCC direction . . . . . . . . . . . . . . . . . . . . . . . . . . 112
14.2.2 LOCC to mixed-unitary direction . . . . . . . . . . . . . . . . . . . . . . . . . 113

15 Measures of entanglement 116


15.1 Maximum inner product with a maximally entangled state . . . . . . . . . . . . . . . 116
15.2 Entanglement cost and distillable entanglement . . . . . . . . . . . . . . . . . . . . . 117
15.2.1 Definition of entanglement cost . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Lecture contents iv

15.2.2 Definition of distillable entanglement . . . . . . . . . . . . . . . . . . . . . . . 118


15.2.3 The distillable entanglement is at most the entanglement cost . . . . . . . . . 118
15.3 Pure state entanglement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

16 The partial transpose and bound-entangled states 122


16.1 The partial transpose and separability . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
16.2 Examples of entangled states that are PPT . . . . . . . . . . . . . . . . . . . . . . . . . 124
16.2.1 First example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
16.2.2 Unextendible product bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
16.3 PPT states and distillation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

17 LOCC and separable measurements 129


17.1 Definitions and simple observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
17.2 Impossibility of LOCC distinguishing some sets of states . . . . . . . . . . . . . . . . 132
17.2.1 Sets of maximally entangled states . . . . . . . . . . . . . . . . . . . . . . . . . 132
17.2.2 Undistinguishable sets of product states . . . . . . . . . . . . . . . . . . . . . . 133
17.3 Any two orthogonal pure states can be distinguished . . . . . . . . . . . . . . . . . . 134

18 Super-operator norms and distinguishability 136


18.1 Distinguishing between quantum operations . . . . . . . . . . . . . . . . . . . . . . . 136
18.2 Definition and properties of the diamond norm . . . . . . . . . . . . . . . . . . . . . . 138
18.2.1 The super-operator trace norm . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
18.2.2 The diamond norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
18.3 Distinguishing unitary super-operators . . . . . . . . . . . . . . . . . . . . . . . . . . 142

19 Alternate characterizations of the diamond norm 144


19.1 Maximum Output Fidelity Characterization . . . . . . . . . . . . . . . . . . . . . . . . 144
19.2 Spectral norm characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
19.2.1 A theorem about fidelity and convex sets . . . . . . . . . . . . . . . . . . . . . 146
19.2.2 Proof of the characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

20 Semidefinite programming 151


20.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
20.2 Semidefinite programming duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
20.3 Optimal measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
20.4 One more look at the diamond norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
20.4.1 Description of a semidefinite program for the diamond norm . . . . . . . . . 157
20.4.2 Analysis of the semidefinite program . . . . . . . . . . . . . . . . . . . . . . . 158
20.4.3 Connection to the spectral norm characterization . . . . . . . . . . . . . . . . 160

21 The quantum de Finetti theorem 161


21.1 Symmetric subspaces and exchangeable operators . . . . . . . . . . . . . . . . . . . . 161
21.1.1 Permutation operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
21.1.2 The symmetric subspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
21.1.3 Exchangeable operators and their relation to the symmetric subspace . . . . . 163
21.2 Integrals and unitarily invariant measure . . . . . . . . . . . . . . . . . . . . . . . . . 164
21.3 The quantum de Finetti theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
Theory of Quantum Information Lecture Notes
John Watrous (Institute for Quantum Computing, University of Waterloo)

Lecture 1
Mathematical Preliminaries – Part I

The goal of the first two lectures of the course is to present an overview of basic mathematical
concepts and tools that are required throughout the course. In particular, we will discuss various
facts about linear algebra in finite-dimensional vector spaces.

1.1 Complex Euclidean spaces

This section introduces the simple notion of a complex Euclidean space, which is used throughout
this course. In particular, with every discrete and finite physical system we will associate some
complex Euclidean space; and fundamental notions such as states and measurements of systems
are represented in linear-algebraic terms that refer to these spaces.

1.1.1 Definition of complex Euclidean spaces


For any finite, nonempty set Σ, we denote by C Σ the set of all functions from Σ to the complex
numbers C. The collection C Σ forms a vector space of dimension | Σ | over the complex numbers
when addition and scalar multiplication are defined in the following standard way:

1. Addition: given u, v ∈ C Σ , the vector u + v ∈ C Σ is defined by the equation (u + v)( a) =


u( a) + v( a) for all a ∈ Σ.
2. Scalar multiplication: given u ∈ C Σ and α ∈ C, the vector αu ∈ C Σ is defined by the equation
(αu)( a) = αu( a) for all a ∈ Σ.
Any vector space defined in this way for some choice of a finite, nonempty set Σ is called a complex
Euclidean space.
Complex Euclidean spaces will generally be denoted by scripted capital letters near the end
of the alphabet, such as W , X , Y , and Z , when it is necessary or helpful to assign specific names
to them. Subsets of these spaces will also be denoted by scripted letters, and when possible our
convention will be to use letters such as A, B , and C near the beginning of the alphabet when
these subsets are not themselves necessarily vector spaces. Vectors will be denoted by lowercase
Roman letters, again near the end of the alphabet, such as u, v, x, y, and z.
In the case where Σ = {1, . . . , n} for some positive integer n, one typically writes C n rather
than C {1,...,n} . For a given positive integer n, it is typical to view a vector u ∈ C n as an n-tuple
u = (u1 , . . . , un ), or as a column vector of the form
 
u1
 u2 
u =  . .
 
 .. 
un

1
Lecture 1. Mathematical Preliminaries – Part I 2

The convention to write ui rather than u(i ) such expressions is simply a matter of typographic
appeal, and is avoided when it is not helpful or would lead to confusion (such as when vectors
are subscripted for another purpose). Similar conventions are used as convenient for elements of
spaces of the more general form C Σ , assuming that some natural ordering is placed on the index
set Σ.

1.1.2 Inner product and norms of vectors


The inner product hu, vi of vectors u, v ∈ C Σ is defined as

hu, vi = ∑ u ( a ) v ( a ).
a∈Σ

It may be verified that the inner product satisfies the following properties:

1. Linearity in the second argument: hu, αv + βwi = α hu, vi + β h u, wi for all u, v, w ∈ C Σ and
α, β ∈ C.
2. Conjugate symmetry: hu, vi = hv, ui for all u, v ∈ C Σ .
3. Positive definiteness: hu, ui ≥ 0 for all u ∈ C Σ , with hu, ui = 0 if and only if u = 0.

One typically refers to any function satisfying these three properties as inner product, but this is
the only inner product for vectors in complex Euclidean spaces that is considered in this course.
The Euclidean norm of a vector u ∈ C Σ is defined as
q r
k u k = hu, ui = ∑ | u( a)|2 .
a∈Σ

The Euclidean norm satisfies the following properties, which are the defining properties of any
function that is called a norm:

1. Positive definiteness: k u k ≥ 0 for all u ∈ C Σ , with k u k = 0 if and only if u = 0.


2. Positive scalability: k αu k = | α | k u k for all u ∈ C Σ and α ∈ C.
3. The triangle inequality: k u + v k ≤ k u k + k v k for all u, v ∈ C Σ .

The Cauchy-Schwarz Inequality states that

|hu, vi| ≤ k u k k v k

for all u, v ∈ C Σ , with equality if and only if u and v are linearly dependent.
The Euclidean norm corresponds to the special case p = 2 of the class of p-norms, defined for
each u ∈ C Σ as ! 1/p
p
kukp = ∑ | u(a)|
a∈Σ

for p < ∞, and


k u k∞ = max{| u( a)| : a ∈ Σ}.
The above three norm properties (positive definiteness, positive scalability, and the triangle in-
equality) hold for k·k replaced by k·k p for any choice of p ∈ [1, ∞].
Lecture 1. Mathematical Preliminaries – Part I 3

1.1.3 Orthogonal and orthonormal sets


A collection of vectors
{ xa : a ∈ Γ} ⊂ C Σ ,
indexed by a given finite, non-empty set Γ, is said to be an orthogonal set if it holds that h xa , xb i = 0
for all choices of a, b ∈ Γ with a 6= b. Such a set is necessarily linearly independent, provided it
does not include the zero vector.
An orthogonal set of unit vectors is called an orthonormal set, and when such a set forms a
basis it is called an orthonormal basis. It holds that an orthonormal set { xa : a ∈ Γ} ⊆ C Σ is an
orthonormal basis of C Σ if and only if | Γ | = | Σ |. The standard basis of C Σ is the orthonormal basis
given by {ea : a ∈ Σ}, where 
1 if a = b
e a (b) =
0 if a 6= b
for all a, b ∈ Σ.

1.2 Linear operators and super-operators

Given complex Euclidean spaces X and Y , one writes L (X , Y ) to refer to the collection of all linear
mappings of the form
A : X → Y. (1.1)
Such mappings will be referred to as linear operators, or simply operators, from X to Y in this course.
Parentheses are typically omitted when expressing the action of linear operators on vectors when
there is little chance of confusion in doing so. For instance, one typically writes Au rather than
A(u) to denote the vector resulting from the application of an operator A ∈ L (X , Y ) to a vector
u ∈ X.
The set L (X , Y ) forms a vector space, where addition and scalar multiplication are defined as
follows:

1. Addition: given A, B ∈ L (X , Y ), the operator A + B ∈ L (X , Y ) is defined by the equation

( A + B)u = Au + Bu

for all u ∈ X .
2. Scalar multiplication: given A ∈ L (X , Y ) and α ∈ C, the operator αA ∈ L (X , Y ) is defined
by the equation
(αA)u = α Au
for all u ∈ X .

The dimension of this vector space is given by

dim(L (X , Y )) = dim(X ) dim(Y ).

Let us assume that X = C Σ and Y = C Γ . For each choice of a ∈ Σ and b ∈ Γ, the operator
Eb,a ∈ L (X , Y ) is defined by the condition that

eb if c = a
Eb,a ec =
0 if c 6= a
Lecture 1. Mathematical Preliminaries – Part I 4

for all c ∈ Σ. Then the set


{Eb,a : a ∈ Σ, b ∈ Γ}
is a basis of L (X , Y ), and is known as the standard basis of this space.
The kernel of an operator A ∈ L (X , Y ) is the subspace of X defined as

ker( A) = {u ∈ X : Au = 0},

while its image is the subspace of Y defined as

im( A) = { Au : u ∈ X }.

The rank of A, denoted rank( A), is the dimension of the subspace im( A). For every operator
A ∈ L (X , Y ) it holds that
dim(ker( A)) + rank( A) = dim(X ).

1.2.1 Matrices and their association with operators


A matrix is a mapping of the form
M : Γ×Σ →C
for finite, non-empty sets Σ and Γ. The collection of all matrices of this form is denoted MΓ,Σ (C ).
For a ∈ Γ and b ∈ Σ the value M ( a, b) is called the ( a, b) entry of M, and the elements a and b are
referred to as indices in this context.
The set MΓ,Σ (C ) is a vector space with respect to vector addition and scalar multiplication
defined in the following way.

1. Addition: given M, N ∈ MΓ,Σ (C ), the matrix M + N ∈ MΓ,Σ (C ) is defined by the equation

( M + N )( a, b) = M ( a, b) + N ( a, b)

for all a ∈ Γ and b ∈ Σ.


2. Scalar multiplication: given M ∈ MΓ,Σ (C ) and α ∈ C, the matrix αM ∈ MΓ,Σ (C ) is defined
by the equation
(αM )( a, b) = αM ( a, b)
for all a ∈ Γ and b ∈ Σ.

As a vector space, MΓ,Σ (C ) is therefore essentially the same as the complex Euclidean space C Γ×Σ .
Multiplication of matrices is defined in the following standard way. Given matrices M ∈
MΓ,∆ (C ) and N ∈ M∆,Σ (C ), for finite non-empty sets Γ, ∆, and Σ, the matrix MN ∈ MΓ,Σ (C ) is
defined as
( MN )( a, b) = ∑ M ( a, c) N (c, b)
c∈∆

for all a ∈ Γ and b ∈ Σ.


Linear operators from one complex Euclidean space to another are naturally represented by
matrices. For X = C Σ and Y = C Γ , one associates with each operator A ∈ L (X , Y ) a matrix
M A ∈ MΓ,Σ (C ) defined as
M A ( a, b) = hea , Aeb i
Lecture 1. Mathematical Preliminaries – Part I 5

for each a ∈ Γ and b ∈ Σ. Conversely, to each matrix M ∈ MΓ,Σ (C ) one associates a linear
operator A M ∈ L (X , Y ) defined by

( A M u)( a) = ∑ M(a, b)u(b) (1.2)


b∈Σ

for each a ∈ Γ. It holds that the mappings A 7→ M A and M 7→ A M are linear and inverse to one
other, and that compositions of linear operators are represented by matrix multiplications:

M AB = M A MB

whenever A ∈ L (Y , Z), B ∈ L (X , Y ) and X , Y , and Z are complex Euclidean spaces. Equiva-


lently,
A MN = A M A N
for any choice of matrices M ∈ MΓ,∆ (C ) and N ∈ M∆,Σ (C ) for finite nonempty sets Σ, ∆ and Γ.
The above correspondence between linear operators and matrices will hereafter not be men-
tioned explicitly in this course. Our preference will generally be to speak of operators, and to
directly associate a given operator with its representation as a matrix when this is necessary. More
specifically, for a given choice of complex Euclidean spaces X = C Σ and Y ∈ C Γ , and for a given
operator A ∈ L (X , Y ), the matrix M A ∈ MΓ,Σ (C ) will simply be denoted A and its ( a, b)-entry as
A( a, b). No ambiguity will result from this identification, which is certainly preferred to the con-
stant distraction that would be caused by explicitly referring to the above correspondence every
time it is used.

1.2.2 The entry-wise conjugate, transpose, and adjoint


Given an operator A ∈ L (X , Y ), for complex Euclidean spaces X = C Σ and Y = C Γ , we define
three operators:
A ∈ L (X , Y ) and AT , A∗ ∈ L (Y , X )
as follows:
1. The operator A ∈ L (X , Y ) is the operator whose matrix representation has entries that are
complex conjugates to the matrix representation of A:

A( a, b) = A( a, b)

for all a ∈ Γ and b ∈ Σ.


2. The operator AT ∈ L (Y , X ) is the operator whose matrix representation is obtained by trans-
posing the matrix representation of A:

AT (b, a) = A( a, b)

for all a ∈ Γ and b ∈ Σ.


3. The operator A∗ ∈ L (X , Y ) is the unique operator that satisfies the equation

hv, Aui = h A∗ v, ui

for all u ∈ X and v ∈ Y . It may be obtained by performing both of the operations described in
items 1 and 2:
A ∗ = AT .
Lecture 1. Mathematical Preliminaries – Part I 6

The operators A, AT , and A∗ will be called the entry-wise conjugate, transpose, and adjoint operators
to A, respectively.
The mappings A 7→ A and A 7→ A∗ are conjugate linear and the mapping A 7→ AT is linear:

αA + βB = α A + β B,
(αA + βB)∗ = αA∗ + βB∗ ,
(αA + βB)T = αAT + βBT ,

for all A, B ∈ L (X , Y ) and α, β ∈ C. These mappings are bijections, each being its own inverse.
Every vector u ∈ X in a complex Euclidean space X may be identified with the linear operator
in L (C, X ) that maps α 7→ αu. Through this identification the linear mappings u ∈ L (C, X ) and
uT , u∗ ∈ L (X , C ) are defined as above. As an element of X , the vector u is of course simply the
entry-wise complex conjugate of u, i.e., if X = C Σ then

u ( a) = u ( a)

for every a ∈ Σ. For each vector u ∈ X the mapping u∗ ∈ L (X , C ) satisfies u∗ v = hu, vi for all
v ∈ X . The space of linear operators L (X , C ) is called the dual space of X , and is typically denoted
by X ∗ rather than L (X , C ).
For any choice of vectors u ∈ X and v ∈ Y , the operator vu∗ ∈ L (X , Y ) satisfies

(vu∗ )w = v(u∗ w) = hu, wi v

for all w ∈ X . For nonzero vectors u and v the operator vu∗ has rank equal to one, and every rank
one operator in L (X , Y ) can be expressed in this form for vectors u and v that are unique up to
scalar multiples.

1.2.3 Algebras of operators


For a given complex Euclidean space X , one writes L (X ) as a shorthand for L (X , X ). The space
L (X ) has special algebraic properties that are worthy of note. In particular, L (X ) is an associative
algebra; it is a vector space, and the composition of operators is associative and bilinear:

( AB)C = A( BC ),
C (αA + βB) = αCA + βCB,
(αA + βB)C = αAC + βBC,

for every choice of A, B, C ∈ L (X ) and α, β ∈ C.


The identity operator 1 X ∈ L (X ) is the operator defined as 1 X u = u for all u ∈ X , and is
denoted 1 when it is understood to act on X . An operator A ∈ L (X ) is invertible if there exists an
operator B ∈ L (X ) such that BA = 1 X . When such an operator B exists it is necessarily unique,
also satisfies AB = 1 X , and is denoted A−1 . The collection of all invertible operators in L (X ) is
denoted GL(X ), and is called the general linear group of X .
For every pair of operators A, B ∈ L (X ), we define the commutator or Lie bracket [ A, B] ∈ L (X )
as [ A, B] = AB − BA.
Lecture 1. Mathematical Preliminaries – Part I 7

1.2.4 Trace and determinant


Operators in the algebra L (X ) are represented by square matrices, which means that their rows
and columns are indexed by the same set. We define two important functions from L (X ) to C, the
trace and the determinant, based on matrix representations of operators as follows:

1. The trace of an operator A ∈ L (X ), for X = C Σ , is defined as

Tr( A) = ∑ A(a, a).


a∈Σ

2. The determinant of an operator A ∈ L (X ), for X = C Σ , is defined by the equation

det( A) = ∑ sign(π ) ∏ A( a, π ( a)),


π ∈Sym( Σ) a∈Σ

where Sym(Σ) is the group of permutations on the set Σ and sign(π ) is the sign of the permu-
tation π (which is +1 if π is expressible as a product of an even number of transpositions of
elements of the set Σ, and −1 if π is expressible as a product of an odd number of transposi-
tions).

The trace is a linear function, and satisfies the property that

Tr( AB) = Tr( BA)

for any choice of operators A ∈ L (X , Y ) and B ∈ L (Y , X ), for arbitrary complex Euclidean spaces
X and Y .
By means of the trace, we define an inner product on the space L (X , Y ), for any choice of
complex Euclidean spaces X and Y , as

h A, Bi = Tr ( A∗ B)

for all A, B ∈ L (X , Y ). True to its name, this inner product satisfies the requisite properties of
being an inner product:

1. Linearity in the second argument: h A, αB + βC i = α h A, Bi + β h A, C i for all A, B, C ∈ L (X , Y )


and α, β ∈ C.
2. Conjugate symmetry: h A, Bi = h B, Ai for all A, B ∈ L (X , Y ).
3. Positive definiteness: h A, Ai ≥ 0 for all A ∈ L (X , Y ), with h A, Ai = 0 if and only if A = 0.

This inner product is sometimes called the Hilbert–Schmidt inner product.


The determinant is multiplicative,

det( AB) = det( A) det( B)

for all A, B ∈ L (X ), and its value is nonzero if and only if its argument is invertible.
Lecture 1. Mathematical Preliminaries – Part I 8

1.2.5 Eigenvectors and eigenvalues


If A ∈ L (X ) and u ∈ X is a nonzero vector such that Au = λu for some choice of λ ∈ C, then u is
said to be an eigenvector of A and λ is its corresponding eigenvalue.
By the definition of the determinant, it is evident that for any operator A ∈ L (X ) we have that

p A (z) = det(z1 X − A)

is a polynomial in z having degree dim(X ). This polynomial is the characteristic polynomial of A.


The spectrum of A, denoted spec( A), is the multiset containing the roots of the polynomial p A (z),
with each root appearing a number of times equal to its multiplicity. Each element λ ∈ spec( A) is
an eigenvalue of A.
The trace and determinant may be expressed in terms of the spectrum as follows:

Tr( A) = ∑ λ
λ∈spec( A )

and
det( A) = ∏ λ
λ∈spec( A )

for every A ∈ L (X ).

1.3 Important classes of operators

Several classes of operators that have importance in quantum information are discussed in this
section.
An operator A ∈ L (X ) is normal if and only if it commutes with its adjoint: [ A, A∗ ] = 0, or
equivalently AA∗ = A∗ A. The importance of this collection of operators, for the purposes of this
course, is mainly derived from two facts: (1) the normal operators are those for which the spectral
theorem (discussed later in Section 1.4.1) holds, and (2) most of the special classes of operators
that are discussed below are subsets of the normal operators.
An operator A ∈ L (X ) is Hermitian if A = A∗ . The set of Hermitian operators acting on a
given complex Euclidean space X will hereafter be denoted Herm (X ) in this course:

Herm (X ) = { A ∈ L (X ) : A = A∗ }.

Every Hermitian operator is obviously a normal operator.


The eigenvalues of every Hermitian operator are necessarily real numbers, and can there-
fore be ordered from largest to smallest. Under the assumption that A ∈ Herm (X ) for X an
n-dimensional complex Euclidean space, one denotes the k-th largest eigenvalue of A by λk ( A).
Equivalently, the vector

λ( A) = (λ1 ( A), λ2 ( A), . . . , λn ( A)) ∈ R n

is defined so that
spec( A) = {λ1 ( A), λ2 ( A), . . . , λn ( A)}
and
λ1 ( A ) ≥ λ2 ( A ) ≥ · · · ≥ λ n ( A ).
Lecture 1. Mathematical Preliminaries – Part I 9

The sum of two Hermitian operators is obviously Hermitian, as is any real scalar multiple of a
Hermitian operator. The inner product of two Hermitian operators is real as well: h A, Bi ∈ R for
all A, B ∈ Herm (X ). The set Herm (X ) therefore forms a real inner product space, the dimension
of which is dim(X )2 . Assuming that X = C Σ , and that the elements of Σ are ordered in some
fixed way, the following collection of operators is one example of a basis of Herm (X ):

Ea,a ( a ∈ Σ)
Ea,b + Eb,a ( a, b ∈ Σ, a < b)
iEa,b − iEb,a ( a, b ∈ Σ, a < b).

An operator A ∈ L (X ) is positive semidefinite if and only if it holds that A = B∗ B for some


operator B ∈ L (X ). Hereafter, when it is possible, we will follow a convention in this course to
use the symbols P, Q and R to denote general positive semidefinite matrices. The collection of
positive semidefinite operators is denoted

Pos (X ) = { B∗ B : B ∈ L (X )}.

The notation P ≥ 0 is also used to mean that P is positive semidefinite, while A ≥ B means
that A − B is positive semidefinite. (Usually this notation is only used when A and B are both
Hermitian.) The inner product of two positive semidefinite operators is always a nonnegative real
number: h P, Qi ≥ 0 for P, Q ∈ Pos (X ).
There are alternate ways to describe positive semidefinite operators that are useful in different
situations. In particular, the following items are equivalent for a given operator P ∈ L (X ):

1. P is positive semidefinite.
2. hu, Pui is a nonnegative real number for every choice of u ∈ X .
3. P is Hermitian and every eigenvalue of P is nonnegative.
4. There exists a collection of vectors {u a : a ∈ Σ} such that P( a, b) = hu a , ub i for all a, b ∈ Σ.

A positive semidefinite operator P ∈ Pos (X ) is said to be positive definite if, in addition to


being positive semidefinite, it is non-singular. We write

Pd (X ) = { A ∈ Pos (X ) : det( A) 6= 0}

to denote the set of such operators for a given complex Euclidean space X . The property of being
positive definite is equivalent to hu, Aui being a positive (and therefore nonzero) real number for
every choice of u ∈ X .
Positive semidefinite operators having trace equal to 1 are called density operators, and it is
conventional to use lowercase Greek letters such as ρ, ξ, and σ to denote such operators. The
notation
D (X ) = {ρ ∈ Pos (X ) : Tr(ρ) = 1}
is used to denote the collection of density operators acting on a given complex Euclidean space.
A positive semidefinite operator P ∈ Pos (X ) is an orthogonal projection if, in addition to being
positive semidefinite, it satisfies P2 = P. Equivalently, an orthogonal projection is any Hermitian
operator whose only eigenvalues are 0 and 1. For each subspace V ⊆ X , we write ΠV to denote
the unique orthogonal projection whose image is equal to the subspace V . It is typically that the
term projection refers to an operator A ∈ L (X ) that satisfies A2 = A, but which might not be
Lecture 1. Mathematical Preliminaries – Part I 10

Hermitian. Given that there is no discussion of such operators in this course, we will use the term
projection to mean orthogonal projection.
An operator A ∈ L (X , Y ) is a linear isometry if it preserves the Euclidean norm—meaning
that k Au k = k u k for all u ∈ X . The condition that k Au k = k u k for all u ∈ X is equivalent to
A∗ A = 1 X . The notation

U (X , Y ) = { A ∈ L (X , Y ) : A∗ A = 1 X }

is used throughout this course. Every linear isometry preserves not only the Euclidean norm, but
inner products as well: h Au, Avi = hu, vi for all u, v ∈ X .
The set of linear isometries mapping X to itself is denoted U (X ), and operators in this set
are called unitary operators. The letters U, V, and W are conventionally used to refer to unitary
operators. Every unitary operator U ∈ U (X ) is invertible and satisfies UU ∗ = U ∗ U = 1 X , which
implies that every unitary operator is normal.

1.4 The spectral and singular value theorems

We now discuss the spectral and singular value theorems. These theorems establish the existence of
the spectral and singular value decompositions of operators, which are indispensable tools that are
used repeatedly throughout this course. The spectral theorem also provides us with the important
notion that functions defined on scalars can be extended to normal operators in a robust way.

1.4.1 The spectral theorem


We will begin with the spectral theorem, which establishes that every normal operator can be
expressed as a linear combination of projections onto pairwise orthogonal subspaces. The spectral
theorem is so-named, and the resulting expressions are called spectral decompositions, because
the coefficients of the projections are determined by the spectrum of the operator being considered.
A formal statement of the spectral theorem follows.

Theorem 1.1 (Spectral Theorem). Let X be a complex Euclidean space, let A ∈ L (X ) be a normal
operator, and assume that the distinct eigenvalues of A are λ1 , . . . , λk . Then there exists a unique choice of
orthogonal projection operators P1 , . . . , Pk ∈ L (X ), with P1 + · · · + Pk = 1 X and Pi Pj = 0 for i 6= j, such
that
k
A= ∑ λi Pi . (1.3)
i =1

For each i ∈ {1, . . . , k}, it holds that the rank of Pi is equal to the multiplicity of λi as an eigenvalue of A.

As suggested above, the expression of a normal operator A in the form of the above equation (1.3)
is called a spectral decomposition of A.
A simple corollary of the spectral theorem follows. It expresses essentially the same fact as the
spectral theorem, but in a slightly different form that will be useful to refer to later in the course.

Corollary 1.2. Let X be a complex Euclidean space, let A ∈ L (X ) be a normal operator, and assume that
spec( A) = {λ1 , . . . , λn }. Then there exists an orthonormal basis { x1 , . . . , xn } of X such that
n
A= ∑ λi xi xi∗ . (1.4)
i =1
Lecture 1. Mathematical Preliminaries – Part I 11

It is clear from the expression (1.4), along with the requirement that the set { x1 , . . . , xn } is an
orthonormal basis, that each xi is an eigenvector of A whose corresponding eigenvalue is λi . It is
also clear that any operator A that is expressible in such a form as (1.4) is normal—implying that
the condition of normality is equivalent to the existence of an orthonormal basis of eigenvectors.
We will often refer to expressions of operators in the form (1.4) as spectral decompositions, de-
spite the fact that it differs slightly from the form (1.3). It must be noted that unlike the form (1.3),
the form (1.4) is generally not unique (unless each eigenvalue of A has multiplicity one, in which
case the expression is unique up to scalar multiples of the vectors { x1 , . . . , xn } and the ordering of
the eigenvalues).
Finally, let us mention one more important theorem regarding spectral decompositions of nor-
mal operators, which states that the same orthonormal basis of eigenvectors { x1 , . . . , xn } may be
chosen for any two normal operators, provided that they commute.
Theorem 1.3. Let X be a complex Euclidean space and let A, B ∈ L (X ) be normal operators for which
[ A, B] = 0. Then there exists an orthonormal basis { x1 , . . . , xn } of X such that
n n
A= ∑ λi xi xi∗ and B= ∑ µi xi xi∗
i =1 i =1

are spectral decompositions of A and B, respectively.

1.4.2 Functions of normal operators


Every function of the form f : C → C may be extended to the set of normal operators in L (X ), for
a given complex Euclidean space X , by means of the spectral theorem. In particular, if A ∈ L (X )
is normal and has the spectral decomposition (1.3), then one defines
k
f ( A) = ∑ f (λi ) Pi .
i =1

Naturally, functions defined only on subsets of scalars may be extended to normal operators
whose eigenvalues are restricted accordingly.
The following examples of scalar functions extended to operators will be important later in
this course.
1. The exponential function α 7→ exp(α) is defined for all α ∈ C, and may therefore be extended
to a function A 7→ exp( A) for any normal operator A ∈ L (X ) by defining
k
exp( A) = ∑ exp(λ j ) Pj ,
j =1

assuming that the spectral decomposition of A is given by (1.4).


The exponential function may, in fact, be defined for all operators A ∈ L (X ) by considering
its usual Taylor series. In particular, the series
1 1
exp( A) = 1 + A + A2 + A3 + · · ·
2 6
can be shown to converge for all operators A ∈ L (X ), and agrees with the above notion based
on the spectral decomposition in the case that A is normal.
Lecture 1. Mathematical Preliminaries – Part I 12

2. For r > 0 the function λ 7→ λr is defined for nonnegative real values λ ∈ [0, ∞). For a given
positive semidefinite operator Q ∈ Pos (X ), having spectral decomposition

k
Q= ∑ λ j Pj , (1.5)
j =1

for which we necessarily have that λ j ≥ 0 for 1 ≤ j ≤ k, we may therefore define

k
Qr = ∑ λrj Pj .
j =1

For integer values of r, it is clear that Qr coincides with the usual meaning of this expression
given by the multiplication of operators.

The case that r = 1/2
√ is particularly common, and in this case we also write Q to denote
Q1/2 . The operator Q is the unique positive semidefinite operator that satisfies
p p
Q Q = Q.

3. Along similar lines to the previous example, for any real number r ∈ R the function λ 7→ λr is
defined for positive real values λ ∈ (0, ∞). For a given positive definite operator Q ∈ Pd (X ),
one defines Qr in a similar way to above.
4. The function λ 7→ log(λ) is defined for every positive real number λ ∈ (0, ∞). For a given
positive definite operator Q ∈ Pd (X ), having a spectral decomposition (1.5) as above, one
defines
k
log( Q) = ∑ log(λ j ) Pj .
j =1

1.4.3 The singular-value theorem


The second decomposition we will consider is the singular value decomposition. Unlike the spectral
decomposition, it holds for arbitrary operators, even those of the form A ∈ L (X , Y ) for different
spaces X and Y . We will find that the singular value decomposition is an indispensable tool in
quantum information.

Theorem 1.4 (Singular Value Theorem). Let X and Y be complex Euclidean spaces, let A ∈ L (X , Y ) be
a nonzero operator, and let r = rank( A). Then there exist positive real numbers s1 , . . . , sr and orthonormal
sets { x1 , . . . , xr } ⊂ X and {y1 , . . . , yr } ⊂ Y such that
r
A= ∑ si yi xi∗ . (1.6)
i =1

An expression of a given matrix A in the form of Equation (1.6) is said to be a singular value
decomposition of A. The numbers s1 , . . . , sr are called singular values and the vectors x1 , . . . , xr and
y1 , . . . , yr are called right and left singular vectors, respectively.
The singular values s1 , . . . , sr of an operator A are uniquely determined, up to their ordering.
Hereafter we will assume, without loss of generality, that the singular values are ordered from
largest to smallest: s1 ≥ · · · ≥ sr . When it is necessary to indicate the dependence of these
Lecture 1. Mathematical Preliminaries – Part I 13

singular values on A, we denote them s1 ( A), . . . , sr ( A). Although technically speaking 0 is never
a singular value of any operator, it will be convenient to also define sk ( A) = 0 for k > rank( A).
The notation s( A) is used to refer to the vector of singular values s( A) = (s1 ( A), . . . , sr ( A)), or to
an extension of this vector s( A) = (s1 ( A), . . . , sk ( A)) for k > r when it is convenient to view it as
an element of R k for k > rank( A).
There is a close relationship between singular value decompositions of an operator A and
spectral decompositions of the operators A∗ A and AA∗ . In particular, it will necessarily hold that
q q
sk ( A) = λk ( AA∗ ) = λk ( A∗ A) (1.7)
for 1 ≤ k ≤ rank( A), and moreover the right singular vectors of A will be eigenvectors of A∗ A
and the left singular vectors of A will be eigenvectors of AA∗ . One is free, in fact, to choose
the left singular vectors of A to be any orthonormal collection of eigenvectors of AA∗ for which
the corresponding eigenvalues are nonzero—and once this is done the right singular vectors will
be uniquely determined. Alternately, the right singular vectors of A may be chosen to be any or-
thonormal collection of eigenvectors of A∗ A for which the corresponding eigenvalues are nonzero,
which uniquely determines the left singular vectors.

1.4.4 The Moore-Penrose pseudo-inverse


For any given operator A ∈ L (X , Y ), we define an operator A+ ∈ L (Y , X ), known as the Moore-
Penrose pseudo-inverse of A, to be the unique operator that satisfies the following properties:
1. AA+ A = A,
2. A+ AA+ = A+ , and
3. AA+ and A+ A are both Hermitian.
It is clear that there is at least one such choice of A+ , for if
r
A= ∑ si yi xi∗
j =1

is a singular value decomposition of A, then


r
1
A+ = ∑ si xi y∗i
j =1

satisfies the three properties above.


The fact that A+ is uniquely determined by the above equations is easily verified, for suppose
that X, Y ∈ L (Y , X ) both satisfy the above properties:
1. AX A = A = AYA,
2. X AX = X and YAY = Y, and
3. AX, X A, AY, and YA are all Hermitian.
Using these properties, we observe that
X = X AX = ( X A)∗ X = A∗ X ∗ X = ( AYA)∗ X ∗ X = A∗Y ∗ A∗ X ∗ X
= (YA)∗ ( X A)∗ X = YAX AX = YAX = YAYAX = Y ( AY )∗ ( AX )∗
= YY ∗ A∗ X ∗ A∗ = YY ∗ ( AX A)∗ = YY ∗ A∗ = Y ( AY )∗ = YAY = Y,
which shows that X = Y.
Theory of Quantum Information Lecture Notes
John Watrous (Institute for Quantum Computing, University of Waterloo)

Lecture 2
Mathematical Preliminaries – Part II

This lecture continues the overview of basic mathematical concepts and tools from linear algebra
that are required throughout the course.

2.1 Linear super-operators

A linear super-operator (or simply a super-operator) is a linear mapping of the form


Φ : L (X ) → L (Y ) ,
where X and Y are complex Euclidean spaces. The set of all such mappings is denoted T (X , Y ),
and is itself a linear space when addition of super-operators and scalar multiplication are defined
in the straightforward way:
1. Addition: given Φ, Ψ ∈ T (X , Y ), the super-operator Φ + Ψ ∈ T (X , Y ) is defined by
(Φ + Ψ)( A) = Φ( A) + Ψ( A)
for all A ∈ L (X ).
2. Scalar multiplication: given Φ ∈ T (X , Y ) and α ∈ C, the super-operator αΦ ∈ T (X , Y ) is
defined by
(αΦ)( A) = α(Φ( A))
for all A ∈ L (X ).
The notation T (X ) is shorthand for T (X , X ).
For a given super-operator Φ ∈ T (X , Y ), the adjoint of Φ is defined to be the unique super-
operator Φ∗ ∈ T (Y , X ) that satisfies
hΦ∗ ( B), Ai = h B, Φ( A)i
for all A ∈ L (X ) and B ∈ L (Y ).
We will study super-operators in much greater detail in a couple of weeks. For the time being,
it is enough that we simply note their existence.

2.2 Tensor products

The notion of a tensor product is fundamental in quantum information; for it is the mathematical
construct that is used when two or more quantum states, operations, or measurements are to
be considered as one. Tensor products are discussed in this section, beginning with a concrete
formulation based on Kronecker products of vectors, followed by a more abstract formulation that
carries valuable intuition about the meaning and uses of tensor products. Tensor products of
operators and super-operators may then be defined directly in terms of their actions on vectors
and operators, respectively.

14
Lecture 2. Mathematical Preliminaries – Part II 15

2.2.1 Concrete representation by Kronecker products


Tensor products of vectors in complex Euclidean spaces may be represented in concrete terms by
Kronecker products of vectors.
For given complex Euclidean spaces X = C Σ and Y = C Γ , we define the space X ⊗ Y as

X ⊗ Y = C Σ×Γ ,

and for vectors u ∈ X and v ∈ Y , we define the Kronecker product u ⊗ v ∈ X ⊗ Y as

(u ⊗ v)( a, b) = u( a)v(b)

for all a ∈ Σ and b ∈ Γ. It holds that the space X ⊗ Y contains vectors that cannot be written in
the form u ⊗ v for u ∈ X and v ∈ Y , as linear combinations of such vectors will typically not take
this form.
The following rules for addition and scalar multiplication of Kronecker products hold:

u1 ⊗ v + u2 ⊗ v = (u1 + u2 ) ⊗ v,
u ⊗ v1 + u ⊗ v2 = u ⊗ ( v1 + v2 ) ,

and
α u ⊗ v = (αu) ⊗ v = u ⊗ (αv)
for all u, u1 , u2 ∈ X , v, v1 , v2 ∈ Y , and α ∈ C.
Kronecker products of three or more vectors, or of three or more complex Euclidean spaces, are
formed by using the ordinary (binary) Kronecker product operation multiple times. In particular,
we may define u1 ⊗ u2 ⊗ · · · ⊗ uk inductively as

u1 ⊗ u2 ⊗ · · · ⊗ u k = u1 ⊗ ( u2 ⊗ · · · ⊗ u k ) (2.1)

for any choice of vectors u1 ∈ C Σ1 , . . . , uk ∈ C Σk and k ≥ 3. It is straightforward to prove that

( u ⊗ v) ⊗ w = u ⊗ ( v ⊗ w )

for any choice of vectors u, v and w, or in other words that the Kronecker product is associative.
For this reason, the ordering in which the Kronecker products are taken in (2.1) is irrelevant.

2.2.2 Abstract notion of tensor products


When one speaks of a tensor product, a more abstract and more generally applicable construction
than the Kronecker product typically comes to mind. For the purposes of this course, there will be
little generality lost in simply identifying the notions of tensor and Kronecker products, but there
is nevertheless valuable intuition to be drawn from the general construction of tensor products.
Tensor products are closely connected to bilinear functions. For arbitrary vector spaces X , Y
and V , a function of the form
ψ : X ×Y → V (2.2)
is said to be bilinear if it is linear in each coordinate when the other is fixed. In other words, for
every choice of u ∈ X the mapping v 7→ φ(u, v) from Y to V is linear, and similarly for every
choice of v ∈ Y the mapping u 7→ φ(u, v) from X to V is linear.
Formally speaking, a tensor product of two vector spaces X and Y consists of two objects:
Lecture 2. Mathematical Preliminaries – Part II 16

1. a vector space denoted X ⊗ Y , and


2. a universal bilinear function φ : X × Y → X ⊗ Y .

To say that a bilinear function φ : X × Y → X ⊗ Y is universal means two things. First, for every
choice of a vector space V and a bilinear function ψ of the above form (2.2), there must exist a
linear mapping A : X ⊗ Y → V such that ψ = A ◦ φ. Second, the space X ⊗ Y must be minimal
in the sense that
X ⊗ Y = span{φ(u, v) : u ∈ X , v ∈ Y }.
This second property implies that for a given choice of ψ, the mapping A for which ψ = A ◦ φ is
necessarily unique. Once the universal bilinear function φ has been fixed, one simply writes u ⊗ v
to denote φ(u, v).
The defining properties of tensor products can be understood as meaning that they give a
perfect translation from bilinear to linear mappings. Everything there is to know about a given
bilinear mapping ψ : X × Y → V is captured by a uniquely determined linear map A : X ⊗ Y →
V . In the other direction, every linear map A : X ⊗ Y → V uniquely determines a bilinear function
ψ : X × Y → V defined by ψ(u, v) = A(u ⊗ v).
Now, the definition of tensor products suggests that tensor products are not really uniquely
defined, as there could be any number of specific choices for the space X ⊗ Y and universal bilin-
ear mapping φ. Although this is true, one typically speaks of the tensor product of X and Y with
the understanding that all tensor products are essentially equivalent. Specifically, going from one
to another amounts to composing one universal bilinear mapping with a linear bijection to the im-
age of the other; which is a change that never has any effect in arguments or calculations involving
tensor products.
The existence of a tensor product can be proved for any choice of vector spaces X and Y ,
by means of an extremely general construction. We do not need to concern ourselves with that
construction, however, as our interest is primarily with complex Euclidean spaces—and here, as
our notation would suggest, the Kronecker product operates as a universal bilinear mapping.
One may define tensor products of three or more spaces inductively, as is done for Kronecker
products, or the entire definition can be generalized by speaking of multilinear rather than bilinear
functions.

2.2.3 Tensor products of operators and super-operators


As for vectors, tensor products of operators can be defined in two fundamentally different ways:
either in concrete terms using the Kronecker product of matrices, or in abstract terms relating to
bilinear (or multilinear) functions.
Kronecker products of matrices are defined in a manner that is similar to the Kronecker prod-
uct of vectors: for finite, nonempty sets Σ1 , Σ2 , Γ1 , and Γ2 , and matrices M ∈ MΣ1 ,Σ2 (C ) and
N ∈ MΓ1 ,Γ2 (C ), the Kronecker product

M ⊗ N ∈ MΣ1 ×Γ1 ,Σ2 ×Γ2 (C )

is defined as
( M ⊗ N )(( a1 , b1 ), ( a2 , b2 )) = M ( a1 , a2 ) N (b1 , b2 )
for all a1 ∈ Γ1 , a2 ∈ Γ2 , b1 ∈ Σ1 , and b2 ∈ Σ2 .
Naturally, for complex Euclidean spaces X1 = C Σ1 , X2 = C Σ2 , Y1 = C Γ1 , and Y2 = C Γ2 , the
Kronecker product of operators A ∈ L (X1 , X2 ) and B ∈ L (Y1 , Y2 ) may be defined in the way that
Lecture 2. Mathematical Preliminaries – Part II 17

is consistent with their representations as matrices. One may check that the following identities
hold for Kronecker products of operators:

1. For all choices of complex Euclidean spaces X1 , X2 , Y1 , and Y2 , it holds that

A ⊗ B + C ⊗ B = ( A + C ) ⊗ B,
A ⊗ B + A ⊗ D = A ⊗ ( B + D ),

and
α A ⊗ B = (αA) ⊗ B = A ⊗ (αB)
for any choice of α ∈ C and operators A, C ∈ L (X1 , X2 ) and B, D ∈ L (Y1 , Y2 ).
2. For all choices of complex Euclidean spaces X1 , X2 , X3 , Y1 , Y2 , and Y3 , it holds that

( A ⊗ B)(C ⊗ D ) = ( AC ) ⊗ ( BD ) (2.3)

for any choice of operators A ∈ L (X2 , X3 ), B ∈ L (Y2 , Y3 ), C ∈ L (X1 , X2 ), and D ∈ L (Y1 , Y2 ).


In other words, the identity (2.3) holds for any choice of operators A, B, C, and D for which
the products AC and BD are defined.

To define tensor products of operators in abstract terms, we first note that the general notion
of tensor products applies equally well to vector spaces of operators as for any other vector space;
so tensor products of operators may be defined by using the same definition as above, except that
the spaces X and Y are replaced, for instance, with L (X1 , X2 ) and L (Y1 , Y2 ) for some choice of
complex Euclidean spaces X1 , X2 , Y1 , and Y2 .
Without further qualifications, this leaves one with the impression that a tensor product space
of the form
L (X1 , X2 ) ⊗ L (Y1 , Y2 ) (2.4)
is merely a vector space and not a space of linear operators. To recover the operator structures of
these spaces, it suffices to make a special choice for the universal bilinear mapping defining the
tensor product. In particular, supposing that A ∈ L (X1 , X2 ) and B ∈ L (Y1 , Y2 ) are operators, we
define the linear operator
A ⊗ B ∈ L (X1 ⊗ Y1 , X2 ⊗ Y2 )
to be the unique linear mapping that satisfies

( A ⊗ B)(u ⊗ v) = ( Au) ⊗ ( Bv)

for all vectors u ∈ X1 and v ∈ Y1 . Such an operator must exist and be unique, given that the
mapping (u, v) 7→ ( Au) ⊗ ( Bv) is bilinear. As is to be expected, the mapping

( A, B) 7→ A ⊗ B

is a universal bilinear mapping, which allows for the identification of the tensor product space
(2.4) with the space L (X1 ⊗ Y1 , X2 ⊗ Y2 ). This, naturally, is consistent with the concrete definition
based on Kronecker products.
Finally, let us note that tensor products of super-operators may be defined in a similar way as
tensor products of operators. We have, at this point, not discussed the representation of super-
operators by matrices, although this will be done in a later lecture in a way that allows for a
concrete definition based on Kronecker products. In the abstract formulation of tensor products,
Lecture 2. Mathematical Preliminaries – Part II 18

however, the situation is completely analogous to the tensor product of operators. That is, the
tensor product space T (X1 , X2 ) ⊗ T (Y1 , Y2 ) is identified with the space T (X1 ⊗ Y1 , X2 ⊗ Y2 ) by
means of the universal bilinear mapping (Φ, Ψ) 7→ Φ ⊗ Ψ, where Φ ⊗ Ψ is the unique mapping
that satisfies
(Φ ⊗ Ψ)( A ⊗ B) = Φ( A) ⊗ Ψ( B)
for all A ∈ L (X1 ) and B ∈ L (Y1 ), assuming that Φ ∈ T (X1 , X2 ) and Ψ ∈ T (Y1 , Y2 ) are given
super-operators. Analogous properties to those listed above for tensor products of operators
emerge.

2.2.4 Further properties of the tensor product


For the sake of convenience later on in the course, a few properties of tensor products are listed
here. The list is not necessarily complete, meaning that we might later need basic facts that are not
listed here, but the list should nevertheless give you some sense for the range of facts that hold.
1. Suppose X and Y are complex Euclidean spaces, u ∈ X ⊗ Y , and { xa : a ∈ Σ} is any linearly
independent set of vectors in X (indexed by some arbitrary finite, nonempty set Σ). Then there
is at most one choice of vectors {ya : a ∈ Σ} ⊂ Y such that
u= ∑ xa ⊗ ya . (2.5)
a∈Σ

If it is the case that { xa : a ∈ Σ} is a basis of X , then there is exactly one choice of vectors
{ya : a ∈ Σ} ⊂ Y such that (2.5) holds. The same fact holds when vectors are replaced by
operators or super-operators.
2. Let X and Y be complex Euclidean spaces, and let { xa : a ∈ Σ} ⊂ X and {yb : b ∈ Γ} ⊂ Y
be linearly independent sets. Then
{ xa ⊗ yb : ( a, b) ∈ Σ × Γ} (2.6)
is a linearly independent set of vectors in X ⊗ Y . If it is the case that { xa : a ∈ Σ} ⊂ X and
{yb : b ∈ Γ} ⊂ Y are bases of X and Y , then (2.6) is a basis of X ⊗ Y . Again, similar facts hold
when vectors are replaced by operators or super-operators.
3. The inner product on X ⊗ Y satisfies
h x1 ⊗ y1 , x2 ⊗ y2 i = h x1 , x2 i h y1 , y2 i
for all x1 , x2 ∈ X and y1 , y2 ∈ Y . Similarly, the inner product on L (X1 ⊗ X2 , Y1 ⊗ Y2 ) satisfies
h A1 ⊗ A2 , B1 ⊗ B2 i = h A1, B1 i h A2 , B2 i
for all A1 , B1 ∈ L (X1 , Y1 ) and A2 , B2 ∈ L (X2 , Y2 ).
4. Let A ∈ L (X ) and B ∈ L (Y ) for complex Euclidean spaces X and Y , and let n = dim(X ) and
m = dim(Y ). Assume that spec( A) = {λ1 , . . . , λn } and spec( B) = {µ1 , . . . , µn }. Then

spec( A ⊗ B) = λi µ j : 1 ≤ i ≤ n, 1 ≤ j ≤ m
This implies, for instance, that
Tr( A ⊗ B) = Tr( A) Tr( B)
and
det( A ⊗ B) = det( A)m det( B)n .
Lecture 2. Mathematical Preliminaries – Part II 19

5. For any choice of A ∈ L (X1 , Y1 ) and B ∈ L (X2 , Y2 ), for complex Euclidean spaces X1 , X2 , Y1 ,
and Y2 , we have

( A ⊗ B)∗ = A∗ ⊗ B∗ , ( A ⊗ B )T = AT ⊗ BT , and A ⊗ B = A ⊗ B.

The first identity implies, for instance, that if A and B are both Hermitian, then so too is A ⊗ B.
(In fact, all of the special classes of operators discussed in Lecture 1 are preserved by taking
tensor products.)

Example 2.1 (The partial trace). Let X be a complex Euclidean space. We may of course view the
trace
Tr : L (X ) → C
as being a super-operator Tr ∈ T (X , C ) by making the identification C = L (C ). For a second
complex Euclidean space Y , we may therefore consider the super-operator

Tr ⊗1L(Y ) ∈ T (X ⊗ Y , Y ) .

This is the unique super-operator that satisfies

(Tr ⊗1L(Y ) )( A ⊗ B) = Tr( A) B

for all A ∈ L (X ) and B ∈ L (Y ). This super-operator is called the partial trace, and is more
commonly denoted TrX . In general, the subscript refers to whatever space to which the trace is
applied, while the space or spaces that remains (Y in the case above) are implicit from the context
in which the super-operator is used.
We may alternately express the partial trace on X as follows, assuming that { xa : a ∈ Σ} is
any orthonormal basis of X :

TrX ( A) = ∑ ( x∗a ⊗ 1Y ) A( xa ⊗ 1Y )
a∈Σ

for all A ∈ L (X ⊗ Y ). An analogous expression holds for TrY .

2.3 The operator-vector correspondence

It will be enormously useful throughout this course to make use of a simple correspondence be-
tween the spaces L (X , Y ) and Y ⊗ X , for given complex Euclidean spaces X and Y .
We define the mapping
vec : L (X , Y ) → Y ⊗ X
to be the linear mapping that represents a change of bases from the standard basis of L (X , Y ) to
the standard basis of Y ⊗ X . Specifically, we define

vec( Eb,a ) = eb ⊗ ea

for all a ∈ Σ and b ∈ Γ, at which point the mapping is determined for every operator A ∈ L (X , Y )
by linearity. In the Dirac notation, this mapping amounts to flipping a bra to a ket:

vec(|bi h a|) = |bi |ai .


Lecture 2. Mathematical Preliminaries – Part II 20

The vec mapping is a linear bijection, which implies that every vector u ∈ Y ⊗ X uniquely
determines an operator A ∈ L (X , Y ) that satisfies vec( A) = u. It is also an isometry, in the sense
that
h A, Bi = hvec( A), vec( B)i
for all A, B ∈ L (X , Y ).
The following properties of the vec mapping are easily verified:

1. For every choice of complex Euclidean spaces X1 , X2 , Y1 , and Y2 , and every choice of operators
A ∈ L (X1 , Y1 ), B ∈ L (X2 , Y2 ), and X ∈ L (X2 , X1 ), it holds that

( A ⊗ B) vec( X ) = vec( AXBT ). (2.7)

2. For every choice of complex Euclidean spaces X and Y , and every choice of operators A, B ∈
L (X , Y ), the following equations hold:

TrX (vec( A) vec( B)∗ ) = AB∗ , (2.8)


TrY (vec( A) vec( B)∗ ) = ( B∗ A) . T
(2.9)

3. For u ∈ X and v ∈ Y we have


vec(uv∗ ) = u ⊗ v. (2.10)
This includes the special cases vec(u) = u and vec(v∗ ) = v, which we obtain by setting v = 1
and u = 1, respectively.

Example 2.2 (The Schmidt decomposition). Suppose u ∈ Y ⊗ X for given complex Euclidean
spaces X and Y . Let A ∈ L (X , Y ) be the unique operator for which u = vec( A). Then there
exists a singular value decomposition
r
A= ∑ si yi xi∗
i =1

of A. Consequently
!
r r r
u = vec( A) = vec ∑ si yi xi∗ = ∑ si vec (yi xi∗ ) = ∑ si yi ⊗ xi .
i =1 i =1 i =1

The fact that { x1 , . . . , xr } is orthonormal implies that { x1 , . . . , xr } is orthonormal as well.


We have therefore established the validity of the Schmidt decomposition, which states that every
vector u ∈ Y ⊗ X can be expressed in the form
r
u= ∑ si yi ⊗ zi
i =1

for positive real numbers s1 , . . . , sr and orthonormal sets {y1 , . . . , yr } ⊂ Y and {z1 , . . . , zr } ⊂ X .
Lecture 2. Mathematical Preliminaries – Part II 21

2.4 Norms of operators

The last topic for this lecture concerns norms of operators. As for all vector spaces, a norm on the
space of operators L (X , Y ), for any choice of complex Euclidean spaces X and Y , is a function
k·k satisfying the following properties:
1. Positive definiteness: k A k ≥ 0 for all A ∈ L (X , Y ), with k A k = 0 if and only if A = 0.
2. Positive scalability: k αA k = | α | k A k for all A ∈ L (X , Y ) and α ∈ C.
3. The triangle inequality: k A + B k ≤ k A k + k B k for all A, B ∈ L (X , Y ).
Many interesting and useful norms can be defined on spaces of operators, but in this course
we will mostly be concerned with a single family of norms called Schatten p-norms. This family
includes the three most commonly used norms in quantum information theory: the spectral norm,
the Frobenius norm, and the trace norm.
For any operator A ∈ L (X , Y ) and any real number p ≥ 1, one defines the Schatten p-norms
of A as h  i 1/p
k A k p = Tr ( A∗ A) p/2 .
We also define
k A k∞ = max {k Au k : u ∈ X , k u k = 1} , (2.11)
which happens to coincide with lim p→∞ k A k p , and therefore explains why the subscript ∞ is used.
For a given non-zero operator A ∈ L (X , Y ) with rank r ≥ 1, the vector s( A) of singular values
of A was defined in the previous lecture. For each real number p ∈ [1, ∞], it holds that the Schatten
p-norm of A coincides with the ordinary (vector) p-norm of s( A):
k A k p = k s( A)k p .
The Schatten p-norms satisfy many nice properties, some of which are summarized in the
following list:
1. The Schatten p-norms are non-increasing in p. In other words, for any operator A ∈ L (X , Y )
and for 1 ≤ p ≤ q ≤ ∞ we have
k A k p ≥ k A kq .

2. For every p ∈ [1, ∞], the Schatten p-norm is isometrically invariant (and therefore unitarily
invariant). This means that
k A k p = k U AV ∗ k p
for any choice of linear isometries U and V (which include unitary operators U and V) for
which the product U AV ∗ makes sense.
3. For each p ∈ [1, ∞], one defines p∗ ∈ [1, ∞] by the equation
1 1
+ ∗ = 1.
p p
For every operator A ∈ L (X , Y ), it holds that
n o
k A k p = max |h B, Ai| : B ∈ L (X , Y ) , k B k p∗ ≤ 1 .
This implies that
|h B, Ai| ≤ k A k p k B k p∗ ,
which is known as the Hölder Inequality for Schatten norms.
Lecture 2. Mathematical Preliminaries – Part II 22

4. For any choice of linear operators A ∈ L (X3 , X4 ), B ∈ L (X2 , X3 ), and C ∈ L (X1 , X2 ), and any
choice of p ∈ [1, ∞], we have

k ABC k p ≤ k A k∞ k B k p k C k∞ .

It follows that
k AB k p ≤ k A k p k B k p (2.12)
for all choices of p ∈ [1, ∞] and operators A and B for which the product AB exists. The
property (2.12) is known as submultiplicativity.
5. It holds that
k A k p = k A ∗ k p = k AT k p = A p ,

for every A ∈ L (X , Y ).

2.4.1 The trace norm, Frobenius norm, and spectral norm


The Schatten 1-norm is more commonly called the trace norm, the Schatten 2-norm is also known
as the Frobenius norm, and the Schatten ∞-norm is called the spectral norm or operator norm. A
common notation for these norms is:

k·ktr = k·k1 , k·kF = k·k2 , and k·k = k·k∞ .

In this course we will generally write k·k rather than k·k∞ , but will not use the notation k·ktr
and k·kF .
To conclude this lecture, we will note just a few special properties of these three particular
norms:

1. The spectral norm. The spectral norm k·k = k·k∞ , also called the operator norm, is special in
several respects. It is the norm induced by the Euclidean norm, which is its defining property
(2.11). It satisfies the property
k A ∗ A k = k A k2
for every A ∈ L (X , Y ).
2. The Frobenius norm. Substituting p = 2 into the definition of k·k p we see that the Frobenius
norm k·k2 is given by q
∗ 1/2
k A k2 = [Tr ( A A)] = h A, Ai.
It is therefore the norm defined by the Hilbert-Schmidt inner product on L (X , Y ). In essence,
it is the norm that one obtains by thinking of elements of L (X , Y ) as ordinary vectors and
forgetting that they are operators:
s
k A k2 = k vec( A)k = ∑ | A( a, b)|2 ,
a,b

where a and b range over the indices of the matrix representation of A.


Lecture 2. Mathematical Preliminaries – Part II 23

3. The trace norm. Substituting p = 1 into the definition of k·k p we see that the trace norm k·k1 is
given by √ 
k A k1 = Tr A∗ A .

A convenient expression of k A k1 , for any operator of the form A ∈ L (X ), is

k A k1 = max{|h A, U i| : U ∈ U (X )}.

Another convenient fact about the trace norm is that it is monotonic:

k TrY ( A)k1 ≤ k A k1

for all A ∈ L (X ⊗ Y ). This is because

k TrY ( A)k1 = max {|h A, U ⊗ 1 Y i| : U ∈ U (X )}

while
k A k1 = max {|h A, U i| : U ∈ U (X ⊗ Y )} ;
and the inequality follows from the fact that the first maximum is taken over a subset of the
unitary operators for the second.
Finally, let us note that for any complex Euclidean space X and any choice of unit vectors
u, v ∈ X , we have q
∗ ∗
k uu − vv k1 = 2 1 − |hu, vi|2 . (2.13)
To see this, we note that the operator A = uu∗ − vv∗ is Hermitian and therefore normal, so
its singular values are the absolute values
p of its nonzero eigenvalues. It will therefore suffice
to prove that the eigenvalues of A are ± 1 − |hu, vi|2 , along with the eigenvalue 0 occurring
with multiplicity n − 2, where n = dim(X ). Given that Tr( A) = 0 and rank( A) ≤ 2, it is
evident that the eigenvalues of A are of the form ±λ for some λ ≥ 0, along with eigenvalue 0
with multiplicity n − 2. As
2λ2 = Tr( A2 ) = 2 − 2|hu, vi|2
p
we conclude λ = 1 − |hu, vi|2 , from which Equation (2.13) follows.
Theory of Quantum Information Lecture Notes
John Watrous (Institute for Quantum Computing, University of Waterloo)

Lecture 3
Registers, reduced states, and measurements

In this lecture we will begin to discuss the basic objects of quantum information, starting with
registers, states, and measurements.

3.1 Definitions of registers, reduced states, and measurements

3.1.1 Registers and register arrays


A quantum register, or just a register for short, is an abstract device in which quantum information
may be stored. Other common terms used in place of registers include qudits, particles, quantum
systems, and variations on these terms. Capital letters in sans serif font, such as X, Y, and Z,
sometimes with subscripts, will be used to refer to registers.
Speaking in formal terms, a register X is a pair (k, Σ), where k is a positive integer and Σ is a
finite, non-empty set. The number k is a label that identifies the register, while Σ is the classical state
set of the register. To a register having classical state set Σ, we associate the complex Euclidean
space C Σ . By default, the same letter will be used to refer to a given register and its associated
complex Euclidean, using a scripted font for the space—for example, the complex Euclidean space
associated with a register X will typically be denoted X .
Usually we will just refer to registers by their names, such as X, Y, and Z, and not as pairs of
the form (k, Σ). We just need some definition that makes clear that multiple registers may share
the same classical state set, and yet be distinct registers.
An ordered collection of distinct registers is a register array. To be more precise, a register
array is an n-tuple (X1 , . . . , Xn ) of registers, each of which has a distinct label. With a register
array (X1 , . . . , Xn ) we associate the tensor product space X1 ⊗ · · · ⊗ Xn , which is equated with the
complex Euclidean space C Σ1 ×···×Σn under the assumption that Xi = (ki , Σi ) for i = 1, . . . , n.

3.1.2 Reduced states


Recall, from Lecture 1, that a density operator is any positive semidefinite operator having trace
equal to one, and that for a complex Euclidean space X we write

D (X ) = {ρ ∈ Pos (X ) : Tr(ρ) = 1} .

A reduced state of either a register or a register array is a density operator acting on the complex
Euclidean space associated with the register or register array. When we refer to a state of a given
register, it should be interpreted that we are referring to the register’s reduced state.
A register array (X1 , . . . , Xn ) is said to be in a product reduced state, or just a product state, if its
reduced state has the form ρ1 ⊗ · · · ⊗ ρn for ρ1 ∈ D (X1 ) , . . . , ρn ∈ D (Xn ).

24
Lecture 3. Registers, reduced states, and measurements 25

3.1.3 Measurements
A measurement on a complex Euclidean space X is a function of the form

µ : Γ → Pos (X ) : a 7→ Pa , (3.1)

where Γ is a finite, nonempty set of measurement outcomes. This function must satisfy the constraint

∑ Pa = 1X .
a∈Γ

Often we express such a measurement in the form { Pa : a ∈ Γ}, with the understanding that it is a
collection indexed by Γ and not an unordered set of operators. Each operator Pa is the measurement
operator associated with the outcome a ∈ Γ.
When a measurement of the form (3.1) is applied to a register, or a register array, whose associ-
ated complex Euclidean space is X and whose reduced state is ρ ∈ D (X ), two things happen:
1. An element of Γ is selected at random, according to a distribution described by the probability
vector p ∈ R Γ defined by
p( a) = h Pa , ρi
for each a ∈ Γ.
2. The register, or register array, that was measured ceases to exist.
The assumption that the register, or register array, that is measured ceases to exist after the mea-
surement takes place creates no loss of generality. The notion of a nondestructive measurement,
which supposes that registers are not destroyed by being measured, and specifies their reduced
state after the measurements on them take place, can be formulated by combining measurements
as just defined with quantum operations (as we will discuss later).
A measurement of the form

µ : Γ1 × · · · × Γn → Pos (X1 ⊗ · · · ⊗ Xn ) ,

defined on a tensor product space X1 ⊗ · · · ⊗ Xn , is called a product measurement if there exist


measurements

µ1 : Γ1 → Pos (X1 ) ,
..
.
µn : Γn → Pos (Xn )

such that µ( a1 , . . . , an ) = µ1 ( a1 ) · · · µn ( an ) for all ( a1 , . . . , an ) ∈ Γ1 × · · · × Γn . We then write


µ = (µ1 , . . . , µn ) to denote that µ is a product measurement formed in this way.
Sometimes we will consider the situation in which a measurement is performed on some sub-
set of registers in a register array. For instance, we may have a pair of registers (X, Y ), and then
consider a measurement { Pa : a ∈ Γ} of just the register X. Supposing that the pair has a reduced
state ρ ∈ D (X ⊗ Y ), and a particular measurement outcome a ∈ Γ results from this measurement
on X, the question arises of what the reduced state of Y then becomes. The answer, which will be
justified shortly, is that the reduced state becomes
TrX [( Pa ⊗ 1 Y ) ρ]
.
h Pa ⊗ 1 Y , ρi
Lecture 3. Registers, reduced states, and measurements 26

3.2 A few observations about registers, reduced states, and measurements

3.2.1 Information complete measurements


Let us observe the simple fact that reduced states of registers are uniquely determined by the
measurement statistics they generate. More precisely, if we know the probability associated with
every outcome of every measurement that could be performed on a given register, then we know
that register’s reduced state.
In fact, something stronger may be said—which is that for any choice of a complex Euclidean
space X , there are choices of measurements that uniquely determine every reduced state by
the measurement statistics that they alone generate. Such measurements are called information-
complete measurements, and are characterized by the property that their measurement operators
span the space L (X ).
Proposition 3.1. Let X be a complex Euclidean space, and let
µ : Γ → Pos (X ) : a 7→ Pa
be a measurement on X with the property that the collection { Pa : a ∈ Γ} spans all of L (X ). Then the
mapping ρ 7→ pρ , defined by pρ ( a) = h Pa , ρi for all a ∈ Γ, is one-to-one on D (X ).
Proof. Let ρ, ξ ∈ D (X ) and assume pρ = pξ . By this assumption, we have h Pa , ρ − ξ i = 0 for all
a ∈ Γ. Now, as { Pa : a ∈ Γ} spans L (X ), we may choose complex numbers {α a : a ∈ Γ} so that
ρ−ξ = ∑ αa Pa.
a∈Γ
The proposition now follows from
k ρ − ξ k22 = hρ − ξ, ρ − ξ i = ∑ αa h Pa , ρ − ξ i = 0,
a∈Γ
along with the positive definite property of the Frobenius norm.
(The function ρ 7→ pρ in fact extends to a one-to-one function from L (X ) to C Γ in the most straight-
forward way.)
The following example gives one method to construct an information-complete measurement
on any complex Euclidean space X .
Example 3.2. Let Σ be any finite and nonempty set, and assume that the elements of Σ are ordered
in some fixed way. For each pair ( a, b) ∈ Σ × Σ, define an operator Q a,b ∈ L C Σ as follows:

 Ea,a
 if a = b
Q a,b = (ea + eb )(ea + eb )∗ if a < b


(ea + ieb )(ea + ieb ) if a > b

Each operator Q a,b is positive semidefinite, and the set {Q a,b : ( a, b) ∈ Σ × Σ} spans the space
L (X ). With the exception of the trivial case | Σ | = 1, the operator
Q= ∑ Q a,b
( a,b )∈Σ× Σ

differs from the identity operator, which means that {Q a,b : ( a, b) ∈ Σ × Σ} is not generally a
measurement. The operator Q is, however, positive definite, and by defining
Pa,b = Q−1/2 Q a,b Q−1/2
we have that { Pa,b : ( a, b) ∈ Σ × Σ} is an information-complete measurement.
Lecture 3. Registers, reduced states, and measurements 27

3.2.2 Nothing is hidden from product measurements


Consider a register array (X1 , . . . , Xn ) for some arbitrary positive integer n. The reduced states of
this register array correspond to D (X1 ⊗ · · · ⊗ Xn ), and include reduced states with complicated
correlations among the n registers.
Among the measurements on X1 ⊗ · · · ⊗ Xn , the product measurements represent a restricted
special case. Nevertheless, we have that every density operator ρ ∈ D (X1 ⊗ · · · ⊗ Xn ) is uniquely
determined by the probability distributions it generates over all product measurements. This
follows from the existence of an information-complete product measurement
µ : Γ1 × · · · × Γn → Pos (X1 ⊗ · · · ⊗ Xn )
on X1 ⊗ · · · ⊗ Xn .  To obtain such a measurement, it is sufficient (as well as necessary) to take
µ j : Γ j → Pos X j to be an information-complete measurement for each choice of j = 1, . . . , n,
and to let µ = (µ1 , . . . , µn ).
We have just made a very simple observation, but it nevertheless says something interesting
about quantum information—which is that nothing about a register array’s reduced state is hid-
den from independent measurements on the registers that comprise the register array. Let us
take care, however, to note that we are speaking in statistical terms, for one can generally not
determine the output statistics of a measurement without repeating it many times and settling
for approximations. In a situation in which just one instance of a measurement of the register ar-
ray (X1 , . . . , Xn ) is made available, a product measurement might provide very little information
about the reduced state of the register array.

3.2.3 Reduced states and the partial trace


Let us again consider reduced states of register arrays, focusing for a moment on a pair of reg-
isters (X1 , X2 ) for simplicity. At a given instant, the pair has a reduced state represented by
ρ ∈ D (X1 ⊗ X2 ), and every such ρ could potentially be the reduced state of the pair.
Now, assuming the reduced state of the pair (X1 , X2 ) is ρ ∈ D (X1 ⊗ X2 ), there is only one
possible reduced state of the register X1 alone that is consistent with ρ, and likewise for the reduced
state of X2 . These reduced states are denoted ρX1 and ρX2 , and are given by
ρX1 = TrX2 (ρ) and ρX2 = TrX1 (ρ) ,
respectively.
One justification for the claim that ρX1 is unique, and is given by the partial trace as above,
follows from the assumption that the mapping ρ 7→ ρX1 must be linear. Now, for a product state
ρ1 ⊗ ρ2 ∈ D (X1 ⊗ X2 ), the reduced state of X1 is necessarily ρ1 , and the only linear mapping that
satisfies ρ1 ⊗ ρ2 7→ ρ1 for all ρ1 ∈ D (X1 ) and ρ2 ∈ D (X2 ) is the partial trace. Similar reasoning
holds for X2 , and for register arrays with more than two registers.
Another way to justify the claim is as follows. Let µ1 : Γ1 → Pos (X1 ) and µ2 : Γ2 → Pos (X2 )
be measurements on X1 and X2 , and consider the product measurement µ = (µ1 , µ2 ) on X1 ⊗ X2 .
The distribution of the outcomes of this measurement, when performed on ρ ∈ D (X1 ⊗ X2 ), are
given by the probability vector
p ( a1 , a2 ) = h µ 1 ( a1 ) ⊗ µ 2 ( a2 ) , ρ i ,
and so the marginal distribution for the outcomes of the measurement on X1 is given by
p1 ( a1 ) = ∑ p( a1 , a2 ) = hµ1 ( a1 ) ⊗ 1 X2 , ρi = hµ1 ( a1 ), TrX2 (ρ)i .
a2 ∈ Γ 2
Lecture 3. Registers, reduced states, and measurements 28

As this holds for all measurements on X1 , we have that the reduced state of X1 must be given
by TrX2 (ρ).
One may extend this second line of reasoning in the following way. Suppose a1 ∈ Γ1 satisfies
p1 ( a1 ) 6= 0, and consider the distribution of outcomes on Γ2 that is obtained by conditioning on
the outcome of the measurement µ1 being a1 . This distribution is given by

( a1 ) p ( a1 , a2 ) h Pa1 ⊗ Pa2 , ρi D
(a )
E
p2 ( a2 ) = = = Pa2 , ρ2 1
p1 ( a1 ) h Pa1 ⊗ 1 X2 , ρi

for
( a1 ) TrX1 [( Pa1 ⊗ 1 X2 )ρ]
ρ2 = , (3.2)
h Pa1 ⊗ 1 X2 , ρi
and where we have denoted Pa1 = µ1 ( a1 ) and Pa2 = µ2 ( a2 ) for every a1 ∈ Γ1 and a2 ∈ Γ2 . If the
measurement µ1 is performed on X1 , and the outcome a1 ∈ Γ1 is obtained, we therefore have that
the reduced state of X2 conditioned on this outcome must be given by (3.2).

3.3 Observable differences between reduced states

3.3.1 The 1-norm distance between probability vectors


A natural way to measure the distance between probability vectors p, q ∈ R Γ is by the 1-norm:

k p − q k1 = ∑ | p(a) − q(a)| .
a∈Γ

It is easily verified that !


k p − q k1 = 2 max
∆⊆Γ
∑ p( a) − ∑ q( a) .
a∈∆ a∈∆

This is a natural measure of distance because it quantifies the optimal probability that two known
probability vectors can be distinguished, given a single sample from the distributions they specify.
As an example, let us consider a thought experiment involving two hypothetical people: Alice
and Bob. Two probability vectors p0 , p1 ∈ R Γ are fixed, and are considered to be known by both
Alice and Bob. Alice privately chooses a random bit b ∈ {0, 1}, uniformly at random, and uses
the value b to randomly chooses an element a ∈ Γ: if b = 0, she samples a according to p0 , and if
b = 1, she samples a according to p1 . The sampled element a ∈ Γ is given to Bob, whose goal is to
identify the value of Alice’s random bit b.
It is clear from Bayes’ Theorem what Bob should do to maximize his probability to correctly
guess the value of b: if p0 ( a) > p1 ( a), he should guess that b = 0, while if p0 ( a) < p1 ( a) he should
guess that b = 1. In case p0 ( a) = p1 ( a), Bob may as well guess that b = 0 or b = 1 arbitrarily, for
he has learned nothing at all about the value of b from such an element a ∈ Γ. Bob’s probability to
correctly identify the value of b using this strategy is

1 1
+ k p0 − p1 k1 ,
2 4
which can be verified by first calculating the probability Bob is correct minus the probability he is
incorrect, which is 12 k p0 − p1 k1 . Bob can do no better than this.
Lecture 3. Registers, reduced states, and measurements 29

3.3.2 Relationship to the trace norm distance


We will now establish a simple connection between the 1-norm distance between probability vec-
tors and the trace-norm distance between density operators. Let us begin by proving the following
simple proposition.

Proposition 3.3. Let X be a complex Euclidean space, let A ∈ Herm (X ) be a Hermitian operator, and
let { Pa : a ∈ Γ} be a measurement on X . Define v ∈ R Γ as v( a) = h Pa , Ai for each a ∈ Γ. Then
k v k 1 ≤ k A k1 .
Proof. We have * +
k v k1 = ∑ |h Pa , Ai| = ∑ αa Pa , A (3.3)
a∈Γ a∈Γ

for some choice of scalars α a ∈ {−1, +1}. For every unit vector u ∈ X it holds that
!
u∗ ∑ αa Pa u= ∑ αa h Pa , uu∗ i
a∈Γ a∈Γ

is a convex combination of the values {−1, +1}, and therefore lies in the interval [−1, 1]. As
∑ a∈Γ α a Pa is Hermitian, it follows that k ∑ a∈Γ α a Pa k ≤ 1, and therefore
* +

∑ α a Pa , A ≤ ∑ α a Pa k A k1 ≤ k A k1 .

a∈Γ a∈Γ

Combined with (3.3), this establishes the claimed fact.

This proposition is easily applied to a situation in which two density operators ρ0 , ρ1 ∈ D (X ),


and a measurement { Pa : a ∈ Γ} on X are given, and p0 , p1 ∈ R Γ are defined to be the probability
vectors that describe the outcome distributions of this measurement when it is performed on ρ0
and ρ1 , respectively: p0 ( a) = h Pa , ρ0 i and p1 ( a) = h Pa , ρ1 i for every a ∈ Γ. By substituting
v = p0 − p1 and A = ρ0 − ρ1 into the proposition, we obtain that k p0 − p1 k1 is at most k ρ0 − ρ1 k1 .
The question remains whether equality k p0 − p1 k1 = k ρ0 − ρ1 k1 can be achieved for some
choice of a measurement { Pa : a ∈ Γ}. Indeed it can, by a two-outcome measurement, defined as
follows:

1. Consider a spectral decomposition


n
ρ0 − ρ1 = ∑ λ j u j u∗j .
j =1

2. Define
P0 = ∑ u j u∗j and P1 = ∑ u j u∗j .
j:λ j ≥0 j:λ j <0

(The choice to include the terms u j u∗j for which λ j = 0 in P0 is arbitrary—they could be in-
cluded in P1 or split between P0 and P1 arbitrarily without changing the optimality of the
resulting measurement.)
Lecture 3. Registers, reduced states, and measurements 30

It holds that { P0 , P1 } is a measurement on X . Defining p0 , p1 ∈ R {0,1} as p0 ( a) = h Pa , ρ0 i and


p1 ( a) = h Pa , ρ1 i for a ∈ {0, 1}, we have that
n
k p0 − p1 k1 = |h P0 , ρ0 − ρ1 i| + |h P1 , ρ0 − ρ1 i| = ∑ λ j = k ρ0 − ρ1 k 1
j =1

as required.
To illustrate the connection we have just established, let us consider a thought experiment
similar to the one above, except with density operators ρ0 , ρ1 ∈ D (X ) in place of the probability
vectors p0 and p1 . Alice chooses a random bit b uniformly; and if b = 0, she prepares a register X in
the reduced state ρ0 ∈ D (X ) and hands it to Bob, and if b = 1, she does likewise with ρ1 ∈ D (X ).
To make a guess for Alice’s random bit b, Bob performs a measurement of X, to extract classical
information from it, to help him make a decision.
Regardless of what measurement Bob performs, he will obtain measurement outcomes, dis-
tributed according to p0 and p1 for the two cases, for which k p0 − p1 k1 ≤ k ρ0 − ρ1 k1 . His proba-
bility to correctly identify Alice’s random bit will therefore be at most
1 1
+ k ρ0 − ρ1 k1 . (3.4)
2 4
By choosing the particular measurement { P0 , P1 } defined above, and letting the outcome of the
measurement be his guess for b, we find that Bob’s probability of correctness achieves the value
(3.4), and is therefore optimal.

3.4 Distinguishing convex sets of density operators

The observations made above concerning the optimal probability with which two density opera-
tors can be distinguished can be generalized to arbitrary convex sets of density operators, as the
following theorem establishes.
Theorem 3.4. Let A0 , A1 ⊆ D (X ) be nonempty convex sets of density operators. Then there exists a
measurement { P0 , P1 } ⊂ Pos (X ) with the property that
1
h P0 − P1 , ρ0 − ρ1 i ≥ dist(A0 , A1 )
2
for all choices of ρ0 ∈ A0 and ρ1 ∈ A1 .
In the statement of the theorem, we have used the notation

dist(A0 , A1 ) = inf {k ρ0 − ρ1 k1 : ρ0 ∈ A0 , ρ1 ∈ A1 } .

What is important about the theorem is that the choice of a measurement { P0 , P1 } comes first, after
which the density operators ρ0 ∈ A0 and ρ1 ∈ A1 may be chosen arbitrarily.
To prove the theorem, we require a fundamental fact from convex analysis. It is one form of a
general type of fact that states that disjoint convex sets can be separated by a hyperplane.
Fact. Let A, B ⊆ Herm (X ) be nonempty, disjoint, convex sets such that B is open. Then there
exists a Hermitian operator H ∈ Herm (X ) and a real number α ∈ R such that

h H, Ai ≥ α > h H, Bi
for all A ∈ A and B ∈ B .
Lecture 3. Registers, reduced states, and measurements 31

Proof of Theorem 3.4. Let d = dist(A0 , A1 ). If d = 0, the theorem is trivially satisfied by the mea-
surement defined by P0 = P1 = 21 1. We will therefore consider just the case d > 0 for the remainder
of the proof.
Define
A = A 0 − A 1 = { ρ0 − ρ1 : ρ0 ∈ A 0 , ρ1 ∈ A 1 },
and let
B = { B ∈ Herm (X ) : k B k1 < 1}.
The sets A and B are nonempty, disjoint sets of Hermitian operators, both are convex, and B is
open. Using the fact from above, we have that there exists an operator H ∈ Herm (X ) and a scalar
α ∈ R such that
h H, Ai ≥ α > h H, Bi
for all A ∈ A and B ∈ B .
For every choice of B ∈ B we have that − B ∈ B as well, and therefore |h H, Bi| < α for all
B ∈ B . This implies first that α > 0, and second (using the fact that the trace norm and operator
norm are dual to one another) that k H k ≤ α/d.
Now let K = αd H, which is a Hermitian operator with k K k ≤ 1. Using a spectral decomposition
of K, we may write
K = K0 − K1
for K0 , K1 ∈ Pos (X ) satisfying K0 K1 = 0. (Sometimes this is called a Jordan decomposition of K. It
is different from and should not be confused with the Jordan Normal Form.) Given that k K k ≤ 1,
we have that K0 + K1 ≤ 1.
Finally, define

1 1
P0 = K0 + (1 − K0 − K1 ) and P1 = K1 + (1 − K0 − K1 ).
2 2
We have that { P0 , P1 } is a measurement on X , and it remains to verify that it has the properties
required to satisfy the theorem: for every choice of ρ0 ∈ A0 and ρ1 ∈ A1 we have

d
h P0 − P1 , ρ0 − ρ1 i = hK, ρ0 − ρ1 i = h H, ρ0 − ρ1 i ≥ d
α
as required.

We may apply this theorem to another variant of the thought experiment from above. As
before, Alice chooses b ∈ {0, 1} uniformly, but now is permitted to prepare a register X in any
reduced state ρ0 ∈ A0 in case b = 0 and any reduced state ρ1 ∈ A1 in case b = 1. Bob knows
A0 and A1 , but does not know which state ρ0 or ρ1 was selected by Alice. Nevertheless, he may
perform a measurement { P0 , P1 } which correctly determines the value of b with probability at
least
1 1
+ dist(A0 , A1 ),
2 4
even if Alice selected ρ0 or ρ1 with knowledge of { P0 , P1 } and in an attempt to minimize Bob’s
probability of correctness.
Theory of Quantum Information Lecture Notes
John Watrous (Institute for Quantum Computing, University of Waterloo)

Lecture 4
Pure states, purifications, and fidelity

4.1 Pure states and purifications

4.1.1 Definition of pure states


For a given complex Euclidean space X , a density operator ρ ∈ D (X ) is said to be pure if it has
rank equal to one, which is equivalent to it taking the form ρ = uu∗ for a unit vector u ∈ X . These
are precisely the extreme points of the set D (X ). As expected, a register X is said to be in a pure
state if its reduced state is represented by a pure density operator.
As is customary, when the reduced state of a given register is pure, we often represent that
state by a vector u rather than as a rank one positive semidefinite operator uu∗ . There is of course
a slight ambiguity in doing this, which is that any two vectors u and eiθ u that differ by a global
phase factor give rise to the same operator uu∗ .
There are many situations of interest where the properties of pure density operators are much
more easily understood than general density operators. We will see later, for example, that bipar-
tite entanglement is much simpler for pure states than it is for general reduced states.

4.1.2 Definition of purifications


Now suppose that X and Y are complex Euclidean space, and ρ ∈ D (X ) is an density operator.
A purification of ρ in X ⊗ Y is any pure density operator uu∗ ∈ Pos (X ⊗ Y ) for which
TrY (uu∗ ) = ρ.
As for pure states in general, we often refer to a given vector u, rather than the operator uu∗ , as
being a purification of a given density operator ρ.
More generally, we may view that a purification of ρ ∈ D (X ) is a pure density operator on any
tensor product space that includes X , and that leaves ρ when every space aside from X is traced
out. For instance, uu∗ ∈ Pos (Y1 ⊗ X ⊗ Y2 ) is a purification of ρ if it holds that TrY1 ⊗Y2 (uu∗ ) = ρ.
If the space X itself is a tensor product of other spaces, then a purification could also be a pure
density operator on a larger space with tensor factors interleaved among the factors of X . Up
to a trivial reordering of the factors in a tensor product space, however, there is nothing different
about such cases versus the simple one discussed above, where a purification takes the form uu∗ ∈
Pos (X ⊗ Y ). No generality will be lost in restricting our attention to this case.
It is convenient to extend the notion of a purification to positive semidefinite operators whose
trace may not be one. For an arbitrary positive semidefinite operator P ∈ Pos (X ), we say that a
rank one positive semidefinite operator uu∗ ∈ Pos (X ⊗ Y ), or a vector u ∈ X ⊗ Y , is a purifica-
tion of P if the equation
TrY (uu∗ ) = P
holds. Any such purification must satisfy the condition k u k2 = Tr( P), which generalizes the fact
that u must be a unit vector in case P is a density operator.

32
Lecture 4. Pure states, purifications, and fidelity 33

4.1.3 Properties of purifications


The following lemma will be helpful for proving various facts about purifications.

Lemma 4.1. Let X and Y be complex Euclidean spaces and let P ∈ Pos (X ) be a non-zero positive
semidefinite operator. Then the following facts hold.

1. There exists an operator A ∈ L (Y , X ) such that AA∗ = P if and only if dim(Y ) ≥ rank( P).
X ) that satisfies AA∗ = P, there exists an operator B ∈ L (Y , X ) such
2. For every operator A ∈ L (Y ,√
that BB∗ = Πim( P) and A = PB.
3. For any choice of operators A1 , A2 ∈ L (Y , X ) satisfying A1 A1∗ = A2 A2∗ = P, there exists a unitary
operator U ∈ L (Y ) such that A1 U = A2 .

Proof. Using the Spectral Theorem, we may write


r
P= ∑ λi xi xi∗ , (4.1)
i =1

where r = rank( P), λ1 , . . . , λr > 0, and { x1 , . . . , xr } ⊂ X is an orthonormal set. Given this


expression of P, it follows that
r
Πim( P) = ∑ xi xi∗ .
i =1

Now let us prove the first item. If it holds that dim(Y ) ≥ rank( P) = r, then there exists an
orthonormal set {y1 , . . . , yr } in Y . Defining
r
λi xi y∗i ∈ L (Y , X ) ,
p
A= ∑
i =1

we have that AA∗ = P. Conversely, if A ∈ L (X , Y ) satisfies AA∗ = P, then

dim(Y ) ≥ rank( A) = rank( AA∗ ) = rank( P).

We have therefore proved that the first item holds.


Next, let us prove the second item. Given an operator A ∈ L (Y , X ) satisfying√ AA∗√= P, and
given the above form (4.1) of P, we have that the singular values of A are λ1 , . . . , λr ; and
moreover we may take { x1 , . . . , xr } to be an orthonormal set of left singular vectors of A. That is,
there must exist an orthonormal set {y1 , . . . , yr } ⊂ Y for which
r
λi xi y∗i
p
A= ∑
i =1

is a singular value decomposition of A. Defining B ∈ L (Y , X ) as


r
B= ∑ xi y∗i ,
i =1

we have BB∗ = Πim( P) and A = PB as required.
Lecture 4. Pure states, purifications, and fidelity 34

Finally, let us prove the last item. As before, we have that there must exist orthonormal sets
{y1 , . . . , yr } and {z1 , . . . , zr } in Y for which
r r
λi xi y∗i λi xi z∗i
p p
A1 = ∑ and A2 = ∑
i =1 i =1

are singular value decompositions of A1 and A2 . These orthonormal sets may be extended to
orthonormal bases {y1 , . . . , yn } and {z1 , . . . , zn } of Y , where n = dim(Y ). It follows that
n
U= ∑ yi z∗i ∈ U (Y )
i =1

is a unitary operator that satisfies A1 U = A2 .

Now, with the above lemma in hand, it will be straightforward to prove the following theorems
about purifications.
Theorem 4.2. Let X and Y be complex Euclidean spaces and let P ∈ Pos (X ) be a positive semidefinite
operator. Then a purification of P exists in X ⊗ Y if and only if dim(Y ) ≥ rank( P).
Proof. The assumption dim(Y ) ≥ rank( P) implies, by item 1 of Lemma 4.1, that there exists an
operator A ∈ L (Y , X ) such that AA∗ = P. The vector vec( A) ∈ X ⊗ Y satisfies

TrY (vec( A) vec( A)∗ ) = AA∗ = P,

and is therefore a purification of P.


Conversely, assume that u ∈ X ⊗ Y is a purification of P, and let A ∈ L (Y , X ) be the unique
operator that satisfies vec( A) = u. Then

P = TrY (vec( A) vec( A)∗ ) = AA∗

and therefore dim(Y ) ≥ rank( P) follows from item 1 of Lemma 4.1.

Corollary 4.3. A purification of every P ∈ Pos (X ) exists in X ⊗ Y provided dim(Y ) ≥ dim(X ).


A purification of a given operator P ∈ Pos (X ) in X ⊗ Y is never unique unless Y = C. For
instance, if u ∈ X ⊗ Y purifies P then so does any vector given by (1 X ⊗ U )u for U ∈ U (Y ), as

TrY [(1 ⊗ U )uu∗ (1 ⊗ U ∗ )] = TrY (uu∗ ) .

The following theorem demonstrates that for any choice of u, these are the only possible purifica-
tions of P in X ⊗ Y .
Theorem 4.4 (Unitary equivalence of purifications). Suppose X and Y are complex Euclidean spaces
and P ∈ Pos (X ) is a positive semidefinite operator for which rank( P) ≤ dim(Y ). Then for any choice of
purifications u1 , u2 ∈ X ⊗ Y of P there exists a unitary operator V ∈ U (Y ) such that (1 ⊗ V )u1 = u2 .
Proof. Let A1 , A2 ∈ L (Y , X ) be the unique operators satisfying u1 = vec( A1 ) and u2 = vec( A2 ).
The assumptions of the theorem imply that A1 A1∗ = A2 A2∗ . It follows from item 3 of Lemma 4.1
that there exists a unitary operator U ∈ U (Y ) such that A1 U ∗ = A2 . Thus,

u2 = vec( A2 ) = vec( A1 U ∗ ) = (1 X ⊗ U ) vec( A1 ) = (1 X ⊗ U )u1 .

Letting V = U completes the proof.


Lecture 4. Pure states, purifications, and fidelity 35

4.2 Fidelity

There are several ways that one may quantify the similarity or difference between density oper-
ators. One way that relates closely to the notion of purifications is known as the fidelity. This
will turn out to be an extremely useful notion throughout the course, and you will see it used
extensively in the theory of quantum information.

4.2.1 Definition of the fidelity function


Given positive semidefinite operators P, Q ∈ Pos (X ), we define the fidelity between P and Q as
√ p
F( P, Q) = P Q .

1

Equivalently, q√ √
F( P, Q) = Tr PQ P.
Similar to purifications, it is common to see the fidelity defined only for density operators as
opposed to arbitrary positive semidefinite operators. It is, however, useful to extend the definition
to all positive semidefinite operators as we have done.

4.2.2 Basic properties of the fidelity


There are many interesting properties of the fidelity function. Let us begin with a few simple ones.
First, the fidelity is symmetric: F( P, Q) = F( Q, P) for all P, Q ∈ Pos (X ). This is clear from the
definition, given that k A k1 = k A∗ k1 for all operators A.
Next let us note the special case represented by the following example.
Example 4.5. Let u ∈ X be a unit vector Q ∈ Pos (X ) be any positive semidefinite operator.
√ and let√
It follows from the observation that λuu∗ = λuu∗ for all λ ≥ 0 that
p

p

F (uu , Q) = u Qu = Qu .

In particular, F (uu∗ , vv∗ ) = |hu, vi| for any choice of unit vectors u, v ∈ X .
One nice property of the fidelity that we will utilize several times is that it is multiplicative
with respect to tensor products. This fact is stated in the following proposition.
Proposition 4.6. Let P1 , Q1 ∈ Pos (X1 ) and P2 , Q2 ∈ Pos (X2 ) be positive semidefinite operators. Then
F( P1 ⊗ P2 , Q1 ⊗ Q2 ) = F( P1 , Q1 ) F( P2 , Q2 ).
Proof. We have
p p
F( P1 ⊗ P2 , Q1 ⊗ Q2 ) = P1 ⊗ P2 Q1 ⊗ Q2

1
p √  p p 
= P1 ⊗ P2 Q1 ⊗ Q2

1
p p √ p
= P1 Q1 ⊗ P2 Q2

p p √ p 1
= P1 Q1 P2 Q2

1 1
= F( P1 , Q1 ) F( P2 , Q2 )
as claimed.
Lecture 4. Pure states, purifications, and fidelity 36

4.2.3 Uhlmann’s Theorem


Next we will prove a fundamentally important theorem about the fidelity, known as Uhlmann’s
Theorem, which relates the fidelity to the notion of purifications.

Theorem 4.7 (Uhlmann’s Theorem). Let X and Y be complex Euclidean spaces, let P, Q ∈ Pos (X ) be
positive semidefinite operators, both having rank at most dim(Y ), and let u ∈ X ⊗ Y be any purification
of P. Then
F( P, Q) = max {|hu, vi| : v ∈ X ⊗ Y is a purification of Q} .

Proof. It follows from Lemma 4.1 that there exists an operator A ∈ L (X , Y√ ) such that A∗ A =
Πim( P) and such that the purification u ∈ X ⊗ Y of P takes the form u = vec( PA∗ ). Along sim-

ilar lines, there exists an operator B ∈ L (X , Y ) satisfying B∗ B = Πim( Q) such that vec( QB∗ ) ∈
X ⊗ Y purifies Q; and moreover every purification of Q in X ⊗ Y takes the form

v = vec( QB∗ U ∗ )
p

for some unitary operator U ∈ L (Y ). We therefore have that

max {|hu, vi| : v ∈ X ⊗ Y , TrY (vv∗ ) = Q}


n D√ E o
PA∗ , QB∗ U ∗ : U ∈ U (Y )
p
= max

n  √ p  o
= max Tr A P QB∗ U ∗ : U ∈ U (Y )

√ p
= A P QB∗ .

1

Now, given that A∗ A and B∗ B are orthogonal projections, we have k A k ≤ 1 and k B k ≤ 1.


Thus √ p √ p √ p
A P QB∗ ≤ k A k P Q k B∗ k ≤ P Q = F( P, Q).

1 1 1
On the other hand, because A∗ A = Πim( P) and B∗ B = Πim( Q) , we have
√ p √ p √ p √ p
F( P, Q) = P Q = A∗ A P QB∗ B ≤ k A∗ k A P QB∗ k B k ≤ A P QB∗ .

1 1 1 1

Thus, √ p
F( P, Q) = A P QB∗ = max {|hu, vi| : v ∈ X ⊗ Y , TrY (vv∗ ) = Q}

1
as required.

Various properties of the fidelity follow from Uhlmann’s Theorem. For example, it is clear
from the theorem that 0 ≤ F(ρ, ξ ) ≤ 1 for density operators ρ and ξ. Moreover F(ρ, ξ ) =√1 if and

only if ρ = ξ. It is also evident (from the definition) that F(ρ, ξ ) = 0 if and only if ρ ξ = 0,
which is equivalent to ρξ = 0 (i.e., to ρ and ξ having orthogonal images).
Another property of the fidelity that follows from Uhlmann’s Theorem is as follows.

Proposition 4.8. Let P1 , . . . , Pk , Q1 , . . . , Qk ∈ Pos (X ) be positive semidefinite operators. Then


!
k k k
F ∑ Pi , ∑ Qi ≥ ∑ F ( Pi , Qi ) .
i =1 i =1 i =1
Lecture 4. Pure states, purifications, and fidelity 37

Proof. Let Y be a complex Euclidean space having dimension at least that of X , and choose vectors
u1 , . . . , uk , v1 , . . . , vk ∈ X ⊗ Y satisfying TrY (ui u∗i ) = Pi , TrY (vi v∗i ) = Qi , and hui , vi i = F( Pi , Qi )
for each i = 1, . . . , k. Such vectors exist by Uhlmann’s Theorem. Let Z be a complex Euclidean
space with dimension k and define u, v ∈ X ⊗ Y ⊗ Z as

k k
u= ∑ u i ⊗ ei and v= ∑ vi ⊗ ei .
i =1 i =1

Then
k k
TrY ⊗Z (uu∗ ) = ∑ Pi and TrY ⊗Z (vv∗ ) = ∑ Qi .
i =1 i =1

Thus, again using Uhlmann’s Theorem, we have


!
k k k
F ∑ Pi , ∑ Qi ≥ |hu, vi| = ∑ F ( Pi , Qi )
i =1 i =1 i =1

as required.

It follows from this proposition is that the fidelity function is concave in the first argument:

F(λρ1 + (1 − λ)ρ2 , ξ ) ≥ λ F(ρ1 , ξ ) + (1 − λ) F(ρ2 , ξ )

for all ρ1 , ρ2 , ξ ∈ D (X ) and λ ∈ [0, 1], and by symmetry it is concave in the second argument as
well. In fact, the fidelity is jointly concave:

F(λρ1 + (1 − λ)ρ2 , λξ 1 + (1 − λ)ξ 2 ) ≥ λ F(ρ1 , ξ 1 ) + (1 − λ) F(ρ2 , ξ 2 ).

for all ρ1 , ρ2 , ξ 1 , ξ 2 ∈ D (X ) and λ ∈ [0, 1].

4.2.4 Alberti’s Theorem


A different characterization of the fidelity function is given by Alberti’s Theorem, which is as
follows.

Theorem 4.9 (Alberti). Let X be a complex Euclidean space and let P, Q ∈ Pos (X ) be positive semidefi-
nite operators. Then D E
(F( P, Q))2 = inf h R, Pi R−1 , Q .
R∈Pd(X )

To prove this theorem, it is helpful to start first with the special case that P = Q, which is
represented by the following lemma.

Lemma 4.10. Let P ∈ Pos (X ). Then


D E
inf h R, Pi R−1, P = (Tr( P))2 .
R∈Pd(X )
Lecture 4. Pure states, purifications, and fidelity 38

Proof. It is clear that D E


inf h R, Pi R−1 , P ≤ (Tr( P))2 ,
R∈Pd(X )

given that R = 1 is positive definite. To establish the reverse inequality, it suffices to prove that
D E
h R, Pi R−1, P ≥ (Tr( P))2

for any choice of R ∈ Pd (X ). This will follow from the simple observation that, for any choice of
positive real numbers α and β, we have α2 + β2 ≥ 2αβ and therefore αβ−1 + βα−1 ≥ 2. With this
fact in mind, consider a spectral decomposition
n
R= ∑ λi ui u∗i .
i =1

We have
D E
h R, Pi R−1 , P = ∑ λi λ− 1 ∗ ∗
j ( u i Pu i )( u j Pu j )
1≤ i,j≤ n

= ∑ (u∗i Pui )2 + ∑ (λi λ− 1 −1 ∗ ∗


j + λ j λ i )( u i Pu i )( u j Pu j )
1≤ i ≤ n 1≤ i < j ≤ n

≥ ∑ (u∗i Pui )2 +2 ∑ (u∗i Pui )(u∗j Pu j )


1≤ i ≤ n 1≤ i < j ≤ n

= (Tr( P))2

as required.

Proof of Theorem 4.9. We will first prove the theorem for P and Q positive definite. Let us define
S ∈ Pd (X ) to be
√ √ −1/4 √ √ √ √  −1/4
S= PQ P PR P PQ P .

Notice that as R ranges over all positive definite operators, so too does S. We have
 √ √  
1/2
S, PQ P = h R, Pi ,
 √ √ 1/2  D E
−1
S , PQ P = R −1 , Q .

Therefore, by Lemma 4.10, we have


D E  √ √ 1/2   −1 √ √ 1/2 
−1
inf h R, Pi R , Q = inf S, PQ P S , PQ P
R∈Pd(X ) S ∈Pd(X )
 p√ √ 2
= Tr PQ P

= (F( P, Q))2 .

To prove the general case, let us first note that, for any choice of R ∈ Pd (X ) and ε > 0, we have
D E D E
h R, Pi R−1 , Q ≤ h R, P + ε1 i R−1 , Q + ε1 .
Lecture 4. Pure states, purifications, and fidelity 39

Thus, D E
inf h R, Pi R−1 , Q ≤ (F( P + ε1, Q + ε1 ))2
R∈Pd(X )

for all ε > 0. As


lim F( P + ε1, Q + ε1 ) = F( P, Q)
ε →0+

we have D E
inf h R, Pi R−1, Q ≤ (F( P, Q))2 .
R∈Pd(X )

On the other hand, for any choice of R ∈ Pd (X ) we have


D E
h R, P + ε1 i R−1, Q + ε1 ≥ (F( P + ε1, Q + ε1 ))2 ≥ (F( P, Q))2

for all ε > 0, and therefore D E


h R, Pi R−1 , Q ≥ (F( P, Q))2 .

As this holds for all R ∈ Pd (X ) we have


D E
inf h R, Pi R−1, Q ≥ (F( P, Q))2 ,
R∈Pd(X )

which completes the proof.

4.2.5 Fuchs–van de Graaf Inequalities


We will now prove the Fuchs–van de Graaf Inequalities, which establish a close relationship be-
tween the trace norm of the difference between two density operators and their fidelity. We first
need the following lemma relating the trace norm and Frobenius norm.

Lemma 4.11. Let X be a complex Euclidean space and let P, Q ∈ Pos (X ) be positive semidefinite opera-
tors on X . Then √ p 2
k P − Q k1 ≥ P − Q .

2

Proof. Let
√ n
∑ λi ui u∗i
p
P− Q=
i =1
√ p √ p
be a spectral decomposition of P− Q. Given that P− Q is Hermitian, it follows that
n √
2 2
p
∑ i
| λ | = P − Q .


2
i =1

Now, define
n
U= ∑ sign(λi )ui u∗i
i =1

where 
1 if λ ≥ 0
sign(λ) =
−1 if λ < 0
Lecture 4. Pure states, purifications, and fidelity 40

for every real number λ. It follows that


√ p  √ p  n √ p

U P− Q = P− Q U = ∑ i i i
| λ | u u = P − Q .

i =1

Using the operator identity


1
A2 − B2 = (( A − B)( A + B) + ( A + B)( A − B)),
2
along with the fact that U is unitary, we have

k P − Q k1 ≥ | Tr (( P − Q)U )|
1  √ √  1  √ √

p p p p
= Tr ( P − Q)( P + Q)U + Tr ( P + Q)( P − Q)U

2 2
 √ p √ p 
= Tr P − Q P+ Q .

Now, by the triangle inequality (for real numbers), we have that


√ p  √
u∗i P + Q ui ≥ u∗i Pui − u∗i Qui = | λi |
p

for every i = 1, . . . , n. Thus


 √ p √ n √ n √
p  p  2 2
p
Tr P − Q P+ Q = ∑ | λi | u∗i P+ Q ui ≥ ∑ i
| λ | = P − Q


2
i =1 i =1

as required.

Theorem 4.12 (Fuchs–van de Graaf Inequalities). Let X be a complex Euclidean space and assume that
ρ, ξ ∈ D (X ) are density operators on X . Then
r
1 1
1 − k ρ − ξ k1 ≤ F(ρ, ξ ) ≤ 1 − k ρ − ξ k21 .
2 4
Proof. The operators ρ and ξ have unit trace, and therefore
√ √ √ p 
p 2 p 2
ρ − ξ = Tr ρ − ξ = 2 − 2 Tr ρ ξ ≥ 2 − 2 F(ρ, ξ ).

2

The first inequality therefore follows from Lemma 4.11.


To prove the second inequality, let Y be a complex Euclidean space with dim(Y ) = dim(X ),
and let u, v ∈ X ⊗ Y satisfy TrY (uu∗ ) = ρ, TrY (vv∗ ) = ξ, and F(ρ, ξ ) = |hu, vi|. Such vectors exist
as a consequence of Uhlmann’s Theorem. By the monotonicity of the trace norm we have
q q
∗ ∗
k ρ − ξ k1 ≤ k uu − vv k1 = 2 1 − |hu, vi| = 2 1 − F(ρ, ξ )2 ,
2

and therefore r
1
F(ρ, ξ ) ≤ 1− k ρ − ξ k21
4
as required.
Theory of Quantum Information Lecture Notes
John Watrous (Institute for Quantum Computing, University of Waterloo)

Lecture 5
Naimark’s Theorem; Characterization of quantum operations

Most of this lecture will be devoted to establishing characterizations of quantum operations, based
on different ways of representing super-operators. First, though, we will discuss one more impor-
tant fact about measurements, known as Naimark’s Theorem, that relates general measurements to
projective measurements.

5.1 Naimark’s Theorem

A measurement { Pa : a ∈ Γ} on a complex Euclidean space X is said to be a projective (or von Neu-


mann) measurement if it is the case that each measurement operator Pa is an orthogonal projection
operator on X . For such a measurement, it necessarily holds that Pa Pb = 0 for a 6= b. When we
refer to a measurement with respect to some orthonormal basis { xa : a ∈ Γ} of X , it is meant that
the measurement is given by { Pa : a ∈ Γ}, where Pa = xa x∗a for each a ∈ Γ.
There is a sense in which no generality is lost in considering only projective measurements:
every (general) measurement on a given register X can be realized as a projective measurement on
a pair of registers (X, Y ), provided that Y is large enough and initialized to a known pure state.
This fact will be easily established once we have proved Naimark’s Theorem, which is as follows.

Theorem 5.1 (Naimark’s Theorem). Let X be a complex Euclidean space, let { Pa : a ∈ Γ} ⊂ Pos (X )
be a measurement, and let Y = C Γ . Then there exists a linear isometry A ∈ U (X , X ⊗ Y ) such that

Pa = A∗ (1 X ⊗ Ea,a ) A

for every a ∈ Γ.

Proof. Define A ∈ L (X , X ⊗ Y ) as √
A= ∑ Pa ⊗ ea .
a∈Γ

We have that
A∗ A = ∑ Pa = 1X ,
a∈Γ

so A is a linear isometry, and A∗ (1 X ⊗ Ea,a ) A = Pa for each a ∈ Γ as required.

The theorem is also sometimes called Neumark’s Theorem. The two names refer to the same
individual, Mark Naimark, whose name has been transliterated in these two different ways.
It should be noted that the term Naimark’s Theorem typically refers to a more general theorem
than the one above, which only has this extremely simple proof for finite dimensional vector
spaces and measurements with a finite number of outcomes. Despite the theorem’s simplicity, as
stated above, it is important for understanding the relationship between general measurements
and projective measurements.

41
Lecture 5. Naimark’s Theorem; Characterization of quantum operations 42

Now let us return to the discussion from above, where an arbitrary measurement

{ Pa : a ∈ Γ} ⊂ Pos (X )

on a register X is to be simulated by a projective measurement on a pair of registers (X, Y ). To


apply Naimark’s Theorem we take Y = C Γ , and by the theorem we conclude that there exists a
linear isometry A ∈ U (X , X ⊗ Y ) such that

Pa = A∗ (1 X ⊗ Ea,a ) A

for every a ∈ Γ. For an arbitrary, but fixed, choice of a unit vector u ∈ Y , we may choose a unitary
operator U ∈ U (X ⊗ Y ) for which
A = U (1 X ⊗ u ).
Consider the projective measurement {Q a : a ∈ Γ} ⊂ Pos (X ⊗ Y ) defined as

Q a = U ∗ (1 X ⊗ Ea,a )U

for each a ∈ Γ. If a density operator ρ ∈ D (X ) is given, and the registers (X, Y ) are prepared in
the reduced state ρ ⊗ uu∗ , then this projective measurement results in each outcome a ∈ Γ with
probability
h Q a , ρ ⊗ uu∗ i = h A∗ (1 X ⊗ Ea,a ) A, ρi = h Pa , ρi .
The probability for each measurement outcome is therefore in agreement with the original mea-
surement { Pa : a ∈ Γ}.

5.2 Representations of super-operators

Next, we will move on to a discussion of four different super-operator representations. These


representations each have uses, and conversions among them can give insight into the basic prop-
erties of super-operators. Throughout this section we assume that X = C Σ and Y = C Γ are given
complex Euclidean spaces.

5.2.1 The natural representation


For every super-operator Φ ∈ T (X , Y ), it is clear that the mapping vec( X ) 7→ vec(Φ( X )) is
linear, as it can be represented as a composition of linear mappings. To be precise, there must exist
a linear operator K (Φ) ∈ L (X ⊗ X , Y ⊗ Y ) that satisfies

K (Φ) vec( X ) = vec(Φ( X ))

for all X ∈ L (X ). We will call the operator K (Φ) the natural representation of Φ. (It is also some-
times called the linear representation of Φ.) A concrete expression for K (Φ) is as follows:

K ( Φ) = ∑ ∑ h Ec,d , Φ( Ea,b )i Ec,a ⊗ Ed,b . (5.1)


a,b ∈Σ c,d∈Γ

To see that the above equation (5.1) holds, we simply compute:


!
K (Φ) vec( Ea,b ) = vec ∑ h Ec,d , Φ( Ea,b )i Ec,d = vec(Φ( Ea,b )),
c,d∈Γ
Lecture 5. Naimark’s Theorem; Characterization of quantum operations 43

from which it follows that K (Φ) vec( X ) = vec(Φ( X )) for all X by linearity.
Notice that the mapping K : T (X , Y ) → L (X ⊗ X , Y ⊗ Y ) is itself linear:
K (αΦ + βΨ) = αK (Φ) + βK (Ψ)
for all choices of α, β ∈ C and Φ, Ψ ∈ T (X , Y ). In fact, K is a linear bijection, as the above
form (5.1) makes clear that for every operator A ∈ L (X ⊗ X , Y ⊗ Y ) there exists a unique choice
of Φ ∈ T (X , Y ) for which A = K (Φ).
Recall that the adjoint of a super-operator Φ ∈ T (X , Y ) is the unique super-operator Φ∗ ∈
T (Y , X ) for which
hY, Φ( X )i = hΦ∗ (Y ), X i
for all X ∈ L (X ) and Y ∈ L (Y ). The natural representation perfectly reflects the notion of
adjoints: we have K (Φ∗ ) = (K (Φ))∗ for all super-operators Φ ∈ T (X , Y ).

5.2.2 The Choi-Jamiołkowski representation


The natural representation of super-operators specifies a straightforward way that an operator can
be associated with a super-operator. There is a different and less straightforward way that such an
association can be made that turns out to be very important in understanding completely positive
super-operators in particular. Specifically, let us define a mapping J : T (X , Y ) → L (Y ⊗ X ) as
J ( Φ) = ∑ Φ( Ea,b ) ⊗ Ea,b
a,b ∈Σ

for each Φ ∈ T (X , Y ). Alternately we may write


J (Φ) = (Φ ⊗ 1L(X ) ) (vec(1 X ) vec(1 X )∗ ) .
The operator J (Φ) is called the Choi-Jamiołkowski representation of Φ.
As for the natural representation, it is evident from the definition that J is a linear bijection.
Another way to see this is to note that the action of the super-operator Φ can be recovered from
the operator J (Φ) by means of the equation
Φ( X ) = TrX [ J (Φ) ( 1 Y ⊗ X T )] .

5.2.3 Kraus representations


Suppose that Φ ∈ T (X , Y ) is a given super-operator, and A1 , . . . , Ak , B1 , . . . , Bk ∈ L (X , Y ) are
operators for which
k
Φ( X ) = ∑ Ai XBi∗ (5.2)
i =1
for all X ∈ L (X ). Then the expression (5.2) is said to be a Kraus representation of the super-
operator Φ. It will be established shortly that Kraus representations exist for all super-operators.
Unlike the natural representation and Choi-Jamiołkowski representation, Kraus representations
are never unique.
Under the assumption that Φ is given by the above equation (5.2), it holds that
k
Φ ∗ (Y ) = ∑ A∗i YBi .
i =1

The term Kraus representation is sometimes reserved for the case that A j = Bj for each j =
1, . . . , k. Such a representation exists, as we will see shortly, if and only if Φ is completely positive.
Lecture 5. Naimark’s Theorem; Characterization of quantum operations 44

5.2.4 Stinespring representations


Finally, suppose that Φ ∈ T (X , Y ) is a given super-operator, Z is a complex Euclidean space, and
A, B ∈ L (X , Y ⊗ Z) are operators such that

Φ( X ) = TrZ ( AXB∗ ) (5.3)

for all X ∈ L (X ). Then the expression (5.3) is said to be a Stinespring representation of Φ. Similar
to the operator-sum representation, a Stinespring representation always exists for a given Φ and
is never unique.
If Φ ∈ T (X , Y ) is given by the above equation (5.3), then

Φ∗ (Y ) = A∗ (Y ⊗ 1 Z ) B.

(Expressions of this form are also sometimes referred to as Stinespring representations.)


Similar to Kraus representations, the term Stinespring representation is often reserved for the
case A = B. Once again, as we will see, such a representation exists if and only if Φ is completely
positive.

5.2.5 Relationships among the representations


The following proposition explains a simple way in which the four representations discussed
above relate to one another.

Proposition 5.2. Let k be a positive integer, let A1 , . . . , Ak , B1 , . . . , Bk ∈ L (X , Y ) be operators, and let


Φ ∈ T (X , Y ). Then the following are equivalent.

1. (Kraus representations.) It holds that


k
Φ( X ) = ∑ A j XB∗j
j =1

for all X ∈ L (X ).
2. (Stinespring representations.) For Z = C k and A, B ∈ L (X , Y ⊗ Z) defined as
k k
A= ∑ Aj ⊗ ej and B= ∑ Bj ⊗ e j ,
j =1 j =1

it holds that Φ( X ) = TrZ ( AXB∗ ) for all X ∈ L (X ).


3. (The natural representation.) It holds that
k
K ( Φ) = ∑ A j ⊗ Bj .
j =1

4. (The Choi-Jamiołkowski representation.) It holds that


k
J ( Φ) = ∑ vec( A j ) vec( Bj )∗ .
j =1
Lecture 5. Naimark’s Theorem; Characterization of quantum operations 45

Proof. The equivalence between items 1 and 2 is a straightforward calculation. The equivalence
between items 1 and 3 follows from the identity

vec( A j XB∗j ) = A j ⊗ Bj vec( X )




for each j = 1, . . . , k and every X ∈ L (X ). Finally, the equivalence between items 1 and 4 follows
from the expression
J (Φ) = (Φ ⊗ 1L(X ) ) (vec(1 X ) vec(1 X )∗ )
along with

( A j ⊗ 1 X ) vec(1 X ) = vec( A j ) and vec(1 X )∗ ( B∗j ⊗ 1 X ) = vec( Bj )∗

for each j = 1, . . . , k.

Various facts may be derived from the above proposition. For instance, it follows that every
super-operator Φ ∈ T (X , Y ) has a Kraus representation in which k = rank( J (Φ)) ≤ dim(X ⊗ Y ),
and similarly that every such Φ has a Stinespring representation in which dim(Z) = rank( J (Φ)).

5.3 Characterization of quantum operations

Now we are ready to characterize quantum operations in terms of their Choi-Jamiołkowski, Kraus,
and Stinespring representations. (The natural representation does not happen to help us with
respect to these particular characterizations. This is not surprising, because it essentially throws
away the operator structure of the input and output of super-operators.) We will begin with a
characterization of completely positive super-operators in terms of these representations.

Theorem 5.3. For every super-operator Φ ∈ T (X , Y ), the following are equivalent:

1. Φ is completely positive.
2. Φ ⊗ 1L(X ) is positive.
3. J (Φ) ∈ Pos (Y ⊗ X ).
4. There exists a positive integer k and operators A1 , . . . , Ak ∈ L (X , Y ) such that

k
Φ( X ) = ∑ Ai X A∗i (5.4)
i =1

for all X ∈ L (X ).
5. Item 4 holds for k = rank( J (Φ)).
6. There exists a complex Euclidean space Z and an operator A ∈ L (X , Y ⊗ Z) such that

Φ( X ) = TrZ ( AX A∗ )

for all X ∈ L (X ).
7. Item 6 holds for Z having dimension equal to the rank of J (Φ).
Lecture 5. Naimark’s Theorem; Characterization of quantum operations 46

Proof. Let us first note the following simple implications: item 1 implies item 2 by the definition
of complete positivity, item 5 implies item 7 by Proposition 5.2 (and likewise item 4 implies item
6), item 5 trivially implies item 4, and item 7 trivially implies item 6.
Super-operators of the form X 7→ AX A∗ are easily seen to be completely positive, and non-
negative linear combinations of completely positive super-operators are completely positive as
well. Item 4 therefore implies item 1. Along similar lines, the partial trace is completely positive
and completely positive super-operators are closed under composition. Item 6 therefore implies
item 1.
Now assume that Φ ⊗ 1L(X ) is positive. As

vec(1 X ) vec(1 X )∗ ∈ Pos (X ⊗ X ) ,

we have  
J (Φ) = Φ ⊗ 1L(X ) (vec(1 X ) vec(1 X )∗ ) ∈ Pos (Y ⊗ X ) .

Item 2 therefore implies item 3.


Finally, assume that J (Φ) ∈ Pos (Y ⊗ X ). By the Spectral Theorem, along with the fact that
every eigenvalue of a positive semidefinite operator is non-negative, we have that it is possible to
write
k
J ( Φ) = ∑ ui u∗i ,
i =1

for k = rank( J (Φ)) and for some choice of vectors u1 , . . . , uk ∈ Y ⊗ X . Define A1 , . . . , Ak ∈


L (X , Y ) so that vec( Ai ) = ui for i = 1, . . . , k. We therefore have
k
J ( Φ) = ∑ vec( Ai ) vec( Ai )∗ ,
i =1

and so by Proposition 5.2 we have that equation (5.4) holds for every X ∈ L (X ). Item 3 therefore
implies item 5.
The above implications prove the equivalence of items 1 through 7, and so the theorem is
proved.

Next is a theorem that characterizes the collection of trace-preserving super-operators. These


characterizations are straightforward, but it is nevertheless convenient to state them in a similar
style to those of Theorem 5.3.

Theorem 5.4. For every super-operator Φ ∈ T (X , Y ), the following are equivalent:

1. Φ is trace-preserving.
2. TrY ( J (Φ)) = 1 X .
3. There exists a Kraus representation
k
Φ( X ) = ∑ Ai XBi∗
i =1

of Φ for which the operators A1 , . . . , Ak , B1 , . . . , Bk ∈ L (X , Y ) satisfy


k
∑ A∗i Bi = 1X .
i =1
Lecture 5. Naimark’s Theorem; Characterization of quantum operations 47

4. For all Kraus representations


k
Φ( X ) = ∑ Ai XBi∗
i =1

of Φ, the operators A1 , . . . , Ak , B1 , . . . , Bk ∈ L (X , Y ) satisfy


k
∑ A∗i Bi = 1X .
i =1

5. There exists a Stinespring representation

Φ( X ) = TrZ ( AXB∗ )

of Φ for which the operators A, B ∈ L (X , Y ⊗ Z) satisfy A∗ B = 1 X .


6. For all Stinespring representations
Φ( X ) = TrZ ( AXB∗ )
of Φ, the operators A, B ∈ L (X , Y ⊗ Z) satisfy A∗ B = 1 X .
Proof. Let us first prove the equivalence of items 1, 5, and 6. First, suppose that that Φ is trace-
preserving, and suppose that
Φ( X ) = TrZ ( AXB∗ )
is a Stinespring representation of Φ. Then for every X ∈ L (X ) we have

h1 X , X i = Tr( X ) = Tr(Φ( X )) = Tr( AXB∗ ) = Tr( B∗ AX ) = h A∗ B, X i .

The equality A∗ B = 1 X follows, and so we have proved that item 1 implies item 6. Of course, item
6 trivially implies item 5. Finally, assume that

Φ( X ) = TrZ ( AXB∗ )

is a Stinespring representation of Φ for which A∗ B = 1 X . It follows that B∗ A = 1 X , and so for


every X ∈ L (X ) we have

Tr(Φ( X )) = Tr( AXB∗ ) = Tr( B∗ AX ) = Tr( X ).

Therefore item 5 implies item 1, and we have proved the equivalence of items 1, 5, and 6.
Along similar lines we may prove the equivalence of items 1, 2, 4, and 5. Specifically, if Φ is
trace-preserving, and if
k
Φ( X ) = ∑ A j XB∗j
j =1

is a Kraus representation of Φ, then for every X ∈ L (X ) we have


! ! * +
k k k
h1 X , X i = Tr( X ) = Tr(Φ( X )) = Tr ∑ A j XB∗j = Tr ∑ B∗j A j X = ∑ A∗j Bj , X .
j =1 j =1 j =1

The equality
k
∑ A∗ Bj = 1 X
j =1
Lecture 5. Naimark’s Theorem; Characterization of quantum operations 48

follows, and so we have proved that item 1 implies item 4. Like before, we have that item 4
trivially implies item 3. Now suppose
k
Φ( X ) = ∑ A j XB∗j
j =1

is a Kraus representation of Φ for which ∑kj=1 A∗j Bj = 1 X . Then ∑kj=1 B∗j A j = 1 X , and by Proposi-
tion 5.2 we have
k
J ( Φ) = ∑ vec( A j ) vec( Bj )∗ .
j =1

Thus
k  T
TrY ( J (Φ)) = ∑ B∗j A j = 1X ,
j =1

and so item 3 implies item 2. Finally, assume TrY ( J (Φ)) = 1 X . Then

∑ Tr(Φ( Ea,b )) Ea,b = 1 X .


a,b ∈Σ

We conclude that 
1 if a = b
Tr(Φ( Ea,b )) =
0 if a 6= b,
so by linearity we have that Φ preserves trace. Thus item 2 implies item 1, and the theorem is
proved.

The two theorems are now easily combined to give the following characterization of admissible
super-operators.
Corollary 5.5. Let Φ ∈ T (X , Y ). Then the following are equivalent:
1. Φ is admissible.
2. J (Φ) ∈ Pos (Y ⊗ X ) and TrY ( J (Φ)) = 1 X .
3. There exists a positive integer k and operators A1 , . . . , Ak ∈ L (X , Y ) such that
k
Φ( X ) = ∑ Ai X A∗i
i =1

for all X ∈ L (X ), and such that


k
∑ A∗i Ai = 1X .
i =1

4. Item 3 holds for k = rank( J (Φ)).


5. There exists a complex Euclidean space Z and a linear isometry A ∈ U (X , Y ⊗ Z), such that

Φ( X ) = TrZ ( AX A∗ )

for all X ∈ L (X ).
6. Item 5 holds for a complex Euclidean space Z with dim(Z) = rank( J (Φ)).
Theory of Quantum Information Lecture Notes
John Watrous (Institute for Quantum Computing, University of Waterloo)

Lecture 6
Examples of states, measurements, and operations

The purpose of this lecture is to discuss a selection of examples of states, measurements, and oper-
ations. We will see some of these examples a bit later in the course, and the last one (teleportation)
is very important in quantum information theory. You should, however, not read too much into
the particular selection of examples in this lecture—it is convenient to discuss these examples now,
for the sake of lectures coming up, but there are of course many important examples that we will
not discuss at the present time.

6.1 Convex combinations of states, measurements, and operations

I assume that you are familiar with convexity and the basic notions that concern it. Recall that
a subset C ⊆ V of some vector space V is convex if, for all choices of x, y ∈ C and λ ∈ [0, 1], it
holds that λx + (1 − λ)y ∈ C . When we refer to a convex combination of elements from some subset
A ⊆ V of a vector space V , we are referring to a (finite) linear combination of the form
N
∑ pj xj ,
j =1

where x1 , . . . , x N ∈ A and ( p1 , . . . , p N ) is a probability vector. The set of all such convex combina-
tions is called the convex hull of A, and is denoted conv(A). The set conv(A) is obviously convex,
and is the smallest convex set that contains A (meaning that any other convex set that contains A
must also contain conv(A)).
We have, of course, seen several examples of convex sets thus far in the course. For example,
for any choice of complex Euclidean space X , the set X itself, as well as L (X ), Herm (X ), Pos (X ),
and D (X ), are all obviously convex.
Convex combinations of admissible super-operators are also admissible: for any choice of ad-
missible super-operators Φ, Ψ ∈ T (X , Y ), and any real number λ ∈ [0, 1], it holds that the super-
operator
λΦ + (1 − λ)Ψ ∈ T (X , Y )
is also admissible. One can verify this in many ways, one of which is to consider the Choi-
Jamiołkowski representation:

J (λΦ + (1 − λ)Ψ) = λJ (Φ) + (1 − λ) J (Ψ).

Given that Φ and Ψ are completely positive, we have J (Φ), J (Ψ) ∈ Pos (Y ⊗ X ), and because
Pos (Y ⊗ X ) is convex it holds that J (λΦ + (1 − λ)Ψ) ∈ Pos (Y ⊗ X ). Thus, λΦ + (1 − λ)Ψ is
completely positive. The fact that λΦ + (1 − λ)Ψ preserves trace is immediate by linearity.
For any choice of a complex Euclidean space X and a finite, non-empty set Γ, the collection of
all measurements of the form
µ : Γ → Pos (X )

49
Lecture 6. Examples of states, measurements, and operations 50

is convex, assuming that addition and scalar multiplication are defined point-wise. That is, if
µ1 , µ2 : Γ → Pos (X ) are measurements and λ ∈ [0, 1], then then function

λµ1 + (1 − λ)µ2 : Γ → Pos (X ) : a 7→ λµ1 ( a) + (1 − λ)µ2 ( a)

also defines a measurement, as is easily verified.

6.2 Mixed-unitary operations

Suppose X and Y are arbitrary complex Euclidean spaces. It is clear that every linear isometry
A ∈ U (X , Y ) gives rise to an admissible super-operator Φ ∈ T (X , Y ) defined as

Φ( X ) = AX A∗

for all X ∈ L (X ). Such a super-operator represents an isometric quantum operation. Along similar
lines, an admissible super-operator Φ ∈ T (X ) defined as

Φ( X ) = UXU ∗

for a unitary operator U ∈ U (X ) represents a unitary quantum operation.


It follows from the discussion above that the convex hull of any collection of admissible super-
operators is admissible as well. The convex hull of the set of super-operators representing unitary
quantum operations is the set of mixed unitary super-operators. It is also called the set of random
unitary super-operators, although this term seems to cause confusion with a different concept of a
“random unitary operator”.
In other terms, a super-operator Φ ∈ T (X ) is a mixed unitary super-operator if there exists a
positive integer N, a collection U1 , . . . , UN ∈ U (X ) of unitary operators on X , and a probability
vector ( p1 , . . . , p N ) such that
N
Φ( X ) = ∑ pj Uj XUj∗ (6.1)
j =1

for all X ∈ L (X ).
A natural question to ask about mixed unitary super-operators is: for a given mixed-unitary
super-operator Φ, how large does N need to be in an expression of Φ in the form (6.1)? Initially it
is tempting to answer that N = dim(X )2 suffices, given what we learned in the previous lecture.
This is not necessarily so, however—for although any super-operator acting on L (X ) has a Kraus
representation with at most dim(X )2 terms, we cannot guarantee that the corresponding Kraus
operators are unitary.
Instead, we will need the following fundamental theorem about convex hulls, which places
a bound on the number of elements over which convex combinations must be taken to express
elements in convex hulls.

Theorem 6.1 (Carathéodory’s Theorem). Let V be an n-dimensional real vector space and let A ⊆ V be
any non-empty subset of V . Then every vector u ∈ conv(A) can be written as
n +1
u= ∑ pj xj
j =1

for x1 , . . . , xn+1 ∈ A and ( p1 , . . . , pn+1 ) a probability vector.


Lecture 6. Examples of states, measurements, and operations 51

To apply this theorem to the case at hand, we must identify a real vector space in which we
are working. For each unitary operator U ∈ U (X ), let us write ΦU ∈ T (X ) to denote the super-
operator defined by
ΦU ( X ) = UXU ∗
for all X ∈ L (X ). Then we have

J (ΦU ) = vec(U ) vec(U )∗ ⊆ Herm (X ⊗ X ) .

The set Herm (X ⊗ X ) is a real vector space with dimension n4 , for n = dim(X ), and is (essen-
tially) the real vector space we are looking for.
Now, for any mixed-unitary super-operator Φ, we have

J (Φ) ∈ conv {vec(U ) vec(U )∗ : U ∈ U (U )} ⊆ Herm (X ⊗ X ) .

By Carathéodory’s Theorem we may therefore write


N
J ( Φ) = ∑ pj vec(Uj ) vec(Uj )∗ ,
j =1

for some choice of unitary operators U1 , . . . , UN ∈ U (X ) and a probability vector ( p1 , . . . , p N ), for


N = n4 + 1. Thus, we have
N
Φ( X ) = ∑ pj Uj XUj∗ (6.2)
j =1

for the same value of N.


It is possible to improve this bound slightly by observing that the subspace

span {vec(U ) vec(U )∗ : U ∈ U (X )}

in fact has dimension n4 − 2n2 + 1, owing to the fact that there are linear constraints that each
operator vec(U ) vec(U )∗ must satisfy. An expression of the form (6.2) is therefore possible for
every mixed-unitary super-operator Φ ∈ T (X ) for N = n4 − 2n2 + 2.

6.3 Non-destructive measurements

Sometimes it is convenient to consider measurements that do not destroy registers, but rather
leave them in some reduced state that may depend on the measurement outcome that is obtained
from the measurement. We will refer to such processes as non-destructive measurements.
Formally, a non-destructive measurement on a space X is a function

ν : Γ → L (X ) : a 7→ Ma ,

for some finite, non-empty set of measurement outcomes Γ, that satisfies the constraint

∑ Ma∗ Ma = 1X .
a∈Γ

When a non-destructive measurement of this form is applied to a register X that has reduced state
ρ ∈ D (X ), two things happen:
Lecture 6. Examples of states, measurements, and operations 52

1. Each measurement outcome a ∈ Γ occurs with probability h Ma∗ Ma , ρi.


2. Conditioned on the measurement outcome a ∈ Γ occurring, the reduced state of the register X
becomes
Ma ρMa∗
.
h Ma∗ Ma , ρi
Naturally, the effect of ν on a larger collection of registers that includes X is determined by the
above definition along with the usual way of using tensor products.
Let us now observe that non-destructive measurements really do not require new definitions,
but can be formed from a composition of an operation and a measurement as we defined them
initially. To do this, we essentially just follow the proof of Naimark’s Theorem. Specifically, let
Y = C Γ and define an operator A ∈ L (X , X ⊗ Y ) as

A= ∑ Ma ⊗ ea .
a∈Γ

We have that
A∗ A = ∑ Ma∗ Ma = 1X ,
a∈Γ

which shows that A is a linear isometry. Therefore, the super-operator X 7→ AX A∗ from L (X )


to L (X ⊗ Y ) is admissible. Now consider that this operation is followed by a measurement of Y
with respect to its standard basis. Each outcome a ∈ Γ appears with probability

h1 X ⊗ Ea,a , AρA∗ i = h Ma∗ Ma , ρi ,

and conditioned on the outcome a ∈ Γ appearing, the reduced state of X becomes

TrY [(1 X ⊗ Ea,a ) AρA∗ ] Ma ρMa∗



=
h1 X ⊗ Ea,a , AρA i h Ma∗ Ma , ρi
as required.

6.4 Generalized Pauli operators and Bell states

For any positive integer n, we define

Z n = {0, 1, . . . , n − 1},

and view this set as a ring with respect to addition and multiplication defined modulo n. Let us
also define
ωn = exp(2πi/n)
to be a principal n-th root of unity, which will typically be denoted ω rather than ωn when n has
been fixed or is clear from the context.
Now, for a fixed choice of n, let X = CZn , and define two unitary operators X, Z ∈ U (X ) as
follows:
X = ∑ Ea+1,a and Z = ∑ ω a Ea,a .
a ∈Z n a ∈Z n

Here, and throughout this section, the expression a + 1 refers to addition in Z n , and similar for
other arithmetic expressions involving elements of Z n . For any choice of ( a, b) ∈ Z2n we may
Lecture 6. Examples of states, measurements, and operations 53

consider the operator X a Z b . Such operators are known as generalized Pauli operators (or discrete
Weyl operators).
Let us note a few basic facts about the collection

{X a Z b : ( a, b) ∈ Z2n }. (6.3)

First, it is straightforward to show that



a b n if a = b = 0
Tr( X Z ) =
0 otherwise.

This implies that the collection (6.3) forms an orthogonal basis for L (X ), because
D E      n if ( a, b) = (c, d)
a b c d −b − a c d c− a d−b
X Z , X Z = Tr Z X X Z = Tr X Z =
0 otherwise.

Finally, we note the commutation relation

ZX = ωXZ,

which (for instance) implies

( X a Z b )( X c Z d ) = ω bc X a+c Z b+d = ω bc− ad ( X c Z d )( X a Z b ).

The generalized Bell basis is an orthonormal basis of X ⊗ X that is derived from the generalized
Pauli operators by means of the vec mapping:
 
1 a b 2
√ vec( X Z ) : ( a, b) ∈ Z n . (6.4)
n

6.5 The completely depolarizing operation

Let X = C Σ be a complex Euclidean space, let n = | Σ |, and consider the super-operator Ω ∈ T (X )


defined as follows:
1
Ω( X ) = Tr( X )1 X .
n
This is the completely depolarizing operation, which sends every density operator to the maximally
mixed state 1 X /n. The Choi-Jamiołkowski representation of Ω is
1 1
J (Ω) = ∑ Ω( Ea,b ) ⊗ Ea,b =
n ∑ 1X ⊗ Ea,a = n 1X ⊗X . (6.5)
a,b ∈Σ a∈Σ

It is clear from this expression that Ω is admissible.


Now let us verify that Ω is a mixed-unitary operation. We assume that a correspondence
between the sets Σ and Z n has been fixed, and we identify the space X with CZn by means of this
correspondence. Having done this, we will verify that
1    ∗
Ω( A) = ∑ X a Zb A X a Zb
n2 a,b ∈Z2n

for all A ∈ L (X ).
Lecture 6. Examples of states, measurements, and operations 54

One way to do this is to consider the Choi-Jamiołkowski representation of the super-operator


Φ defined by
1    ∗
Φ( A ) = 2 ∑ X a Z b A X a Z b .
n a,b∈Z2
n

We have ∗
1    1
J ( Φ) = ∑ vec X a Z b vec X a Z b = 1 X ⊗X ,
n2 a,b ∈Z2
n
n

where the second equality follows from the fact that (6.4) is an orthonormal basis of X ⊗ X . By
(6.5) it follows that Φ = Ω as claimed.
Another way to verify this, which was suggested in the lecture, is to verify that

c d 1 a b c d −b − a 1 bc− ad c d 1 if (c, d) = (0, 0)
Φ( X Z ) = 2 ∑ X Z X Z Z X = 2 ∑ ω X Z =
n 2 n 2
0 otherwise.
a,b ∈Z n a,b ∈Z n

As {X c Z d : (c, d) ∈ Z n } forms a basis of L (X ), we have Φ = Ω by linearity.

6.6 Teleportation

The last example for this lecture is quantum teleportation, with which I assume you are familiar.
Here let us note that teleportation works exactly as you would expect for registers of arbitrary
dimension n, by using the generalized Pauli operators and generalized Bell basis. So, just in case
you were wondering if teleportation works for 8,675,309-dimensional registers, the answer is yes.
Suppose that X1 , X2 , and X3 are registers whose associated complex Euclidean spaces X1 , X2 ,
and X3 are isomorphic copies of X = CZn . Assume that the pair of registers (X2 , X3 ) is initialized
to a maximally entangled pure state

1 1
√ vec(1 X ) = √
n n
∑ ea ⊗ ea ,
a ∈Z n

while X1 has some reduced state ρ. It is our assumption that one party, Alice, possesses X1 and
X2 , while a second party, Bob, possesses X3 . The goal is for Alice to teleport the state of X1 to X3 by
making use of the above shared state of (X2 , X3 ) along with classical communication. The registers
X1 and X2 are destroyed in the process.
Suppose that Alice measures the pair (X1 , X2 ) with respect to the generalized Bell basis
 
1
√ vec( X a Z b ) : ( a, b) ∈ Z2n .
n

We wish to describe the reduced state of X3 conditioned on the outcome ( a, b). Let us first note
that
(vec(1 )∗ ⊗ 1 ) (u ⊗ vec(1 )) = ∑ (e∗a ⊗ e∗a ⊗ 1 ) (u ⊗ eb ⊗ eb ) = ∑ hea , ui ea = u,
a,b a

for every vector u ∈ X , and consequently

(vec(1 )∗ ⊗ 1 ) [ξ ⊗ vec(1 ) vec(1 )∗ ] (vec(1 ) ⊗ 1 ) = ξ


Lecture 6. Examples of states, measurements, and operations 55

for every density operator ξ ∈ D (X ). Making use of the identity


 
vec( X a Z b ) = X a Z b ⊗ 1 vec(1 ),

we therefore have that


   
vec( X a Z b )∗ ⊗ 1 [ρ ⊗ vec(1 ) vec(1 )∗ ] vec( X a Z b ) ⊗ 1
h ∗   i
= (vec(1 )∗ ⊗ 1 ) X a Z b ρ X a Z b ⊗ vec(1 ) vec(1 )∗ (vec(1 ) ⊗ 1 )
 ∗  
= X a Zb ρ X a Zb .

Now, by including the normalization factors, we see that Alice’s measurement results in each
outcome ( a, b) ∈ Z2n with probability

1  ∗   1
a b a b
Tr X Z ρ X Z = 2,
n2 n
and conditioned on the outcome ( a, b), the reduced state of X3 becomes
 ∗  
X a Zb ρ X a Zb .

Alice may therefore transmit the measurement outcome ( a, b) to Bob, who then applies the unitary
operation X a Z b to X3 to recover the original reduced state ρ of X1 .
Notice that the above process does more than to set the reduced state of X3 to ρ. In fact, it
functions as a quantum channel that effectively acts as the identity super-operator mapping L (X1 )
to L (X3 ). In other words, if Alice’s register X1 was initially entangled or correlated with some
other register Y, the register X3 will have exactly the same relationship to Y after teleportation.
This must be so because (i) the above process can be described as an admissible super-operator of
the form
Φ ∈ T (X1 , X3 ) ,
and (ii) this super-operator Φ is in perfect agreement with the identity super-operator for all
choices of ρ ∈ D (X1 ). (Super-operators that agree precisely on all density operator inputs must
of course be equal as super-operators.)
Theory of Quantum Information Lecture Notes
John Watrous (Institute for Quantum Computing, University of Waterloo)

Lecture 7
Entropy and compression

For the next several lectures we will be discussing the von Neumann entropy and various concepts
relating to it. This lecture is intended to introduce the notion of entropy and its connection to
compression.

7.1 Shannon entropy

Before we discuss the von Neumann entropy, we will take a few moments to discuss the Shannon
entropy. This is a purely classical notion, but it is appropriate to start here. The Shannon entropy of
a probability vector p ∈ R Σ is defined as follows:
H ( p) = − ∑ p( a) log ( p( a)).
a∈Σ
p( a)>0

Here, and always in this course, the base of the logarithm is 2. (We will write ln(α) if we wish
to refer to the natural logarithm of a real number α.) It is typical to express the Shannon entropy
slightly more concisely as
H ( p) = − ∑ p( a) log ( p( a)),
a∈Σ
which is meaningful if we make the interpretation 0 log(0) = 0. This is sensible given that
lim α log(α) = 0.
α ↓0

There is no reason why we cannot extend the definition of the Shannon entropy to arbitrary vectors
with nonnegative entries if it is useful to do this—but mostly we will focus on probability vectors.
There are standard ways to interpret the Shannon entropy. For instance, the quantity H ( p) can
be viewed as a measure of the amount of uncertainty in a random experiment described by the
probability vector p, or as a measure of the amount of information one gains by learning the value
of such an experiment. Indeed, it is possible to start with simple axioms for what a measure of
uncertainty or information should satisfy, and to derive from these axioms that such a measure
must be equivalent to the Shannon entropy.
Something to keep in mind, however, when using these interpretations as a guide, is that the
Shannon entropy is usually only a meaningful measure of uncertainty in an asymptotic sense—as
the number of experiments becomes large. When a small number of samples from some experi-
ment is considered, the Shannon entropy may not conform to your intuition about uncertainty, as
the following example is meant to demonstrate.
2
Example 7.1. Let Σ = {0, 1, . . . , 2m }, and define a probability vector p ∈ R Σ as follows:
(
1 − m1 a = 0,
p( a) = 1 − m2 2
m2 1 ≤ a ≤ 2m .

56
Lecture 7. Entropy and compression 57

Then H ( p) > m, and yet the outcome 0 appears with probability 1 − 1/m. So, as m grows, we be-
come more and more “certain” that the outcome will be 0, and yet the “uncertainty” (as measured
by the entropy) goes to infinity.

The above example does not, of course, represent a paradox. The issue is simply that the
Shannon entropy can only be interpreted as measuring uncertainty if the number of random ex-
periments grows and the probability vector remains fixed, which is opposite to the example.

7.2 Classical compression and Shannon’s Source Coding Theorem

Let us now focus on an important use of the Shannon entropy, which involves the notion of a
compression scheme. This will allow us to attach a concrete meaning to the Shannon entropy.

7.2.1 Compression schemes


Let p ∈ R Σ be a probability vector as above. For a positive integer n and real numbers α > 0 and
δ ∈ (0, 1), let us say that a pair of mappings

f : Σn → {0, 1}⌊ αn⌋


g : {0, 1}⌊ αn⌋ → Σn

forms an (n, α, δ)-compression scheme for p if it holds that

Pr [ g( f ( a1 · · · an )) = a1 · · · an ] > 1 − δ, (7.1)

where the probability is over random choices of a1 , . . . , an ∈ Σ, each chosen independently accord-
ing to the probability vector p.
To understand what a compression scheme means at an intuitive level, let us imagine the fol-
lowing situation between two people: Alice and Bob. Alice has a device of some sort with a button
on it, and when she presses the button she gets an element of Σ, distributed according to p, inde-
pendent of any prior outputs of the device. She presses the button n times, obtaining outcomes
a1 · · · an , and she wants to communicate these outcomes to Bob using as few bits of communica-
tion as possible. So, what Alice does is to compress a1 · · · an into a string of ⌊αn⌋ bits by computing
f ( a1 · · · an ). She sends the resulting bitstring f ( a1 · · · an ) to Bob, who then decompresses by ap-
plying g, therefore obtaining g( f ( a1 · · · an )). Naturally they hope that g( f ( a1 · · · an )) = a1 · · · an ,
which means that Bob will have obtained the correct sequence a1 · · · an .
The quantity δ is a bound on the probability the compression scheme makes an error. We may
view that the pair ( f , g) works correctly for a string a1 · · · an ∈ Σn if g( f ( a1 · · · an )) = a1 · · · an , so
the above equation (7.1) is equivalent to the condition that the pair ( f , g) works correctly with
high probability (assuming δ is small).

7.2.2 Statement of Shannon’s Source Coding Theorem


In the discussion above, the number α represents the average number of bits the compression
scheme needs in order to represent each sample from the distribution described by p. It is obvious
that compression schemes will exist for some numbers α and not others. The particular values of
α for which it is possible to come up with a compression scheme are closely related to the Shannon
entropy H ( p), as the following theorem establishes.
Lecture 7. Entropy and compression 58

Theorem 7.2 (Shannon’s Source Coding Theorem). Let Σ be a finite, non-empty set, let p ∈ R Σ be a
probability vector, let α > 0, and let δ ∈ (0, 1). Then the following statements hold.
1. If α > H ( p), then there exists an (n, α, δ)-compression scheme for p for all but finitely many choices of
n ≥ 1.
2. If α < H ( p), then there exists an (n, α, δ)-compression scheme for p for at most finitely many choices
of n ≥ 1.
It is not a mistake, by the way, that both statements hold for any fixed choice of δ ∈ (0, 1), re-
gardless of whether it is close to 0 or 1 for instance. This will make perfect sense when we see the
proof.
It should be mentioned that the above statement of Shannon’s Source Coding Theorem is spe-
cific to the somewhat simplified (fixed-length) notion of compression that we have defined. It is
more common, in fact, to consider variable-length compressions and to state Shannon’s Source
Coding Theorem in terms of the average length of compressed strings. The reason why we restrict
our attention to fixed-length compression schemes is that this sort of scheme will be more natural
when we turn to the quantum setting.

7.2.3 Typical strings


Before we can prove the above theorem, we will need to develop the notion of a typical string. For
a given probability vector p ∈ R Σ , positive integer n, and positive real number ε, we say that a
string a1 · · · an ∈ Σn is ǫ-typical (with respect to p) if
2−n( H ( p)+ε) < p( a1 ) · · · p( an ) < 2−n( H ( p)−ε).
We will need to refer to the set of all ε-typical strings of a given length repeatedly, so let us give
this set a name:
n o
Tn,ε ( p) = a1 · · · an ∈ Σn : 2−n( H ( p)+ε) < p( a1 ) · · · p( an ) < 2−n( H ( p)−ε) .

When the probability vector p is understood from context we write Tn,ε rather than Tn,ε ( p).
The following lemma establishes that a random selection of a string a1 · · · an is very likely to
be ε-typical as n gets large.
Lemma 7.3. Let p ∈ R Σ be a probability vector and let ε > 0. Then
lim
n→∞
∑ p ( a1 ) · · · p ( a n ) = 1
a1 ··· an ∈ Tn,ε ( p)

Proof. Let Y1 , . . . , Yn be independent and identically distributed random variables defined as fol-
lows: we choose a ∈ Σ randomly according to the probability vector p, and then let the output
value be the real number − log( p( a)) for whichever value of a was selected. It holds that the
expected value of each Yj is
E[Yj ] = − ∑ p(a) log( p(a)) = H ( p).
a∈Σ

The conclusion of the lemma may now be written


" #
1 n
lim Pr ∑ Yj − E[Yj ] ≥ ε = 0,

n→∞ n j =1

which is true by the (weak) law of large numbers.


Lecture 7. Entropy and compression 59

Based on the previous lemma, it is straightforward to place upper and lower bounds on the
number of ε-typical strings, as shown in the following lemma.

Lemma 7.4. Let p ∈ R Σ be a probability vector and let ε be a positive real number. Then for all but finitely
many positive integers n it holds that

(1 − ε)2n( H ( p)−ε) < | Tn,ε | < 2n( H ( p)+ε).

Proof. The upper bound holds for all n. Specifically, by the definition of ε-typical, we have

1≥ ∑ p( a1 ) · · · p( an ) > 2−n( H ( p)+ε) | Tn,ε | ,


a1 ···an ∈ Tn,ε

and therefore | Tn,ε | < 2n( H ( p)+ε).


For the lower bound, let us choose n0 so that

∑ p ( a1 ) · · · p ( a n ) > 1 − ε
a1 ···an ∈ Tn,ε

for all n ≥ n0 , which is possible by Lemma 7.3. Then for all n ≥ n0 we have

1−ε < ∑ p( a1 ) · · · p( an ) < | Tn,ε | 2−n( H ( p)−ε),


a1 ···an ∈ Tn,ε

and therefore | Tn,ε | > (1 − ε)2n( H ( p)−ε), which completes the proof.

7.2.4 Proof of Shannon’s Source Coding Theorem


We now have the necessary tools to prove Shannon’s Source Coding theorem. Having developed
some basic properties of typical strings, the proof is very simple: a good compression function is
obtained by simply assigning a unique binary string to each typical string, with every other string
mapped arbitrarily. On the other hand, any compression scheme that fails to account for a large
fraction of the typical strings will be shown to fail with very high probability.

Proof of Theorem 7.2. First assume that α > H ( p), and choose ε > 0 so that α > H ( p) + 2ε. For
every choice of n > 1/ε we therefore have that

⌊ αn⌋ > n( H ( p) + ε).

Now, because
| Tn,ε | < 2n( H ( p)+ε) < 2⌊αn⌋ ,
we may defined a function f : Σn → {0, 1}⌊αn⌋ that is 1-to-1 when restricted to Tn,ε , and we may
define g : {0, 1}⌊αn⌋ → Σn appropriately so that g( f ( a1 · · · an )) = a1 · · · an for every a1 · · · an ∈ Tn,ε .
As
Pr [ g( f ( a1 · · · an )) = a1 · · · an ] ≥ Pr[a1 · · · an ∈ Tn,ε ] = ∑ p ( a1 ) · · · p ( a n ) ,
a1 ··· an ∈ Tn,ε

we have that this quantity is greater than 1 − δ for sufficiently large n.


Now let us prove the second item, where we assume α < H ( p). It is clear from the definition
of an (n, α, δ)-compression scheme that such a scheme can only work correctly for at most 2⌊αn⌋
Lecture 7. Entropy and compression 60

strings a1 · · · an . Let us suppose such a scheme is given for each n, and let Gn ⊆ Σn be the collection
of strings on which the appropriate scheme works correctly. If we can show that

lim Pr[a1 · · · an ∈ Gn ] = 0
n→∞

then we will be finished.


Toward this goal, let us note that for every n and ε, we have

Pr[a1 · · · an ∈ Gn ] ≤ Pr[a1 · · · an ∈ Gn ∩ Tn,ε ] + Pr[a1 · · · an 6∈ Tn,ε ]


≤ | Gn | 2−n( H ( p)−ε) + Pr[a1 · · · an 6∈ Tn,ε ].

Choose ε > 0 so that α < H ( p) − ε. Then

lim | Gn | 2−n( H ( p)−ε) = 0.


n→∞

As
lim Pr[a1 · · · an 6∈ Tn,ε ] = 0
n→∞

by Lemma 7.3, we have


lim Pr[a1 · · · an ∈ Gn ] = 0
n→∞

as required.

7.3 Von Neumann entropy

Next we will discuss the von Neumann entropy, which may be viewed as a quantum information-
theoretic analogue of the Shannon entropy. We will spend the next few lectures after this one
discussing the properties of the von Neumann entropy as well as some of its uses—but for now
let us just focus on the definition.
Let X be a complex Euclidean space, let n = dim(X ), and let ρ ∈ D (X ) be a density operator.
The von Neumann entropy of ρ is defined as

S(ρ) = H (λ(ρ)),

where λ(ρ) = (λ1 (ρ), . . . , λn (ρ)) is the vector of eigenvalues of ρ. An equivalent expression is

S(ρ) = − Tr(ρ log(ρ)),

where log(ρ) is the Hermitian operator that has exactly the same eigenvectors as ρ, and we take the
base 2 logarithm of the corresponding eigenvalues. Technically speaking, log(ρ) is only defined
for ρ positive definite, but ρ log(ρ) may be defined for all positive semidefinite ρ by interpreting
0 log(0) as 0, just like in the definition of the Shannon entropy.

7.4 Quantum compression

There are some ways in which the von Neumann entropy is similar to the Shannon entropy and
some ways in which it is very different. One way in which they are quite similar is in their rela-
tionships to notions of compression.
Lecture 7. Entropy and compression 61

7.4.1 Informal discussion of quantum compression


To explain quantum compression, let us imagine a scenario between Alice and Bob that is similar
to the classical scenario we discussed in relation to classical compression. We imagine that Alice
has a collection of identical registers X1 , X2 , . . . , Xn , whose associated complex Euclidean spaces
X1 , . . . , Xn are equivalent to some space X = C Σ . She wants to compress the contents of these
registers into ⌊αn⌋ qubits, for some choice of α > 0, and to send those qubits to Bob. Bob will then
decompress the qubits to (hopefully) obtain registers X1 , X2 , . . . , Xn with little disturbance to their
initial state.
It will not gnenerally be possible for Alice to do this without some assumption on the reduced
state of X1 , X2 , . . . , Xn . Our assumption will be analogous to the classical case: we assume that
the reduced states of these registers are independent and described by some density operator
ρ ∈ D (X ) (as opposed to a probability vector p ∈ R Σ ). That is, the reduced state of the sequence
will be assumed to be
ρ⊗n ∈ D (X1 ⊗ · · · ⊗ Xn ) .
What we will show is that for large n, compression will be possible for α > S(ρ) and impossible
for α < S(ρ).
To speak more precisely about what is meant by quantum compression and decompression,
let us consider that α > 0 has been fixed, let m = ⌊αn⌋, and let Y1 , . . . , Ym be qubit registers whose
associated spaces Y1 , . . . , Ym are isomorphic to Y = C {0,1} . Alice’s compression mapping will be
an admissible super-operator

Φ ∈ T (X1 ⊗ · · · ⊗ Xn , Y1 ⊗ · · · ⊗ Ym )

and Bob’s decompression will be an admissible super-operator

Ψ ∈ T (Y1 ⊗ · · · ⊗ Ym , X1 ⊗ · · · ⊗ Xn ) .

Now, we need to be careful about how we measure the accuracy of quantum compression
schemes. Our assumption on the reduced state of X1 , X2 , . . . , Xn does not rule out the existence of
other registers that they may be entangled or otherwise correlated with—so let us imagine that
there exists another register Z, and that the initial state of (X1 , X2 , . . . , Xn , Z) is

ξ ∈ D (X1 ⊗ · · · ⊗ Xn ⊗ Z) .

When Alice compresses and Bob decompresses X1 , . . . , Xn , the resulting state of (X1 , X2 , . . . , Xn , Z)
is given by  
ΨΦ ⊗ 1L(Z) (ξ ).

For the compression to be successful, we require that this density operator is close to ξ. This must
in fact hold for all choices of Z and ξ, provided of course that the assumption TrZ (ξ ) = ρ⊗n is
met. There is nothing unreasonable about this assumption—it is the natural quantum analogue to
requiring that g( f ( a1 · · · an )) = a1 · · · an for classical compression.
It might seem complicated that we have to worry about all possible spaces Z and all ξ ∈
D (X1 ⊗ · · · ⊗ Xn ⊗ Z) that satisfy TrZ (ξ ) = ρ⊗n , but in fact it will be simple if we make use of the
notion of channel fidelity.
Lecture 7. Entropy and compression 62

7.4.2 Quantum channel fidelity


Consider an admissible super-operator Ξ ∈ T (W ) for some complex Euclidean space W , and let
σ ∈ D (W ) be a density operator on this space. Let Z be any complex Euclidean space, and let
u ∈ W ⊗ Z be any purification of σ. Then the fidelity
    rD   E
F uu∗ , Ξ ⊗ 1L(Z) (uu∗ ) = uu∗ , Ξ ⊗ 1L(Z) (uu∗ )

represents one way of measuring the quality with which Φ acts trivially on the state uu∗ .
Now, any purification u ∈ W ⊗ Z of σ must take the form
√ 
u = vec σB

for some operator B ∈ L (Z , W ) satisfying BB∗ = Πim(σ) . Assuming that

k
Ξ( X ) = ∑ Ai X A∗i
j =1

is a Kraus representation of Ξ, it therefore holds that


v v
u k
√ √ 2
    u u k
F uu∗ , Ξ ⊗ 1L(Z) (uu∗ ) = t ∑ σB, A j σB = t ∑ σ, A j 2 .
u

j =1 j =1

It is not surprising that this quantity is independent of the particular purification of σ that was
chosen. We define the quantity to be the channel fidelity of Ξ with σ:
v
u k
u
2
Fchannel (Ξ, σ) = t ∑ σ, A j .
j =1

Another expression for the channel fidelity is


q
Fchannel (Ξ, σ) = hvec(σ) vec(σ)∗ , J (Ξ)i.

The channel fidelity Fchannel (Ξ, σ) places a lower bound on the fidelity of the input and output
of a given channel Ξ provided that it acts on a part of a larger system whose reduced state is σ.
Although the above discussion only considered pure states of a larger system, it is clear from
Uhlmann’s Theorem that the minimal input/output fidelity is achieved on a pure state.

7.4.3 Schumacher’s Quantum Source Coding Theorem


We now have the required tools to establish the relationship between the von Neumann entropy
and quantum compression that was discussed earlier in the lecture. Using the same notation that
was introduced above, let us say that a pair of admissible super-operators

Φ ∈ T (X1 ⊗ · · · ⊗ Xn , Y1 ⊗ · · · ⊗ Ym ) ,
Ψ ∈ T (Y1 ⊗ · · · ⊗ Ym , X1 ⊗ · · · ⊗ Xn )
Lecture 7. Entropy and compression 63

is an (n, α, δ)-quantum compression scheme for ρ ∈ D (X ) if

Fchannel ΨΦ, ρ⊗n > 1 − δ.




The following theorem, which is the quantum analogue to Shannon’s Source Coding Theorem,
establishes conditions on α for which quantum compression is possible and impossible.

Theorem 7.5 (Schumacher). Let ρ ∈ D (X ) be a density operator, let α > 0 and let δ ∈ (0, 1). Then the
following statements hold.

1. If α > S(ρ), then there exists an (n, α, δ)-quantum compression scheme for ρ for all but finitely many
choices of n ≥ 1.
2. If α < S(ρ), then there exists an (n, α, δ)-quantum compression scheme for ρ for at most finitely many
choices of n ≥ 1.

Proof. Assume first that α > S(ρ). We begin by defining a quantum analogue of the set of typical
strings, which is the typical subspace. This notion is based on a spectral decomposition

ρ= ∑ p(a)ua u∗a .
a∈Σ

As p is a probability vector, we may consider for each n ≥ 1 the set of ε-typical strings Tn,ε ⊆ Σn
for this distribution. In particular, we form the projection onto the typical subspace:

Πn,ε = ∑ u a1 u∗a1 ⊗ · · · ⊗ u an u∗an .


a1 ···an ∈ Tn,ε

Notice that
Πn,ε , ρ⊗n = ∑


p ( a1 ) · · · p ( a n ) ,
a1 ··· an ∈ Tn,ε

and therefore
lim Πn,ε , ρ⊗m = 1,


n→∞

for every choice of ε > 0.


We can now move on to describing a sequence of compression schemes that will suffice to
prove the theorem, provided that α > S(ρ) = H ( p). By Shannon’s Source Coding Theorem (or, to
be more precise, our proof of that theorem) we may assume, for sufficiently large n, that we have
a classical (n, α, ε)-compression scheme ( f , g) for p that satisfies

g( f ( a1 · · · an )) = a1 · · · an

for all a1 · · · an ∈ Tn,ε . Define a linear operator

A ∈ L (X1 ⊗ · · · ⊗ Xn , Y1 ⊗ · · · ⊗ Ym )

as
A= ∑ e f ( a1 ··· an ) u∗a1 ··· an ,
a1 ··· an ∈ Tn,ε

where we have used the shorthand u a1 ··· an = u a1 ⊗ · · · ⊗ u an for each a1 · · · an ∈ Σn . Notice that

A∗ A = Πn,ε .
Lecture 7. Entropy and compression 64

Now, the super-operator defined by X 7→ AX A∗ is completely positive but generally not ad-
missible. However, it is sub-admissible, by which it is meant that there must exist a completely
positive super-operator Ξ for which

Φ( X ) = AX A∗ + Ξ( X ) (7.2)

is admissible. For instance, we may take

Ξ( X ) = h1 − Πn,ε , X i σ

for some arbitrary choice of σ ∈ D (Y1 ⊗ · · · ⊗ Ym ). Likewise, the super-operator Y 7→ A∗ YA


is also sub-admissible, meaning that there must exist a completely positive super-operator ∆ for
which
Ψ(Y ) = A∗ YA + ∆(Y ) (7.3)
is admissible.
It remains to argue that, for sufficiently large n, that the pair (Φ, Ψ) is an (n, α, δ)-quantum
compression scheme for any constant δ > 0. From the above expressions (7.2) and (7.3) it is clear
that there exists a Kraus representation of ΨΦ having the form
k
(ΨΦ)( X ) = ( A∗ A) X ( A∗ A)∗ + ∑ Bj XB∗j
j =1

for some collection of operators B1 , . . . , Bk that we do not really care about. It follows that

Fchannel (ΨΦ, ρ⊗n ) ≥ ρ⊗n , A∗ A = ρ⊗n , Πn,ε .




This quantity approaches 1 in the limit, as we have observed, and therefore for sufficiently large n
it must hold that (Φ, Ψ) is an (n, α, δ) quantum compression scheme.
Now consider the case where α < S(ρ). Note that if Πn ∈ Pos (X1 ⊗ · · · ⊗ Xn ) is a projection
with rank at most 2n(S(ρ)−ε) for each n ≥ 1, then

lim Πn , ρ⊗n = 0.


(7.4)
n→∞

This is because, for any positive semidefinite operator P, the maximum value of hΠ, Pi over all
choices of orthogonal projections Π with rank(Π) ≤ r is precisely the sum of the r largest eigen-
values of P. The eigenvalues of ρ⊗n are the values p( a1 ) · · · p( an ) over all choices of a1 · · · an ∈ Σn ,
so for each n we have
Π n , ρ ⊗ n ≤ ∑ p ( a1 ) · · · p ( a n )


a1 ··· an ∈ Gn

for some set Gn of size at most 2n(S(ρ)−ε). At this point the equation (7.4) follows by similar rea-
soning to the proof of Theorem 7.2.
Now let us suppose, for each n ≥ 1 and for m = ⌊αn⌋, that

Φn ∈ T (X1 ⊗ · · · ⊗ Xn , Y1 ⊗ · · · ⊗ Ym ) ,
Ψn ∈ T (Y1 ⊗ · · · ⊗ Ym , X1 ⊗ · · · ⊗ Xn )

are admissible super-operators. Our goal is to prove that (Φn , Ψn ) fails as a quantum compression
scheme for all sufficiently large values of n.
Lecture 7. Entropy and compression 65

Fix n ≥ 1, and consider Kraus representations

k k
Φn ( X ) = ∑ A j X A∗j and Ψn ( X ) = ∑ Bj XB∗j ,
j =1 j =1

where

A1 , . . . , Ak ∈ L (X1 ⊗ · · · ⊗ Xn , Y1 ⊗ · · · ⊗ Ym ) ,
B1 , . . . , Bk ∈ L (Y1 ⊗ · · · ⊗ Ym , X1 ⊗ · · · ⊗ Xn ) ,

and where the assumption that they have the same number of terms is easily made without loss
of generality. Let Π j be the projection onto the range of Bj for each j = 1, . . . k, and note that it
obviously holds that
rank(Π j ) ≤ dim(Y1 ⊗ · · · ⊗ Ym ) = 2⌊αn⌋ .
By the Cauchy-Schwarz inequality, we have
2
Fchannel (Ψn Φn , ρ⊗n )2 = ρ⊗ n , B j A i


i,j
D p p E 2
= ∑ Π j ρ⊗ n , B j A i ρ⊗ n

i,j
 
≤ ∑ Π j , ρ⊗n Tr Bj Ai ρ⊗n A∗i B∗j .

i,j

As  
Tr Bj Ai ρ⊗n A∗i B∗j ≥ 0

for each i, j, and  


∑ Tr Bj Ai ρ⊗n A∗i B∗j = Tr(ΨΦ(ρ⊗n )) = 1,
i,j

it follows that
Fchannel (Ψn Φn , ρ⊗n )2 ∈ conv Π j , ρ⊗ n


: j = 1, . . . , k .
As each Π j has rank at most 2⌊αn⌋ , it follows that

lim Fchannel (Ψn Φn , ρ⊗n ) = 0.


n→∞

So, for all but finitely many choices of n, the pair (Φn , Ψn ) fails to be an (n, α, δ) quantum com-
pression scheme.
Theory of Quantum Information Lecture Notes
John Watrous (Institute for Quantum Computing, University of Waterloo)

Lecture 8
Properties of the von Neumann entropy

In the previous lecture we defined the Shannon and von Neumann entropy functions, and estab-
lished the fundamental connection between these functions and the notion of compression. In this
lecture and the next we will look more closely at the von Neumann entropy in order to establish
some basic properties of this function, as well as an important related function called the quantum
relative entropy.

8.1 Continuity of von Neumann entropy

The first property we will establish about the von Neumann entropy is that it is continuous every-
where on its domain.

8.1.1 Continuity of the Shannon entropy


First, let us define a real valued function η : [0, ∞) → R as follows:

−λ ln(λ) λ > 0
η (λ) =
0 λ = 0.
This function is continuous everywhere on its domain, and derivatives of all orders exist for all
positive real numbers. In particular we have η ′ (λ) = −(1 + ln(λ)) and η ′′ (λ) = −1/λ. A plot of
the function η is shown in Figure 8.1, and its first derivative η ′ is plotted in Figure 8.2.

(1/e, 1/e)

(1, 0)

Figure 8.1: A plot of the function η (λ) = −λ ln(λ).

The fact that η is continuous on [0, ∞) implies that for every finite, nonempty set Σ the Shannon
entropy is continuous at every point on [0, ∞)Σ , as
1
H ( p) =
ln(2) ∑ η ( p(a)).
a∈Σ

66
Lecture 8. Properties of the von Neumann entropy 67

(1/e, 0)

Figure 8.2: A plot of the function η ′ (λ) = −(1 + ln(λ)).

We are usually only interested in H ( p) for probability vectors p, but of course the function is
defined on vectors having nonnegative real entries.

8.1.2 Continuity of von Neumann entropy


Now, to prove that the von Neumann entropy is continuous we will first prove the following fun-
damental theorem, which establishes one specific sense in which the eigenvalues of a Hermitian
operator vary continuously as a function of the operator.
Theorem 8.1. Let X be a complex Euclidean space and let A, B ∈ Herm (X ) be Hermitian operators.
Then
k λ( A) − λ( B)k1 ≤ k A − B k1 .
To prove this theorem, we need another fact about eigenvalues of operators, but this one we
will take as given. (You can find proofs in several books on matrix analysis.)
Theorem 8.2 (Weyl’s Monotonicity Theorem). Let X be a complex Euclidean space and let A, B ∈
Herm (X ) satisfy A ≤ B. Then λ j ( A) ≤ λ j ( B) for 1 ≤ j ≤ dim(X ).
Proof of Theorem 8.1. Let n = dim(X ). Using the spectral decomposition of A − B, it is possible to
define two positive semidefinite operators P, Q ∈ Pos (X ) such that:
1. PQ = 0, and

2. P − Q = A − B.
(An expression of a given Hermitian operator as P − Q for such a choice of P and Q is called a
Jordan decomposition of that operator.) Notice that k A − B k1 = Tr( P) + Tr( Q).
Now, define one more Hermitian operator

X = P + B = Q + A.

We have X ≥ A, and therefore λ j ( X ) ≥ λ j ( A) for 1 ≤ j ≤ n by Weyl’s Monotonicity Theorem.


Similarly, it holds that λ j ( X ) ≥ λ j ( B) for 1 ≤ j ≤ n = dim(X ). By considering the two possible
cases λ j ( A) ≥ λ j ( B) and λ j ( A) ≤ λ j ( B), we therefore find that

λ j ( A) − λ j ( B) ≤ 2λ j ( X ) − (λ j ( A) + λ j ( B))
Lecture 8. Properties of the von Neumann entropy 68

for 1 ≤ j ≤ n. Thus,
n
k λ( A) − λ( B)k1 = ∑ λ j ( A) − λ j ( B) ≥ Tr(2X − A − B) = Tr( P + Q) = k A − B k1
j =1

as required.

With the above fact in hand, it is immediate from the expression S( P) = H (λ( P)) that the
von Neumann entropy is continuous (as it is a composition of two continuous functions).

Theorem 8.3. For every complex Euclidean space X , the von Neumann entropy S( P) is continuous at
every point P ∈ Pos (X ).

8.1.3 Fannes’ Inequality


Let us next prove Fannes Inequality, which may be viewed as a quantitative statement concerning
the continuity of the von Neumann entropy. To begin we need to use calculus to prove a fact about
the function η.

Lemma 8.4. Suppose α and β are real numbers satisfying 0 ≤ α ≤ β ≤ 1 and β − α ≤ 1/2. Then

| η ( β) − η (α)| ≤ η ( β − α).

Proof. Consider the function η ′ (λ) = −(1 + ln(λ)), which is plotted in Figure 8.2. Given that η ′ is
monotonically decreasing on its domain (0, ∞), it holds that the function
Z λ+γ
f (λ) = η ′ (t)dt = η (λ + γ) − η (λ)
λ

is monotonically non-increasing for any choice of γ ≥ 0. This means that the maximum value of
| f (λ)| over the range λ = [0, 1 − γ] must occur at either λ = 0 or λ = 1 − γ, and so for λ in this
range we have
| η (λ + γ) − η (λ)| ≤ max{η (γ), η (1 − γ)}.
Here we have used the fact that η (1) = 0 and η (λ) ≥ 0 for λ ∈ [0, 1].
To complete the proof it suffices to prove that η (γ) ≥ η (1 − γ) for γ ∈ [0, 1/2]. This claim
is certainly supported by the plot in Figure 8.1, but we can easily prove it analytically. Define a
function g(λ) = η (λ) − η (1 − λ). We see that g happens to have zeroes at λ = 0 and λ = 1/2, and
were there an additional zero λ of g in the range (0, 1/2), then we would have two distinct values
δ1 , δ2 ∈ (0, 1/2) for which g′ (δ1 ) = g′ (δ2 ) = 0 by the Mean Value Theorem. This, however, is
in contradiction with the fact that the second derivative g′′ (λ) = 1−1 λ − λ1 of g is strictly negative
in the range (0, 1/2). As g(1/4) > 0, for instance, we have that g(λ) ≥ 0 for λ ∈ [0, 1/2] as
required.

Theorem 8.5 (Fannes Inequality). Let X be a complex Euclidean space and let n = dim(X ). Then for
all density operators ρ, ξ ∈ D (X ) such that k ρ − ξ k1 ≤ 1/e it holds that

1
| S(ρ) − S(ξ )| ≤ log(n) k ρ − ξ k1 + η (k ρ − ξ k1 ).
ln(2)
Lecture 8. Properties of the von Neumann entropy 69

Proof. Define
ε i = | λi (ρ) − λi (ξ )|
and let ε = ε 1 + · · · + ε n . Note that ε i ≤ k ρ − ξ k1 ≤ 1/e < 1/2 for each i, and therefore

1 n 1 n
∑ η (λi (ρ)) − η (λi (ξ )) ≤
ln(2) i∑
| S(ρ) − S(ξ )| = η (ε i )

ln(2) i=1 =1

by Lemma 8.4.
For any positive α and β we have βη (α/β) = η (α) + α ln( β), so
n n n
1 1 ε 1
ln(2) ∑ η (ε i ) = ln(2) ∑ (εη (ε i /ε) − ε i ln(ε)) = ln(2) ∑ η (ε i /ε) + ln(2) η (ε).
i =1 i =1 i =1

Because (ε 1 /ε, . . . , ε n /ε) is a probability vector this gives


1
| S(ρ) − S(ξ )| ≤ ε log(n) + η ( ε ).
ln(2)

We have that ε ≤ k ρ − ξ k1 , and that η is monotone increasing on the interval [0, 1/e], so
1
| S(ρ) − S(ξ )| ≤ log(n) k ρ − ξ k1 + η (k ρ − ξ k1 ),
ln(2)
which completes the proof.

8.2 Quantum relative entropy

Next we will introduce a new function, which is indispensable as a tool for studying the von Neu-
mann entropy: the quantum relative entropy (or relative von Neumann entropy).
For two positive definite operators P, Q ∈ Pd (X ) we define the quantum relative entropy of
P with Q as follows:
S( PkQ) = Tr( P log( P)) − Tr( P log( Q)). (8.1)
We usually only care about the quantum relative entropy for density operators, but there is noth-
ing that prevents us from allowing the definition to hold for all positive definite operators.
We may also define the quantum relative entropy for positive semidefinite operators that are
not positive definite, provided we are willing to have an extended real-valued function. Specifi-
cally, when
ker( Q) 6⊥ im( P)
(or equivalently when im( Q) 6⊆ im( P)), we define S( PkQ) = ∞. Otherwise, there is no difficulty
in evaluating the above expression (8.1) by following the usual convention of setting 0 log(0) = 0.
Nevertheless, it will typically not be necessary for us to give up the convenience of restricting our
attention to positive definite operators. This is because we already know that the von Neumann
entropy function is continuous, and we will mostly use the relative von Neumann entropy in this
course to establish facts about the von Neumann entropy.
The quantum relative entropy S( PkQ) can be negative for some choices of P and Q, but not
when they are density operators (or more generally when Tr( P) = Tr( Q)). The following theorem
establishes that this is so, and in fact that the value of the quantum relative entropy of two density
operators is zero if and only if they are equal.
Lecture 8. Properties of the von Neumann entropy 70

Theorem 8.6. Let ρ, ξ ∈ D (X ) be positive definite density operators. Then


1
S ( ρkξ ) ≥ k ρ − ξ k22 .
2 ln(2)

Proof. Let us first note that for every choice of α, β ∈ (0, 1) we have

α ln(α) − α ln( β) = (α − β)η ′ ( β) + η ( β) − η (α) + β − α.

Moreover, by Taylor’s Theorem, we have that


1
(α − β)η ′ ( β) + η ( β) − η (α) = − η ′′ (γ)(α − β)2
2
for some choice of γ lying between α and β.
Now, let n = dim(X ) and let
n n
ρ= ∑ pi xi xi∗ and ξ= ∑ qi yi y∗i
i =1 i =1

be spectral decompositions of ρ and ξ. The assumption that ρ and ξ are positive definite density
operators implies that pi and qi are positive for 1 ≤ i ≤ n. Applying the facts observed above, we
have that
1 xi , y j 2 ( pi ln( pi ) − pi ln(q j ))



S ( ρkξ ) =
ln 2 1≤i,j≤n
 
1
2 1 ′′ 2
ln 2 1≤∑
= xi , y j q j − pi − η (γij )(q j − pi )
i,j≤ n
2

for some choice of real numbers {γij }, where each γij lies between pi and q j . In particular, this
means that 0 < γij ≤ 1, implying that −η ′′ (γij ) ≥ 1, for each choice of i and j. Consequently we
have
1 x i , y j 2 ( p i − q j )2 = 1 k ρ − ξ k 2



S ( ρkξ ) ≥ 2
2 ln 2 1≤i,j≤n 2 ln 2

as required.

The following corollary represents a simple application of this fact. (We could just as easily
prove it using analogous facts about the Shannon entropy, but the proof is essentially the same.)
Corollary 8.7. Let X be a complex Euclidean space and let n = dim(X ). Then 0 ≤ S(ρ) ≤ log(n) for all
ρ ∈ D (X ). Furthermore, ρ = 1/n is the unique density operator in D (X ) having von Neumann entropy
equal to log(n).
Proof. It is obvious that 0 ≤ S(ρ) for every density operator ρ. To prove the upper bound, let us
assume ρ is positive definite, and consider the relative entropy S(ρk1/n). We have

0 ≤ S(ρk1/n) = −S(ρ) − log(1/n) Tr(ρ) = −S(ρ) + log(n).

Therefore S(ρ) ≤ log(n), and when ρ is not equal to 1/n the inequality becomes strict. For positive
semidefinite ρ, the result follows from continuity.
Lecture 8. Properties of the von Neumann entropy 71

8.3 Subadditivity and Concavity

Now let us prove two simple properties of the von Neumann entropy: subadditivity and concavity.
These properties also hold for the Shannon entropy—and while it is not difficult to prove them
directly for the Shannon entropy, we get the properties for free once they are established for the
von Neumann entropy.
When we refer to the von Neumann entropy of some collection of registers, we mean the
von Neumann entropy of the reduced state of those registers at some instant. For example, if X
and Y are registers and ρ ∈ D (X ⊗ Y ) is the reduced state of the pair (X, Y ) at some instant, then

S(X, Y ) = S(ρ), S ( X ) = S ( ρX ) , and S ( Y ) = S ( ρY ) ,

where, in accordance with conventions we have already discussed, we have written ρX = TrY (ρ)
and ρY = TrX (ρ). We often state properties of the von Neumann entropy in terms of registers,
with the understanding that whatever statement is being discussed holds for all or some speci-
fied subset of the possible states of these registers. A similar convention is used for the Shannon
entropy (for classical registers).

Theorem 8.8 (Subadditivity of von Neumann entropy). Let X and Y be quantum registers. Then for
every state of the pair (X, Y ) we have

S(X, Y ) ≤ S(X) + S(Y ).

Proof. Assume that the reduced state of the pair (X, Y ) is ρ ∈ D (X ⊗ Y ). We will prove the
theorem for ρ positive definite, from which the general case follows by continuity.
Consider the relative von Neumann entropy S(ρXY kρX ⊗ ρY ). Using the formula

log( P ⊗ Q) = log( P) ⊗ 1 + 1 ⊗ log( Q)

we find that
S(ρXY kρX ⊗ ρY ) = −S(ρXY ) + S(ρX ) + S(ρY ).
By Theorem 8.6 we have S(ρXY kρX ⊗ ρY ) ≥ 0, which completes the proof.

In the next lecture we will prove a much stronger version of subadditivity, which is aptly
named: strong subadditivity. It will imply the truth of the previous theorem, but it is instructive to
compare the very easy proof above with the much more difficult proof of strong subadditivity.
Subadditivity also holds for the Shannon entropy:

H (X, Y ) ≤ H (X) + H (Y )

for any choice of classical registers X and Y. This is simply a special case of the above theorem,
where the density operator ρ is diagonal with respect to the standard basis of X ⊗ Y .
Subadditivity implies that the von Neumann entropy is concave, as is established by the proof
of the following theorem.

Theorem 8.9 (Concavity of von Neumann entropy). Let ρ, ξ ∈ D (X ) and λ ∈ [0, 1]. Then

S(λρ + (1 − λ)ξ ) ≥ λS(ρ) + (1 − λ)S(ξ ).


Lecture 8. Properties of the von Neumann entropy 72

Proof. Let Y be a register corresponding to a single qubit, so that its associated space is Y = C {0,1} .
Consider the density operator

σ = λρ ⊗ E0,0 + (1 − λ)ξ ⊗ E1,1 ,

and suppose that the reduced state of the registers (X, Y ) is described by σ. We have

S(X, Y ) = λS(ρ) + (1 − λ)S(ξ ) + H (λ),

which is easily established by considering spectral decompositions of ρ and ξ. (Here we have


referred to the binary entropy function H (λ) = −λ log(λ) − (1 − λ) log(1 − λ).) Furthermore, we
have
S(X) = S(λρ + (1 − λ)ξ )
and
S (Y ) = H ( λ ).
It follows by subadditivity that

λS(ρ) + (1 − λ)S(ξ ) + H (λ) ≤ S(λρ + (1 − λ)ξ ) + H (λ)

which proves the theorem.

Concavity also holds for the Shannon entropy as a simple consequence of this theorem, as we
may take ρ and ξ to be diagonal with respect to the standard basis.

8.4 Conditional entropy and mutual information

Let us finish off the lecture by defining a few more quantities associated with the von Neumann
entropy. We will not be able to say very much about these quantities until after we prove strong
subadditivity in the next lecture.
Classically we define the conditional Shannon entropy as follows for two classical registers X
and Y:
H (X|Y ) = ∑ Pr[Y = a] H (X|Y = a).
a

This quantity represents the expected value of the entropy of X given that you know the value
of Y. It is not hard to prove that

H (X|Y ) = H (X, Y ) − H (Y ).

It follows from subadditivity that


H ( X |Y ) ≤ H ( X ) .
The intuition is that your uncertainty can only increase when you know less.
In the quantum setting the first definition does not really make sense, so we use the second
fact as our definition—the conditional von Neumann entropy of X given Y is

S(X|Y ) = S(X, Y ) − S(Y ).

Now we start to see some strangeness: we can have S(Y ) > S(X, Y ), as we will if (X, Y ) is in a
pure, non-product state. This means that S(X|Y ) can be negative, but such is life.
Lecture 8. Properties of the von Neumann entropy 73

Next, the (classical) mutual information between two classical registers X and Y is defined as

I (X : Y ) = H (X) + H (Y ) − H (X, Y ).

This can alternately be expressed as

I ( X : Y ) = H ( Y ) − H ( Y |X ) = H ( X ) − H ( X |Y ) .

We view this quantity as representing the amount of information in X about Y and vice versa. The
quantum mutual information is defined similarly:

S(X : Y ) = S(X) + S(Y ) − S(X, Y ).

At least we know from subadditivity that this quantity is always nonnegative. We will, however,
need to further develop our understanding before we can safely associate any intuition with this
quantity.
Theory of Quantum Information Lecture Notes
John Watrous (Institute for Quantum Computing, University of Waterloo)

Lecture 9
Strong subadditivity of the von Neumann entropy

In this lecture we will prove a fundamental fact about the von Neumann entropy, known as strong
subadditivity. Let us begin with a precise statement of this fact.
Theorem 9.1 (Strong subadditivity of von Neumann entropy). Let X, Y, and Z be quantum registers.
Then for every reduced state ρ ∈ D (X ⊗ Y ⊗ Z) of these registers we have

S(X, Y, Z) + S(Z) ≤ S(X, Z) + S(Y, Z).

There are multiple known ways to prove this theorem. The approach we will take is to first estab-
lish a property of the quantum relative entropy, known as joint convexity. Once we establish this
property, it will be straightforward to prove strong subadditivity.

9.1 Joint convexity of the quantum relative entropy

We will now prove that the quantum relative entropy is jointly convex, as is stated by the following
theorem.
Theorem 9.2 (Joint convexity of the quantum relative entropy). Let X be a complex Euclidean space,
let ρ1 , ρ2 , ξ 1 , ξ 2 ∈ D (X ) be positive definite density operators, and let λ ∈ [0, 1]. Then

S(λρ1 + (1 − λ)ρ2 kλξ 1 + (1 − λ)ξ 2 ) ≤ λ S(ρ1 kξ 1 ) + (1 − λ) S(ρ2 kξ 2 ).

The proof of Theorem 9.2 that we will study is fairly standard and has the nice property of being
elementary. It is, however, relatively complicated, so we will need to break it up into a few pieces.
The first step is to consider, for any choice of positive definite density operators ρ, ξ ∈ D (X ),
a real-valued function f ρ,ξ : R → R defined as
 
f ρ,ξ (α) = Tr ξ α ρ1−α

for all α ∈ R. As ρ and ξ are both positive definite, we have that the function f ρ,ξ is well defined,
and is in fact differentiable (and therefore continuous) everywhere on its domain. In particular,
we have h i

f ρ,ξ (α) = Tr ξ α ρ1−α (ln(ξ ) − ln(ρ)) . (9.1)
To verify that this expression is correct, we may consider spectral decompositions
n n
ρ= ∑ pi xi xi∗ and ξ= ∑ qi yi y∗i .
i =1 i =1

We have   2
Tr ξ α ρ1−α = qαj p1i −α xi , y j


1≤ i,j≤ n

74
Lecture 9. Strong subadditivity of the von Neumann entropy 75

and so
2 h i

(ln(q j ) − ln( pi ))qαj p1i −α xi , y j = Tr ξ α ρ1−α (ln(ξ ) − ln(ρ))

f ρ,ξ (α) = ∑
1≤ i,j≤ n

as claimed.
The main reason we are interested in the function f ρ,ξ is that its derivative has an interesting
value at 0:
′ 1
f ρ,ξ ( 0) = − S ( ρ k ξ ).
log(e)
We may therefore write
α ρ1 − α − 1

′ Tr ξ
S(ρkξ ) = − log(e) f ρ,ξ (0) = − log(e) lim ,
α ↓0 α
where the second equality follows by substituting f ρ,ξ (0) = 1 into the definition of the derivative.
Now consider the following lemma that concerns the relationship among the functions f ρ,ξ for
various choices of ρ and ξ.
Lemma 9.3. Let ξ 1 , ξ 2 , ρ1 , ρ2 ∈ Pd (X ) be positive definite operators. Then for every choice of α, λ ∈ [0, 1]
we have
     
Tr (λξ 1 + (1 − λ)ξ 2 )α (λρ1 + (1 − λ)ρ2 )1−α ≥ λ Tr ξ 1α ρ11−α + (1 − λ) Tr ξ 2α ρ12−α .

(The lemma happens to be true for all positive definite operators ρ1 , ρ2 , ξ 1 , and ξ 2 , but we will
really only need it for density operators.)
Before proving this lemma, let us note that it implies Theorem 9.2.

Proof of Theorem 9.2 (assuming Lemma 9.3). We have

S(λρ1 + (1 − λ)ρ2 kλξ 1 + (1 − λ)ξ 2 )


Tr (λξ 1 + (1 − λ)ξ 2 )α (λρ1 + (1 − λ)ρ2 )1−α − 1

= −(log e)α↓0
  α  
λ Tr ξ 1α ρ11−α + (1 − λ) Tr ξ 2α ρ12−α − 1
≤ −(log e)α↓0
   α     
Tr ξ 1α ρ11−α
−1 Tr ξ 2α ρ12−α − 1
= −(log e) lim λ   + (1 − λ )  
α ↓0 α α

= λ S ( ρ1 k ξ 1 ) + ( 1 − λ ) S ( ρ2 k ξ 2 )

as required.

Our goal has therefore shifted to proving Lemma 9.3. To prove Lemma 9.3 we require an-
other fact that is stated in the theorem that follows. It is equivalent to a theorem known as Lieb’s
Concavity Theorem, although it is stated in a somewhat different form than is most typical.
Theorem 9.4. Let A1 , A2 ∈ Pd (X ) and B1 , B2 ∈ Pd (Y ) be positive definite operators. Then for every
choice of α ∈ [0, 1] we have

( A1 + A2 )α ⊗ ( B1 + B2 )1−α ≥ A1α ⊗ B11−α + A2α ⊗ B21−α .


Lecture 9. Strong subadditivity of the von Neumann entropy 76

Once again, before proving this theorem, let us note that it implies the main result we are
working toward.
Proof of Lemma 9.3 (assuming Theorem 9.4). The substitutions
A1 = λξ 1 , B1 = λρT1 , A2 = (1 − λ ) ξ 2 , B2 = (1 − λ)ρT2 ,
taken in Theorem 9.4 imply the operator inequality
(λξ 1 + (1 − λ)ξ 2 )α ⊗ (λρT1 + (1 − λ)ρT2 )1−α
≥ λ ξ α ⊗ (ρT1 )1−α + (1 − λ) ξ 2α ⊗ (ρT2 )1−α
= λ ξ α ⊗ (ρ11−α )T + (1 − λ) ξ 2α ⊗ (ρ12−α )T .
The identity vec(1 )∗ ( X ⊗ Y T ) vec(1 ) = Tr( XY ) then gives the desired result.
Now, toward the proof of Theorem 9.4, we require the following lemma.
Lemma 9.5. Let P1 , P2 , Q1 , Q2 , R1 , R2 ∈ Pd (X ) be positive definite operators that satisfy these conditions:
1. [ P1 , P2 ] = [Q1 , Q2 ] = [ R1 , R2 ] = 0,
2. P12 ≥ Q21 + R21 , and
3. P22 ≥ Q22 + R22 .
Then it holds that P1 P2 ≥ Q1 Q2 + R1 R2 .
Remark. Notice that in the conclusion of the lemma, P1 P2 is positive definite given the assumption
that [ P1 , P2 ] = 0, and likewise for Q1 Q2 and R1 R2 .
Proof. The conclusion of the lemma is equivalent to X ≤ 1 for
X = P1−1/2 P2−1/2 ( Q1 Q2 + R1 R2 ) P2−1/2 P1−1/2 .
This in turn is equivalent to the condition that every eigenvalue of X is at most 1. To this end, let
us suppose that u is any eigenvector of X whose corresponding eigenvalue is λ.
As P1 and P2 are invertible and u is nonzero, we have that P1−1/2 P21/2 u is nonzero as well, and
therefore we may define a unit vector v as follows:
P1−1/2 P21/2 u
v=
.
−1/2 1/2
P
1 P2 u
We now have
v∗ P1−1 ( Q1 Q2 + R1 R2 ) P2−1 v = λ.
Finally, using the fact that v is a unit vector, we can establish the required bound on λ as
follows:
λ = v∗ P1−1 ( Q1 Q2 + R1 R2 ) P2−1 v

≤ v∗ P1−1 Q1 Q2 P2−1 v + v∗ P1−1 R1 R2 P2−1 v

q q q q
≤ v P1 Q1 P1 v v P2 Q2 P2 v + v P1 R1 P1 v v∗ P2−1 R22 P2−1 v
∗ −1 2 −1 ∗ −1 2 −1 ∗ −1 2 −1
q q
≤ v P1 ( Q1 + R1 ) P1 v v∗ P2−1 ( Q22 + R22 ) P2−1 v
∗ −1 2 2 −1

≤ 1.
Lecture 9. Strong subadditivity of the von Neumann entropy 77

Here we have used the triangle inequality once and the Cauchy-Schwarz inequality twice, along
with the given assumptions on the operators.

Finally, we can finish of the proof of Theorem 9.2 by proving Theorem 9.4.

Proof of Theorem 9.4. Let us define a function f : [0, 1] → Herm (X ⊗ Y ) as


 
f (α) = ( A1 + A2 )α ⊗ ( B1 + B2 )1−α − A1α ⊗ B11−α + A2α ⊗ B21−α ,

and let K = {α ∈ [0, 1] : f (α) ∈ Pos (X ⊗ Y )} be the pre-image under f of the set Pos (X ⊗ Y ).
Notice that K is a closed set, given that f is continuous and Pos (X ⊗ Y ) is closed. Our goal is to
prove that K = [0, 1].
It is obvious that 0 and 1 are elements of K. For an arbitrary choice of α, β ∈ K, consider the
following operators:

P1 = ( A1 + A2 )α/2 ⊗ ( B1 + B2 )(1−α)/2 ,
P2 = ( A1 + A2 ) β/2 ⊗ ( B1 + B2 )(1− β)/2,
(1− α) /2
Q1 = A1α/2 ⊗ B1 ,
β/2 (1− β) /2
Q2 = A1 ⊗ B1 ,
α/2 (1− α) /2
R1 = A2 ⊗ B2 ,
β/2 (1− β) /2
R2 = A2 ⊗ B2 .

The conditions [ P1 , P2 ] = [Q1 , Q2 ] = [ R1 , R2 ] = 0 are immediate, while the assumptions that α ∈ K


and β ∈ K correspond to the conditions P12 ≥ Q21 + R21 and P22 ≥ Q22 + R22 , respectively. We may
therefore apply Lemma 9.5 to obtain

( A1 + A2 )γ ⊗ ( B1 + B2 )1−γ ≥ A1γ ⊗ B11−γ + A2γ ⊗ B21−γ

for γ = (α + β)/2, which implies that (α + β)/2 ∈ K.


Now, given that 0 ∈ K, 1 ∈ K, and (α + β)/2 ∈ K for any choice of α, β ∈ K, we have that K is
dense in [0, 1]. In particular, K contains every number of the form m/2n for n and m nonnegative
integers with m ≤ 2n . As K is closed, this implies that K = [0, 1] as required.

9.2 Monotonicity and strong subadditivity

We have worked hard to prove that the quantum relative entropy is jointly convex, and now it is
time to reap the rewards. Let us begin by proving the following simple theorem, which states that
mixed unitary super-operators cannot increase the relative entropy of two density operators.
Theorem 9.6. Let X be a complex Euclidean space and let Φ ∈ T (X ) be a mixed unitary super-operator.
Then for any choice of positive definite density operators ρ, ξ ∈ D (X ) we have

S(Φ(ρ)kΦ(ξ )) ≤ S(ρkξ ).

Proof. As Φ is mixed unitary, we may write


m
Φ( X ) = ∑ pj Uj XUj∗
j =1
Lecture 9. Strong subadditivity of the von Neumann entropy 78

for a probability vector ( p1 , . . . , pm ) and unitary operators U1 , . . . , Um ∈ U (X ). By Theorem 9.2


we have
!
m  
S(Φ(ρ)kΦ(ξ )) = S ∑ p j Uj ρUj∗ ∑ p j Uj ξUj∗ ≤ ∑ p j S Uj ρUj∗ Uj ξUj∗ .

j j =1
j

The quantum relative entropy is clearly unitarily invariant, meaning

S(ρkξ ) = S(UρU ∗ kUξU ∗ )

for all U ∈ U (X ). This implies that


m  
∑ p j S Uj ρUj∗ Uj ξUj∗ = S(ρkξ ),

j =1

and therefore completes the proof.

Notice that for any choice of positive definite density operators ρ1 , ρ2 , ξ 1 , ξ 2 ∈ D (X ) we have

S ( ρ1 ⊗ ρ2 k ξ 1 ⊗ ξ 2 ) = S ( ρ1 k ξ 1 ) + S ( ρ2 k ξ 2 ) .

This fact follows easily from the identity log( P ⊗ Q) = log( P) ⊗ 1 + 1 ⊗ log( Q), which is valid
for all P, Q ∈ Pd (X ). Combining this observation with the previous theorem yields the following
corollary.

Corollary 9.7 (Monotonicity of the quantum relative entropy). Let X and Y be complex Euclidean
spaces. Then for any choice of positive definite density operators ρ, ξ ∈ D (X ⊗ Y ) we have

S(TrY (ρ)k TrY (ξ )) ≤ S(ρkξ ).

Proof. The completely depolarizing operation Ω ∈ T (Y ) is mixed unitary, as we proved in Lec-


ture 6, which implies that 1L(X ) ⊗ Ω is mixed unitary as well. For every σ ∈ D (X ⊗ Y ) we have

1Y
(1L(X ) ⊗ Ω)(σ) = TrY (σ) ⊗
m
where m = dim(Y ). Therefore we have
 
1 1Y
S (TrY (ρ)k TrY (ξ )) = S TrY (ρ) ⊗ Y Tr Y ( ξ ) ⊗
m m
 
= S (1L(X ) ⊗ Ω)(ρ) k (1L(X ) ⊗ Ω)(ξ )
≤ S ( ρkξ )

as required.

Finally we are prepared to prove strong subadditivity, which turns out to be very easy now
that we have established that the quantum relative entropy is monotonic.
Lecture 9. Strong subadditivity of the von Neumann entropy 79

Proof of Theorem 9.1. We need to prove that the inequality

S(ρXYZ ) + S(ρZ ) ≤ S(ρXZ ) + S(ρYZ )

holds for all choices of ρ ∈ D (X ⊗ Y ⊗ Z). It suffices to prove this inequality for all positive
definite ρ, as it then follows for arbitrary density operators ρ by the continuity of the von Neumann
entropy.
Let n = dim(X ), and observe that the following two identities hold: the first is
 
XYZ 1 X YZ
= −S(ρXYZ ) + S(ρYZ ) + log(n),

S ρ n ⊗ρ

and the second is  


XZ
1X Z
S ρ n ⊗ρ
= −S(ρXZ ) + S(ρZ ) + log(n).

By Corollary 9.7 we have


   
XZ
1X Z
1X
S ρ n ⊗ρ
≤ S ρXYZ n ⊗ρ
YZ
,

and therefore
S(ρXYZ ) + S(ρZ ) ≤ S(ρXZ ) + S(ρYZ )
as required.

To conclude the lecture, let us prove a statement about quantum mutual information that is
equivalent to strong subadditivity.

Corollary 9.8. Let X, Y, and Z be registers. Then for every reduced state ρ ∈ D (X ⊗ Y ⊗ Z) of these
registers we have
S(X : Y ) ≤ S(X : Y, Z).

Proof. By strong subadditivity we have

S(X, Y, Z) + S(Y ) ≤ S(X, Y ) + S(Y, Z),

which is equivalent to
S(Y ) − S(X, Y ) ≤ S(Y, Z) − S(X, Y, Z).
Adding S(X) to both sides gives

S(X) + S(Y ) − S(X, Y ) ≤ S(X) + S(Y, Z) − S(X, Y, Z).

This inequality is equivalent to


S(X : Y ) ≤ S(X : Y, Z),
which establishes the claim.
Theory of Quantum Information Lecture Notes
John Watrous (Institute for Quantum Computing, University of Waterloo)

Lecture 10
Holevo’s Theorem and Nayak’s Bound

In this lecture we will prove Holevo’s Theorem. This is a famous theorem in quantum information
theory, which is often informally summarized by a statement along the lines of this:

It is not possible to communicate more than n classical bits of information by the transmission
of n qubits alone.

Although this is an implication of Holevo’s Theorem, the theorem itself says more, and is stated
in more precise terms that has no resemblance to the above statement. After stating and proving
Holevo’s Theorem, we will discuss an interesting application of this theorem to the notion of a
quantum random access code. In particular, we will prove Nayak’s Bound, which demonstrates that
quantum random access codes are surprisingly limited in power.

10.1 Holevo’s Theorem

We will first discuss some of the concepts that Holevo’s Theorem concerns, and then state and
prove the theorem itself. Although the theorem is difficult to prove from first principles, it turns
out that there is a very simple proof that makes use of the strong subadditivity of the von Neu-
mann entropy. Having proved strong subadditivity in the previous lecture, we will naturally opt
for this simple proof.

10.1.1 Mutual Information


Recall that if A and B are classical registers, whose values are distributed in some particular way,
then the mutual information between A and B for this distribution is defined as

I (A : B) , H (A) + H (B) − H (A, B)


= H ( A ) − H ( A |B )
= H ( B ) − H ( B |A ) .

The usual interpretation of this quantity is that it describes how many bits of information about B
are, on average, revealed by the value of A; or equivalently, given that the quantity is symmetric in
A and B, how many bits of information about A are revealed by the value of B. Like all quantities
involving the Shannon entropy, this interpretation should be understood to be in asymptotic sense.
To illustrate the intuition behind this interpretation, let us suppose that A and B are distributed
in some particular way, and Alice looks at the value of A. As Bob does not know what value Alice
sees, he has H (A) bits of uncertainty about her value. After sampling B, Bob’s average uncertainty
about Alice’s value becomes H (A|B), which is always at most H (A) and is less assuming that A
and B are correlated. Therefore, by sampling B, Bob has decreased his uncertainty of Alice’s value
by I (A : B) bits.

80
Lecture 10. Holevo’s Theorem and Nayak’s Bound 81

In analogy to the above formula we have defined the quantum mutual information between two
registers X and Y as
S(X : Y ) , S(X) + S(Y ) − S(X, Y ).
Although Holevo’s Theorem does not directly concern the quantum mutual information, it is nev-
ertheless related and indirectly appears in the proof.

10.1.2 Accessible information


Imagine that Alice wants to communicate classical information to Bob. In particular, suppose Alice
wishes to communicate to Bob information about the value of a classical register A, whose possible
values are drawn from some set Σ and where p ∈ R Σ is the probability vector that describes the
distribution of these values:
p( a) = Pr[A = a]
for each a ∈ Σ.
The way that Alice chooses to do this is by preparing a quantum register X in some way,
depending on A, after which X is sent to Bob. Specifically, let us suppose that {ρa : a ∈ Σ} is
a collection of density operators in D (X ), and that Alice prepares X in the reduced state ρa for
whichever a ∈ Σ is the value of A. The register X is sent to Bob, and Bob measures it to gain
information about the value of Alice’s register A.
One possible approach that Bob could take would be to measure X with respect to some mea-
surement { Pb : b ∈ Σ}, and to choose this measurement so as to maximize the probability that his
measurement result b agrees with Alice’s sample a. We will not, however, make the assumption
that this is Bob’s approach, and in fact we will not even assume that Bob chooses a measurement
whose outcomes agree with Σ. Instead, we will consider a completely general situation in which
Bob chooses a measurement
{ Pb : b ∈ Γ} ⊆ Pos (X )
to measure the register X sent by Alice. Let us denote by B a classical register that stores the result
of this measurement, so that the pair of registers (A, B) is then distributed as follows:

Pr[(A, B) = ( a, b)] = p( a) h Pb , ρa i

for each ( a, b) ∈ Σ × Γ. The amount of information that Bob learns about Alice’s information by
means of this process is given by the mutual information I (A : B).
The accessible information is the maximum value of I (A : B) that can be achieved by Bob, over
all possible measurements. More precisely, the accessible information of the ensemble

E = {( p( a), ρa ) : a ∈ Σ}

is defined as the maximum of I (A : B) over all possible choices of the set Γ and the measurement
{ Pb : b ∈ Γ}, assuming that the pair (A, B) is distributed as described above for this choice of
a measurement. (Given that there is no upper bound on the size of Bob’s outcome set Γ, it is
not obvious that the accessible information of a given ensemble E is always achievable for some
fixed choice of a measurement { Pb : b ∈ Γ}. It turns out, however, that there is always an
achievable maximum value for I (A : B) that Bob reaches when his set of outcomes Γ has size at
most dim(X )2 .) We will write Iacc (E ) to denote the accessible information of the ensemble E .
Lecture 10. Holevo’s Theorem and Nayak’s Bound 82

10.1.3 The Holevo quantity


The last quantity that we need to discuss before stating and proving Holevo’s Theorem is the
Holevo χ-quantity. Let us consider again an ensemble E = {( p( a), ρa ) : a ∈ Σ}, where each ρa is
a density operator on X and p ∈ R Σ is a probability vector. For such an ensemble we define the
Holevo χ-quantity of E as
!
χ(E ) , S ∑ p( a) ρ a − ∑ p ( a ) S ( ρ a ).
a∈Σ a∈Σ

Notice that the quantity χ(E ) is always nonnegative, which follows from the concavity of the von
Neumann entropy.
One way to think about this quantity is as follows. Consider the situation above where Al-
ice has prepared the register X depending on the value of A, and Bob has received (but not yet
measured) the register X. From Bob’s point of view, the reduced state of X is therefore

ρ= ∑ p( a) ρ a .
a∈Σ

If, however, Bob were to learn that the value of A is a ∈ Σ, he would then describe the reduced
state of X as ρa . The quantity χ(E ) therefore represents the average decrease in the von Neumann
entropy of X that Bob would expect from learning the value of A.
Another way to view the quantity χ(E ) is to consider the reduced state of the pair (A, X) in the
situation just considered, which is

ξ= ∑ p(a)Ea,a ⊗ ρa .
a∈Σ

We have

S (A, X) = H ( p) + ∑ p ( a ) S ( ρ a ),
a∈Σ
S (A) = H ( p ),
!
S (X) = S ∑ p( a) ρ a ,
a∈Σ

and therefore χ(E ) = S(A) + S(X) − S(A, X) = S(A : X).

10.1.4 Statement and proof of Holevo’s Theorem


Now we are prepared to state and prove Holevo’s Theorem. The formal statement of the theorem
follows.
Theorem 10.1 (Holevo’s Theorem). Let E = {( p( a), ρa ) : a ∈ Σ} be an ensemble of density operators
over some complex Euclidean space X . Then Iacc (E ) ≤ χ(E ).
Proof. Suppose Γ is a finite, nonempty set and { Pb : b ∈ Γ} is a measurement on X . Let A = C Σ ,
B = C Γ , and Y = C Γ , and let A, X, Y, and B be registers corresponding to the spaces A, X , Y , and
B . Define a density operator ξ ∈ D (A ⊗ X ) as
ξ= ∑ p(a)Ea,a ⊗ ρa ,
a∈Σ
Lecture 10. Holevo’s Theorem and Nayak’s Bound 83

define a linear isometry V ∈ U (X , X ⊗ Y ⊗ B) as


p
V = ∑ Pb ⊗ eb ⊗ eb ,
b∈Γ

and let σ ∈ D (A ⊗ X ⊗ Y ⊗ B) be defined as

σ = (1 A ⊗ V ) ξ (1 A ⊗ V ) ∗ .

It is clear that    
S σA = S ξ A = H ( p ) .

Moreover, as V is an isometry, we have


   
S σAXYB = S ξ AX = H ( p) + ∑ p ( a ) S ( ρ a ),
a∈Σ

and !
   
S σXYB = S ξ X = S ∑ p( a) ρ a .
a∈Σ

Therefore, for the density operator σ ∈ D (A ⊗ X ⊗ Y ⊗ B), we have S(A : X, Y, B) = ξ (E ).


On the other hand, we have that

σAB = ∑ ∑ p(a) h Pb , ρa i Ea,a ⊗ Eb,b,


a∈Σ b ∈Γ

and from this expression we see that the quantity S(A : B) is equal to the accessible information
Iacc (E ) for some choice of a measurement { Pb : b ∈ Γ}.
The theorem therefore follows from the inequality

S(A : X, Y, B) ≥ S(A : B),

which in turn follows from strong subadditivity (as shown at the end of the previous lecture).

As discussed at the beginning of the lecture, this theorem implies that Alice can communicate
no more than n classical bits of information to Bob by sending n qubits alone. If the register X
comprises n qubits, and therefore X has dimension 2n , then for any ensemble

E = {( p( a), ρa ) : a ∈ Σ}

of density operators on X we have


!
χ(E ) ≤ S ∑ p( a) ρ a ≤ n.
a∈Σ

This means that for any choice of the register A, the ensemble E , and the measurement that deter-
mines the value of a classical register B, we must have I (A : B) ≤ n. In other words, Bob can learn
no more than n bits of information by means of the process he and Alice have performed.
Lecture 10. Holevo’s Theorem and Nayak’s Bound 84

10.2 Nayak’s Bound

We will now consider a related, but nevertheless different setting from the one that Holevo’s The-
orem concerns. Suppose now that Alice has m bits, and she wants to encode them into fewer than
n qubits in such a way that Bob can recover not the entire string of bits, but rather any single bit (or
small number of bits) of his choice. Given that Bob will only learn a very small amount of infor-
mation by means of this process, the possibility that Alice could do this does not violate Holevo’s
Theorem in any obvious way.
This idea has been described as a “quantum phone book”. Imagine a very compact phone book
implemented using quantum information. The user measures the qubits forming the phone book
using a measurement that is unique to the individual whose number is being sought. The user’s
measurement destroys the phone book, so no more than 7 digits of information are effectively
transmitted by sending the phone book. Perhaps it is not unreasonable to hope that such a phone
book containing 100,000 numbers could be constructed using, say, 1,000 qubits?
Here is an example showing that something nontrivial along these lines can be realized. Sup-
pose Alice wants to encode 2 bits a, b ∈ {0, 1} into one qubit so that when she sends this qubit to
Bob, he can pick a two-outcome measurement giving him either a or b with reasonable probability.
Define
|ψ(θ )i = cos(θ ) |0i + sin(θ ) |1i
for θ ∈ [0, 2π ]. Alice encodes ab as follows:
00 → |ψ(π/8)i
10 → |ψ(3π/8)i
11 → |ψ(5π/8)i
01 → |ψ(7π/8)i
Alice sends the qubit to Bob. If Bob wants to decode a, be measures in the {|0i , |1i} basis, and if
he wants to decode b he measures in the {|+i , |−i} basis. A simple calculation shows that Bob
will correctly decode the bit he has chosen with probability cos2 (π/8) ≈ .85. There does not exist
an analogous classical scheme that allow Bob to do better than randomly guessing for at least one
of his choices.

10.2.1 Definition of quantum random access encodings


In more generality, we define a quantum random access encoding to be as follows.
p
Definition 10.2. Let m and n be positive integers, and let p ∈ [0, 1]. Then an m 7→ n quantum
random access encoding is a function
 n

R : {0, 1}m → D C {0,1} : a1 · · · am 7→ ρa1 ···am

such that the following holds. For each j ∈ {1, . . . , m} there exists a measurement
n o  
j j n
P0 , P1 ⊂ Pos C {0,1}
such that D E
j
Pa j , ρa1 ··· am ≥p
for every j ∈ {1, . . . , m} and every choice of a1 · · · am ∈ {0, 1}m .
.85
For example, the above example is a 2 7→ 1 quantum random access encoding.
Lecture 10. Holevo’s Theorem and Nayak’s Bound 85

10.2.2 Fano’s Inequality


p
In order to determine whether m 7→ n quantum random access codes exist for various choices
of the parameters n, m, and p, we will need a result from classical information theory known as
Fano’s Inequality.
Theorem 10.3 (Fano’s Inequality). Let A and B be classical registers taking values in some finite set Σ
and let q = Pr[A 6= B]. Then
H (A|B) ≤ H (q) + q log(|Σ| − 1).

Proof. Define a new register C whose value is determined by A and B as follows:



1 if A 6= B
C=
0 if A = B.
Let us first note that
H (A|B) = H (C|B) + H (A|B, C) − H (C|A, B).
This holds for any choice of registers as a result of the following equations:
H (A|B) = H (A, B) − H (B),
H (C|B) = H (B, C) − H (B),
H (A|B, C) = H (A, B, C) − H (B, C),
H (C|A, B) = H (A, B, C) − H (A, B).
Next, note that
H ( C |B ) ≤ H ( C ) = H ( q ) .
Finally, we have H (C|A, B) = 0 because C is determined by A and B. So, at this point we conclude
H (A|B) ≤ H (q) + H (A|B, C).
It remains to put an upper bound on H (A|B, C). We have
H (A|B, C) = Pr[C = 0] H (A|B, C = 0) + Pr[C = 1] H (A|B, C = 1).
We also have
H (A|B, C = 0) = 0,
because C = 0 implies A = B, and
H (A|B, C = 1) ≤ log(|Σ| − 1)
because C = 1 implies that A 6= B, so the largest the uncertainty of A given B = b can be is
log(|Σ| − 1), which is the case when A is uniform over all elements of Σ besides b. Thus
H (A|B) ≤ H (q) + q log(|Σ| − 1)
as required.

The following special case of Fano’s Inequality, where Σ = {0, 1} and A is uniform will be
useful.
Corollary 10.4. Let A be a uniformly distributed Boolean register and let B be any Boolean register. Then
for q = Pr(A = B) we have I (A : B) ≥ 1 − H (q).
Lecture 10. Holevo’s Theorem and Nayak’s Bound 86

10.2.3 Statement and proof of Nayak’s bound


We are now ready to state Nayak’s Bound, which implies that the quest for a compact quantum
p
phone book was doomed to fail: any m 7→ n quantum random access code requires that n and m
are linearly related, with the constant of proportionality tending to 1 as p approaches 1.
p
Theorem 10.5. Let n and m be positive integers and let p ∈ [1/2, 1]. If there exists a m 7→ n quantum
random access encoding, then
n ≥ (1 − H ( p))m.

To prove this theorem, we will first require the following lemma, which is a consequence of
Holevo’s Theorem and Fano’s Inequality.

Lemma 10.6. Suppose ρ0 , ρ1 ∈ D (X ) are density operators, {Q0 , Q1 } ⊆ Pos (X ) is a measurement, and
q ∈ [1/2, 1]. If it is the case that
h Q b , ρb i ≥ q
for b ∈ {0, 1}, then
ρ0 + ρ1 S ( ρ0 ) + S ( ρ1 )
 
S − ≥ 1 − H (q)
2 2

Proof. Let A and B be classical Boolean registers, let p ∈ R {0,1} be the uniform probability vector
(meaning p(0) = p(1) = 1/2), and assume that

Pr[(A, B) = ( a, b)] = p( a) h Qb , ρa i

for a, b ∈ {0, 1}. By Holevo’s Theorem we have

ρ0 + ρ1 S ( ρ0 ) + S ( ρ1 )
 
I (A : B) ≤ S − ,
2 2

and by Fano’s Inequality we have

I (A : B) ≥ 1 − H (Pr[A = B])) ≥ 1 − H (q),

from which the lemma follows.


p
Proof of Theorem 10.5. Suppose we have some m 7→ n quantum random access encoding

E : a1 · · · am 7→ ρa1 ··· am .

For 0 ≤ k ≤ m − 1 let
1
ρa1 ···ak =
2m − k
∑ ρa1 ···am ,
ak +1 ··· am ∈{0,1} m −k

and note that


1
(ρa ···a 0 + ρa1 ···ak 1 ).
ρa1 ··· ak =
2 1 k
By
 kthek assumption
that E is a random access encoding, we have that there exists a measurement
P0 , P1 , for 1 ≤ k ≤ m, that satisfies
D E
Pbk , ρa1 ···ak −1 b ≥ p
Lecture 10. Holevo’s Theorem and Nayak’s Bound 87

for each b ∈ {0, 1}. Thus, by Lemma 10.6,

1
S(ρa1 ···ak −1 ) ≥ (S(ρa1 ···ak −1 0 ) + S(ρa1 ···ak −1 1 )) + (1 − H ( p))
2
for 1 ≤ k ≤ m and all choices of a1 · · · ak−1 . By applying this inequality repeatedly, we conclude
that
m(1 − H ( p)) ≤ S(ρ) ≤ n,
which completes the proof.
p
It can be shown that there exists a classical random access encoding m 7→ n for any p > 1/2
provided that n ∈ (1 − H ( p))m + O(log m). Thus, asymptotically speaking, there is no significant
advantage to be gained from quantum random access codes over classical random access codes.
Theory of Quantum Information Lecture Notes
John Watrous (Institute for Quantum Computing, University of Waterloo)

Lecture 11
Majorization for real vectors and Hermitian operators

This lecture discusses the notion of majorization and some of its connections to quantum informa-
tion. We will see later (in Lecture 14) that majorization plays a critical role in Nielsen’s Theorem,
which precisely characterizes when it is possible for two parties to transform one pure state to
another by means of local operations and classical communication.

11.1 Doubly stochastic operators

Let Σ be a finite, nonempty set, and for the sake of this discussion let us focus on the real vector
space R Σ .
An operator A ∈ L R Σ acting on this vector space is said to be stochastic if


1. A( a, b) ≥ 0 for each ( a, b) ∈ Σ × Σ, and


2. ∑ a∈Σ A( a, b) = 1 for each b ∈ Σ.
This condition is equivalent to requiring that Aeb is a probability vector for each b ∈ Σ, or equiva-
lently that A maps probability
 vectors to probability vectors.
An operator A ∈ L R Σ is doubly stochastic if it is the case that both A and AT (or, equivalently,
A and A∗ ) are stochastic. In other words, when viewed as a matrix, every row and every column
of A forms a probability vector:
1. A( a, b) ≥ 0 for each ( a, b) ∈ Σ × Σ,
2. ∑ a∈Σ A( a, b) = 1 for each b ∈ Σ, and
3. ∑b∈Σ A( a, b) = 1 for each a ∈ Σ.
Next, recall that we write Sym(Σ) to denote the set of one-to-one and onto functions of the
form π : Σ → Σ (or,in other words, the permutations of Σ). For each π ∈ Sym(Σ) we define an
operator Pπ ∈ L R Σ as

1 if a = π (b)
Pπ ( a, b) =
0 otherwise
for every ( a, b) ∈ Σ × Σ. Equivalently, Pπ is the operator defined by Pπ eb = eπ (b) for each b ∈ Σ.
Such an operator is called a permutation operator.
It is clear that every permutation operator is doubly stochastic, and that the set of doubly
stochastic operators is a convex set. The following famous theorem establishes that the doubly
stochastic operators are, in fact, given by the convex hull of the permutation operators.

 11.1 (The Birkhoff–von Neumann Theorem). Let Σ be a finite, nonempty set and let A ∈
Theorem
L R Σ be a linear operator on R Σ . Then A is a doubly stochastic operator if and only if there exists a
probability vector p ∈ RSym(Σ) such that
A= ∑ p(π ) Pπ .
π ∈Sym( Σ)

88
Lecture 11. Majorization for real vectors and Hermitian operators 89

Proof. The Krein-Milman Theorem states that every compact, convex set is equal to the convex
hull of its extreme points. As the set of doubly stochastic operators is compact and convex, the
theorem will therefore follow if we prove that every extreme point in this set is a permutation
operator.
To this end, let us consider any doubly stochastic operator A that is not a permutation operator.
Our goal is to prove that A is not an extreme point in the set of doubly stochastic operators.
Given that A is doubly stochastic but not a permutation operator, there must exist at least one pair
( a1 , b1 ) ∈ Σ × Σ such that
A( a1 , b1 ) ∈ (0, 1).
As ∑b A( a1 , b) = 1 and A( a1 , b1 ) ∈ (0, 1), we conclude that there must exist b2 6= b1 such that

A( a1 , b2 ) ∈ (0, 1).

Applying similar reasoning, but to the first index rather than the second, there must exist a2 6= a1
such that
A( a2 , b2 ) ∈ (0, 1).
This argument may repeated, alternating between the first and second indices (i.e., between rows
and columns if we view A as a matrix), until eventually a closed loop of even length is formed
that alternates between horizontal and vertical moves among the entries of A. This process is
illustrated in Figure 11.1, where the loop is indicated by the dotted lines.

 
 
 9 
 ( a 5 , b5 ) ( a 2 , b3 ) ( a 2 , b2 ) 

 3 

 
 

 4 

 
 
 5 

 ( a 3 , b3 ) ( a 3 , b4 ) 

 
8
 

 2 

 
 
6
 
 1 

 ( a 1 , b1 ) ( a 1 , b2 ) 

 
 
 
 
( a 4 , b5 ) ( a 4 , b4 )
 
 
 7 

Figure 11.1: An example of a closed loop consisting of entries of A that are contained in the interval
(0, 1).

Now, let ε ∈ (0, 1) be equal to the minimum value over the entries in the closed loop, and
define B to be the operator obtained by setting each entry in the closed loop to be ±ε, alternating
sign along the entries as suggested in Figure 11.2. All of the other entries in B are set to 0.
Finally, consider the operators A + B and A − B. As A is doubly stochastic and the row and
column sums of B are all 0, we have that both A + B and A − B also have row and column sums
equal to 1. As ε was chosen to be no larger than the smallest entry within the chosen closed loop,
Lecture 11. Majorization for real vectors and Hermitian operators 90

 
 
 

 −ε ε 

 
 
 
 
 
 
 
 

 −ε ε 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
−ε
 
 ε 
 

Figure 11.2: The operator B. All entries besides those indicated are 0.

none of the entries of A + B or A − B are negative, and therefore A − B and A + B are doubly
stochastic. As B is non-zero, we have that A + B and A − B are distinct. Thus, we have that
1 1
A= ( A + B) + ( A − B)
2 2
is a proper convex combination of doubly stochastic operators, and is therefore not an extreme
point in the set of doubly stochastic operators. This is what we needed to prove, and so we are
done.

11.2 Majorization for real vectors

We will now define what it means for one real vector to majorize another, and we will discuss one
alternate characterization of this notion. As usual we take Σ to be a finite, nonempty set, and as
in the previous section we will focus on the real vector space R Σ . The definition is as follows: for
u, v ∈ R Σ , we say that u majorizes v if there exists a doubly stochastic operator A such that v = Au.
We denote this relation as v ≺ u (or u ≻ v if it is convenient to switch the ordering).
By the Birkhoff–von Neumann Theorem, this definition can intuitively be interpreted as saying
that v ≺ u if and only if there is a way to “randomly shuffle” the entries of u to obtain v, where
by “randomly shuffle” it is meant that one averages in some way a collection of vectors that is
obtained by permuting the entries of u to obtain v. Informally speaking, the relation v ≺ u means
that u is “more ordered” than v, because we can get from u to v by randomizing the order of the
vector indices.
An alternate characterization of majorization (which is in fact more frequently taken as the
definition) is based on a condition on various sums of the entries of the vectors involved. To state
the condition more precisely, let us introduce the following notation. For every vector u ∈ R Σ and
for n = | Σ |, we write
s(u) = (s1 (u), . . . , sn (u))
Lecture 11. Majorization for real vectors and Hermitian operators 91

to denote the vector obtained by sorting the entries of u from largest to smallest. In other words,
we have
{u( a) : a ∈ Σ} = {s1 (u), . . . , sn (u)},
where the equality considers the two sides of the equation to be multisets, and moreover
s1 ( u ) ≥ · · · ≥ s n ( u ) .
The characterization is given by the equivalence of the first and second items in the following
theorem. (The equivalence of the third item in the theorem is an easy addition to the equivalence,
and will turn out to be useful for one application later in the lecture.)
Theorem 11.2. Let Σ be a finite, non-empty set, and let u, v ∈ R Σ . Then the following are equivalent.
1. v ≺ u.
2. For n = | Σ | and for 1 ≤ k < n we have
k k
∑ s j ( v) ≤ ∑ s j ( u ), (11.1)
j =1 j =1

and moreover
n n
∑ s j ( v ) = ∑ s j ( u ). (11.2)
j =1 j =1

3. There exists a unitary operator U ∈ L C Σ , for which U ( a, b) is a real number for each ( a, b) ∈ Σ, such

2
that v = Au, where A ∈ L R Σ is the operator defined by A( a, b) = | U ( a, b)| for all ( a, b) ∈ Σ × Σ.


Proof. First let us prove that item 1 implies item 2. Assume that A is a doubly stochastic opera-
tor satisfying v = Au. It is clear that the condition (11.2) holds, as doubly stochastic operators
preserve the sum of the entries in any vector, and so it remains to prove the condition (11.1) for
1 ≤ k < n.
To do this, let us first consider the effect of an arbitrary doubly stochastic operator B ∈ L R Σ


on a vector of the form


eΓ = ∑ e a
a∈Γ
where Γ ⊆ Σ. The vector eΓ is the characteristic vector of the subset Γ ⊆ Σ. The resulting vector
BeΓ is a convex combination of permutations of eΓ , or in other words is a convex combination of
characteristic vectors of sets having size | Γ |. The sum of the entries of BeΓ is therefore | Γ |, and each
entry must lie in the interval [0, 1]. For any set ∆ ⊆ Σ with | ∆ | = | Γ |, the vector e∆ − BeΓ therefore
has entries summing to 0, and satisfies (e∆ − BeΓ ) ( a) ≥ 0 for every a ∈ ∆ and (e∆ − BeΓ ) ( a) ≤ 0
for every a 6∈ ∆.
Now, for each value 1 ≤ k < n, define ∆k , Γk ⊆ Σ to be the subsets indexing the k largest
entries of u and v, respectively. In other words,
k k
∑ s j (u) = ∑ u ( a) = h e∆k , u i and ∑ s j ( v) = ∑ v( a) = h eΓ k , vi .
j =1 a∈∆ k j =1 a∈Γ k

We now see that


k k
∑ si (u) − ∑ si (v) = he∆ , ui − heΓ , vi = he∆ , ui − heΓ , Aui = he∆
k k k k k
− A ∗ eΓk , u i .
i =1 i =1
Lecture 11. Majorization for real vectors and Hermitian operators 92

This quantity in turn may be expressed as

h e∆k − A ∗ eΓk , u i = ∑ α a u ( a) − ∑ α a u ( a ),
a∈∆ k a6 ∈∆ k

where (
(e∆k − A∗ eΓk )( a) if a ∈ ∆k
αa = ∗
−(e∆k − A eΓk )( a) if a 6∈ ∆k .
As argued above, we have α a ≥ 0 for each a ∈ Σ and ∑ a∈∆k α a = ∑ a6∈∆k α a . By the choice of ∆k we
have u( a) ≥ u(b) for all choices of a ∈ ∆k and b 6∈ ∆k , and therefore

∑ α a u ( a) ≥ ∑ α a u ( a ).
a∈∆ k a6 ∈∆ k

This is equivalent to (11.1) as required.


Next we will prove item 2 implies item 3, by induction on | Σ |. The case | Σ | = 1 is trivial, so let
us consider the case that | Σ | ≥ 2. Assume for simplicity that Σ = {1, . . . , n}, that u = (u1 , . . . , un )
for u1 ≥ · · · ≥ un , and that v = (v1 , . . . , vn ) for v1 ≥ · · · ≥ vn . This causes no loss of generality
because the majorization relationship is invariant under renaming and independently reordering
the indices of the vectors under consideration. Let us also identify the operators U and A we wish
to construct with n × n matrices having entries denoted Ui,j and Ai,j .
Now, assuming item 2 holds, we must have that u1 ≥ v1 ≥ uk for some choice of k ∈ {1, . . . , n}.
Take k to be minimal among all such indices. If it is the case that k = 1 then u1 = v1 ; and by setting
x = (u2 , . . . , xn ) and y = (v2 , . . . , vn ) we conclude from the induction hypothesis that there exists
2
an (n − 1) × (n − 1) unitary matrix V so that the doubly stochastic matrix B defined by Bi,j = Vi,j
satisfies y = Bx. By taking U to be the n × n unitary matrix
 
1 0
U=
0 V
2
and letting A be defined by Ai,j = Ui,j , we have that v = Au as is required.
The more difficult case is where k ≥ 2. Let λ ∈ [0, 1] satisfy v1 = λu1 + (1 − λ)uk , and define
V to be the n × n unitary matrix determined by the following equations:
√ √
Ve1 = λe1 − 1 − λek
√ √
Vek = 1 − λe1 + λek
Ve j = e j (for j 6∈ {1, k}).

The action of V on the span of {e1 , ek } is described by this matrix:


√ √ !
λ 1−λ
√ √ .
− 1−λ λ
2
Notice that the n × n doubly stochastic matrix D given by Di,j = Vi,j may be written

D = λ1 + (1 − λ) P(1 k) ,

where (1 k) ∈ Sn denotes the permutation that swaps 1 and k, leaving every other element of
{1, . . . , n} fixed.
Lecture 11. Majorization for real vectors and Hermitian operators 93

Next, define (n − 1)-dimensional vectors

x = (u2 , . . . , uk−1 , λuk + (1 − λ)u1 , uk+1 , . . . , un )


y = ( v2 , . . . , v n ) .

We will index these vectors as x = ( x2 , . . . , xn ) and y = (y2 , . . . , yn ) for clarity. For 1 ≤ l ≤ k − 1


we clearly have
l l l l
∑ yj = ∑ vj ≤ ∑ uj = ∑ xj
j =2 j =2 j =2 j =2

given that vn ≤ · · · ≤ v1 ≤ uk−1 ≤ · · · ≤ u1 . For k ≤ l ≤ n, we have


l l l
∑ xj = ∑ ui − (λu1 + (1 − λ)uk ) ≥ ∑ vj .
j =2 j =1 j =2

Thus, we may again apply the induction hypothesis to obtain an (n − 1) × (n − 1) unitary matrix
2
W such that, for B the doubly stochastic matrix defined by Bi,j = Wi,j , we have y = Bx.
Now define  
1 0
U= V.
0 W
This is clearly a unitary matrix, and to complete the proof it suffices to prove that the doubly
2
stochastic matrix A defined by Ai,j = Ui,j satisfies v = Au. We have the following entries of A:

A1,1 = | U1,1 |2 = λ, Ai,1 = (1 − λ) | Wi−1,k−1 |2 = (1 − λ) Bi−1,k−1 ,


A1,k = | U1,k |2 = 1 − λ, Ai,k = λ | Wi−1,k−1 |2 = λBi−1,k−1 ,
2
A1,j = 0, Ai,j = Wi−1,j−1 = Bi−1,j−1 ,

Where i and j range over all choices of indices with i, j 6∈ {1, k}. From these equations we see that
 
1 0
A= D,
0 B
which satisfies v = Au as required.
The final step in the proof is to observe that item 3 implies item 1, which is given that the
operator A determined by item 3 must be doubly stochastic.

11.3 Majorization for Hermitian operators

We will now define an analogous notion of majorization for Hermitian operators. For Hermitian
operators A, B ∈ Herm (X ) we say that A majorizes B, which we express as B ≺ A or A ≻ B, if
there exists a mixed unitary super-operator Φ ∈ T (X ) such that

B = Φ ( A ).

Inspiration for this definition partly comes from the Birkhoff–von Neumann Theorem, along with
the intuitive idea that randomizing the entries of a real vector is analogous to randomizing the
choice of an orthonormal basis for a Hermitian operator.
The following theorem gives an alternate characterization this relationship that also connects
it with majorization for real vectors.
Lecture 11. Majorization for real vectors and Hermitian operators 94

Theorem 11.3. Let X be a complex Euclidean space and let A, B ∈ Herm (X ). Then B ≺ A if and only
if λ( B) ≺ λ( A).

Proof. Let n = dim(X ). By the Spectral Theorem, we may write


n n
B= ∑ λ j ( B)u j u∗j and A= ∑ λ j ( A)vj v∗j
j =1 j =1

for orthonormal bases {u1 , . . . , un } and {v1 , . . . , vn } of X .


Let us first assume that λ( B) ≺ λ( A). This implies there exist a probability vector p ∈ R Sn
such that
λ j ( B ) = ∑ p( π ) λ π ( j) ( A )
π ∈ Sn

for 1 ≤ j ≤ n. For each permutation π ∈ Sn , define a unitary operator


n
Uπ = ∑ u j v∗π ( j) .
j =1

Then
n
∑ p(π )Uπ AUπ∗ = ∑ ∑ p(π )λπ ( j) ( A) u j u∗j = B.
π ∈ Sn j =1 π ∈ Sn

Suppose on the other hand that there exists a probability vector ( p1 , . . . , pm ) and unitary oper-
ators U1 , . . . , Um so that
m
B= ∑ pi Ui AUi∗.
i =1

Then by considering the spectral decompositions above we have


m m n 2
λ j ( B) = ∑ pi u∗j Ui AUi∗ u j = ∑∑ pi u∗j Ui vk λk ( A).

i =1 i =1 k =1

Define an n × n matrix D as
m 2
D j,k = ∑ pi u∗j Ui vk .

i =1

It is clear that D is doubly stochastic and satisfies Dλ( A) = λ( B). Therefore λ( B) ≺ λ( A) as


required.

11.4 Applications

Finally, let us note a few applications of the facts we have proved about majorization. Let us begin
with a simple one.

Proposition 11.4. Suppose that ρ, ξ ∈ D (X ) satisfy λ(ρ) ≺ λ(ξ ). Then S(ρ) ≥ S(ξ ).

Proof. The proposition follows from fact that for every density operator ξ and every mixed unitary
operation Φ, we have S(ξ ) ≤ S(Φ(ξ )) by the concavity of the von Neumann entropy.
Lecture 11. Majorization for real vectors and Hermitian operators 95

Note that we could equally well have first observe that H ( p) ≥ H (q) for probability vectors p and
q for which p ≺ q, and then applied Theorem 11.3.
The second application concerns a relationship between the diagonal entries of an operator
and its eigenvalues, which was proved by Schur. First, a lemma is required.

Lemma 11.5. Let X be a complex Euclidean space and let { xa : a ∈ Σ} be any orthonormal basis of X .
Then the super-operator
Φ( A) = ∑ ( x∗a Axa ) xa x∗a
a∈Σ

is mixed unitary.

Proof. We may assume without loss of generality that Σ = Z n for some n ≥ 1. First consider the
standard basis {ea : a ∈ Z n }, and define a mixed unitary super-operator

1
Ψ( A ) = ∑ Z a A( Z a )∗ ,
n a ∈Z n

where Z is as defined in Lecture 6. We have



1 a( b − c ) Eb,b if b = c
Ψ( Eb,c ) =
n ∑ ω Eb,c =
0 if b 6= c,
a ∈Z n

and therefore
Ψ( A ) = ∑ h Ea,a , Ai Ea,a .
a ∈Z n

For U ∈ U (X ) defined as
U= ∑ xa e∗a
a ∈Z n

it follows that
1
Φ( A ) = ∑ (UZ a U ∗ ) A (UZ a U ∗ )∗ .
n a ∈Z n

The mapping Φ is therefore mixed unitary as required.

Theorem 11.6. Let X be a complex Euclidean space, let A ∈ Herm (X ) be a Hermitian operator, and let
{ xa : a ∈ Σ} be an orthonormal basis of X . Define u ∈ R Σ as u( a) = x∗a Axa . Then u ≺ λ( A).
Proof. Immediate from Lemma 11.5 and Theorem 11.3.

Notice that this theorem implies that the probability distribution arising from any complete projec-
tive measurement of a density operator ρ must have Shannon entropy at least S(ρ).
Finally, we will prove a characterization of precisely which probability vectors are consistent
with a given density operator, meaning that the density operator could have arisen from a random
choice of pure states according to the distribution described by the probability vector. Again, we
will first need a lemma.

Lemma 11.7. Suppose Σ is a finite, nonempty set, let X = C Σ , and suppose that A ∈ Herm (X ) and
v ∈ R Σ satisfy v ≺ λ( A). Then there exists an orthonormal basis { xa : a ∈ Σ} of X such that
v( a) = h xa x∗a , Ai for each a ∈ Σ.
Lecture 11. Majorization for real vectors and Hermitian operators 96

Proof. Let
A= ∑ w(a)ua u∗a
a∈Σ
be a spectral decomposition of A. It follows that there exists a unitary operator U with the follow-
ing property. For D defined by
2
D ( a, b) = | U ( a, b)|
for ( a, b) ∈ Σ × Σ, we have v = Dw.
Next, define
V= ∑ ea u∗a
a∈Σ
and let xa = V ∗ U ∗ Vu a for each a ∈ Σ. Then
h xa x∗a , Ai = ∑ | U ( a, b)|2 w(b) = ( Dw)( a) = v( a),
b

which proves the lemma.


Theorem 11.8. Let X = C Σ for Σ a finite, nonempty set, and suppose that a density operator ρ ∈ D (X )
and a probability vector p ∈ R Σ are given. Then there exist (not necessarily orthogonal) unit vectors
{u a : a ∈ Σ} in X such that
ρ = ∑ p( a)u a u∗a
a∈Σ
if and only if p ≺ λ(ρ).
Proof. Assume first that p ≺ λ(ρ). By Lemma 11.7 we have that there exists an orthonormal basis
{ xa : a ∈ Σ} of X with the property that h xa x∗a , ρi = p( a) for each a ∈ Σ. Given that { xa : a ∈ Σ}
is a basis of X , we may therefore write

vec( ρ) = ∑ ya ⊗ xa
a∈Σ

for some choice of {ya : a ∈ Σ}.



Taking the partial trace on the two sides of vec( ρ) gives
√ √
ρT = (Tr ⊗ 1 ) (vec( ρ) vec( ρ)∗ ) = ∑ hyb , ya i xa xTb ,
a,b ∈Σ
√ √ ∗
ρ = (1 ⊗ Tr) (vec( ρ) vec( ρ) ) = ∑ ya y∗a .
a∈Σ

It follows from the first equation that


ρ= ∑ hyb , ya i xa xb∗ ,
a,b ∈Σ

and given that x∗a ρxa = p( a) we have that k ya k2 = p( a) for each a ∈ Σ. Thus, for each a ∈ Σ for
which p( a) > 0, we may take
ya
ua = p
p( a)
(and let u a be an arbitrary unit vector if p( a) = 0), and we have that each u a is a unit vector and

∑ p(a)ua u∗a = ∑ ya y∗a = ρ


a∈Σ a∈Σ

as required.
Theory of Quantum Information Lecture Notes
John Watrous (Institute for Quantum Computing, University of Waterloo)

Lecture 12
Separable operators

For the next several lectures we will be discussing various aspects and properties of entanglement.
Mathematically speaking, we define entanglement in terms of what it is not, rather than what it
is—we define the notion of a separable operator, and define that any density operator that is not
separable represents an entangled state.

12.1 Definition and basic properties of separable operators

Let X and Y be complex Euclidean spaces. Then a positive semidefinite operator P ∈ Pos (X ⊗ Y )
is separable if and only if there exist a positive integer m and positive semidefinite operators

Q1 , . . . , Qm ∈ Pos (X ) and R1 , . . . , Rm ∈ Pos (Y )

such that
m
P= ∑ Qj ⊗ Rj. (12.1)
j =1

We will write Sep (X : Y ) to denote the collection of all such operators.1 It is clear that Sep (X : Y )
is a convex cone, and it will not be difficult to prove that Sep (X : Y ) is properly contained in
Pos (X ⊗ Y ). Operators P ∈ Pos (X ⊗ Y ) that are not contained in Sep (X : Y ) are said to be
entangled.
Let us also define
SepD (X : Y ) = Sep (X : Y ) ∩ D (X ⊗ Y )
to be the set of separable density operators acting on X ⊗ Y . By thinking about spectral decom-
positions, one sees immediately that the set of separable density operators is equal to the convex
hull of the pure product density operators:

SepD (X : Y ) = conv { xx∗ ⊗ yy∗ : x ∈ S(X ) , y ∈ S(Y )} .

Thus, every element ρ ∈ SepD (X : Y ) may be expressed as


m
ρ= ∑ pj xj x∗j ⊗ yj y∗j
j =1

for some choice of m ≥ 1, a probability vector p = ( p1 , . . . , pm ), and unit vectors x1 , . . . , xm ∈ X


and y1 , . . . , ym ∈ Y . A simple application of Carathéodory’s Theorem establishes that such an
expression exists for some choice of m ≤ dim(X ⊗ Y )2 .
1 One may extend this definition to any number of spaces, defining (for instance) Sep (X1 : X2 : · · · : X n ) in the
natural way. Our focus, however, will be on bipartite entanglement rather than multipartite entanglement, and so we
will not consider this extension further.

97
Lecture 12. Separable operators 98

Notice that Sep (X : Y ) is the (pointed) cone generated by SepD (X : Y ):

Sep (X : Y ) = {λρ : λ ≥ 0, ρ ∈ SepD (X : Y )} .

The same bound m ≤ dim(X ⊗ Y )2 may therefore be taken for some expression (12.1) of any
P ∈ Sep (X : Y ).
Next, let us note that SepD (X : Y ) is a compact set. To see this, we first observe that the unit
spheres S(X ) and S(Y ) are compact, and therefore so too is the Cartesian product S(X ) × S(Y ).
The function
f : X × Y → L (X ⊗ Y ) : ( x, y) 7→ xx∗ ⊗ yy∗
is obviously continuous, and continuous functions map compact sets to compact sets, so the set

{ xx∗ ⊗ yy∗ : x ∈ S(X ) , y ∈ S(Y )}

is compact as well. Finally, it is a basic fact from convex analysis (that we will take as given) that
the convex hull of any compact set is compact.
The set Sep (X ⊗ Y ) is of course not compact, given that it is not bounded. It is a closed,
convex cone, however, because it is the cone generated by a compact, convex set that does not
contain the origin.

12.2 The Woronowicz–Horodecki Criterion

Next we will discuss a necessary and sufficient condition for a given positive semidefinite operator
to be separable. Although this condition, sometimes known as the Woronowicz–Horodecki Criterion,
does not give us an efficiently computable method to determine whether or not an operator is
separable, it is useful nevertheless in an analytic sense.
The Woronowicz–Horodecki Criterion is based on the fundamental fact from convex analysis
that says that closed convex sets are completely determined by the closed half-spaces that contain
them. Here is one version of this fact that is well-suited to our needs.

Fact. Let X be a complex Euclidean space and let A ⊂ Herm (X ) be a closed, convex cone. Then
for any choice of an operator B ∈ Herm (X ) with B 6∈ A, there exists an operator H ∈ Herm (X )
such that

1. h H, Ai ≥ 0 for all A ∈ A, and


2. h H, Bi < 0.

It should be noted that the particular statement above is only valid for closed convex cones, not
general closed convex sets. For a general closed, convex set, it may be necessary to replace 0 with
some other real scalar for each choice of B.

Theorem 12.1 (Woronowicz–Horodecki Criterion). Let X and Y be complex Euclidean spaces and let
P ∈ Pos (X ⊗ Y ). Then P ∈ Sep (X : Y ) if and only if

(Φ ⊗ 1L(Y ) )( P) ∈ Pos (Y ⊗ Y )

for every positive unital super-operator Φ ∈ T (X , Y ).


Lecture 12. Separable operators 99

Proof. One direction of the proof is simple. If P ∈ Sep (X : Y ), then we have


m
P= ∑ Qj ⊗ Rj
j =1

for some choice of Q1 , . . . , Qm ∈ Pos (X ) and R1 , . . . , Rm ∈ Pos (Y ). Thus, for every positive
super-operator Φ ∈ T (X , Y ) we have
m
(Φ ⊗ 1L(Y ) )( P) = ∑ Φ(Q j ) ⊗ R j ∈ Sep (Y : Y ) ⊂ Pos (Y ⊗ Y ) .
j =1

Let us now assume that P ∈ Pos (X ⊗ Y ) is not separable. Then by Fact 12.2 above there must
exist a Hermitian operator H ∈ Herm (X ⊗ Y ) such that:

1. h H, Q ⊗ Ri ≥ 0 for all Q ∈ Pos (X ) and R ∈ Pos (Y ), and

2. h H, Pi < 0.

Let Ψ ∈ T (Y , X ) be the unique super-operator for which J (Ψ) = H. For any Q ∈ Pos (X ) and
R ∈ Pos (Y ) we therefore have



0 ≤ H, Q ⊗ R = J (Ψ), Q ⊗ R = hΨ( R), Qi .

From this we conclude that Ψ (and therefore Ψ∗ ) is a positive super-operator. At this point we
could easily conclude that (Ψ∗ ⊗ 1L(Y ) )( P) is not positive semidefinite, but we need to work a
little bit more to obtain the unital condition in the statement of the theorem.
We have h H, Pi < 0, so we may choose ε > 0 sufficiently small so that

h H, Pi + ε Tr( P) < 0.

Define Ξ ∈ T (X , Y ) as Ξ = Ψ∗ + εΛ, where Λ ∈ T (X , Y ) is defined as Λ( X ) = Tr( X )1 Y , and let


A = Ξ(1 X ). It holds that A ∈ Pd (Y ), and so we may define Φ ∈ T (X , Y ) as

Φ( X ) = A−1/2 Ξ( X ) A−1/2

for every X ∈ L (X ).
It is clear that Φ is both positive and unital, and it remains to prove that (Φ ⊗ 1L(Y ) )( P) is not
positive semidefinite. This may be verified as follows:
D √  √ ∗ E
vec A vec A , (Φ ⊗ 1L(Y ) )( P)
D E
= vec(1 Y ) vec(1 Y )∗ , (Ξ ⊗ 1L(Y ) )( P)
D E
= (Ξ∗ ⊗ 1L(Y ) ) (vec(1 Y ) vec(1 Y )∗ ) , P
= h J (Ξ∗ ) , Pi
= h H, Pi + ε Tr( P)
< 0.

Here we have used the observation that J (Λ∗ ) = 1 X ⊗ 1 Y . It follows that (Φ ⊗ 1L(Y ) )( P) is not
positive semidefinite, and so the proof is complete.
Lecture 12. Separable operators 100

This theorem allows us to easily prove that certain positive semidefinite operators are not
separable. For example, consider the operator
1
P= vec(1 X ) vec(1 X )∗
n
for X a complex Euclidean space of dimension n. As in Lecture 6, we consider the transposition
super-operator T ∈ T (X ), which is positive and unital. As
1 1
( T ⊗ 1L(X ) )( P) = − R + S,
n n
where R and S are the projections onto the antisymmetric and symmetric subspaces of X ⊗ X , we
have that ( T ⊗ 1L(X ) )( P) is not positive semidefinite, and therefore P is not separable.

12.3 Separable ball around the identity

Finally, we will prove that there exists a small region around the identity operator 1 X ⊗ 1 Y where
every Hermitian operator is separable. This fact gives us an intuitive connection between noise
and entanglement, which is that entanglement cannot exist in the presence of too much noise.
We will need two facts, beyond those we have already proved, to establish this result. The first
is straightforward, and is as follows.
Lemma 12.2. Let X and Y = C Σ be complex Euclidean spaces, and consider an operator A ∈ L (X ⊗ Y )
given by
A = ∑ A a,b ⊗ Ea,b
a,b ∈Σ

for { A a,b : a, b ∈ Σ} ⊂ L (X ). Then

k A k2 ≤ ∑ k A a,b k2 .
a,b ∈Σ

Proof. For each a ∈ Σ define


Ba = ∑ Aa,b ⊗ Ea,b .
b∈Σ
We have that


k Ba Ba∗ k = ∑ A a,b A∗a,b ⊗ Ea,a ≤ ∑ Aa,b A∗a,b = ∑ k Aa,b k2 .

b∈Σ b∈Σ b∈Σ

Now,
A= ∑ Ba ,
a∈Σ
and given that Ba∗ Bb = 0 for a 6= b, we have that

A∗ A = ∑ Ba∗ Ba .
a∈Σ

Therefore
k A k2 = k A ∗ A k ≤ ∑ k Ba∗ Ba k ≤ ∑ k A a,b k2
a∈Σ a,b ∈Σ

as claimed.
Lecture 12. Separable operators 101

The second fact that we need is a theorem about positive unital super-operators, which is
sometimes known as the Russo–Dye Theorem.
Theorem 12.3 (Russo–Dye). Let X and Y be complex Euclidean spaces and let Φ ∈ T (X , Y ) be positive
and unital. Then
k Φ( X )k ≤ k X k
for every X ∈ L (X ).
Proof. Let us first prove that k Φ(U )k ≤ 1 for every unitary operator U ∈ U (X ). Assume X = C Σ ,
and let
U = ∑ λ a u a u∗a
a∈Σ

be a spectral decomposition of U. Then

Φ (U ) = ∑ λa Φ (ua u∗a ) = ∑ λa Pa ,
a∈Σ a∈Σ

where Pa = Φ ( u a u∗a ) for each a ∈ Σ. As Φ is positive, we have that Pa ∈ Pos (Y ) for each a ∈ Σ,
and given that Φ is unital we have
∑ Pa = 1Y .
a∈Σ

By Naimark’s Theorem there exists a linear isometry A ∈ U (Y , Y ⊗ X ) such that

Pa = A∗ (1 Y ⊗ Ea,a ) A

for each a ∈ Σ, and therefore


!
Φ (U ) = A ∗ ∑ λa 1Y ⊗ Ea,a A = A∗ (1 Y ⊗ VUV ∗ ) A
a∈Σ

for
V= ∑ ea u∗a .
a∈Σ

As U and V are unitary and A is an isometry, the bound k Φ(U )k ≤ 1 follows from the submulti-
plicativity of the spectral norm.
For general X, it suffices to prove k Φ( X )k ≤ 1 whenever k X k ≤ 1. Because every operator
X ∈ L (X ) with k X k ≤ 1 can be expressed as a convex combination of unitary operators, the
required bound follows from the convexity of the spectral norm.

Theorem 12.4. Let X and Y be complex Euclidean spaces and suppose that A ∈ Herm (X ⊗ Y ) satisfies
k A k2 ≤ 1. Then
1 X ⊗Y − A ∈ Sep (X : Y ) .
Proof. Let Φ ∈ T (X , Y ) be positive and unital. Assume Y = C Σ and write

A= ∑ A a,b ⊗ Ea,b .
a,b ∈Σ

Then
(Φ ⊗ 1L(Y ) )( A) = ∑ Φ( A a,b ) ⊗ Ea,b ,
a,b ∈Σ
Lecture 12. Separable operators 102

and therefore
2
(Φ ⊗ 1L(Y ) )( A) ≤ ∑ k Φ( A a,b )k2 ≤ ∑ k A a,b k2 ≤ ∑ k A a,b k22 = k A k22 ≤ 1.

a,b ∈Σ a,b ∈Σ a,b ∈Σ

The first inequality is by Lemma 12.2 and the second inequality is by the Russo-Dye Theorem.
The positivity of Φ implies that (Φ ⊗ 1L(Y ) )( A) is Hermitian, and thus (Φ ⊗ 1L(Y ) )( A) ≤ 1 Y ⊗Y .
Therefore we have

(Φ ⊗ 1L(Y ) )(1 X ⊗Y − A) = 1 Y ⊗Y − (Φ ⊗ 1L(Y ) )( A) ≥ 0.

As this holds for all positive unital Φ, we will have that 1 X ⊗Y − A is separable by Theorem 12.1.
Theory of Quantum Information Lecture Notes
John Watrous (Institute for Quantum Computing, University of Waterloo)

Lecture 13
Separable super-operators and the LOCC paradigm

In the previous lecture we discussed separable operators. The focus of this lecture will be on
analogous concepts for super-operators. In particular, we will discuss separable super-operators,
as well as the important subclass of LOCC super-operators. The acronym LOCC is short for local
operations and classical communication, and plays a central role in the study of entanglement.

13.1 Min-rank

Before discussing separable and LOCC super-operators, it will be helpful to briefly discuss a gen-
eralization of the concept of separability for operators.
Suppose two complex Euclidean spaces X and Y are fixed, and for a given choice of a nonneg-
ative integer k let us consider the collection of operators
Rk (X : Y ) = conv {vec( A) vec( A)∗ : A ∈ L (Y , X ) , rank( A) ≤ k} .
In other words, a given positive semidefinite operator P ∈ Pos (X ⊗ Y ) is contained in Rk (X : Y )
if and only if it is possible to write
m
P= ∑ vec( A j ) vec( A j )∗
j =1

for some choice of an integer m and operators A1 , . . . , Am ∈ L (Y , X ), each having rank at most k.
(It is important to note that this sort of expression does not require orthogonality of the operators
A1 , . . . , Am .)
Each of the sets Rk (X : Y ) is a closed convex cone. It is easy to see that
R0 (X : Y ) = {0},
R1 (X : Y ) = Sep (X : Y ) ,

and

Rn (X : Y ) = Pos (X ⊗ Y )
for n ≥ min{dim(X ), dim(Y )}. Moreover,
Rk (X : Y ) ( Rk+1 (X : Y )
for 0 ≤ k < min{dim(X ), dim(Y )}, as any rank one operator vec( A) vec( A)∗ is contained in
Rr (X : Y ) but not Rr −1 (X : Y ) for r = rank( A) (assuming A 6= 0).
Finally, for each positive semidefinite operator P ∈ Pos (X ⊗ Y ), we define the min-rank of P
as
min-rank( P) = min {k ≥ 0 : P ∈ Rk (X : Y )} .
(This quantity is more commonly known as the Schmidt number.)

103
Lecture 13. Separable super-operators and the LOCC paradigm 104

13.2 Separable super-operators

Now let us move on to super-operators. A super-operator Φ ∈ T (X A ⊗ X B , Y A ⊗ YB ), is said


to be separable if it can be expressed as a convex combination of completely positive product
super-operators. Equivalently, Φ is separable if and only if there exists operators A1 , . . . , Am ∈
L (X A , Y A ) and B1 , . . . , Bm ∈ L (X B , YB ) such that
m
Φ( X ) = ∑ ( A j ⊗ Bj ) X ( A j ⊗ Bj )∗ (13.1)
j =1

for all X ∈ L (X A ⊗ X B ). We will denote by

SepT (X A , Y A : X B , YB )

the set of all such super-operators.


The use of the term separable to describe super-operators of the above form is consistent with
the following observation.

Proposition 13.1. Let Φ ∈ T (X A ⊗ X B , Y A ⊗ YB ) be a super-operator. Then

Φ ∈ SepT (X A , Y A : X B , YB )

if and only if
J (Φ) ∈ Sep (Y A ⊗ X A : YB ⊗ X B ) .

Proof. Given an expression (13.1) for Φ, we have


m
J ( Φ) = ∑ vec( A j ) vec( A j )∗ ⊗ vec( Bj ) vec( Bj )∗ ∈ Sep (Y A ⊗ X A : YB ⊗ XB ) .
j =1

On the other hand, if J (Φ) ∈ Sep (Y A ⊗ X A : YB ⊗ X B ) we may write


m
J ( Φ) = ∑ vec( A j ) vec( A j )∗ ⊗ vec( Bj ) vec( Bj )∗
j =1

for some choice of operators A1 , . . . , Am ∈ L (X A , Y A ) and B1 , . . . , Bm ∈ L (X B , YB ). This implies


Φ may be expressed in the form (13.1).

Let us now observe the simple and yet useful fact that separable super-operators cannot in-
crease min-rank. This implies, in particular, that separable super-operators cannot create entan-
glement out of thin air—if a separable operator is input to a separable super-operator, the output
will also be separable.

Theorem 13.2. Let Φ ∈ SepT (X A , Y A : X B , YB ) be a separable super-operator and let P ∈ Rk (X A : X B ).


Then
Φ( P) ∈ Rk (Y A : YB ) .
In other words, min-rank(Φ( P)) ≤ min-rank( P).
Lecture 13. Separable super-operators and the LOCC paradigm 105

Proof. Assume A1 , . . . , Am ∈ L (X A , Y A ) and B1 , . . . , Bm ∈ L (X B , YB ) satisfy


m
Φ( X ) = ∑ ( A j ⊗ Bj ) X ( A j ⊗ Bj )∗
j =1

for all X ∈ L (X A ⊗ X B ). Then for any choice of Y ∈ L (X B , X A ) we have


m    ∗
Φ(vec(Y ) vec(Y )∗ ) = ∑ vec A j YBTj vec A j YBTj .
j =1

As  
rank A j YBTj ≤ rank(Y )

for each j = 1, . . . , m, it holds that

Φ(vec(Y ) vec(Y )∗ ) ∈ Rr (Y A : YB )

for r = rank(Y ). The theorem now follows by convexity.

13.3 LOCC super-operators

Consider the situation in which two parties, Alice and Bob, collectively perform some sequence
of operations and/or measurements on a shared quantum system. One may of course consider
various restrictions on the sorts of actions Alice and Bob can perform—for instance they may be
forbidden to communicate, permitted to exchange classical information, or permitted to exchange
quantum information.
The paradigm of LOCC, which is an acronym for local operations and classical communication,
concerns the situation in which Alice and Bob are permitted to communicate, but only using
classical communication. More specifically, LOCC operations will be defined as those that can
obtained as follows.

1. Alice and Bob can independently perform admissible operations on their own registers, inde-
pendently of the other player.
2. Alice can perform a measurement on one or more of her registers, and then transmit the clas-
sical information she obtains to Bob.
3. The roles of Alice and Bob can be reversed in item 2.
4. Alice and Bob can compose any finite number of operations that correspond to items 1, 2,
and 3.

Many problems and results in quantum information theory concern LOCC super-operators in
one form or another, often involving Alice and Bob’s ability to manipulate entangled states by
means of such operations.

13.3.1 Definition of LOCC super-operators


Let us begin with a straightforward formal definition of LOCC super-operators. There are many
other equivalent ways that one could define this class—and we are simply picking one way.
Lecture 13. Separable super-operators and the LOCC paradigm 106

Admissible product super-operators


Let X A , X B , Y A , YB be complex Euclidean spaces and suppose that Φ A ∈ T (X A , Y A ) and ΦB ∈
T (X B , YB ) are admissible super-operators. Then the super-operator

Φ A ⊗ ΦB ∈ T (X A ⊗ X B , Y A ⊗ YB )

is said to be an admissible product super-operator. Such a super-operator represents the situation in


which Alice and Bob perform independent operations on their own quantum systems.

Measurement transmission super-operators


Let X A , Z A , and X B be complex Euclidean spaces, let Σ be a finite, nonempty set, let

{ Pa : a ∈ Σ} ⊂ Pos (Z A )

be a measurement on Z A , and let Z B = C Σ . Then the super-operator

Φ ∈ T (X A ⊗ Z A ⊗ X B , X A ⊗ X B ⊗ Z B )

defined by
Φ( X ) = ∑ TrZ A
[(1 X A ⊗ Pa ⊗ 1 XB ) X ] ⊗ Ea,a
a∈Σ

is an Alice-to-Bob measurement transmission super-operator.


The interpretation of such a super-operator is that Alice performs the measurement described
by { Pa : a ∈ Σ} on the part of her quantum system that corresponds to Z A and transmits the result
to Bob. We then imagine that Bob initializes a quantum register to the state Ea,a for whichever
outcome a ∈ Σ Alice obtained, so that we may incorporate this measurement outcome into the
description of Bob’s quantum information.
A Bob-to-Alice measurement transmission super-operator is a super-operator of the form

Φ ∈ T (X A ⊗ X B ⊗ Z B , X A ⊗ Z A ⊗ X B )

that is defined in the same way as an Alice-to-Bob measurement transmission super-operator,


except that of course Bob performs the measurement rather than Alice.
We may speak of a measurement transmission super-operator to mean either an Alice-to-Bob or
Bob-to-Alice measurement transmission super-operator.

Finite compositions
Finally, for complex Euclidean spaces X A , X B , Y A and YB , an LOCC super-operator is any super-
operator of the form
Φ ∈ T (X A ⊗ X B , Y A ⊗ YB )
that can be obtained from the composition of any finite number of admissible product super-
operators and measurement transmission super-operators. Note that by defining LOCC super-
operators in terms of finite compositions, we implicitly fix the number of messages exchanged by
Alice and Bob in the realization of any LOCC super-operator.
We will write
LOCC (X A , Y A : X B , YB )
to denote the collection of all LOCC super-operators as just defined.
Lecture 13. Separable super-operators and the LOCC paradigm 107

13.3.2 LOCC super-operators are separable


It is important to note that there are simple questions concerning LOCC super-operators that are
not yet answered. For example, it is not known whether the collection of LOCC super-operators
LOCC (X A , Y A : X B , YB ) is closed for fixed spaces X A , X B , Y A and YB . Also, little is known about
the number of messages that may be required to implement LOCC super-operators.
In some situations, we may conclude interesting facts about LOCC super-operators by reason-
ing about separable super-operators. To this end, let us state a simple but very useful proposition.

Proposition 13.3. Let Φ ∈ T (X A ⊗ X B , Y A ⊗ YB ) be an LOCC super-operator. Then

Φ ∈ SepT (X A , Y A : X B , YB ) .

Proof. The set of separable super-operators is closed under composition, so it suffices to note that
every admissible product super-operator and every measurement transmission super-operator is
separable. In both cases, it is straightforward to verify separability.

Note that the separable super-operators do not offer a perfect characterization of LOCC super-
operators: there exist super-operators that are both admissible and separable, but not LOCC.
Nevertheless, we will still be able to use this proposition to prove various things about LOCC
super-operators. One simple example follows.

Corollary 13.4. Suppose ρ ∈ D (X A ⊗ X B ) and Φ ∈ LOCC (X A , Y A : X B , YB ). Then

min-rank(Φ(ρ)) ≤ min-rank(ρ).

In particular, if ρ ∈ SepD (X A : X B ) then Φ(ρ) ∈ SepD (Y A : YB ).


Theory of Quantum Information Lecture Notes
John Watrous (Institute for Quantum Computing, University of Waterloo)

Lecture 14
Pure state entanglement transformation

In this lecture we will consider pure-state entanglement transformation. The setting is as follows:
Alice and Bob share a pure state x ∈ X A ⊗ X B , but they would like to transform this state to
another state y ∈ Y A ⊗ YB by means of local operations and classical communication. This is
obviously possible in some situations and impossible in others—and what we would like is to
have a condition on x and y that tells us precisely when it is possible. The following theorem
provides such a condition.

Theorem 14.1 (Nielsen’s Theorem). Assume that X A , X B , Y A , and YB are complex Euclidean spaces,
and let x ∈ X A ⊗ X B and y ∈ Y A ⊗ YB be unit vectors. Then there exists an LOCC super-operator
Φ ∈ LOCC (X A , Y A : X B , YB ) such that Φ( xx∗ ) = yy∗ if and only if

TrXB ( xx∗ ) ≺ TrY B (yy∗ ) .

Remark 14.2. It may be that X A and Y A do not have the same dimension, and in this case the
condition TrXB ( xx∗ ) ≺ TrY B (yy∗ ) requires further explanation. To be more precise, the condition
should be interpreted as
V (TrXB ( xx∗ )) V ∗ ≺ W (TrY B (yy∗ )) W ∗
for some choice of a complex Euclidean space Z A and linear isometries V ∈ U (X A , Z A ) and
W ∈ U (Y A , Z A ). (If the condition holds for one such choice of isometries V and W, it holds for
all choices.) In essence, this interpretation is analogous to padding vectors with zeroes as we did
when we discussed the majorization relation between real vectors of different dimensions. Here,
the isometries V and W embed the operators TrXB (yy∗ ) and TrY B (zz∗ ) into a single space so that
they may be related by our definition of majorization.

The remainder of this lecture will be devoted to proving this theorem. The most difficult aspect
of the proof is that one must reason about general LOCC super-operators, which are sometimes
cumbersome. For this reason we will begin with a restricted definition of LOCC super-operators
that will be easier to reason about in this context. As we will see, it turns out that there is no loss of
generality in working with this restricted notion. Once this is done, we will prove the implications
that are necessary to establish the theorem.

14.1 A restricted definition of LOCC operations

The restricted type of LOCC super-operators we will work with are defined as follows for given
complex Euclidean spaces Z A and Z B .

1. A super-operator Φ ∈ T (Z A ⊗ Z B ) will be said to be an A→B super-operator if there exists a


non-destructive measurement
{ Ma : a ∈ Σ} ⊂ L (Z A )

108
Lecture 14. Pure state entanglement transformation 109

and a collection of unitary operators

{Ua : a ∈ Σ} ⊂ U (Z B )

such that
Φ( X ) = ∑ ( M a ⊗ Ua ) X ( M a ⊗ Ua ) ∗ .
a∈Σ

One imagines that such an operation represents the situation where Alice performs a non-
destructive measurement, transmits the result to Bob, and Bob applies a unitary operation to
his system that depends on Alice’s measurement result.
2. A super-operator Φ ∈ T (Z A ⊗ Z B ) will be said to be a B→A super-operator if there exists a
non-destructive measurement { Ma : a ∈ Σ} ⊂ L (Z B ) and a collection of unitary operators
{Ua : a ∈ Σ} ⊂ U (Z A ) such that

Φ( X ) = ∑ (Ua ⊗ M a ) X (Ua ⊗ M a ) ∗ .
a∈Σ

Such a super-operator is analogous to an A→B super-operator, but where the roles of Alice
and Bob are reversed.
3. Finally, a super-operator Φ ∈ T (Z A ⊗ Z B ) is a restricted LOCC super-operator if it is a composi-
tion of A→B and B →A super-operators.

It is not difficult to see that every restricted LOCC super-operator is LOCC. As the following
theorem shows, restricted LOCC super-operators turn out to be as powerful as general LOCC
super-operators, provided they are free to act on sufficiently large spaces.

Theorem 14.3. Suppose Φ ∈ LOCC (X A , Y A : X B , YB ) is an LOCC super-operator. Then for any choice
of complex Euclidean spaces Z A and Z B having sufficiently large dimension, and for any choice of linear
isometries

VA ∈ U (X A , Z A ) , WA ∈ U (Y A , Z A ) , VB ∈ U (X B , Z B ) , WB ∈ U (YB , Z B ) ,

there exists a restricted LOCC super-operator Ψ ∈ T (Z A ⊗ Z B ) such that

(WA ⊗ WB )Φ( X )(WA ⊗ WB )∗ = Ψ ((VA ⊗ VB ) X (VA ⊗ VB )∗ ) (14.1)

for all X ∈ L (X A ⊗ X B ).

Remark 14.4. Before we prove this theorem, let us consider what it is saying. Alice has the in-
put space X A and the output space is Y A , which may have different dimensions—but we want
to view these two spaces as being embedded in a single space Z A . The isometries VA and WA
describe these embeddings. Likewise, VB and WB describe the embeddings of Bob’s input and
output spaces X B and YB in a single space Z B . The above equation (14.1) simply means that Ψ
correctly represents Φ in terms of these embeddings: Alice and Bob could either embed the input
X ∈ L (X A ⊗ X B ) in L (Z A ⊗ Z B ) as (VA ⊗ VB ) X (VA ⊗ VB )∗ , and then apply Ψ; or they could first
perform Φ and then embed the output Φ( X ) into L (Z A ⊗ Z B ) as (WA ⊗ WB )Φ( X )(WA ⊗ WB )∗ .
The equation (14.1) means that they obtain the same thing either way.
Lecture 14. Pure state entanglement transformation 110

Proof. We will first consider just a limited class of super-operators Φ whose closure under compo-
sition gives the class of LOCC super-operators. Once this has been done we will observe that the
theorem holds for compositions of such super-operators, which will suffice to prove the theorem.
Let us first consider the case that Φ takes the form Φ = Φ A ⊗ 1L(XB ) for Φ A ∈ T (X A , Y A ) an
admissible super-operator, and let

Φ A (X ) = ∑ Aa X A∗a
a∈Σ

be a Kraus representation of Φ A . We will establish that the claim in the statement of the the-
orem holds for any choice of Z A and Z B for which dim(Z A ) ≥ max{dim(X A ), dim(Y A )} and
dim(Z B ) ≥ dim(X B ). Linear isometries VA , VB , WA , and WB are given, and in the case at hand the
form of Φ dictates that VA ∈ U (X A , Z A ), WA ∈ U (Y A , Z A ), and VB , WB ∈ U (X B , Z B ).
Now, for each a ∈ Σ define
Ma = WA A a VA∗ ∈ L (Z A ) .
The collection { Ma : a ∈ Σ} may not quite give us a non-destructive measurement because

∑ Ma∗ Ma = VA VA∗
a∈Σ

which is an orthogonal projection operator that may not have full rank. So, let us assume without
loss of generality that 0 6∈ Σ, define Σ′ = Σ ∪ {0}, and define

M0 = 1 Z A − VA VA∗ .

Then
∑ ′ Ma∗ Ma = 1Z A
a∈Σ

so that the collection { Ma : a ∈ Σ′ } defines a non-destructive measurement on Z A . Let us also


define U ∈ U (Z B ) to be any unitary operator that satisfies UVB = WB and let Ua = U for every
a ∈ Σ′ . Then we obtain that

(WA ⊗ WB )Φ( X )(WA ⊗ WB )∗ = Ψ ((VA ⊗ VB ) X (VA ⊗ VB )∗ )

as required. A symmetric argument also establishes the theorem for the case Φ = 1 X A ⊗ ΦB for
ΦB admissible.
Now suppose { Ma : a ∈ Σ} ⊂ L (X A ) is a non-destructive measurement and YB = X B ⊗ C Σ ,
and consider the super-operator Φ ∈ T (X A ⊗ X B , X A ⊗ YB ) defined by

Φ( X ) = ∑ ( Ma ⊗ 1 X B
) X ( Ma ⊗ 1 XB )∗ ⊗ Ea,a .
a∈Σ

We will establish that the claim in the statement of the theorem holds for any choice of Z A and Z B
for which dim(Z A ) ≥ dim(X A ) and dim(Z B ) ≥ dim(YB ). Linear isometries VA , VB , WA , and WB
are given, and in this case the form of Φ dictates that VA , WA ∈ U (X A , Z A ), VB ∈ U (X B , Z B ), and
WB ∈ U (YB , Z B ).
Along similar lines to the first type of super-operator considered above we define

Na = WA Ma VA∗ ∈ L (Z A )
Lecture 14. Pure state entanglement transformation 111

for each a ∈ Σ; and again we assume without loss of generality that 0 6∈ Σ, define Σ′ = Σ ∪ {0},
and define
N0 = 1 Z A − VA VA∗ .
Then { Na : a ∈ Σ′ } is a non-destructive measurement on Z A . Also define Ua ∈ U (Z B ) so that
Ua VB = WA (1 XB ⊗ ea )
for each a ∈ Σ, and define U0 = 1 Z B (which happens to be a completely arbitrary choice of a
unitary on Z B that does not effect the proof). We then have
(WA ⊗ WB )Φ( X )(WA ⊗ WB )∗ = Ψ ((VA ⊗ VB ) X (VA ⊗ VB )∗ )
as required. Once again, a similar argument holds for Φ taking the symmetric form
Φ( X ) = ∑ Ea,a ⊗ (1X A
⊗ M a ) X (1 X A ⊗ M a ) ∗ .
a∈Σ
Now suppose that the theorem has been established for super-operators
Φ1 ∈ LOCC (X A , W A : X B , W B ) ,
Φ2 ∈ LOCC (W A , Y A : W B , YB ) ,
and consider any specific choice of complex Euclidean spaces Z A and Z B for which the claim in
the statement of the theorem has been shown to hold for these super-operators. Assume linear
isometries
VA ∈ U (X A , Z A ) , WA ∈ U (Y A , Z A ) , VB ∈ U (X B , Z B ) , WB ∈ U (YB , Z B )
are given, and let isometries U A ∈ L (W A , Z A ) and UB ∈ L (W B , Z B ) be chosen arbitrarily. We
have that there exist restricted LOCC super-operators Ψ1 , Ψ2 ∈ T (Z A ⊗ Z B ) such that
(U A ⊗ UB )Φ1 ( X )(U A ⊗ UB )∗ = Ψ1 ((VA ⊗ VB ) X (VA ⊗ VB )∗ )
(WA ⊗ WB )Φ2 (W )(WA ⊗ WB )∗ = Ψ2 ((U A ⊗ UB )W (U A ⊗ UB )∗ ) .
It follows immediately that
(WA ⊗ WB )Φ2 (Φ1 ( X ))(WA ⊗ WB )∗
= Ψ2 ((U A ⊗ UB )Φ1 ( X )(U A ⊗ UB )∗ )
= Ψ2 (Ψ1 ((VA ⊗ VB ) X (VA ⊗ VB )∗ ) ,
and so the theorem is established for the composition Φ2 Φ1 . Given that compositions of the types
of super-operators above include admissible product super-operators and measurement transmis-
sion super-operators, and therefore all LOCC super-operators, the proof is complete.

14.2 Proof of Nielsen’s Theorem

We will now prove Nielsen’s Theorem. Based on the remark following the statement of this the-
orem together with Theorem 14.3, we see that we may assume without loss of generality that we
have x, y ∈ Z A ⊗ Z B for complex Euclidean spaces Z A and Z B , and that we may restrict our
attention to restricted LOCC super-operators rather than general LOCC super-operators. More
specifically, our goal is to prove that the following two conditions are equivalent:
1. There exists a restricted LOCC super-operator Φ ∈ T (Z A ⊗ Z B ) such that Φ( xx∗ ) = yy∗ .
2. There exists a mixed unitary super-operator Ξ ∈ T (Z A ) such that TrZ B ( xx∗ ) = Ξ(TrZ B (yy∗ )).
Also note that there is no loss of generality in assuming that Z A and Z B have equal dimension.
Lecture 14. Pure state entanglement transformation 112

14.2.1 Mixed unitary to LOCC direction


We will first prove that the existence of a mixed unitary super-operator Ξ ∈ T (Z A ) such that

TrZ B ( xx∗ ) = Ξ(TrZ B (yy∗ ))

implies the existence of a restricted LOCC super-operator Φ ∈ T (Z A ⊗ Z B ) that satisfies the con-
dition Φ( xx∗ ) = yy∗ .
Let X, Y ∈ L (Z B , Z A ) be the operators satisfying vec( X ) = x and vec(Y ) = y. The assumption
that there exists a mixed unitary super-operator Ξ ∈ T (Z A ) as above implies that

XX ∗ = ∑ p(a) Ua YY ∗ Ua∗
a∈Σ

for some choice of a finite, nonempty set Σ, a probability vector p ∈ R Σ , and a collection of
unitary operators {Ua : a ∈ Σ} ⊂ U (Z A ). We may of course assume without loss of generality
that p( a) > 0 for every a ∈ Σ.
Define q ∗
M a = p( a) X + Ua Y
for each a ∈ Σ, where X + denotes the Moore–Penrose pseudo-inverse of X (which is discussed in the
Lecture 1 notes). For each a ∈ Σ we have

Ma∗ Ma = p( a) X + Ua (YY ∗ ) Ua∗ ( X + )∗ ,

and therefore ∗
∑ Ma∗ Ma = X + XX ∗ (X + )∗ = X+ X X+ X

= 1 − Πker( X ) .
a∈Σ

So, { Ma : a ∈ Σ} is not quite a non-destructive measurement, but we can turn it into one by
adding an additional element in a similar way that we did in the proof of Theorem 14.3—so
let us assume 0 6∈ Σ, define Σ′ = Σ ∪ {0}, and define M0 = Πker( X ) . Then, { Ma : a ∈ Σ′ }
is
 a non-destructive measurement, and therefore so too is its element-wise complex conjugate
Ma : a ∈ Σ′ .

Define U0 = 1 Z A and consider the B→A super-operator Φ ∈ T (Z A ⊗ Z B ) defined by


∗
Φ( Z ) = ∑ Ua∗ ⊗ Ma Z Ua∗ ⊗ Ma .

a∈Σ′

We have
Φ(vec( X ) vec( X )∗ ) = ∑ ′ vec(Ua∗ XMa∗ ) vec(Ua XMa∗ )∗ ,
a∈Σ
and as XΠker( X ) = 0 we see that the term corresponding to a = 0 takes the value 0. As

XX ∗ = ∑ p(a)Ua YY ∗ Ua∗
a∈Σ

and each operator UaYY ∗ Ua∗ is positive semidefinite, we must have

im (UaYY ∗ Ua∗ ) ⊆ im( XX ∗ )

for each a ∈ Σ. This implies that im(Y ) ⊆ im(Ua∗ X ) for each a ∈ Σ, and therefore
q q   q q
Ua∗ XMa∗ = p( a) Ua∗ XX + Ua Y = p( a) Ua∗ Πim( X ) Ua Y = p( a) Πim(Ua∗ X ) Y = p( a) Y.

Lecture 14. Pure state entanglement transformation 113

Consequently
Φ(vec( X ) vec( X )∗ ) = vec(Y ) vec(Y )∗ .
We have therefore proved that the assumption that there exists a mixed unitary super-operator
Ξ ∈ T (Z A ) such that TrZ B ( xx∗ ) = Ξ(TrZ B (yy∗ )) implies that there exists a restricted LOCC super-
operator Φ ∈ T (Z A ⊗ Z B ) such that Φ( xx∗ ) = yy∗ .

14.2.2 LOCC to mixed-unitary direction


Now let us consider the other implication, which states that the existence of a restricted LOCC
super-operator Φ ∈ T (Z A ⊗ Z B ) satisfying Φ( xx∗ ) = yy∗ implies that there must exist a mixed
unitary super-operator Ξ ∈ T (Z A ) such that TrZ B ( xx∗ ) = Ξ(TrZ B (yy∗ )). This direction is more
difficult, because the LOCC super-operator Φ could potentially be very complicated. To address
this issue, we begin with the following lemma.

Lemma 14.5. For any choice of complex Euclidean spaces Z A and Z B having equal dimension, the follow-
ing two statements hold:

1. For every restricted LOCC super-operator Φ ∈ T (Z A ⊗ Z B ) and every vector u ∈ Z A ⊗ Z B , there


exists an A→B super-operator Ψ ∈ T (Z A ⊗ Z B ) such that Ψ(uu∗ ) = Φ(uu∗ ).
2. For every restricted LOCC super-operator Φ ∈ T (Z A ⊗ Z B ) and every vector u ∈ Z A ⊗ Z B , there
exists a B→A super-operator Ψ ∈ T (Z A ⊗ Z B ) such that Ψ(uu∗ ) = Φ(uu∗ ).

Proof. The idea of the proof is to show that A→B and B→A super-operators can be interchanged
for fixed pure-state inputs, which allows any restricted LOCC super-operator to be “collapsed” to
a single A→B or B→A super-operator.
Suppose first that { Ma : a ∈ Σ} ⊂ L (Z B ) is a non-destructive measurement, {Ua : a ∈ Σ} ⊂
U (Z A ) is a collection of unitary operators, and

Φ( Z ) = ∑ (Ua ⊗ M a ) Z (Ua ⊗ M a ) ∗
a∈Σ

is the B→A super-operator that is described by these operators. Also let u ∈ Z A ⊗ Z B be a vector
and let X ∈ L (Z B , Z A ) satisfy vec( X ) = u. Thus

Φ(vec( X ) vec( X )∗ ) = ∑ vec (Ua XMa ) vec (Ua XMa )
T T
.
a∈Σ

Our goal is find a non-destructive measurement { Na : a ∈ Σ} ⊂ L (Z A ) and a collection of


unitary operators {Va : a ∈ Σ} ⊂ U (Z B ) such that

Na XVaT = Ua XMTa

for every a ∈ Σ. With this condition satisfied we will have that the A→B super-operator Ψ defined
by
Ψ( Z ) = ∑ ( Na ⊗ Va ) Z ( Na ⊗ Va )∗
a∈Σ
Lecture 14. Pure state entanglement transformation 114

satisfies

Ψ(uu∗ ) = Ψ(vec( X ) vec( X )∗ )



= ∑ vec ( Na XVa ) vec ( Na XVa )
T T

a∈Σ

= ∑ vec (Ua XMa ) vec (Ua XMa )
T T

a∈Σ
= Φ(vec( X ) vec( X )∗ )
= Φ(uu∗ ).

Choose a unitary operator A ∈ U (Z A , Z B ) so that AX ∈ Pos (Z B ) and choose unitary opera-


tors Wa ∈ U (Z B ) so that Wa AXMTa ∈ Pos (Z B ) for each a ∈ Σ. This is easily done by considering
singular value decompositions of the operators X and AXMTa . Notice that we therefore have

Wa AXMTa = (Wa AXMTa ) = Ma ( AX )∗ Wa∗ = Ma AXWa∗

for each a. Multiplying both sides of this equation on the left by Ua A∗ Wa∗ gives

Ua XMTa = Ua A∗Wa∗ Ma A XWa∗




Defining

Va = Wa
Na = Ua A∗ Wa∗ Ma A

for each a ∈ Σ therefore gives Na XVaT = Ua XMTa . It is clear that each Va is unitary and it is easily
checked that
∑ Na∗ Na = 1Z A ,
a∈Σ

implying that { Na : a ∈ Σ} is a valid non-destructive measurement.


We have therefore proved that for every B→A super-operator Φ ∈ T (Z A ⊗ Z B ) and every
vector u ∈ Z A ⊗ Z B , there exists an A→B super-operator Ψ ∈ T (Z A ⊗ Z B ) such that Ψ(uu∗ ) =
Φ(uu∗ ). A symmetric argument shows that for every A→B super-operator Φ and every vector
u ∈ Z A ⊗ Z B , there exists a B→A super-operator Ψ such that Φ(uu∗ ) = Ψ(uu∗ ).
Finally, notice that the composition of any two A→B super-operators is also an A→B super-
operator, and likewise for B→A super-operators. Therefore, by applying the above arguments
repeatedly for any given restricted LOCC super-operator Φ and vector u ∈ Z A ⊗ Z B , we find that
there exists an A→B super-operator Ψ such that Ψ(uu∗ ) = Φ(uu∗ ), and likewise for Ψ being a
B→A super-operator.

We are now prepared to finish the proof. We assume that there exists a restricted LOCC super-
operator Φ ∈ T (Z A ⊗ Z B ) such that Φ( xx∗ ) = yy∗ , from which we conclude that there exists a
B→A super-operator Ψ ∈ T (Z A ⊗ Z B ) such that Ψ( xx∗ ) = yy∗ . Write

Ψ( Z ) = ∑ (Ua ⊗ M a ) Z (Ua ⊗ M a ) ∗ ,
a∈Σ

for { Ma : a ∈ Σ} a non-destructive measurement on Z B and {Ua : a ∈ Σ} a collection of unitary


operators on Z A .
Lecture 14. Pure state entanglement transformation 115

Let X, Y ∈ L (Z B , Z A ) satisfy x = vec( X ) and y = vec(Y ), so that

Ψ(vec( X ) vec( X )∗ ) = ∑ vec(Ua XMa ) vec(Ua XMa )∗ = vec(Y ) vec(Y )∗ .


T T

a∈Σ

This implies that


Ua XMTa = α a Y
and therefore
XMTa = α a Ua∗ Y
for each a ∈ Σ, where {α a : a ∈ Σ} is a collection of complex numbers satisfying ∑ a∈Σ | α a |2 = 1.
We now have
∑ | αa |2 Ua∗YY ∗ Ua = ∑ XMTa Ma X ∗ = XX ∗ .
a∈Σ a∈Σ

This shows that there exists a mixed unitary super-operator Ξ such that

XX ∗ = Ξ(YY ∗ ).

which is equivalent to
TrZ B ( xx∗ ) = Ξ(TrZ B (yy∗ ))
as required.
Theory of Quantum Information Lecture Notes
John Watrous (Institute for Quantum Computing, University of Waterloo)

Lecture 15
Measures of entanglement

The topic of this lecture is measures of entanglement. The underlying idea throughout this discussion
is that entanglement may be viewed as a resource that is useful for various communication-related
tasks, such as teleportation, and we would like to quantify the amount of entanglement that is
contained in different states.

15.1 Maximum inner product with a maximally entangled state

We will begin with a very simple quantity that relates to the amount of entanglement in a given
state: the maximum inner product with a maximally entangled state. It is not necessarily an
interesting concept in its own right, but it will be useful as a mathematical tool in the sections of
this lecture that follow.
Suppose X and Y are complex Euclidean spaces, and assume for the moment that dim(X ) ≥
dim(Y ) = n. A maximally entangled state of a pair of registers (X, Y ) to which these spaces are
associated is any pure state uu∗ for which
1
TrX (uu∗ ) = 1Y ;
n
or, in other words, tracing out the larger space leaves the completely mixed state on the other
register. Equivalently, the maximally entangled states are those that may be expressed as
1
vec(U ) vec(U )∗
n
for some choice of a linear isometry U ∈ U (Y , X ). If it is the case that n = dim(X ) ≤ dim(Y )
then the maximally entangled states are those states uu∗ such that
1
TrY (uu∗ ) = 1X ,
n
or equivalently those that can be written
1
vec(U ∗ ) vec(U ∗ )∗
n
where U ∈ L (X , Y ) is a linear isometry. Quite frequently, the term maximally entangled refers to
the situation in which dim(X ) = dim(Y ), where the two notions obviously coincide. As is to be
expected when discussing pure states, we will often refer to a unit vector u as being a maximally
entangled state, which simply means that uu∗ is maximally entangled.
Now, for an arbitrary density operator ρ ∈ D (X ⊗ Y ), let us define
M (ρ) = max{h uu∗ , ρi : u ∈ X ⊗ Y is maximally entangled}.
Clearly it holds that 0 < M (ρ) ≤ 1 for every density operator ρ ∈ D (X ⊗ Y ). The following
lemma establishes an upper bound on M (ρ) based on the min-rank of ρ.

116
Lecture 15. Measures of entanglement 117

Lemma 15.1. Let X and Y be complex Euclidean spaces, and let n = min{dim(X ), dim(Y )}. Then

min-rank(ρ)
M ( ρ) ≤
n
for all ρ ∈ D (X ⊗ Y ).

Proof. Let us assume dim(X ) ≥ dim(Y ) = n, and note that the argument is equivalent in case the
inequality is reversed.
First let us note that M is a convex function on D (X ⊗ Y ). To see this, consider any choice of
σ, ξ ∈ D (X ⊗ Y ) and p ∈ [0, 1]. Then for U ∈ U (Y , X ) we have

1
vec(U )∗ ( pσ + (1 − p)ξ ) vec(U )
n
1 1
= p vec(U )∗ σ vec(U ) + (1 − p) vec(U )∗ ξ vec(U )
n n
≤ pM (σ) + (1 − p) M (ξ ).

Maximizing over all U ∈ U (Y , X ) establishes that M is convex as claimed.


Now, given that M is convex, we see that it suffices to prove the lemma by considering only
pure states. Every pure density operator on X ⊗ Y may be written as vec( A) vec( A)∗ for A ∈
L (Y , X ) satisfying k A k2 = 1. We have

1 1
M (vec( A) vec( A)∗ ) = max |hU, Ai|2 = k A k21 .
n U ∈U(Y ,X ) n

Given that q
k A k1 ≤ rank( A) k A k2
for every operator A, the lemma follows.

15.2 Entanglement cost and distillable entanglement

We will now discuss two fundamental measures of entanglement: the entanglement cost and the
distillable entanglement. For the remainder of the lecture, let us take

Y A = C {0,1} and YB = C {0,1}

to be complex Euclidean spaces corresponding to single qubits, and let τ ∈ D (Y A ⊗ YB ) denote


the density operator
1
τ = (e0 ⊗ e0 + e1 ⊗ e1 )(e0 ⊗ e0 + e1 ⊗ e1 )∗ ,
2
which may be recognizable when expressed in the Dirac notation as

1 1
τ = |φ + i hφ + | for |φ+ i = √ |00i + √ |11i .
2 2
We view that the state τ represents one unit of entanglement, typically called an e-bit of entangle-
ment.
Lecture 15. Measures of entanglement 118

15.2.1 Definition of entanglement cost


The first measure of entanglement we will consider is called the entanglement cost. Informally
speaking, the entanglement cost of a density operator ρ ∈ D (X A ⊗ X B ) represents the number of
e-bits Alice and Bob need to share in order to create a copy of ρ by means of an LOCC opereration
with high fidelity. It is an information-theoretic quantity, so it must be understood to be asymp-
totic in nature—where one amortizes over many parallel repetitions of such a conversion. The
following definition states this more precisely.

Definition 15.2. The entanglement cost of a density operator ρ ∈ D (X A ⊗ X B ), denoted Ec (ρ), is the
infimum over all real numbers α ≥ 0 for which there exists a sequence of LOCC super-operators
{Φn : n ∈ N }, where  
⊗⌊ αn⌋ ⊗⌊ αn⌋
Φn ∈ LOCC Y A , X A⊗n : YB , X B⊗n ,

such that    
lim F Φn τ ⊗⌊αn⌋ , ρ⊗n = 1.
n→∞

The interpretation of the definition is that, in the limit of large n, Alice and Bob are able to
convert ⌊αn⌋ e-bits into n copies of ρ with high fidelity for any α > Ec (ρ). It is not entirely obvious
that there should exist any value of α for which there exists a sequence of LOCC super-operators
{Φn : n ∈ N } as in the statement of the definition, but indeed there always does exist a suitable
choice of α.

15.2.2 Definition of distillable entanglement


The second measure of entanglement we will consider is the distillable entanglement, which is es-
sentially the reverse of the entanglement cost. It quantifies the number of e-bits that Alice and Bob
can extract from the state in question, again amortized over many copies.

Definition 15.3. The distillable entanglement of a density operator ρ ∈ D (X A ⊗ X B ), denoted Ed (ρ),


is the supremum over all real numbers α ≥ 0 for which there exists a sequence of LOCC super-
operators {Φn : n ∈ N }, where
 
⊗⌊ αn⌋ ⊗⌊ αn⌋
Φn ∈ LOCC X A⊗n , Y A : X B⊗n , YB ,

such that  
lim F Φn ρ⊗n , τ ⊗⌊αn⌋ = 1.

n→∞

The interpretation of the definition is that, in the limit of large n, Alice and Bob are able to
convert n copies of ρ into ⌊αn⌋ e-bits with high fidelity for any α < Ed (ρ). In order to clarify the
definition, let us state explicitly that the operator τ ⊗0 is interpreted to be the scalar 1, implying
that condition in the lemma is trivially satisfied for α = 0.

15.2.3 The distillable entanglement is at most the entanglement cost


At an intuitive level it is clear that the entanglement cost must be at least as large as the distillable
entanglement, for otherwise Alice and Bob would be able to open an “entanglement factory” that
would violate the principle that LOCC super-operators cannot create entanglement out of thin air.
Let us now prove this formally.
Lecture 15. Measures of entanglement 119

Theorem 15.4. For every state ρ ∈ D (X A ⊗ X B ) we have Ed (ρ) ≤ Ec (ρ).

Proof. Let us assume that α and β are nonnegative real numbers such that the following two prop-
erties are satisfied:

1. There exists a sequence of LOCC super-operators {Φn : n ∈ N }, where


 
⊗⌊ αn⌋ ⊗⌊ αn⌋
Φn ∈ LOCC Y A , X A⊗n : YB , X B⊗n ,

such that    
⊗⌊αn⌋ ⊗n
lim F Φn τ ,ρ = 1.
n→∞

2. There exists a sequence of LOCC super-operators {Φn : n ∈ N }, where


 
⊗⌊ βn⌋ ⊗⌊ βn⌋
Ψn ∈ LOCC X A⊗n , Y A : X B⊗n , YB ,

such that  
lim F Ψn ρ⊗n , τ ⊗⌊ βn⌋ = 1.

n→∞

Using the Fuchs–van de Graaf Inequalities, along with triangle inequality for the trace norm,
we conclude that    
lim F (Ψn Φn ) τ ⊗⌊αn⌋ , τ ⊗⌊ βn⌋ = 1. (15.1)
n→∞

Because min-rank(τ ⊗k ) = 2k for every choice of k ≥ 1, and LOCC super-operators cannot increase
min-rank, we have that
   2
F (Ψn Φn ) τ ⊗⌊αn⌋ , τ ⊗⌊ βn⌋ ≤ 2⌊αn⌋−⌊ βn⌋. (15.2)

by Lemma 15.1. By equations (15.1) and (15.2), we therefore have that α ≥ β.


Given that Ec (ρ) is the infimum over all α, and Ed (ρ) is the supremum over all β, with the
above properties, we have that Ec (ρ) ≥ Ed (ρ) as required.

15.3 Pure state entanglement

The remainder of the lecture will focus on the entanglement cost and distillable entanglement for
bipartite pure states. In this case, these measures turn out to be identical, and coincide precisely
with the von Neumann entropy of the reduced state of either subsystem.

Theorem 15.5. Let X A and X B be complex Euclidean spaces and let u ∈ X A ⊗ X B be a unit vector. Then

Ec (uu∗ ) = Ed (uu∗ ) = S(TrX A (uu∗ )) = S(TrXB (uu∗ )).

Proof. The proof will start with some basic observations about the vector u that will be used to
calculate both the entanglement cost and the distillable entanglement. Let ρ = TrXB (uu∗ ), and let

ρ= ∑ p(a) va v∗a
a∈Σ
Lecture 15. Measures of entanglement 120

be a spectral decomposition of ρ, for some arbitrary set Σ with | Σ | = dim(X A ). We may then
choose orthonormal vectors {w a : a ∈ Σ} ⊂ X B so that
q
u= ∑ p( a) v a ⊗ w a
a∈Σ

represents a singular value (or Schmidt) decomposition of u. As ρ is a density operator we have


that p ∈ R Σ is a probability vector, and we have S(ρ) = H ( p). Let us assume hereafter that
H ( p) > 0, for the case H ( p) = 0 corresponds to the situation where u is separable (in which case
the entanglement cost and distillable entanglement are both easily seen to be 0).
We will make use of concepts regarding compression that were discussed in Lecture 7. Recall
that for each choice of n and ε > 0, we denote by Tn,ε ⊆ Σn the set of ε-typical sequences of length
n with respect to the probability vector p:
n o
Tn,ε = a1 · · · an ∈ Σn : 2−n( H ( p)+ε) < p( a1 ) · · · p( an ) < 2−n( H ( p)−ε) .

For each n ∈ N and ε > 0, let us define a vector


q
xn,ε = ∑ p( a1 ) · · · p( an ) va1 ··· an ⊗ w a1 ···an ∈ (X A ⊗ X B )⊗n .
a1 ···an ∈ Tn,ε

Here we are using the shorthand va1 ···an = va1 ⊗ · · · ⊗ van (and similar for w a1 ···an ). We have

k xn,ε k2 = ∑ p ( a1 ) · · · p ( a n ) ,
a1 ···an ∈ Tn,ε

which is the probability that a random choice of a1 · · · an is ε-typical with respect to the probability
vector p. It follows that limn→∞ k xn,ε k = 1 for any choice of ε > 0. Let
xn,ε
yn,ε =
k xn,ε k
denote the normalized versions of these vectors.
Next, consider the vector of eigenvalues
 

λ TrX ⊗n xn,ε xn,ε .
B

The nonzero eigenvalues are given by the probabilities for the various ε-typical sequences, and so
 
2−n( H ( p)+ε) < λ j TrX ⊗n xn,ε xn,ε

< 2−n( H ( p)−ε)
B

for j = 1, . . . , | Tn,ε |. (The remaining eigenvalues are 0.) It follows that

2−n( H ( p)+ε)   2−n( H ( p)−ε)


< λ j TrX ⊗n yn,ε y∗n,ε <
k xn,ε k2 B
k xn,ε k2
for j = 1, . . . , | Tn,ε |, and again the remaining eigenvalues are 0.
Let us now consider the entanglement cost of uu∗ . We wish to show that for every real number
α > H ( p) there exists a sequence {Φn : n ∈ N } of LOCC super-operators such that
   
lim F Φn τ ⊗⌊αn⌋ , (uu∗ )⊗n = 1. (15.3)
n→∞
Lecture 15. Measures of entanglement 121

We will do this by means of Nielsen’s Theorem. Specifically, let us choose ε > 0 so that α >
H ( p) + 2ε, from which it follows that ⌊ αn⌋ ≥ n( H ( p) + ε) for sufficiently large n. We have
  
λ j TrY ⊗⌊αn⌋ τ ⊗⌊αn⌋ = 2−⌊αn⌋
B

for j = 1, . . . , 2⌊αn⌋ . Given that

2−n( H ( p)+ε)
≥ 2−n( H ( p)+ε) ≥ 2−⌊αn⌋ ,
k xn,ε k2
for sufficiently large n, it follows that
 
TrY ⊗⌊αn⌋ τ ⊗⌊αn⌋ ≺ TrX ⊗n yn,ε y∗n,ε .

(15.4)
B B

This means that τ ⊗⌊αn⌋ can be converted to yn,ε y∗n,ε by means of an LOCC super-operator Φn by
Nielsen’s Theorem. Given that
 
lim F yn,ε y∗n,ε , (uu∗ )⊗n = 1
n→∞

this implies that the required equation (15.3) holds. Consequently Ec (uu∗ ) ≤ S (TrXB (uu∗ )).
Next let us consider the distillable entanglement, for which a similar argument is used. Our
goal is to prove that for every α < H ( p), there exists a sequence {Φn : n ∈ N } of LOCC super-
operators such that  
lim F Φn ρ⊗n , τ ⊗⌊αn⌋ = 1.

(15.5)
n→∞

In this case, let us choose ε > 0 small enough so that α < H ( p) − 2ε. Then for sufficiently large n
we have
2⌊αn⌋ ≤ k xn,ε k2 2n( H ( p)−ε).
Similar to above, we therefore have
 
TrX ⊗n yn,ε y∗n,ε ≺ TrY ⊗⌊αn⌋ τ ⊗⌊αn⌋ ,

B B

which implies that the state yn,ε y∗n,ε can be converted to the state τ ⊗⌊αn⌋ by means of an LOCC
super-operator Φn for sufficiently large n. Given that

lim (uu∗ )⊗n − yn,ε y∗n,ε 1 = 0



n→∞

it follows that

lim Φn (uu∗ )⊗n − τ ⊗⌊αn⌋

n→∞
 1 
≤ lim Φn (uu∗ )⊗n − Φn yn,ε y∗n,ε 1 + Φn yn,ε y∗n,ε − τ ⊗⌊αn⌋ = 0,
  
n→∞ 1

which establishes the above equation (15.5). Consequently S (TrXB (uu∗ )) ≤ Ed (uu∗ ).
We have shown that
Ec (uu∗ ) ≤ S(TrXB (uu∗ )) ≤ Ed (uu∗ ).
As Ed (uu∗ ) ≤ Ec (uu∗ ), the equality in the statement of the theorem follows.
Theory of Quantum Information Lecture Notes
John Watrous (Institute for Quantum Computing, University of Waterloo)

Lecture 16
The partial transpose and bound-entangled states

In this lecture we will discuss the partial transpose and its connection to entanglement. The pri-
mary focus of the lecture will be to construct examples of bound-entangled states, which are mixed
states that are entangled and yet have zero distillable entanglement. The known constructions of
such states are closely connected to the partial transpose.

16.1 The partial transpose and separability

Recall the Woronowicz–Horodecki criterion for separability: for complex Euclidean spaces X
and Y , we have that a given operator P ∈ Pos (X ⊗ Y ) is separable if and only if

(Φ ⊗ 1L(Y ) )( P) ∈ Pos (Y ⊗ Y )

for every choice of a positive unital super-operator Φ ∈ T (X , Y ). We note, however, that the
restriction of the super-operator Φ to be both unital and to take the form Φ ∈ T (X , Y ) can be
relaxed. Specifically, the Woronowicz–Horodecki criterion implies the truth of the following two
facts:

1. If P ∈ Pos (X ⊗ Y ) is separable, then for every choice of a complex Euclidean space Z and a
positive super-operator Φ ∈ T (X , Z), we have

(Φ ⊗ 1L(Y ) )( P) ∈ Pos (Z ⊗ Y ) .

2. If P ∈ Pos (X ⊗ Y ) is not separable, there exists a positive super-operator Φ ∈ T (X , Z) that


reveals this fact, in the sense that

(Φ ⊗ 1L(Y ) )( P) 6∈ Pos (Z ⊗ Y ) .

Moreover, there exists such a super-operator Φ that is unital and for which Z = Y .

It is clear that the criterion illustrates a connection between separability and positive super-
operators that are not completely positive, for if Φ ∈ T (X , Z) is completely positive, then

(Φ ⊗ 1L(Y ) )( P) ∈ Pos (Z ⊗ Y )

for every completely positive super-operator Φ ∈ T (X , Z), regardless of whether P is separable


or not.
Thus far, we have only seen one example of a super-operator that is positive but not completely
positive: the transpose. Let us recall that the transpose super-operator T ∈ T (X ) on a complex
Euclidean space X is defined as
T (X ) = XT

122
Lecture 16. The partial transpose and bound-entangled states 123

for all X ∈ L (X ). The positivity of T is clear: X ∈ Pos (X ) if and only of X T ∈ Pos (X ) for every
X ∈ L (X ). Assuming that X = C Σ , we have

T (X ) = ∑ Ea,b XEb,a
a,b ∈Σ

for all X ∈ L (X ). The Choi–Jamiołkowski representation of T is

J (T ) = ∑ Ea,b ⊗ Eb,a = W
a,b ∈Σ

where W ∈ U (X ⊗ X ) denotes the swap operator. The fact that W is not positive semidefinite
shows that T is not completely positive.
When we refer to the partial transpose, we mean that the transpose super-operator is tensored
with the identity super-operator on some other space. We will use a similar notation to the partial
trace: for given complex Euclidean spaces X and Y , we define

TX = T ⊗ 1L(Y ) ∈ T (X ⊗ Y ) .

More generally, the subscript refers to the space on which the transpose is performed.
Given that the transpose is positive, we may conclude the following from the Woronowicz–
Horodecki Criterion for any choice of P ∈ Pos (X ⊗ Y ):
1. If P is separable, then TX ( P) is necessarily positive semidefinite.
2. If P is not separable, then TX ( P) might not be positive semidefinite, although nothing defini-
tive can be concluded from the criterion.
We have seen a specific example where the transpose indeed does identify entanglement: if Σ is a
finite, nonempty set of size n, and we take X A = C Σ and X B = C Σ , then
1
P=
n ∑ Ea,b ⊗ Ea,b ∈ Pos (X A ⊗ X B )
a,b ∈Σ

is certainly entangled, because


1
TX A ( P) = W 6∈ Pos (X A ⊗ X B ) .
n
We will soon prove that indeed there do exist entangled operators P ∈ Pos (X A ⊗ X B ) for which
TX A ( P) ∈ Pos (X A ⊗ X B ), which means that the partial transpose does not give a simple test for
separability. It turns out, however, that the partial transpose does have an interesting connection
to entanglement distillation, as we will see later in the lecture.
For the sake of discussing this issue in greater detail, let us consider the following definition.
For any choice of complex Euclidean spaces X A and X B , we define

PPT (X A : X B ) = { P ∈ Pos (X A ⊗ X B ) : TX A ( P) ∈ Pos (X A ⊗ X B )} .

The acronym PPT stands for positive (semidefinite) partial transpose.


It is clear that the set PPT (X A : X B ) is a closed convex cone. Let us also note that this notion
respects tensor products (up to a re-ordering of tensor factors), meaning that if P ∈ PPT (X A : X B )
and Q ∈ PPT (Y A : YB ), then

P ⊗ Q ∈ PPT (X A ⊗ Y A : X B ⊗ YB ) .
Lecture 16. The partial transpose and bound-entangled states 124

Finally, notice that the definition of PPT (X A : X B ) does not really depend on the fact that the
partial transpose is performed on X A as opposed to X B . This follows from the observation that

T ( TX A ( X )) = TXB ( X )

for every X ∈ L (X A ⊗ X B ), and therefore

TX A ( X ) ∈ Pos (X A ⊗ X B ) ⇔ TXB ( X ) ∈ Pos (X A ⊗ X B ) .

16.2 Examples of entangled states that are PPT

In this section we will discuss two examples of operators that are both entangled and PPT. This
shows that the partial transpose test does not give an efficient test for separability, and also implies
something interesting about entanglement distillation to be discussed in the next section.

16.2.1 First example


Let us begin by considering the following collection of operators, all of which act on the complex
Euclidean space CZn ⊗ CZn for an integer n ≥ 2:

1 1 1
Pn =
n ∑ Ea,b ⊗ Ea,b , Wn = ∑ Eb,a ⊗ Ea,b , Rn =
2
1 − Wn ,
2
a,b ∈Z n a,b ∈Z n
1 1
Qn = 1 − Pn , Sn = 1 + Wn .
2 2
It holds that Pn , Qn , Rn , and Sn are projection operators with Pn + Qn = Rn + Sn = 1. The operator
Wn is the swap operator, Rn is the projection onto the anti-symmetric subspace of CZn ⊗ CZn and Sn
is the projection onto the symmetric subspace of CZn ⊗ CZn .
It is clear that
1
( T ⊗ 1 )( Pn ) = Wn and ( T ⊗ 1 )(1 ) = 1,
n
from which the following equations follow:

1 1 n−1 1
( T ⊗ 1 )( Pn ) = − Rn + Sn , ( T ⊗ 1 )( Rn ) = − Pn + Qn ,
n n 2 2
n+1 n−1 n+1 1
( T ⊗ 1 )( Qn ) = Rn + Sn , ( T ⊗ 1 )(Sn ) = Pn + Qn .
n n 2 2
Now let us suppose we have registers X2 , Y2 , X3 , and Y3 , where

X2 = C Z2 , Y2 = C Z 2 , X3 = C Z3 , Y3 = C Z 3 .

In other words, X2 and Y2 are qubit registers, while X3 and Y3 are qutrit registers. We will imagine
the situation in which Alice holds registers X2 and X3 , while Bob holds Y2 and Y3 .
For every choice of α > 0, define

Xα = Q3 ⊗ Q2 + αP3 ⊗ P2 ∈ Pos (X3 ⊗ Y3 ⊗ X2 ⊗ Y2 ) .


Lecture 16. The partial transpose and bound-entangled states 125

Based on the above equations we compute:


       
4 2 3 1 1 1 1 1
TX3 ⊗X2 ( Xα ) = R 3 + S3 ⊗ R 2 + S2 + α − R 3 + S3 ⊗ − R 2 + S2
3 3 2 2 3 3 2 2
12 + α 4−α 6−α 2+α
= R3 ⊗ R2 + R 3 ⊗ S2 + S3 ⊗ R 2 + S3 ⊗ S2 .
6 6 6 6
Provided that α ≤ 4, we therefore have that Xα ∈ PPT (X3 ⊗ X2 : Y3 ⊗ Y2 ).
On the other hand, we have that

Xα 6∈ Sep (X3 ⊗ X2 : Y3 ⊗ Y2 )

for every choice of α > 0, as we will now show. Define Ψ ∈ T (X2 ⊗ Y2 , X3 ⊗ Y3 ) to be the unique
super-operator for which J (Ψ) = Xα . Using the identity

Ψ(Y ) = TrX2 ⊗Y2 [ J (Ψ) ( 1 ⊗ Y T )]

we see that Ψ( P2 ) = αP3 . So, for α > 0 we have that Ψ increases min-rank and is therefore not a
separable super-operator. Thus, it is not the case that Xα is separable.

16.2.2 Unextendible product bases


The second example is based on the notion of an unextendible product basis. Although the construc-
tion works for any choice of an unextendible product basis, we will just consider one example. Let
X = CZ3 and Y = CZ3 , and consider the following 5 unit vectors in X ⊗ Y :

|0i − |1i
 
u 1 = |0i ⊗ √
2
|1i − |2i
 
u 2 = |2i ⊗ √
2
|0i − |1i
 
u3 = √ ⊗ |2i
2
|1i − |2i
 
u4 = √ ⊗ |0i
2
|0i + |1i + |2i |0i + |1i + |2i
   
u5 = √ ⊗ √
3 3

There are three relevant facts about this set for the purpose of our discussion:

1. The set {u1 , . . . , u5 } is an orthonormal set.


2. Each ui is a product vector, meaning ui = xi ⊗ yi for some choice of x1 , . . . , x5 ∈ X and
y1 , . . . , y5 ∈ Y .
3. It is impossible to find a sixth non-zero product vector v ⊗ w ∈ X ⊗ Y that is orthogonal to
u1 , . . . , u5 .
Lecture 16. The partial transpose and bound-entangled states 126

To verify the third property, note that in order for a product vector v ⊗ w to be orthogonal to any
ui , it must be that hv, xi i = 0 or hw, yi i = 0. In order to have hv ⊗ w, ui i for i = 1, . . . , 5 we must
therefore have hv, xi i = 0 for at least three distinct choices of i or hw, yi i = 0 for at least three
distinct choices of i. However, for any three distinct choices of indices i, j, k ∈ {1, . . . , 5} we have
span{ xi , x j , xk } = X and span{yi , y j , yk } = Y , which implies that either v = 0 or w = 0, and
therefore v ⊗ w = 0.
Now, define a projection P ∈ Pos (X ⊗ Y ) as
5
P = 1 X ⊗Y − ∑ ui u∗i .
i =1

Let us first note that P ∈ PPT (X : Y ). For each i = 1, . . . , 5 we have

TX (ui u∗i ) = ( xi xi∗ )T ⊗ yi y∗i = xi xi∗ ⊗ yi y∗i = ui u∗i .

The second equality follows from the fact that each xi has only real coefficients, so xi = xi . Thus,
5 5
TX ( P) = TX (1 X ⊗Y ) − ∑ TX (ui u∗i ) = 1 X ⊗Y − ∑ ui u∗i = P ∈ Pos (X ⊗ Y ) ,
i =1 i =1

as claimed.
Now let us assume toward contradiction that P is separable. This implies that it is possible to
write
m
P= ∑ vj v∗j ⊗ w j w∗j
j =1

for some choice of v1 , . . . , vm ∈ X and w1 , . . . , wm ∈ Y . For each i = 1, . . . , 5 we have


m
0 = u∗i Pui = ∑ u∗i (vj v∗j ⊗ w j w∗j )ui .
j =1


Therefore, for each j = 1, . . . , m we have v j ⊗ w j , ui = 0 for i = 1, . . . , 5. This implies that
v1 ⊗ w1 = · · · = vm ⊗ wm = 0, and thus P = 0, establishing a contradiction. Consequently P is not
separable.

16.3 PPT states and distillation

The last part of this lecture concerns the relationship between the partial transpose and entangle-
ment distillation. Our goal will be to prove that PPT states cannot be distilled, meaning that the
distillable entanglement is zero.
Let us begin the discussion with some further properties of PPT states that will be needed.
First we will observe that separable super-operators respect the positivity of the partial transpose.

Theorem 16.1. Suppose P ∈ PPT (X A : X B ) and Φ ∈ SepT (X A , Y A : X B , YB ) is a separable super-


operator. Then Φ( P) ∈ PPT (Y A : YB ).

Proof. Suppose that A ∈ L (X A , Y A ) and B ∈ L (X B , YB ). As P ∈ PPT (X A : X B ) we have

TX A ( P) ∈ Pos (X A ⊗ X B )
Lecture 16. The partial transpose and bound-entangled states 127

and therefore
(1 X A ⊗ B) TX A ( P)(1 X A ⊗ B∗ ) ∈ Pos (X A ⊗ YB ) .
The partial transpose on X A commutes with the conjugation by B, and therefore
TX A ((1 X A ⊗ B) P(1 X A ⊗ B∗ )) ∈ Pos (X A ⊗ YB ) .
This implies that
T ( TX A (( I ⊗ B) P( I ⊗ B∗ ))) = TY B (( I ⊗ B) P( I ⊗ B∗ )) ∈ Pos (X A ⊗ YB )
as remarked in the first section of the lecture. Using the fact that conjugation by A commutes with
the partial transpose on YB , we have that
( A ⊗ 1 YB ) TYB (( I ⊗ B) P( I ⊗ B∗ )) ( A∗ ⊗ 1 YB ) = TYB (( A ⊗ B) P( A∗ ⊗ B∗ )) ∈ Pos (Y A ⊗ YB ) .
We have therefore proved that ( A ⊗ B) P( A∗ ⊗ B∗ ) ∈ PPT (Y A : YB ).
Now, for Φ ∈ SepT (X A , Y A : X B , YB ), we have that Φ( P) ∈ PPT (Y A : YB ) by the above obser-
vation together with the fact that PPT (Y A : YB ) is a convex cone.

Next, let us note that PPT states cannot have a large inner product with a maximally entangled
states.
Lemma 16.2. Let X and Y be complex Euclidean spaces and let n = min{dim(X ), dim(Y )}. For any
PPT density operator
ρ ∈ D (X ⊗ Y ) ∩ PPT (X : Y )
we have M (ρ) ≤ 1/n.
Proof. Let us assume, without loss of generality, that Y = CZn and dim(X ) ≥ n. Every maximally
entangled state on X ⊗ Y may therefore be written
(U ⊗ 1 Y ) Pn (U ⊗ 1 Y )∗
for U ∈ U (Y , X ) a linear isometry. We have that
h(U ⊗ 1 Y ) Pn (U ⊗ 1 Y )∗ , ρi = h Pn , (U ⊗ 1 Y )∗ ρ(U ⊗ 1 Y )i ,
and that
(U ⊗ 1 Y )∗ ρ(U ⊗ 1 Y ) ∈ PPT (Y : Y )
is a PPT operator with trace at most 1. To prove the lemma it therefore suffices to prove that
1
h Pn , ξ i ≤
n
for every ξ ∈ D (Y ⊗ Y ) ∩ PPT (Y : Y ).
The partial transpose is its own adjoint and inverse, which implies that
h( T ⊗ 1 )( A), ( T ⊗ 1 )( B)i = h A, Bi
for any choice of operators A, B ∈ L (Y ⊗ Y ). It is also clear that the partial transpose preserves
trace, which implies that ( T ⊗ 1 )(ξ ) ∈ D (Y ⊗ Y ) for every ξ ∈ D (Y ⊗ Y ) ∩ PPT (Y : Y ). Given
that Rn and Sn are both projections, and therefore h Rn , ( T ⊗ 1 )(ξ )i , hSn , ( T ⊗ 1 )(ξ )i ∈ [0, 1] it
follows that
1 1 1
h Pn , ξ i = h( T ⊗ 1 )( Pn ), ( T ⊗ 1 )(ξ )i = hSn , ( T ⊗ 1 )(ξ )i − h Rn , ( T ⊗ 1 )(ξ )i ≤
n n n
as claimed.
Lecture 16. The partial transpose and bound-entangled states 128

Finally we are ready for the main result of the section, which states that PPT density operators
have no distillable entanglement.

Theorem 16.3. Let X A and X B be complex Euclidean spaces and let ρ ∈ D (X A ⊗ X B ) ∩ PPT (X A : X B ).
Then Ed (ρ) = 0.

Proof. Let Y A = C {0,1} and YB = C {0,1} be complex Euclidean spaces each corresponding to a
single qubit, as in the definition of distillable entanglement, and let τ ∈ D (Y A ⊗ YB ) be the density
operator corresponding to a perfect e-bit.
Let α > 0 and let  
⊗⌊αn⌋ ⊗⌊αn⌋
Φn ∈ LOCC X A⊗n , Y A : X B⊗n , YB

be an LOCC super-operator for each n ≥ 1. This implies that Φn is separable and admissible.
Now, if ρ ∈ PPT (X A : X B ) then ρ⊗n ∈ PPT X A⊗n : X B⊗n , and therefore


   
⊗n ⊗⌊αn⌋ ⊗⌊αn⌋ ⊗⌊αn⌋ ⊗⌊αn⌋
Φn ( ρ ) ∈ D Y A ⊗ YB ∩ PPT Y A : YB .

By Lemma 16.2 we therefore have that


D E
τ ⊗⌊αn⌋ , Φn (ρ⊗n ) ≤ 2−⌊αn⌋ .

As we have assumed α > 0, this implies that


 
lim F Φn (ρ⊗n ), τ ⊗⌊αn⌋ = 0,
n→∞

and consequently Ed (ρ) = 0.


Theory of Quantum Information Lecture Notes
John Watrous (Institute for Quantum Computing, University of Waterloo)

Lecture 17
LOCC and separable measurements

In this lecture we will discuss measurements that can be collectively performed by two parties by
means of local quantum operations and classical communication. Much of this discussion could
be generalized to measurements implemented by more than two parties, but as we did when
discussing separable states and operations, we will restrict our attention to the two-party case.

17.1 Definitions and simple observations

Informally speaking, an LOCC measurement is one that can be implemented by two (or more)
parties using only local quantum operations and classical communication. We must, however,
choose a more precise mathematical definition if we are to reason effectively about these objects.
There are many ways one could formally define LOCC measurements, and we will choose just
one way that makes the task simpler by making use of the definition we have already considered
for LOCC super-operators. Specifically, we will say that a measurement
{ Pa : a ∈ Σ} ⊂ Pos (X A ⊗ X B )
on a bipartite system having associated complex Euclidean spaces X A and X B is an LOCC measure-
ment if there exist complex Euclidean spaces Y A and YB , an LOCC super-operator
Φ ∈ LOCC (X A , Y A : X B , YB ) ,
and a measurement {Q a : a ∈ Σ} ⊆ Pos (Y A ) such that
h Pa , ρi = h Q a ⊗ 1 YB , Φ(ρ)i (17.1)
for every a ∈ Σ and every ρ ∈ D (X A ⊗ X B ). Let us note for future reference that the equation
(17.1) holds for all ρ ∈ D (X A ⊗ X B ) if and only if
Pa = Φ∗ ( Q a ⊗ 1 Y B ).
The interpretation of this definition is simple: Alice and Bob implement the measurement
{ Pa : a ∈ Σ} by first performing the LOCC super-operator Φ, after which Alice measures the
resulting system according to {Q a : a ∈ Σ}. Of course there is nothing special about letting
Alice perform the measurement rather than Bob; we are just making an arbitrary choice for the
sake of arriving at a definition, and it is easy to see that the definition would be equivalent if
Bob performed the measurement instead. The space YB has no significance in this definition—so
although the definition does not impose any particular choice of YB , there is no loss of generality
in choosing YB = C.
As we did for super-operators, we also consider a relaxation of LOCC measurements that is
often much easier to work with. A measurement
{ Pa : a ∈ Σ} ⊂ Pos (X A ⊗ X B )
is said to be separable if Pa ∈ Sep (X A : X B ) for every a ∈ Σ.

129
Lecture 17. LOCC and separable measurements 130

Proposition 17.1. Let X A and X B be complex Euclidean spaces and let

µ : Σ → Pos (X A ⊗ X B )

be an LOCC measurement. Then is µ is separable.

Proof. Let Φ ∈ LOCC (X A , Y A : X B , YB ) be an LOCC super-operator and let {Q a : a ∈ Σ} ⊂


Pos (Y A ) be a measurement such that

µ( a) = Pa = Φ∗ ( Q a ⊗ 1 Y B )

for each a ∈ Σ. As Φ is LOCC, it is necessarily separable, and therefore so too is Φ∗ . As Q a ⊗ 1 Y B


is separable for every a ∈ Σ, we have that Pa = Φ∗ ( Q a ⊗ 1 Y B ) is separable for each a ∈ Σ as
required.

It is the case that there are separable measurements that are not LOCC measurements—we will
see such an example (albeit without a proof) later in the lecture. However, separable measure-
ments can be simulated by LOCC measurement in a probabilistic sense: the LOCC measurement
that simulates the separable measurement might fail, but it succeeds with nonzero probability,
and if it succeeds it generates the same output statistics as the original separable measurement.
This is the content of the following theorem.

Theorem 17.2. Suppose { Pa : a ∈ Σ} ⊂ Pos (X A ⊗ X B ) is a separable measurement. Then there exists


an LOCC measurement
{Q a : a ∈ Σ ∪ {fail}} ⊂ Pos (X A ⊗ X B )
with the following properties:

1. For every ρ ∈ D (X A ⊗ X B ) we have h Qfail , ρi < 1. (Equivalently: Qfail < 1.)


2. For every ρ ∈ D (X A ⊗ X B ) and a ∈ Σ, we have

h Q a , ρi
= h Pa , ρi .
1 − h Qfail , ρi

Proof. A general separable measurement { Pa : a ∈ Σ} must have the form

Pa = ∑ Aa,b ⊗ Ba,b
b∈Γ

for some finite set Γ and two collections

{ A a,b : a ∈ Σ, b ∈ Γ} ⊂ Pos (X A ) ,
{ Ba,b : a ∈ Σ, b ∈ Γ} ⊂ Pos (X B ) .

An LOCC measurement that simulates the separable measurement

{ A a,b ⊗ Ba,b : ( a, b) ∈ Σ × Γ}

can easily be transformed into one that simulates { Pa : a ∈ Σ} by adjusting Alice’s final measure-
ment outcome so that she effectively discards b and outputs a.
Lecture 17. LOCC and separable measurements 131

We may therefore assume without loss of generality that Pa = A a ⊗ Ba for A a ∈ Pos (X A ) and
Ba ∈ Pos (X B ) for each a ∈ Σ. Choose scalars α, β > 0 such that

∑(αAa ) ≤ 1X A
and ∑( βBa ) ≤ 1X B
,
a a

and consider two measurements:

{ R a : a ∈ Σ ∪ {fail }} ⊂ Pos (X A ) ,
{Sa : a ∈ Σ ∪ {fail }} ⊂ Pos (X B ) ,

where R a = αA a and Sa = βBa for a ∈ Σ and

Rfail = 1 X A − ∑ R a and Sfail = 1 XB − ∑ Sa .


a a

Now, consider the following LOCC protocol: Alice and Bob independently measure according
to { R a : a ∈ Σ ∪ {fail}} and {Sa : a ∈ Σ ∪ {fail}}, and then compare their measurement outcomes
by using classical communication. If either of their outcomes are fail, or if their outcomes do not
agree, then Alice and Bob decide on the measurement outcome fail. Otherwise, if they agree on
the outcome a ∈ Σ, then Alice and Bob decide on outcome a. We see that the measurement they
implement is described by
Q a = αβ A a ⊗ Ba
for a ∈ Σ and
Qfail = 1 X A ⊗XB − ∑ Qa .
a∈Σ

The properties in the statement of the theorem follow.

It is clear that certain measurements are not separable, and therefore cannot be performed by
means of local operations and classical communication. For instance, no LOCC measurement can
perfectly distinguish any fixed entangled pure state from all orthogonal states, given that one of
the required measurement operators would then necessarily be non-separable. This fact trivially
implies that Alice and Bob cannot perform a measurement with respect to any orthonormal basis

{u a : a ∈ Σ } ⊂ X A ⊗ X B

of X A ⊗ X B unless that basis consists entirely of product vectors.


Another example along these lines is that Alice and Bob cannot perfectly distinguish symmet-
ric and antisymmetric states of CZn ⊗ CZn by means of an LOCC measurement. Such a measure-
ment is described by the two-outcome projective measurement { Rn , Sn }, where

1 1
Rn = (1 − Wn ) and Sn = (1 + Wn ),
2 2
for Wn denoting the swap operator (as was discussed in the previous lecture). The fact that Rn is
not separable follows from the fact that it is not PPT:

n−1 1
( T ⊗ 1 )( Rn ) = − Pn + Qn ,
2 2
where Pn and Qn are as defined in the previous lecture.
Lecture 17. LOCC and separable measurements 132

17.2 Impossibility of LOCC distinguishing some sets of states

When we force measurements to have non-separable measurement operators, it is clear that the
measurements cannot be performed using local operations and classical communication. Some-
times, however, we may be interested in a task that potentially allows for many different imple-
mentations as a measurement.
One interesting scenario along these lines that has been studied quite a bit is the task of dis-
tinguishing sets of pure states. Specifically, we suppose that X A and X B are complex Euclidean
spaces and {u1 , . . . , uk } ⊂ X A ⊗ X B is a set of orthogonal unit vectors. Alice and Bob are given a
pure state ui for i ∈ {1, . . . , k} and their goal is to determine the value of i. (It is assumed of course
that they have complete knowledge of the set {u1 , . . . , uk }.) Under the assumption that k is smaller
than the total dimension of the space X A ⊗ X B , there will be many measurements { P1 , . . . , Pk } that
correctly distinguish the elements of the set {u1 , . . . , uk }, and the general question we consider is
whether there is at least one such measurement that is LOCC.

17.2.1 Sets of maximally entangled states


Let us start with a simple general result that proves that maximally entangled pure states are hard
to distinguish in the sense described. Specifically, assume that X A and X B both have dimension
n, and consider
a collection
U1 , . . . , Uk ∈ U (X B , X A ) of pairwise orthogonal unitary operators,
meaning that : Ui , Uj = 0 for i 6= j. The set
 
1 1
√ vec(U1 ), . . . , √ vec(Uk )
n n

therefore represents a set of maximally entangled pure states. We will show that such a set cannot
be perfectly distinguished under the assumption that k ≥ n + 1.
Consider any separable measurement { P1 , . . . , Pk } ⊂ Pos (X A ⊗ X B ). Given that this is a sepa-
rable measurement, we may write
Pi = ∑ Q a ⊗ R a
a∈Σi

for some set Σi and operators {Q a : a ∈ Σi } ⊂ Pos (X A ) and { R a : a ∈ Σi } ⊂ Pos (X B ). Let us


take Σ = Σ1 ∪ · · · ∪ Σk to be the disjoint union of the sets Σ1 , . . . , Σk , meaning that we regard the
sets as being distinct: Σi ∩ Σ j = ∅ for i 6= j, We therefore have that

∑ Qa ⊗ Ra = 1X A ⊗X B
.
a∈Σ

Now, given one of the above maximally entangled states chosen uniformly at random, the
probability that the measurement correctly determines the chosen index i is
k
1
p=
nk ∑ vec(Ui )∗ Pi vec(Ui )
i =1
k
1
= ∑∑ vec(Ui )∗ ( Q a ⊗ R a ) vec(Ui )
nk i =1 a ∈ Σ i
k
1
= ∑∑ Tr (Ui∗ Q a Ui RTa ) .
nk i =1 a ∈ Σ i
Lecture 17. LOCC and separable measurements 133

For any two positive semidefinite operators X and Y we have Tr( XY ) ≤ Tr( X ) Tr(Y ), and thus
k k
1 1 1
p≤ ∑∑ Tr (Ui∗ Q a Ui ) Tr ( RTa ) = ∑∑ Tr ( Q a ) Tr ( R a ) = ∑ Tr(Qa ) Tr( Ra ).
nk i =1 a ∈ Σ i
nk i =1 a ∈ Σ i
nk a∈Σ

Now,
n2 = Tr(1 X A ⊗XB ) = ∑ Tr(Qa ⊗ Ra ) = ∑ Tr(Qa ) Tr( Ra ),
a∈Σ a∈Σ
and thus
1 2 n
n = . p≤
nk k
Consequently, if k > n then the measurement errs with some nonzero probability.
As any measurement implementable by an LOCC protocol is separable, it follows that no
LOCC protocol can distinguish more than n maximally entangled states in X A ⊗ X B in the case
that dim(X A ) = dim(X B ) = n.

17.2.2 Undistinguishable sets of product states


It is reasonable to hypothesize that large sets of maximally entangled states are not LOCC distin-
guishable because they are highly entangled. However, it turns out that entanglement is not an
essential feature for this phenomenon. In fact, there exist orthogonal collections of product states
that are not perfectly distinguishable.
One example is the following orthonormal basis of CZ3 ⊗ CZ3 :
       
|1i ⊗ |1i |0i ⊗ |0i+|
√ 1i
2
| 2 i ⊗ |1i+|
√ 2i
2
|1i+|
√ 2 i ⊗ |0i
2
|0i+|
√ 1 i ⊗ |2i
2
       
|0i ⊗ |0i−|
√ 1i |2i ⊗ |1i−|
2
√ 2i |1i−|
√ 2 i ⊗ |0i
2
|0i−|
√ 1 i ⊗ |2i
2 2

A measurement with respect to this basis is an example of a measurement that is separable but
not LOCC, and this can also be translated into an example of an admissible and separable super-
operator that is not LOCC.
The proof that the above set is not perfectly LOCC distinguishable is very technical, and so a
reference will have to suffice in place of a proof:
C. H. Bennett, D. DiVincenzo, C. Fuchs, T. Mor, E. Rains, P. Shor, J. Smolin, and W. Wootters.
Quantum nonlocality without entanglement. Physical Review A, 59(2):1070–1091, 1999.
Another family of examples of product states (but not product bases) that cannot be distin-
guished by LOCC measurements comes from an unextendible product set. For instance, the set
discussed in the previous lecture cannot be distinguished by an LOCC measurement:
       
|0i ⊗ |0i−|
√ 1i
2
|0i−|
√ 1 i ⊗ |2i
2
|0i+|√1i+|2i
3
⊗ |0i+|√1i+|2i
3
   
|2i ⊗ |1i−|
√ 2i
2
|1i−|
√ 2 i ⊗ |0i
2

This fact is proved in the following paper:


D. DiVincenzo, T. Mor, P. Shor, J. Smolin and B. Terhal. Unextendible Product Bases, Uncom-
pletable Product Bases and Bound Entanglement. Communications in Mathematical Physics,
238(3): 379–410, 2003.
Lecture 17. LOCC and separable measurements 134

17.3 Any two orthogonal pure states can be distinguished

Finally, we will prove an interesting and fundamental result in this area, which is that any two
orthogonal pure states can always be distinguished by an LOCC measurement.
In order to prove this fact, we need a theorem known as the Toeplitz–Hausdorff Theorem,
which concerns the numerical range of an operator. The numerical range of an operator A ∈ L (X )
is the set N ( A) ⊆ C defined as follows:

N ( A) = {u∗ Au : u ∈ X , k u k = 1} .

This set is also sometimes called the field of values of A.


Theorem 17.3 (Toeplitz-Hausdorff). For any complex Euclidean space X and any operator A ∈ L (X ),
the set N ( A) is compact and convex.

Proof. The proof of compactness is straightforward. Specifically, the function f : X → C defined


by f (u) = u∗ Au is continuous, and the unit sphere S(X ) is compact. Continuous functions map
compact sets to compact sets, implying that N ( A) = f (S(X )) is compact.
The proof of convexity is the more difficult part of the proof. Let us fix some arbitrary choice
of α, β ∈ N ( A) and p ∈ [0, 1]. It is our goal to prove that pα + (1 − p) β ∈ N ( A). We will assume
that α 6= β, as the assertion is trivial in case α = β.
By the definition of the numerical range, we may choose unit vectors u, v ∈ X such that
u∗ Au = α and v∗ Av = β. It follows from the fact that α 6= β that the vectors u and v are lin-
early independent.
Next, define
−β 1
B= 1X + A
α−β α−β
so that u∗ Bu = 1 and v∗ Bv = 0. Let
1 1
X= ( B + B∗ ) and Y = ( B − B∗ ).
2 2i
Then B = X + iY, and both X and Y are Hermitian. It therefore follows that

u∗ Xu = 1, v∗ Xv = 0,
u∗ Yu = 0, v∗ Yv = 0.

Without loss of generality we may also assume u∗ Yv is purely imaginary (i.e., has real part equal
to 0), for otherwise v may be replaced by eiθ v for an appropriate choice of θ without changing any
of the previously observed properties.
As u and v are linearly independent, we have that tu + (1 − t)v is a nonzero vector for every
choice of t. Thus, for each t ∈ [0, 1] we may define

tu + (1 − t)v
z( t) = ,
k tu + (1 − t)v k
which is of course a unit vector. Because u∗ Yu = v∗ Yv = 0 and u∗ Yv is purely imaginary, we have
z(t)∗ Yz(t) = 0 for every t. Thus

t2 + 2t(1 − t)ℜ(v∗ Xu)


z(t)∗ Bz(t) = z(t)∗ Xz(t) = .
k tu + (1 − t)v k
Lecture 17. LOCC and separable measurements 135

This is a continuous real-valued function mapping 0 to 0 and 1 to 1. Consequently there must


exist some choice of t ∈ [0, 1] such that z(t)∗ Bz(t) = p. Let w = z(t) for such a value of t, so that
w∗ Bw = p. We have that w is a unit vector, and
 
β
w∗ Aw = (α − β) + w∗ Bw = β + p(α − β) = pα + (1 − p) β.
α−β
Thus we have shown that pα + (1 − p) β ∈ N ( A) as required.

Corollary 17.4. For any complex Euclidean space X and any operator A ∈ L (X ) satisfying Tr( A) = 0,
there exists an orthonormal basis { x1 , . . . , xn } of X for which xi∗ Axi = 0 for i = 1, . . . , n.
Proof. The proof is by induction on n = dim(X ), and the base case n = 1 is trivial.
Suppose that n ≥ 2. It is clear that λ1 ( A), . . . , λn ( A) ∈ N ( A), and thus 0 ∈ N ( A) because
n
1 1
0=
n
Tr( A) =
n ∑ λ i ( A ),
i =1

which is a convex combination of elements of N ( A). Therefore there exists a unit vector u ∈ X
such that u∗ Au = 0.
Now, let Y ⊆ X be the orthogonal complement of u in X , and let ΠY = 1 X − uu∗ be the
orthogonal projection onto Y . It holds that
Tr(ΠY AΠY ) = Tr( A) − u∗ Au = 0.
Moreover, because Im(ΠY AΠY ) ⊆ Y , we may regard ΠY AΠY as an element of L (Y ). By the
induction hypothesis, we therefore have that there exists an orthonormal basis {v1 , . . . , vn−1 } of Y
such that v∗i ΠY AΠY vi = 0 for i = 1, . . . , n − 1. It follows that {u, v1 , . . . , vn−1 } is an orthonormal
basis of X with the properties required by the statement of the corollary.

Now we are ready to return to the problem of distinguishing orthogonal states. Suppose that
x, y ∈ X A ⊗ X B
are orthogonal unit vectors. We wish to show that there exists an LOCC measurement that cor-
rectly distinguishes between x and y. Let X, Y ∈ L (X B , X A ) be operators satisfying x = vec( X )
and y = vec(Y ), so that the orthogonality of x and y is equivalent to Tr( X ∗ Y ) = 0. By Corol-
lary 17.4, we have that there exists an orthonormal basis {u1 , . . . , un } of X B with the property that
u∗i X ∗ Yui = 0 for i = 1, . . . , n.
Now, suppose that Bob measures his part of either xx∗ or yy∗ with respect to the orthonormal
basis {u1 , . . . , un } of X B and transmits the result of the measurement to Alice. Conditioned on Bob
obtaining the outcome i, the (unnormalized) reduced state of Alice’s system becomes
(1 X A ⊗ uTi ) vec( X ) vec( X )∗ (1 X A ⊗ ui ) = Xui u∗i X ∗
in case the original state was xx∗ , and Yui u∗i Y ∗ in case the original state was yy∗ .
A necessary and sufficient condition for Alice to be able to correctly distinguish these two
states, given knowledge of i, is that
h Xui u∗i X ∗ , Yui u∗i Y ∗ i = 0.
This condition is equivalent to u∗i X ∗ Yui = 0 for each i = 1, . . . , n. The basis {u1 , . . . , un } was cho-
sen to satisfy this condition, which implies that Alice can correctly distinguish the two possibilities
without error.
Theory of Quantum Information Lecture Notes
John Watrous (Institute for Quantum Computing, University of Waterloo)

Lecture 18
Super-operator norms and distinguishability

This lecture is primarily concerned with the distinguishability of quantum operations. We will first
discuss some basic aspects of this notion, and then discuss norms on super-operators that are
closely related to it. In particular, we will define a super-operator norm called the diamond norm
(or completely bounded trace norm) that plays an analogous role for super-operator distinguishability
that the ordinary trace norm plays for density operator distinguishability.

18.1 Distinguishing between quantum operations

Recall from Lecture 3 that the trace norm has a close relationship to the optimal probability of
distinguishing between two quantum states. For instance, if a bit a ∈ {0, 1} is chosen uniformly at
random, and a register X is prepared in the reduced state ρa for a fixed choice of density operators
ρ0 , ρ1 ∈ D (X ), then the optimal probability to correctly guess a by means of a measurement on X
alone is
1 1
+ k ρ0 − ρ1 k1 .
2 4
One may consider a similar situation involving admissible super-operators rather than density
operators. Specifically, let us suppose that Φ0 , Φ1 ∈ T (X , Y ) are admissible super-operators, and
that a bit a ∈ {0, 1} is chosen uniformly at random. A single evaluation of the operation Φa is
made available, and the goal is to determine a with maximal probability.
One approach to this problem is to choose ξ ∈ D (X ) in order to maximize the quantity
k ρ0 − ρ1 k1 for ρ0 = Φ0 (ξ ) and ρ1 = Φ1 (ξ ). If a register X is prepared so that its reduced state
is ξ, and X is input to the given operation Φa , then the output is a register Y that can be measured
using an optimal measurement to distinguish the two possible outputs ρ0 and ρ1 .
This, however, is not the most general approach. More generally, one may include an auxiliary
register Z in the process—meaning that a pair of registers (X, Z) is prepared in some reduced state
ξ ∈ D (X ⊗ Z), and the given super-operator Φa is applied to X. This results in a pair of registers
(Y, Z) that will be in either of the states ρ0 = (Φ0 ⊗ 1L(Z) )(ξ ) or ρ1 = (Φ1 ⊗ 1L(Z) )(ξ ), which may
then be distinguished by a measurement on Y ⊗ Z . Indeed, this more general approach can give
a sometimes striking improvement in the probability to distinguish Φ0 and Φ1 , as the following
example illustrates.

Example 18.1. Let X be a complex Euclidean space and let n = dim(X ). Define admissible super-
operators Φ0 , Φ1 ∈ T (X ) as follows:

1
Φ0 ( X ) = ((Tr X )1 X + X T ) ,
n+1
1
Φ1 ( X ) = ((Tr X )1 X − X T ) .
n−1

136
Lecture 18. Super-operator norms and distinguishability 137

It is clear from the definitions that both Φ0 and Φ1 are trace-preserving, while complete positivity
follows from a calculation of the Choi-Jamiołkowski representations of these super-operators:
1 2
J ( Φ0 ) = (1 X ⊗X + W ) = S,
n+1 n+1
1 2
J ( Φ1 ) = (1 X ⊗X − W ) = R,
n−1 n−1
where W ∈ L (X ⊗ X ) is the swap operator and R, S ∈ L (X ⊗ X ) are the projections onto the
symmetric and antisymmetric subspaces of X ⊗ X , respectively.
Now, for any choice of a density operator ξ ∈ D (X ) we have
   
1 1 1 1
Φ0 ( ξ ) − Φ1 ( ξ ) = − 1X + + ξT
n+1 n−1 n+1 n−1
2 2n T
=− 2 1X + 2 ξ .
n −1 n −1
The trace norm of such an operator is maximized when ξ has rank 1, in which case the value of
the trace norm is
2n − 2 2 4
2
+ ( n − 1) 2 = .
n −1 n −1 n+1
Consequently
4
k Φ0 (ξ ) − Φ1 (ξ )k1 ≤
n+1
for all ξ ∈ D (X ). For large n, this quantity is small, which is not surprising because both Φ0 (ξ )
and Φ1 (ξ ) are almost completely mixed for any choice of ξ ∈ D (X ).
However, suppose that we prepare two registers (X1 , X2 ) in the maximally entangled state
1
τ= vec(1 X ) vec(1 X )∗ ∈ D (X ⊗ X ) .
n
We have
1
(Φa ⊗ 1L(X ) )(τ ) = J (Φa )
n
for a ∈ {0, 1}, and therefore
2 2
(Φ0 ⊗ 1L(X ) )(τ ) = S and (Φ1 ⊗ 1L(X ) )(τ ) = R.
n ( n + 1) n ( n − 1)
Because R and S are orthogonal, we have

(Φ0 ⊗ 1L(X ) )(τ ) − (Φ1 ⊗ 1L(X ) )(τ ) = 2,

1

meaning that the states (Φ0 ⊗ 1L(X ) )(τ ) and (Φ1 ⊗ 1L(X ) )(τ ), and therefore the super-operators
Φ0 and Φ1 , can be distinguished perfectly.
By applying Φ0 or Φ1 to part of a larger system, we have therefore completely eliminated the
large error that was present when the limited approach of choosing ξ ∈ D (X ) as an input to Φ0
and Φ1 was considered.
The previous example makes clear that auxiliary systems must be taken into account if we
are to understand the optimal probability with which admissible super-operators can be distin-
guished.
Lecture 18. Super-operator norms and distinguishability 138

18.2 Definition and properties of the diamond norm

With the discussion from the previous section in mind, we will now discuss two super-operator
norms: the super-operator trace norm and the diamond norm. The precise relationship these norms
have to the notion of super-operator distinguishability will be made clear later in the section.

18.2.1 The super-operator trace norm


We will begin with the super-operator trace norm. For any choice of complex Euclidean spaces X
and Y , and for a given super-operator Φ ∈ T (X , Y ), this norm is defined as

k Φ k1 = max {k Φ( X )k1 : X ∈ L (X ) , k X k1 ≤ 1} .

The super-operator trace norm is an example of an induced super-operator norm—in general,


one may consider the norm obtained by replacing the two trace norms in this definition with
any other choice of norms that are defined on L (X ) and L (Y ). The use of the maximum, rather
than the supremum, is justified in this context by the observation that every norm defined on a
complex Euclidean space is continuous and its corresponding unit ball is compact. The fact that
these norms really are norms is straightforward to verify.
Let us note two simple properties of the super-operator trace norm that will be useful for our
purposes. First, for every choice of complex Euclidean spaces X , Y , and Z , and super-operators
Ψ ∈ T (X , Y ) and Φ ∈ T (Y , Z), it holds that

k ΦΨ k 1 ≤ k Φ k1 k Ψ k1 . (18.1)

This is a general property of every induced super-operator norm, provided that the same norm on
L (Y ) is taken for both super-operator norms.
Second, the super-operator trace norm can be expressed as follows for a given super-operator
Φ ∈ T (X , Y ):
k Φ k1 = max{k Φ(uv∗ )k1 : u, v ∈ S(X )}. (18.2)
This is because the trace norm (like every other norm) is a convex function, and the unit ball with
respect to the trace norm can be represented as

{X ∈ L (X ) : k X k1 ≤ 1} = conv{uv∗ : u, v ∈ S(X )}.

18.2.2 The diamond norm


For any choice of complex Euclidean spaces X and Y , and a super-operator Φ ∈ T (X , Y ), we
define the diamond norm of Φ to be

k Φ k⋄ = Φ ⊗ 1L(X ) .

1

This norm is also called the completely bounded trace norm, as well as some other variants of this
name. The principle behind its definition is that tensoring Φ with the identity super-operator
has the effect of stabilizing its super-operator trace norm. This sort of stabilization endows the
diamond norm with many nice properties, and allows it to be used in contexts where the super-
operator trace norm is not sufficient. It is also clear that this sort of stabilization is related to the
phenomenon illustrated in the previous section, where tensoring with the identity super-operator
had the effect of amplifying the difference between admissible super-operators.
Lecture 18. Super-operator norms and distinguishability 139

The first thing we must do is to explain why the definition of the diamond norm tensors Φ with
the identity super-operator on X , rather than some other space. The answer is that X has suffi-
ciently large dimension—and tensoring with any complex Euclidean space with larger dimension
would not change anything. The following lemma allows us to prove this fact, and includes a
special case that will be useful later in the lecture.

Lemma 18.2. Let Φ ∈ T (X , Y ), and let Z be a complex Euclidean space. For every choice of unit vectors
u, v ∈ X ⊗ Z there exist unit vectors x, y ∈ X ⊗ X such that

(Φ ⊗ 1L(Z) )(uv∗ ) = (Φ ⊗ 1L(X ) )( xy∗ ) .

1 1

In case u = v, we may in addition take x = y.

Proof. The lemma is straightforward when dim(Z) ≤ dim(X ); for any choice of a linear isometry
U ∈ U (Z , X ) the vectors x = (1 X ⊗ U )u and y = (1 X ⊗ U )v satisfy the required conditions. Let
us therefore consider the case where dim(Z) > dim(X ) = n.
Consider the vector u ∈ X ⊗ Z . Given that dim(X ) ≤ dim(Z), there must exist an orthogonal
projection Π ∈ L (Z) having rank at most n = dim(X ) that satisfies u = (1 X ⊗ Π)u and therefore
there must exist a linear isometry U ∈ U (X , Z) such that u = (1 X ⊗ UU ∗ )u. Likewise there must
exist a linear isometry V ∈ U (X , Z) such that v = (1 X ⊗ VV ∗ )v. Such linear isometries U and V
can be obtained from the singular value decompositions of u and v.
Let x = (1 ⊗ U ∗ )u and y = (1 ⊗ V ∗ )v. Notice that we therefore have u = (1 ⊗ U ) x and
v = (1 ⊗ V )y, which shows that x and y are unit vectors. Moreover, given that k U k = k V k = 1,
we have

(Φ ⊗ 1L(Z) )(uv∗ ) = (Φ ⊗ 1L(Z) )((1 ⊗ U ) xy∗ (1 ⊗ V ∗ ))

1
1
∗ ∗
= (1 ⊗ U )(Φ ⊗ 1L(X ) )( xy )(1 ⊗ V )

1

= (Φ ⊗ 1L(X ) )( xy )

1

as required. In case u = v, we may take U = V, implying that x = y.

The following theorem, which gives a precise statement of the fact we wish to prove, is imme-
diate from Lemma 18.2 together with (18.2).

Theorem 18.3. Let X and Y be complex Euclidean spaces, Φ ∈ T (X , Y ) be any super-operator, and let
Z be any complex Euclidean space for which dim(Z) ≥ dim(X ). Then

Φ ⊗ 1L(Z) = k Φ k ⋄ .

1

One simple but important consequence of this theorem is that the diamond norm is multiplica-
tive with respect to tensor products.

Theorem 18.4. Let Φ1 ∈ T (X1 , Y1 ) and Φ2 ∈ T (X2 , Y2 ) be arbitrary super-operators. Then

k Φ1 ⊗ Φ2 k ⋄ = k Φ1 k ⋄ k Φ2 k ⋄ .
Lecture 18. Super-operator norms and distinguishability 140

Proof. To make the proof more clear, let us take W1 and W2 to be complex Euclidean spaces with
dim(W1 ) = dim(X1 ) and dim(W2 ) = dim(X2 ), so that

k Φ1 k⋄ = Φ1 ⊗ 1L(W1 ) ,

1
k Φ2 k⋄ = Φ2 ⊗ 1L(W2 ) ,

1

k Φ1 ⊗ Φ2 k⋄ = Φ1 ⊗ Φ2 ⊗ 1L(W1 ⊗W2 ) .

1

We have

k Φ1 ⊗ Φ2 k⋄ = Φ1 ⊗ Φ2 ⊗ 1L(W1 ⊗W2 )

 1
 
= Φ1 ⊗ 1L(Y2 ) ⊗ 1L(W1 ⊗W2 ) 1L(X1 ) ⊗ Φ2 ⊗ 1L(W1 ⊗W2 )

1
   
≤ Φ1 ⊗ 1L(Y2 ) ⊗ 1L(W1 ⊗W2 ) 1L(X1 ) ⊗ Φ2 ⊗ 1L(W1 ⊗W2 )

1 1
= k Φ1 k ⋄ k Φ2 k ⋄ ,

where the last inequality follows by Theorem 18.3.


For the reverse inequality, choose operators X1 ∈ L (X1 ⊗ W1 ) and X2 ∈ L (X2 ⊗ W2 ) such
that k X1 k1 = k X2 k1 = 1, k Φ1 k⋄ = k(Φ1 ⊗ 1 W1 )( X1 )k1 , and k Φ2 k⋄ = k(Φ2 ⊗ 1 W2 )( X2 )k1 . Then
k X1 ⊗ X2 k1 = 1, and so

k Φ1 ⊗ Φ2 k⋄ = Φ1 ⊗ Φ2 ⊗ 1L(W1 ⊗W2 )

1
 
≥ Φ1 ⊗ 1L(W1 ) ⊗ Φ2 ⊗ 1L(W2 ) ( X1 ⊗ X2 )

 1
  
= Φ1 ⊗ 1L(W1 ) ( X1 ) ⊗ Φ2 ⊗ 1L(W2 ) ( X2 )

1
   
= Φ1 ⊗ 1L(W1 ) ( X1 ) Φ2 ⊗ 1L(W2 ) ( X2 )

1 1
= k Φ1 k ⋄ k Φ2 k ⋄

as required.

The final fact that we will establish about the diamond norm in this lecture concerns input
operators for which the value of the diamond norm is achieved. In particular, we will establish a
simple condition on super-operators Φ ∈ T (X , Y ) for which there exists a unit vector u ∈ X ⊗ X
for which
k Φ k⋄ = (Φ ⊗ 1L(X ) )(uu∗ ) .

1
This fact will give us the final piece we need to connect the diamond norm to the distinguishability
problem discussed in the beginning of the lecture.

Theorem 18.5. Suppose that Φ ∈ T (X , Y ) satisfies Φ( X ∗ ) = (Φ( X ))∗ for every X ∈ L (X ). Then
n o
k Φ k⋄ = max (Φ ⊗ 1L(X ) )( xx∗ ) : x ∈ S(X ⊗ X ) .

1

Remark 18.6. The following conditions are easily shown to be equivalent for a given super-
operator Φ ∈ T (X , Y ):
Lecture 18. Super-operator norms and distinguishability 141

1. Φ( X ∗ ) = (Φ( X ))∗ for every X ∈ L (X ).


2. J (Φ) is Hermitian.
3. Φ = Φ0 − Φ1 for completely positive super-operators Φ0 , Φ1 ∈ T (X , Y ).
4. Φ preserves Hermiticity, meaning that Φ( H ) ∈ Herm (Y ) for every choice of H ∈ Herm (X ).
The third characterization in this list is particularly helpful when applying this theorem.
Proof of Theorem 18.5. Let X ∈ L (X ⊗ X ) be an operator with k X k1 = 1 that satisfies

k Φ k⋄ = (Φ ⊗ 1L(X ) )( X ) .

1

Let Z = C {0,1} and let


1 1
Y= X ⊗ E0,1 + X ∗ ⊗ E1,0 ∈ Herm (X ⊗ X ⊗ Z) .
2 2
Then k Y k = k X k = 1 and
1
(Φ ⊗ 1L(X ⊗Z) )(Y ) = (Φ ⊗ 1L(X ) )( X ) ⊗ E0,1 + (Φ ⊗ 1L(X ) )( X ∗ ) ⊗ E1,0

1 2 1
1
= (Φ ⊗ 1L(X ) )( X ) ⊗ E0,1 + ((Φ ⊗ 1L(X ) )( X ))∗ ⊗ E1,0

2 1

= (Φ ⊗ 1L(X ) )( X )

1
= k Φ k⋄ .
Now, because Y is Hermitian, we may consider a spectral decomposition
Y= ∑ λ j u j u∗j .
j

By the triangle inequality we have




k Φ k⋄ = (Φ ⊗ 1L(X ⊗Z) )(Y ) ≤ ∑ λ j (Φ ⊗ 1L(X ⊗Z) )(u j u j ) .

1
j

As k Y k = 1, we have ∑ j λ j = 1, and thus

(Φ ⊗ 1L(X ⊗Z) )(u j u∗j ) ≥ k Φ k⋄

1
for some index j. We may therefore apply Lemma 18.2 to obtain a unit vector x ∈ X ⊗ X such that

(Φ ⊗ 1L(X ) )( xx∗ ) = (Φ ⊗ 1L(X ⊗Z) )(u j u∗j ) ≥ k Φ k⋄ .

1 1
Thus n o
k Φ k⋄ ≤ max (Φ ⊗ 1L(X ) )( xx∗ ) : x ∈ S(X ⊗ X ) .

1
It is clear that the maximum cannot exceed k Φ k⋄ , and so the proof is complete.
Let us note that at this point we have established the relationship that we wanted, which is
that the maximum probability with which two admissible super-operators can be distinguished
in the sense described in the beginning of the lecture is
1 1
+ k Φ0 − Φ1 k ⋄ .
2 4
We also see that it is sufficient to restrict the space Z associated with the auxiliary system used to
help distinguish admissible super-operators Φ0 , Φ1 ∈ T (X , Y ) to have dimension equal to X .
Lecture 18. Super-operator norms and distinguishability 142

18.3 Distinguishing unitary super-operators

We started the lecture with an example of two admissible super-operators Φ0 , Φ1 ∈ T (X , Y ) for


which an auxiliary system was necessary to optimally distinguish Φ0 and Φ1 . Let us conclude the
lecture by observing that this phenomenon does not arise for super-operators induced by linear
isometries.

Theorem 18.7. Let X and Y be complex Euclidean spaces, let U, V ∈ U (X , Y ) be linear isometries, and
suppose
Φ0 ( X ) = UXU ∗ and Φ1 ( X ) = VXV ∗
for all X ∈ L (X ). Then there exists a unit vector u ∈ X such that

k Φ0 − Φ1 k⋄ = k Φ0 (uu∗ ) − Φ1 (uu∗ )k1 .

Proof. Recall that the numerical range of an operator A ∈ L (X ) is defined as

N ( A) = {u∗ Au : u ∈ S(X )} .

Let us define ν( A) to be the smallest absolute value of any element of N ( A):

ν( A) = min {| α | : α ∈ N ( A)} .

Now, for any choice of a unit vector u ∈ X we have


q
k Φ0 (uu ) − Φ1 (uu )k1 = k Uuu U − Vuu V k1 = 2 1 − | u∗ U ∗ Vu |2 .
∗ ∗ ∗ ∗ ∗ ∗

Maximizing this quantity over u ∈ S(X ) gives


q
max{k Φ0 (uu∗ ) − Φ1 (uu∗ )k1 : u ∈ S(X )} = 2 1 − ν (U ∗ V )2 .

Along similar lines, we have


q
k Φ0 − Φ1 k⋄ = max (Φ0 ⊗ 1L(X ) )(uu∗ ) − (Φ1 ⊗ 1L(X ) )(uu∗ ) = 2 1 − ν(U ∗ V ⊗ 1 X )2 ,

u ∈S(X ⊗X ) 1

where here we have also made use of Theorem 18.5.


To complete the proof it therefore suffices to prove that

ν( A ⊗ 1 X ) = ν( A )

for every operator A ∈ L (X ). It is obvious that ν( A ⊗ 1 X ) ≤ ν( A), so we just need to prove


ν( A ⊗ 1 X ) ≥ ν( A). Let u ∈ X ⊗ X be any unit vector, and let
r
u= ∑ pj xj ⊗ yj
p
j =1

be a Schmidt decomposition of u. Then


r
u∗ ( A ⊗ 1 X )u = ∑ pj x∗j Axj .
j =1
Lecture 18. Super-operator norms and distinguishability 143

For each j we have u∗j Au j ∈ N ( A), and therefore

r
u∗ ( A ⊗ 1 X )u = ∑ pj x∗j Axj ∈ N ( A)
j =1

by the Toeplitz–Hausdorff theorem. Consequently

| u ∗ ( A ⊗ 1 X ) u | ≥ ν( A )

for every u ∈ S(X ⊗ X ), which completes the proof.


Theory of Quantum Information Lecture Notes
John Watrous (Institute for Quantum Computing, University of Waterloo)

Lecture 19
Alternate characterizations of the diamond norm

In the previous lecture we discussed the diamond norm, its connection to the problem of distin-
guishing admissible super-operators, and some of its basic properties. Recall that for X and Y
complex Euclidean spaces and Φ ∈ T (X , Y ) a super-operator, we have
n o
k Φ k⋄ = max (Φ ⊗ 1L(X ) )(uv∗ ) : u, v ∈ S(X ⊗ X ) .

1

In this lecture we will discuss two alternate ways in which this norm may be characterized.

19.1 Maximum Output Fidelity Characterization

Suppose X and Y are complex Euclidean spaces and Φ, Ψ ∈ T (X , Y ) are completely positive (but
not necessarily trace-preserving) super-operators. Let us define the maximum output fidelity of Φ
and Ψ as
Fmax (Φ, Ψ) = max {F(Φ(ρ), Ψ(ξ )) : ρ, ξ ∈ D (X )} .
In other words, this is the maximum fidelity between an output of Φ and an output of Ψ, ranging
over all pairs of density operator inputs. Our first characterization of the diamond norm is based
on the maximum output fidelity, and is given by the following theorem.

Theorem 19.1. Let X and Y be complex Euclidean spaces and let Φ ∈ T (X , Y ) be an arbitrary super-
operator. Suppose further that Z is a complex Euclidean space and A, B ∈ L (X , Y ⊗ Z) satisfy

Φ( X ) = TrZ ( AXB∗ )

for all X ∈ L (X ). Then, for completely positive super-operators Ψ A , ΨB ∈ T (X , Z) defined as

Ψ A ( X ) = TrY ( AX A∗ ) ,
ΨB ( X ) = TrY ( BXB∗ ) ,

for all X ∈ L (X ), we have k Φ k⋄ = Fmax (Ψ A , ΨB ).

Remark 19.2. Note that it is the space Y that is traced-out in the definition of Ψ A and ΨB , rather
than the space Z .

To prove this theorem, we will begin with the following lemma that establishes a simple rela-
tionship between the fidelity and the trace norm. (This appeared as a problem on problem set 1.)

Lemma 19.3. Let X and Y be complex Euclidean spaces and let P, Q ∈ Pos (X ). Assume that u, v ∈ X ⊗
Y are arbitrary purifications of P and Q, respectively, meaning that TrY (uu∗ ) = P and TrY (vv∗ ) = Q.
Then
F( P, Q) = k TrX (uv∗ )k1 .

144
Lecture 19. Alternate characterizations of the diamond norm 145

Proof. For any choice of Y ∈ L (Y ) we have

k Y k1 = max {|hU, Y i| : U ∈ U (Y )} ,

and therefore

k TrX (uv∗ )k1 = max {|hU, TrX (uv∗ )i| : U ∈ U (Y )}


= max {| Tr ((1 X ⊗ U ∗ ) uv∗ )| : U ∈ U (Y )}
= max {|hv, (1 X ⊗ U ∗ ) ui| : U ∈ U (Y )} .

As U ranges over all possible unitary operators on Y , the vector (1 X ⊗ U ∗ )u ranges over all pu-
rifications of P in X ⊗ Y . The above quantity is therefore equal to F( P, Q) by Uhlmann’s Theo-
rem.

Proof of Theorem 19.1. For clarity, let us take W to be a complex Euclidean space with the same
dimension as X , so that
n o
k Φ k ⋄ = max (Φ ⊗ 1L(W ) )(uv∗ ) : u, v ∈ S(X ⊗ W )

1
= max {k TrZ [( A ⊗ IW ) uv∗ ( B∗ ⊗ IW )]k1 : u, v ∈ S(X ⊗ W )} .

For any choice of vectors u, v ∈ X ⊗ W we have

TrY ⊗W [( A ⊗ 1 W )uu∗ ( A∗ ⊗ 1 W )] = Ψ A (TrW (uu∗ )) ,


TrY ⊗W [( B ⊗ 1 W )vv∗ ( B∗ ⊗ 1 W )] = ΨB (TrW (vv∗ )) ,

and therefore by Lemma 19.3 it follows that

k TrZ [( A ⊗ 1 W )uv∗ ( B∗ ⊗ 1 W )]k1 = F(Ψ A (TrW uu∗ ), ΨB (TrW vv∗ )).

Consequently

k Φ k⋄ = max {F (Ψ A (TrW (uu∗ )), ΨB (TrW (vv∗ ))) : u, v ∈ S(X ⊗ W )}


= max {F(Ψ A (ρ), ΨB (ξ )) : ρ, ξ ∈ D (X )}
= Fmax (Ψ A , ΨB )

as required.

The following corollary follows immediately from this characterization along with the fact that
the diamond norm is multiplicative with respect to tensor products.

Corollary 19.4. Let Φ1 , Ψ1 ∈ T (X1 , Y1 ) and Φ2 , Ψ2 ∈ T (X2 , Y2 ) be completely positive. Then

Fmax (Φ1 ⊗ Φ2 , Ψ1 ⊗ Ψ2 ) = Fmax (Φ1 , Ψ1 ) · Fmax (Φ2 , Ψ2 ).

This is a simple but not obvious fact—it says that the maximum fidelity between the outputs of
any two completely positive product super-operators is achieved for product state inputs. Several
other quantities of interest based on quantum operations fail to respect tensor products in this
way.
Lecture 19. Alternate characterizations of the diamond norm 146

19.2 Spectral norm characterization

The second characterization of the diamond norm that we will consider will be more difficult
to establish than the first. Consider any super-operator Φ ∈ T (X , Y ) for complex Euclidean
spaces X and Y . For a given choice of a complex Euclidean space Z , we have that there exists a
Stinespring representation
Φ( X ) = TrZ ( AXB∗ )
for Φ if and only if dim(Z) ≥ rank( J (Φ)). Under the assumption that dim(Z) ≥ rank( J (Φ)), we
may therefore consider the non-empty set of pairs ( A, B) that represent Φ in this way:
SΦ , {( A, B) : A, B ∈ L (X , Y ⊗ Z) , Φ( X ) = TrZ ( AXB∗ ) for all X ∈ L (X )} .
The characterization of the diamond norm that is established in this section concerns the spectral
norm of the operators in this set, and is given by the following theorem.
Theorem 19.5. Let X and Y be complex Euclidean spaces, let Φ ∈ T (X , Y ), and let Z be a complex
Euclidean space with dimension at least rank( J (Φ)). Then for SΦ as defined above we have
k Φ k ⋄ = inf {k A k k B k : ( A, B) ∈ SΦ } .
It will require some work to prove this theorem, but in the process we will learn something
interesting about the maximum fidelity between compact, convex sets of positive semidefinite
operators. This fact generalizes the characterization of the fidelity function given by Alberti’s
theorem, which we discussed in Lecture 4.

19.2.1 A theorem about fidelity and convex sets


Recall from Lecture 4 that Alberti’s theorem states that for any complex Euclidean space X and
any choice of P, Q ∈ Pos (X ), we have

(F( P, Q))2 = inf h R, Pi h R−1 , Qi.


R∈Pd(X )

The generalization we require is as follows.


Theorem 19.6. Let X be a complex Euclidean space and let A, B ⊂ Pos (X ) be nonempty, compact and
convex. Then
max (F( P, Q))2 = inf max h R, Pi h R−1 , Qi.
P ∈A R∈Pd(X ) P ∈A
Q ∈B Q ∈B

The interesting aspect of this theorem is the ordering of the infimum and maximum—for if they
were reversed, the theorem would follow trivially from Alberti’s theorem. The assumption that
allows for this reversal is, of course, the convexity of A and B .
To prove Theorem 19.6, we will start with the following lemma. This lemma concerns only
convex sets of density operators, rather than arbitrary positive semidefinite operators, but this
will be sufficient for our needs (and makes for a simpler technical statement).
Lemma 19.7. Let A, B ⊆ D (X ) be closed and convex sets of density operators. Then A and B are disjoint
if and only if there exists a Hermitian operator H ∈ Herm (X ) and a positive real number δ > 0 such that
D ED E
etH , ρ e−tH , ξ < 1

for all ρ ∈ A, ξ ∈ B , and t ∈ (0, δ).


Lecture 19. Alternate characterizations of the diamond norm 147

Proof. For any density operator σ ∈ D (X ), Alberti’s theorem implies that


D ED E
etH , σ e−tH , σ ≥ (F(σ, σ)) 2 = 1

for all choices of t ∈ R and H ∈ Herm (X ). It is therefore clear that if H ∈ Herm (X ), δ > 0, and
D ED E
etH , ρ e−tH , ξ < 1

for all ρ ∈ A, ξ ∈ B , and t ∈ (0, δ), then A and B must be disjoint.


Assume, on the other hand, that A and B are disjoint. From this assumption, and using the
convexity of A and B , we know that there exists a Hermitian operator H, which we may scale so
that k H k ≤ 1, and a positive real number α > 0, such that

h H, ξ − ρi > α
for all ρ ∈ A and ξ ∈ B . Let us fix such a choice of H and α for the rest of the proof.
Now consider the Taylor series expansion of the exponential function. As k H k ≤ 1, we have
that

−tH
e − (1 − tH ) ≤ t2 , etH − (1 + tH ) ≤ t2 , e−tH ≤ 1 + 2t, and etH ≤ 1 + 2t

for t ∈ [0, 1]. (These bounds are not tight, but are simple to state and are sufficient for our needs.)
Using the fact that A and B are sets of density operators, we may therefore conclude that for all
t ∈ [0, 1] we have D ED E
etH , ρ e−tH , ξ = 1 − t h H, ξ − ρi + ct2 ,
for some constant c > 0. (We may choose c = 7, for instance.) For a sufficiently small choice of
δ > 0, we therefore have D ED E
etH , ρ ξ, e−tH ≤ 1 − αt + ct2 < 1

for all t ∈ (0, δ).

Proof of Theorem 19.6. We will prove the equality by proving inequalities in both directions. One
direction is easy, because the infimum of a maximum is at least the maximum of the infimum, i.e.,

inf max h R, Pi h R−1 , Qi ≥ max inf h R, Pi h R−1 , Qi = max (F( P, Q))2 ,


R∈Pd(X ) P ∈A P ∈A R∈Pd(X ) P ∈A
Q ∈B Q ∈B Q ∈B

where the equality is by Alberti’s theorem.


For the other direction, let us choose an arbitrary real number ε > 0, and consider the quantity

inf max h R, P + ε1 i h R−1 , Q + ε1 i. (19.1)


R∈Pd(X ) P ∈A
Q ∈B

Later we will take the limit as ε goes to 0, but for now we will view ε as being fixed. The reason
for adding ε1 to both P and Q is that it guarantees that the infimum in the expression is achieved
for some R.
To see this, it suffices to argue that the infimum may without loss of generality be restricted to
a compact set. In particular, consider the compact set K defined as

K = { R ∈ Pd (X ) : 1 X /K ≤ R ≤ K 1 X } ,
Lecture 19. Alternate characterizations of the diamond norm 148

for some choice of a positive real number K to be chosen shortly. Now, there is clearly no change in
the infimum if we restrict it to operators R ∈ Pd (X ) that satisfy Tr( R) = Tr( R−1 ), given that the
expression (19.2) is invariant under positive scaling of R. There is also no change in the infimum
if we restrict R further to satisfy

1
Tr( R) Tr( R−1 ) ≤ max Tr( P + ε1 ) Tr( Q + ε1 ),
ε2 P∈A
Q ∈B

for any R that does not satisfy this bound will surely not produce a value for expression (19.1) that
is smaller than the choice R = 1. Any R that satisfies these properties is certainly contained K for
the choice
1
K = 2 max Tr( P + ε1 ) Tr( Q + ε1 ),
ε P∈A
Q ∈B

and so we have

inf max h R, P + ε1 i h R−1 , Q + ε1 i = inf max h R, P + ε1 i h R−1 , Q + ε1 i.


R∈Pd(X ) P ∈A R∈K P ∈A
Q ∈B Q ∈B

Given that
max h R, P + ε1 i h R−1 , Q + ε1 i (19.2)
P ∈A
Q ∈B

is a continuous function of R, we have that the infimum is achieved for some choice of R ∈ K. Let
us fix such a choice of R that achieves this infimum, and let us again rescale R so that

sup h R, P + ε1 i = suph R−1 , Q + ε1 i = α (19.3)


P ∈A Q ∈B

for some positive real number α.


Next, define sets C and D as follows:
 
1 1/2
C= R ( P + ε1 ) R1/2 : P ∈ A ∩ D (X ) ,
α
 
1 −1/2 −1/2
D= R ( Q + ε1 ) R : Q ∈ B ∩ D (X ) .
α

We have that C and D are compact and convex, given that the same is true of A and B . We also
have that C and D are non-empty given that we have scaled R in accordance with the equation
(19.3) above. Finally, we have by Lemma 19.7 that C and D have a nonempty intersection. If this
were not the case, then there would exist an operator S = R1/2 etH R1/2 > 0 for which

hS, P + ε1 i hS−1 , Q + ε1 i < α2

for all P ∈ A and Q ∈ B , contradicting the fact that minimum of the above expression (19.1) is
achieved by R and equals α2 .
Now, because C and D have a nonempty intersection, we may conclude that there exists P ∈ A
and Q ∈ B such that
Z = R1/2 ( P + ε1 ) R1/2 = R−1/2 ( Q + ε1 ) R−1/2
Lecture 19. Alternate characterizations of the diamond norm 149

is positive definite and has trace α. Therefore


 2
(F( P + ε1, Q + ε1 ))2 = F( R1/2 ( P + ε1 ) R1/2 , R−1/2 ( Q + ε1 ) R−1/2 )
= (F( Z, Z ))2
= (Tr( Z ))2
= α2
= inf max h R, P + ε1 i h R−1 , Q + ε1 i.
R∈Pd(X ) P ∈A
Q ∈B

We may now take the limit as ε goes to 0 to obtain


2
sup (F( P, Q)) ≥ inf max h R, Pi h R−1 , Qi
P ∈A R∈Pd(X ) P ∈A
Q ∈B Q ∈B

as required.

19.2.2 Proof of the characterization


Now that we have Theorem 19.6 in hand, we are prepared to prove Theorem 19.5. Let us first note
that, for any choice of operators A, B ∈ L (X , Y ⊗ Z) and unit vectors u, v ∈ X ⊗ W , we have

k TrZ [( A ⊗ 1 W )uv∗ ( B∗ ⊗ 1 W )]k1 ≤ k( A ⊗ 1 W )uv∗ ( B∗ ⊗ 1 W )k1


≤ k A ⊗ 1 W k k uv∗ k1 k B ⊗ 1 W k
= k Ak k Bk ,

which implies that k Φ k ⋄ ≤ k A k k B k for all ( A, B) ∈ SΦ , and consequently

k Φ k ⋄ ≤ inf {k A k k B k : ( A, B) ∈ SΦ } .

It remains to establish the reverse inequality.


First let us fix an arbitrary choice of ( A, B) ∈ SΦ . Notice that for every choice of an invertible
operator Z ∈ L (Z) we have

Φ( X ) = TrZ ( AXB∗ ) = TrZ ((1 Y ⊗ Z ) AXB∗ (1 Y ⊗ Z −1 )),

which implies that


((1 Y ⊗ Z ) A, (1 Y ⊗ Z −1 ) B) ∈ SΦ .
It will therefore suffice to prove that

k Φ k ⋄ = inf{k(1 Y ⊗ Z ) A k k(1 Y ⊗ Z −1 ) Bk : Z ∈ L (Z) invertible}.

Next, consider the two quantities k(1 Y ⊗ Z ) A k and 1 Y ⊗ Z −1 B . The spectral norm has


the property that k X k2 = k X ∗ X k for every operator X. Using this property we see that

k(1 Y ⊗ Z ) A k2 = k A∗ (1 Y ⊗ Z ∗ Z ) A k
= max {hρ, A∗ (1 Y ⊗ Z ∗ Z ) Ai : ρ ∈ D (X )}
= max {h Z ∗ Z, TrY ( AρA∗ )i : ρ ∈ D (X )} ,
Lecture 19. Alternate characterizations of the diamond norm 150

and likewise
k(1 Y ⊗ Z −1 ) Bk2 = max{h( Z ∗ Z )−1 , TrY ( BξB∗ )i : ξ ∈ D (X )}.
We now define two completely positive super-operators Ψ A , ΨB ∈ T (X , Z) precisely as we did
for the maximum fidelity characterization:

Ψ A ( X ) = TrY ( AX A∗ ) ,
ΨB ( X ) = TrY ( BXB∗ ) ,

for all X ∈ L (X ). It therefore remains to prove that

k Φ k2⋄ = inf max h P, Ψ A (ρ)i h P−1 , ΨB (ξ )i.


P ∈Pd(Z) ρ,ξ ∈D(X )

But given that the sets

A = {Ψ A (ρ) : ρ ∈ D (X )},
B = {ΨB (ξ ) : ξ ∈ D (X )}

are nonempty, compact and convex, we may apply Theorem 19.6 to obtain

inf max h P, Ψ A (ρ)i h P−1 , ΨB (ξ )i = max F(Ψ A (ρ), ΨB (ξ ))2 = k Φ k2⋄ ,


P ∈Pd(Z) ρ,ξ ∈D(X ) ρ,ξ ∈D(X )

which completes the proof.


Theory of Quantum Information Lecture Notes
John Watrous (Institute for Quantum Computing, University of Waterloo)

Lecture 20
Semidefinite programming

This lecture is concerned with semidefinite programming and a few (among many) of its applications
to quantum information. We will begin with some basic definitions, and then move on to a discus-
sion of the duality theory of semidefinite programs—which is a powerful tool for reasoning about
them. Finally, two applications are discussed: one concerns optimal measurements for identifying
elements of finite sets of quantum states and the other concerns the diamond norm.

20.1 Definitions

Let X and Y be complex Euclidean spaces. Recall that the following conditions are equivalent for
a given super-operator Φ ∈ T (X , Y ):
1. It holds that Φ( X ) ∈ Herm (Y ) for all Hermitian operators X ∈ Herm (X ).
2. It holds that (Φ( X ))∗ = Φ( X ∗ ) for every X ∈ L (X ).
3. It holds that J (Φ) ∈ Herm (Y ⊗ X ).
4. It holds that Φ = Φ0 − Φ1 for completely positive super-operators Φ0 , Φ1 ∈ T (X , Y ).
We will hereafter refer to super-operators that satisfy these conditions as Hermiticity preserving
super-operators.
Now, a semidefinite program over X and Y is specified by a triple (Φ, A, B) where Φ ∈ T (X , Y )
is a Hermiticity preserving super-operator and A ∈ Herm (X ) and B ∈ Herm (Y ) are Hermitian
operators. We associate with this triple the following two optimization problems:
Primal problem Dual problem
minimize: h A, X i maximize: h B, Y i
subject to: Φ( X ) ≥ B, subject to: Φ∗ (Y ) ≤ A,
X ∈ Pos (X ) . Y ∈ Pos (Y ) .

The primal and dual problems are closely related and turn out to say a great deal about one
another, as we will see shortly.
A few additional definitions will be helpful when discussing semidefinite programs. For a
given semidefinite program (Φ, A, B) as above, we define the primal feasible set P and the dual
feasible set D as

P = {X ∈ Pos (X ) : Φ( X ) ≥ B} ,
D = {Y ∈ Pos (Y ) : Φ∗ (Y ) ≤ A} .
We also say that operators X ∈ P and Y ∈ D are primal feasible and dual feasible, respectively,
and that (Φ, A, B) is primal feasible or dual feasible when it is the case that P 6= ∅ or D 6= ∅,
respectively.

151
Lecture 20. Semidefinite programming 152

The functions X 7→ h A, X i and Y 7→ h B, Y i are typically called the primal and dual objective
functions. We also define

α = inf{h A, X i : X ∈ P },
β = sup{h B, Y i : Y ∈ D},

which are the correct solutions to the primal and dual problems, or in other words the optimal
values of the objective functions taken over the feasible sets for the two problems. (If it is the case
that P = ∅ or D = ∅, then the above definitions should be interpreted as α = ∞ and β = −∞,
respectively.) It is important at this stage to note that we cannot always replace the infimum and
supremum by the minimum and maximum—in some cases, the values α and β might not actually
be achieved for any choice of X ∈ P and Y ∈ Q, respectively.

20.2 Semidefinite programming duality

One interpretation of the word “duality”, when used in ordinary life, is that it refers to pairs of
concepts that are directly opposite or complimentary in some way: like good and evil, or matter
and anti-matter. In semidefinite programming, duality refers to the special relationship between
the primal and dual problems associated with a given semidefinite program that is reminiscent of
this every-day interpretation.
We will start with a theorem that establishes the property of weak duality, which holds for all
semidefinite programs.
Theorem 20.1 (Weak duality for semidefinite programs). Let X and Y be complex Euclidean spaces,
let (Φ, A, B) be a semidefinite program over X and Y , and let α and β be as defined above. Then α ≥ β.
Proof. The theorem is obvious if either P = ∅ or D = ∅, so let us assume this is not the case. For
any choice of X ∈ P and Y ∈ D we have that X ≥ 0 and Φ∗ (Y ) ≤ A, and therefore

h A, X i ≥ hΦ∗ (Y ), X i .
Likewise, given that Y ≥ 0 and Φ( X ) ≥ B we have

hΦ( X ), Y i ≥ h B, Y i .
Thus
h A, X i ≥ hΦ∗ (Y ), X i = hΦ( X ), Y i ≥ h B, Y i . (20.1)
Taking the infimum over X ∈ P and the supremum over Y ∈ D establishes the theorem.

Notice that the above equation (20.1) expresses a simple but important fact, which is that every
dual feasible operator Y ∈ D provides a lower bound of h B, Y i on the value h A, X i that is achiev-
able over all choices of a primal feasible X ∈ P ; and likewise every primal feasible operator X ∈ P
provides an upper bound of h A, X i on the value h B, Y i that is achievable over all choices of a dual
feasible Y ∈ D .
Now, it may not always be the case that α = β for a given semidefinite program (Φ, A, B). One
can easily invent an example where P = ∅ = D , which gives α = ∞ and β = −∞; but this is
just a triviality based on our choice of how to interpret this case. But even when both P and D are
nonempty, it may still be that α 6= β. However, in most cases that come up naturally, it will be the
case that α = β; a situation called strong duality. We will now establish some specific conditions
under which this happens.
Lecture 20. Semidefinite programming 153

Lemma 20.2. The following two implications hold for a given semidefinite program (Φ, A, B) over com-
plex Euclidean spaces X and Y . The two implications are symmetric with respect to the primal and dual
problems associated with (Φ, A, B).
1. Suppose that α is finite (i.e., neither ∞ nor −∞) and that the set

{(h A, X i , Φ( X ) − Q) : X ∈ Pos (X ) , Q ∈ Pos (Y )} ⊆ R ⊕ Herm (Y )

is closed. Then it holds that α = β, and moreover there exists a primal feasible operator X ∈ P such
that h A, X i = α.
2. Suppose that β is finite and that the set

{(h B, Y i , Φ∗ (Y ) + P) : Y ∈ Pos (Y ) , P ∈ Pos (X )} ⊆ R ⊕ Herm (X )

is closed. Then it holds that α = β, and moreover there exists a dual feasible operator Y ∈ D such that
h B, Y i = β.
Proof. The two implications are symmetric and proved in precisely the same way, so we will just
prove the first implication. For convenience, let us write

C = {(h A, X i , Φ( X ) − Q) : X ∈ Pos (X ) , Q ∈ Pos (Y )} .

Let us choose a value ε > 0. By the definition of α, it holds that the point (α − ε, B) is not
contained in the set C . Given that C is obviously convex, and is closed by assumption, we have
that there exists a hyperplane that (strongly) separates (α − ε, B) from C . To be more precise, there
exists a point (λ, Y ) ∈ R ⊕ Herm (Y ) such that

λ h A, X i + hY, Φ( X ) − Qi < λ(α − ε) + hY, Bi (20.2)

for all X ∈ Pos (X ) and Q ∈ Pos (Y ).


We will now derive a series of facts based on the inequality (20.2). First, we note that Y must be
positive semidefinite, for otherwise the left-hand-side of the inequality (20.2) could be made arbi-
trarily large by making an appropriate choice of Q. Next, let us choose X ∈ Pos (X ) to be primal
feasible, which is possible given the assumption that α is finite. Setting Q = 0 and rearranging the
inequality (20.2), we obtain

λ(h A, X i − α + ε) < hY, B − Φ( X )i .

Given that h A, X i − α + ε is positive and hY, B − Φ( X )i is non-positive for every primal feasible
X, we conclude that λ < 0. We may therefore rescale (λ, Y ) so that λ = −1 without changing
any of the observations we have made so far. After this rescaling, we may again set Q = 0 and
rearrange the inequality (20.2) to obtain

h X, Φ∗ (Y ) − Ai < hY, Bi − α + ε

for all X ∈ Pos (X ). The right-hand-side is bounded while X is free to range over all positive
semidefinite operators on the left-hand-side, which implies that Φ∗ (Y ) ≤ A. We have therefore
proved that Y is dual feasible. Setting X = 0, we conclude that hY, Bi > α − ε. Given that ε > 0
was chosen arbitrarily, we have that β ≥ α, from which it follows that α = β by weak duality.
It remains to prove that there exists a primal feasible X ∈ P such that h A, X i = α. This is
done by a similar argument to the one above, except that we consider the point (α, B) rather than
Lecture 20. Semidefinite programming 154

(α − ε, B). If it is the case that (α, B) ∈ C , then there is nothing more to do—for we must have that
a primal feasible X ∈ P exists for which h A, X i = α. If, however, it is not the case that (α, B) ∈ C ,
then may be separated from C by a hyperplane precisely as above. Similar reasoning establishes
Y ∈ Pos (Y ) and that λ < 0, after which we rescale to obtain a dual feasible Y ∈ D . This time,
however, Y must satisfy h B, Y i < α, which contradicts weak duality and completes the proof.

While the lemma we have just proved establishes basic conditions under which strong duality
holds for semidefinite programs, it is not very useful as a tool—because it is not obvious how
one checks that the sets described are closed for a given semidefinite program of interest. The
following theorem establishes a more useful and often easy-to-check condition (sometimes called
the Slater condition) under which strong duality holds.

Theorem 20.3. Let (Φ, A, B) be a semidefinite program over complex Euclidean spaces X and Y . Then the
following two implications, which are symmetric with respect to the primal and dual problems associated
with (Φ, A, B), hold.

1. (Strict dual feasibility.) Suppose that α is finite and that there exists an operator Y ∈ Pd (Y ) such that
Φ∗ (Y ) < A. Then α = β and there exists X ∈ P such that h A, X i = α.
2. (Strict primal feasibility.) Suppose that β is finite and that there exists an operator X ∈ Pd (X ) such
that Φ( X ) > B. Then α = β and there exists Y ∈ D such that h B, Y i = β.

Proof. The set of operators Pos (X ⊕ Y ) consists of all operators that may be expressed in block
form as  
P Z
(20.3)
Z∗ Q
for operators P ∈ Pos (X ), Q ∈ Pos (Y ), and Z ∈ L (Y , X ) ranging over a restricted set of opera-
tors that depends in some way on P and Q that does not concern us for the sake of this proof.
Now let us define a function φ : Herm (X ⊕ Y ) → R ⊕ Herm (Y ) as
 
P Z
φ = (h A, Pi , Φ( P) − Q).
Z∗ Q

Our goal is to prove that the assumptions of the theorem guarantee that the image of the set
Pos (X ⊕ Y ) under φ is closed.
Given that φ is linear and the set D (X ⊕ Y ) of density operators on X ⊕ Y is convex and
compact, it is the case that φ(D (X ⊕ Y )) is convex and compact as well. Moreover, it holds that
φ(Pos (X ⊕ Y )) is the convex cone generated by φ(D (X ⊕ Y )). This is a closed set provided that
0 is not contained in φ(D (X ⊕ Y )), and therefore it suffices to prove 0 6∈ φ(D (X ⊕ Y )).
Now, the assumptions of the theorem imply that the operator

A − Φ ∗ (Y ) 0
 
∈ Herm (X ⊕ Y )
0 Y

is positive definite. For any density operator of the above form (20.3) we have that

A − Φ ∗ (Y ) 0
   
P Z
0< , = h A, Pi − hY, Φ( P) − Qi .
0 Y Z∗ Q

It is therefore not the case that (h A, Pi , Φ( P) − Q) = 0, which completes the proof.


Lecture 20. Semidefinite programming 155

20.3 Optimal measurements

We will now consider an application of semidefinite programming duality to the problem of opti-
mally identifying sets of density operators. To be more precise, let us suppose that X is a complex
Euclidean space and
{( p( a), ρa ) : a ∈ Γ}
is an ensemble of density operators over X , meaning that Γ is a finite and nonempty set, p ∈ R Γ
is a probability vector, and ρa ∈ D (X ) is a density operator for each a ∈ Γ. We are interested in
measurements of the form
{ Pa : a ∈ Γ} ⊂ Pos (X ) ,
and specifically in the probability
∑ p(a) h Pa , ρa i (20.4)
a∈Γ
that such a measurement correctly identifies a sample from the above ensemble.
The main result that we will establish is a necessary and sufficient condition on a given mea-
surement { Pa : a ∈ Γ} for it to achieve the optimal probability of correctness, among all possible
measurements with outcome set Γ.
There is no simple formula known for the maximum value of the expression (20.4), over all
possible choices of the measurement { Pa : a ∈ Γ}. It can, however, be expressed as a semidefinite
program. Specifically, let Z = C Γ , let Y = Z ⊗ X , and consider the semidefinite program (Φ, A, B)
where Φ ∈ T (X , Y ), A ∈ Herm (X ), and B ∈ Herm (Y ) are defined as
Φ( X ) = 1 Z ⊗ X (for all X ∈ Herm (X )),
A = 1X ,
B= ∑ p(a)Ea,a ⊗ ρa .
a∈Γ

The primal and dual problems associated with this semidefinite program can be expressed as
follows.
Primal problem Dual problem
minimize: Tr( X ) maximize: h B, Y i
subject to: 1 Z ⊗ X ≥ B, subject to: TrZ (Y ) ≤ 1 X ,
X ∈ Pos (X ) . Y ∈ Pos (Z ⊗ X ) .
(Notice that the constraint 1 Z ⊗ X ≥ B in the primal problem may equivalently be expressed as
X ≥ p( a)ρa for all a ∈ Γ.)
Let us begin by verifying that the dual problem indeed expresses the optimal value of the
expression (20.4), optimized over all measurements of the form { Pa : a ∈ Γ}. First, for any
measurement { Pa : a ∈ Γ} ⊂ Pos (X ), it holds that the operator
Y= ∑ Ea,a ⊗ Pa
a∈Γ

is dual feasible, and the value of h B, Y i is equal to the value expressed in the above equation (20.4).
This implies that the optimal value β of the dual problem is at least as large as the optimal value
of the expression (20.4). On the other hand, for any dual feasible operator
Y= ∑ Ea,b ⊗ Xa,b
a,b ∈Γ
Lecture 20. Semidefinite programming 156

we have that Xa,a ∈ Pos (X ) and

∑ Xa,a = TrY (Y ) ≤ 1X .
a∈Γ

By taking !
1
Q=
|Γ|
1X − ∑ Xa,a
a∈Γ

and Pa = Xa,a + Q for each a ∈ Γ, we obtain a valid measurement { Pa : a ∈ Γ} on X for which the
expression (20.4) is at least as large as the optimal value β associated with the above dual problem.
Thus, the two quantities are equal as claimed.
It is straightforward to prove that strong duality holds for this semidefinite program, and that
the optimal value α = β is achieved in both the primal and dual problems. In particular, the values
α and β are clearly finite, and one may (for instance) take X = (1 + δ)1 X and Y = 1|− δ
Γ | 1 Z ⊗ 1 X for
any choice of δ ∈ (0, 1) in Theorem 20.3 to establish this claim.
Now, based on the above discussion, we are prepared to state and prove the following theorem
that establishes a necessary and sufficient condition on a given measurement for it to optimally
identify samples from a given ensemble.

Theorem 20.4. Let X be a complex Euclidean space, let {( p( a), ρa ) : a ∈ Γ} be an ensemble of density
operators on X , and let { Pa : a ∈ Γ} be a measurement on X . Then the measurement { Pa : a ∈ Γ} is
optimal for {( p( a), ρa ) : a ∈ Γ} if and only if the operator

∑ p(a)ρa Pa
a∈Γ

is positive semidefinite and satisfies


∑ p(a) Pa ρa ≥ p(b)ρb
a∈Γ

for every b ∈ Γ.

Proof. First, let


X= ∑ p(a) Pa ρa
a∈Γ

and assume that X is positive semidefinite and satisfies X ≥ p(b)ρb for every b ∈ Γ. Then X is
primal feasible for the semidefinite program discussed above. As is the case for all valid measure-
ments, the operator
Y = ∑ Ea,a ⊗ Pa
a∈Γ

is dual feasible. Given that


Tr( X ) = ∑ p(a) h Pa , ρa i = h B, Y i ,
a∈Γ

we have that X is optimal for the primal problem and Y is optimal for the dual problem, and
therefore the measurement { Pa : a ∈ Γ} is optimal for the ensemble {( p( a), ρa ) : a ∈ Γ}.
Suppose on the other hand that the measurement { Pa : a ∈ Γ} is optimal for the ensemble
{( p( a), ρa ) : a ∈ Γ}. Then there exists a primal feasible operator X ∈ Pos (X ) such that

Tr( X ) = ∑ p(a) h Pa , ρa i .
a∈Γ
Lecture 20. Semidefinite programming 157

Given that ∑ a∈Γ Pa = 1 X we therefore have

∑ h Pa, X − p(a)ρa i = 0.
a∈Γ

As X is primal feasible we have that X ≥ p( a)ρa for every a ∈ Γ, and therefore

∑ Pa (X − p(a)ρa ) = 0,
a∈Γ

where we have used the fact that Tr( PQ) = 0 implies PQ = 0 for every choice of positive semidef-
inite operators P, Q ≥ 0. Thus,
X = ∑ p( a) Pa ρa ,
a∈Γ
and therefore the required conditions on this operator hold.

20.4 One more look at the diamond norm

The square of the diamond norm of an arbitrary super-operator can be expressed as the optimal
value of a semidefinite program, as we will now verify. This provides a simplification of the spec-
tral norm characterization of the diamond norm from the previous lecture, and has the pleasant
side-effect of giving a means to efficiently compute the diamond norm of a given super-operator—
because there exist efficient algorithms to approximate the optimal value of very general classes
of semidefinite programs to high precision.
Let us begin by describing the semidefinite program, starting first with its associated primal
and dual problems. After doing this we will verify that its value corresponds to the square of the
diamond norm. Throughout this discussion we assume that a Stinespring representation
Φ( X ) = TrZ ( AXB∗ )
of an arbitrary super-operator Φ ∈ T (X , Y ) has been fixed. It will be helpful to assume that
dim(Z) = rank( J (Φ)), which is the smallest dimension for which such a Stinespring representa-
tion exists, although this is not critical for most of our discussion. Let us also agree hereafter to
disregard the trivial case where Φ = 0.

20.4.1 Description of a semidefinite program for the diamond norm


The primal and dual problems for the semidefinite program we wish to consider are as follows:
Primal problem Dual problem
minimize: λ maximize: h BB∗ , W i
subject to: λ1 X ≥ A∗ (1 Y ⊗ Z ) A, subject to: TrY (W ) ≤ TrY ( AX A∗ ) ,
1 Y ⊗ Z ≥ BB∗ , Tr( X ) ≤ 1,
λ ≥ 0, X ∈ Pos (X ) ,
Z ∈ Pos (Z) . W ∈ Pos (Y ⊗ Z) .
These problems are associated with the semidefinite program, defined over complex Euclidean
spaces C ⊕ Z and X ⊕ (Y ⊗ Z), that is formally specified as (Ψ, C, D ), where the super-operator
Ψ ∈ T (C ⊕ Z , X ⊕ (Y ⊗ Z)) is given by
λ u∗ λ1 X − A∗ (1 Y ⊗ Z ) A
   
0
Ψ =
v Z 0 1Y ⊗ Z
Lecture 20. Semidefinite programming 158

and the operators C ∈ Herm (C ⊕ Z) and D ∈ Herm (X ⊕ (Y ⊗ Z)) are


   
1 0 0 0
C= and D= .
0 0 0 BB∗

It is straightforward to verify that the primal problem given above corresponds to this formal
description, while the dual problem may be verified by calculating the adjoint of Ψ, which is

   
∗ X M Tr( X ) 0
Ψ = .
N W 0 TrY (W − AX A∗ )

(In the above expressions of Ψ and Ψ∗ , the vectors u, v ∈ Z and the operators M, N ∈ L (X , Y ⊗ Z)
essentially act as place-holders that do not contribute to the output of these super-operators.)

20.4.2 Analysis of the semidefinite program


We will now analyze the semidefinite program given above. Before we discuss its relationship to
the diamond norm, let us verify that it satisfies strong duality.
The primal problem obviously results in a finite optimal value. For µ > k B k2 and λ > µ k A k2 ,
we have
λ1 X − µA∗ A
       
λ 0 λ 0 0 0 0
>0 and Ψ = > ,
0 µ1 Z 0 µ1 Z 0 µ1 Y ⊗ 1 Z 0 BB∗

which illustrates strict primal feasibility. Thus, the optimal value β associated with the dual prob-
lem is equal to the optimal primal value α, and is achieved for some choice of a dual feasible
operator.
The same reasoning can be applied to show that there exists a primal feasible operator for
which the optimal value α is achieved, but here is where we require an assumption on the dimen-
sion of Z . If it is the case that dim(Z) = rank( J (Φ)), then necessarily we have that TrY ( AX A∗ )
is positive definite for any positive definite X. Therefore, by choosing any positive definite X for
which Tr( X ) < 1, there must exist a sufficiently small ε > 0 for which
   
X 0 1 0
Ψ∗ < .
0 ε 1Y ⊗ 1Z 0 0

This shows that we have strict dual feasibility, and therefore the optimal value is achieved for
some primal feasible operator.
Now let us verify that the optimal value associated with this semidefinite program corre-
sponds to k Φ k2⋄ . We will do this by considering the dual problem, and because our focus is on the
dual problem we will refer to the optimal value of the semidefinite program as β.
Let us define a set

A = {W ∈ Pos (Y ⊗ Z) : TrY (W ) = TrY ( AρA∗ ) for some ρ ∈ D (X )} .

It is easy to see that


β = max h BB∗ , W i ,
W ∈A

because an optimal solution


X M∗
 

M W
Lecture 20. Semidefinite programming 159

to the dual problem could be “inflated” (meaning that we add some positive semidefinite oper-
ator to it) to achieve equality in the constraints Tr( X ) ≤ 1 and TrY (W ) ≤ TrY ( AX A∗ ), without
decreasing the value of the objective function.
Now we need to prove that
k Φ k2⋄ = max h BB∗ , W i .
W ∈A
For any choice of a complex Euclidean space W for which dim(W ) ≥ dim(X ), we have
k Φ k2⋄ = max k TrZ [( A ⊗ 1 W )uv∗ ( B ⊗ 1 W )∗ ]k21
u,v∈S(X ⊗W )

= max | Tr [(U ⊗ 1 Z )( A ⊗ 1 W )uv∗ ( B ⊗ 1 W )∗ ]|2


u,v∈S(X ⊗W )
U ∈U(Y ⊗W )

= max | v∗ ( B ⊗ 1 W )∗ (U ⊗ 1 Z )( A ⊗ 1 W )u |2
u,v∈S(X ⊗W )
U ∈U(Y ⊗W )

= max k( B ⊗ 1 W )∗ (U ⊗ 1 Z )( A ⊗ 1 W )u k2
u ∈S(X ⊗W )
U ∈U(Y ⊗W )

= max u∗ ( A ⊗ 1 W )∗ (U ⊗ 1 W )∗ ( B ⊗ 1 W )( B ⊗ 1 W )∗ (U ⊗ 1 Z )( A ⊗ 1 W )u
u ∈S(X ⊗W )
U ∈U(Y ⊗W )

= max Tr [( BB∗ ⊗ 1 W )(U ⊗ 1 Z )( A ⊗ 1 W )uu∗ ( A ⊗ 1 W )∗ (U ⊗ 1 Z )∗ ]


u ∈S(X ⊗W )
U ∈U(Y ⊗W )

= max h BB∗ , TrW [(U ⊗ 1 Z )( A ⊗ 1 W )uu∗ ( A ⊗ 1 W )∗ (U ⊗ 1 Z )∗ ]i .


u ∈S(X ⊗W )
U ∈U(Y ⊗W )

(If this calculation looks nasty, it is only because I have taken small steps—the equalities are all
applications simple facts that we have probably seen several times previously in the course.)
It now remains to prove that
A = {TrW [(U ⊗ 1 Z )( A ⊗ 1 W )uu∗ ( A ⊗ 1 W )∗ (U ⊗ 1 Z )∗ ] : u ∈ S(X ⊗ W ) , U ∈ U (Y ⊗ W )}
for some choice of W with dim(W ) ≥ dim(X ). We will choose W such that
dim(W ) = max{dim(X ), dim(Y ⊗ Z)}.
First consider an arbitrary choice of u ∈ S(X ⊗ W ) and U ∈ U (Y ⊗ W ), and let
W = TrW [(U ⊗ 1 Z )( A ⊗ 1 W )uu∗ ( A ⊗ 1 W )∗ (U ⊗ 1 Z )∗ ] .
Then TrY (W ) = TrY ( A TrW (uu∗ ) A∗ ), and so it is clear that W ∈ A. Now consider an arbitrary
element W ∈ A, and let ρ ∈ D (X ) satisfy TrY (W ) = TrY ( AρA∗ ). Let u ∈ S(X ⊗ W ) purify ρ and
let w ∈ Y ⊗ Z ⊗ W purify W. Then
TrY ⊗W (ww∗ ) = TrY ⊗W (( A ⊗ 1 W )uu∗ ( A ⊗ 1 W )∗ ) ,
so there exists U ∈ U (Y ⊗ W ) such that (U ⊗ 1 Z )( A ⊗ 1 W )u = w, and therefore
W = TrW (ww∗ ) = TrW [(U ⊗ 1 Z )( A ⊗ 1 W )uu∗ ( A ⊗ 1 W )∗ (U ⊗ 1 Z )∗ ] .
We have therefore proved that
A = {TrW [(U ⊗ 1 Z )( A ⊗ 1 W )uu∗ ( A ⊗ 1 W )∗ (U ⊗ 1 Z )∗ ] : u ∈ S(X ⊗ W ) , U ∈ U (Y ⊗ W )} ,
and so we have that β = k Φ k2⋄ as required.
Lecture 20. Semidefinite programming 160

20.4.3 Connection to the spectral norm characterization


Finally, we will relate the semidefinite program from the previous section to the spectral norm
characterization of the diamond norm from the previous lecture.
Recall that the spectral norm characterization of the diamond norm is that

k Φ k ⋄ = inf{k A k k B k : ( A, B) ∈ SΦ }, (20.5)

where SΦ denotes the collection of all pairs ( A, B) that give Stinespring representations of Φ. The
easy part of our proof of this characterization was establishing

k Φ k⋄ ≤ k A k k B k

for all ( A, B) ∈ SΦ , while the more difficult task of showing that the infimum equals k Φ k ⋄ re-
quired a much more complicated argument relating to Alberti’s theorem.
Given the description of k Φ k2⋄ by the semidefinite program above, we may now simplify this
part of the proof. Notice first that the primal problem may alternately be expressed as

minimize: k A ∗ (1 Y ⊗ Z ) A k
subject to: 1 Y ⊗ Z ≥ BB∗ ,
Z ∈ Pos (Z) .

The optimal value of this problem does not change if we restrict Z to be positive definite, provided
we accept that an optimal solution may not be achieved. We therefore have

k Φ k2⋄ = inf{k A∗ (1 Y ⊗ Z ) A k : 1 Y ⊗ Z ≥ BB∗ , Z ∈ Pd (Z)}.

For any ε > 0 we may therefore choose Z ∈ Pd (Z) such that

k A∗ (1 Y ⊗ Z ) A k ≤ (k Φ k⋄ + ε)2

and 1 Y ⊗ Z ≥ BB∗ . This second inequality is equivalent to


   
1 Y ⊗ Z −1/2 BB∗ 1 Y ⊗ Z −1/2 ≤ 1.

So now we have that


   
1 Y ⊗ Z1/2 A 1 Y ⊗ Z −1/2 B ≤ k Φ k⋄ + ε,

and it is clear that     


1 Y ⊗ Z1/2 A, 1 Y ⊗ Z −1/2 B ∈ SΦ .

This establishes that the infimum equals k Φ k ⋄ in the expression (20.5), which completes the char-
acterization.
Theory of Quantum Information Lecture Notes
John Watrous (Institute for Quantum Computing, University of Waterloo)

Lecture 21
The quantum de Finetti theorem

The main goal of this lecture is to prove a theorem known as the quantum de Finetti theorem. There
are in fact several known variants of this theorem, so to be more precise it may be said that we
will prove a theorem of the quantum de Finetti type. This type of theorem states, in effect, that if a
collection of identical quantum registers have a reduced state that is invariant under permutations,
then the reduced state of a comparatively small number of these registers must be close to a convex
combination of identical product states. Variants of this type of theorem may differ with respect
to the precise bounds that are obtained.
There are three main parts of the lecture. First we will introduce various concepts concerning
quantum states of multiple register systems that are invariant under permutations of these reg-
isters. Then we will very briefly discuss integrals defined with respect to the unitarily invariant
measure on the unit sphere of a given complex Euclidean space, simply because it supplies us
with a tool we need for the last part of the lecture. The last part of the lecture is the statement and
proof of the quantum de Finetti theorem.
This lecture will be somewhat different from the others in the course in that I will skip many
of the details of the proofs. This is inevitable for two reasons: one is because we have very limited
time remaining in the course, and certainly not enough time for a proper discussion of the details;
and the other is that the background knowledge required to formalize some of the details is sig-
nificantly more than for the other lectures. Nevertheless, I hope there will be enough information
for you to follow-up on this lecture on your own, in case you choose to do this.

21.1 Symmetric subspaces and exchangeable operators

Let us fix a finite, nonempty set Σ, and let d = | Σ | for the remainder of this lecture. Also let n
be a positive integer, and let X1 , . . . , Xn be identical quantum registers, with associated complex
Euclidean spaces X1 , . . . , Xn taking the form Xk = C Σ for 1 ≤ k ≤ n.

21.1.1 Permutation operators


For each permutation π ∈ Sn , we define a unitary operator
Wπ ∈ U (X1 ⊗ · · · ⊗ Xn )
by the action
Wπ (u1 ⊗ · · · ⊗ un ) = uπ −1 (1) ⊗ · · · ⊗ uπ −1 (n)
for every choice of vectors u1 , . . . , un ∈ C Σ . In other words, Wπ permutes the contents of the
registers X1 , . . . , Xn according to π. It holds that
Wπ Wσ = Wπσ and Wπ−1 = Wπ∗ = Wπ −1 (21.1)
for all π, σ ∈ Sn .

161
Lecture 21. The quantum de Finetti theorem 162

21.1.2 The symmetric subspace


Some vectors in X1 ⊗ · · · ⊗ Xn are invariant under the action of Wπ for every choice of π ∈ Sn , and
it is clear that the set of all such vectors forms a subspace. This subspace is called the symmetric
subspace, and will be denoted in these notes as X1 6 · · · 6 Xn . In more precise terms, this subspace
is defined as

X1 6 · · · 6 Xn = {u ∈ X1 ⊗ · · · ⊗ Xn : u = Wπ u for every π ∈ Sn } .

It is not difficult to verify that the orthogonal projection operator that projects onto this subspace
is given by
1
n! π∑
ΠX1 6···6Xn = Wπ .
∈ Sn

Let us now construct an orthonormal basis for the symmetric subspace. First, consider the set
Urn(n, Σ) of functions of the form φ : Σ → N (where N = {0, 1, 2, . . . }) that satisfy

∑ φ(a) = n.
a∈Σ

The elements of this set describe urns containing n marbles, where each marble is labelled by
an element of Σ. (There is no order associated with the marbles—all that matters is how many
marbles with each possible label are contained in the urn.)
Now, to say that a string a1 · · · an ∈ Σ is consistent with a particular function φ ∈ Urn(n, Σ)
means simply that a1 · · · an is one possible ordering of the marbles in the urn described by φ. One
can express this formally by defining a function

f a1 ··· an (b) = { j ∈ {1, . . . , n} : b = a j } ,

and by defining that a1 · · · an is consistent with φ if and only if f a1 ··· an = φ. The number of distinct
strings a1 · · · an ∈ Σn that are consistent with a given function φ ∈ Urn(n, Σ) is given by the
multinomial coefficient  
n n!
, .
φ ∏ a∈Σ ( φ ( a) !)

Finally, we define an orthonormal basis of X1 6 · · · 6 Xn as uφ : φ ∈ Urn(n, Σ) , where
 −1/2
n
uφ =
φ ∑ n
e a1 ⊗ · · · ⊗ e a n .
a 1 ··· a n ∈ Σ
f a1 ···an = φ

In other words, uφ is the uniform superposition over all of the strings that are consistent with the
function φ.
For example, taking n = 3 and Σ = {0, 1}, we obtain the following four vectors:

u 0 = e0 ⊗ e0 ⊗ e0
1
u 1 = √ ( e0 ⊗ e0 ⊗ e1 + e0 ⊗ e1 ⊗ e0 + e1 ⊗ e0 ⊗ e0 )
3
1
u 2 = √ ( e0 ⊗ e1 ⊗ e1 + e1 ⊗ e0 ⊗ e1 + e1 ⊗ e1 ⊗ e0 )
3
u 3 = e1 ⊗ e1 ⊗ e1 ,
Lecture 21. The quantum de Finetti theorem 163

where the vectors are indexed by integers rather than functions φ ∈ Urn(3, {0, 1}) in a straight-
forward way.
Using simple combinatorics, it can be shown that | Urn(n, Σ)| = (n+d−d− 1
1 ), and therefore

n+d−1
 
dim(X1 6 · · · 6 Xn ) = .
d−1

Notice that for small d and large n, the dimension of the symmetric subspace is therefore very
small compared with the entire space X1 ⊗ · · · ⊗ Xn .
It is also the case that
n o
X1 6 · · · 6 Xn = span u⊗n : u ∈ C Σ .

This follows from an elementary fact concerning the theory of symmetric functions, but (like many
things in this lecture) I will not prove it here.

21.1.3 Exchangeable operators and their relation to the symmetric subspace


Along similar lines to vectors in the symmetric subspace, we say that a positive semidefinite op-
erator P ∈ Pos (X1 ⊗ · · · ⊗ Xn ) is exchangeable if it is the case that

P = Wπ PWπ∗

for every π ∈ Sn .
There is not an immediate relationship between exchangeable operators and the symmetric
subspace. It is the case that every positive semidefinite operator whose image is contained in
the symmetric subspace is exchangeable, but this is not a necessary condition—for instance, the
identity operator 1 X1 ⊗···⊗Xn is exchangeable and its image is all of X1 ⊗ · · · ⊗ Xn . We may, however,
relate exchangeable operators and the symmetric subspace by means of the following lemma.

Lemma 21.1. Let X1 , . . . , Xn and Y1 , . . . , Yn be copies of the complex Euclidean space C Σ , and suppose
that
P ∈ Pos (X1 ⊗ · · · ⊗ Xn )
is an exchangeable operator. Then there exists a symmetric vector

u ∈ (X1 ⊗ Y1 ) 6 · · · 6 (Xn ⊗ Yn )

that purifies P, i.e., TrY1 ⊗···⊗Yn (uu∗ ) = P.

Proof. Consider a spectral decomposition


k
P= ∑ λj Qj ,
j =1

where λ1 , . . . , λk are the distinct eigenvalues of P and Q1 , . . . , Qk are orthogonal projection opera-
tors onto the associated eigenspaces. Such a decomposition is unique, and given that Wπ PWπ∗ = P
for each permutation π ∈ Sn it is therefore clear that Wπ Q j Wπ∗ = Q j for each j = 1, . . . , k.
Let us view that each of the above projection operators takes the form

Q1 , . . . , Qk ∈ L (Y1 ⊗ · · · ⊗ Yn , X1 ⊗ · · · ⊗ Xn ) ,
Lecture 21. The quantum de Finetti theorem 164

which we may do because the spaces Y1 , . . . , Yn are isomorphic copies of X1 , . . . , Xn . Having done
this, we define
k q
u = ∑ λ j vec( Q j ).
j =1

Formally this is a vector in the space X1 ⊗ · · · ⊗ Xn ⊗ Y1 ⊗ · · · ⊗ Yn , but we may view it as an


element of the space (Y1 ⊗ X1 ) ⊗ · · · ⊗ (Yn ⊗ Xn ) by rearranging its tensor factors in a manner
that is consistent with their names (as we have done repeatedly throughout the course).
It remains to verify that u has the properties we require of it. We have that
q k

TrY1 ⊗···⊗Yn (uu ) = ∑ λi λ j Qi Q∗j = ∑ λ j Q j = P.
1≤ i,j≤ k j =1

Moreover, for every π ∈ Sn we have


k q k q k q
∑ ∑ λ j vec(Wπ Q j Wπ∗ ) = ∑

(Wπ ⊗ Wπ )u = Wπ ⊗ Wπ λ j vec( Q j ) = λ j vec( Q j ) = u,
j =1 j =1 j =1

which shows that u ∈ (X1 ⊗ Y1 ) 6 · · · 6 (Xn ⊗ Yn ) as required.

21.2 Integrals and unitarily invariant measure

For the proof of the main result in the next section, we will need to be able to express certain linear
operators as integrals. Here is a very simple expression that will serve as an example for the sake
of this discussion: Z
uu∗ dµ(u).

Now, there are two very different questions one may have about such an operator:

1. What does it mean in formal terms?


2. How is it calculated?

The answer to the first question is a bit complicated—and although we will not have time to
discuss it in detail, I would like to say enough to at least give you some clues and key-words in
case you wish to learn more on your own.
In the above expression, µ refers to the normalized unitarily invariant measure defined on the
Borel sets of the unit sphere S = S(X ) in some chosen complex Euclidean space X . (The space
X is implicit in the above expression, and generally would be determined by the context of the
expression.) To say that µ is normalized means that µ(S) = 1, and to say that µ is unitarily invariant
means that µ(A) = µ(U (A)) for every Borel set A ⊆ S and every unitary operator U ∈ U (X ).
It turns out that there is only one measure with the properties that have just been described.
Sometimes this measure is called Haar measure, although this term is considerably more general
than what we have just described. (There is a uniquely defined Haar measure on many different
sorts of measure spaces with groups acting on them in a particular way.)
Informally speaking, you may think of the measure described above as a way of assigning an
infinitesimally small probability to each point on the unit sphere in such a way that no one vector
is weighted more or less than any other. So, in an integral like the one above, we may view that it
is an average of operators uu∗ over the entire unit sphere, with each u being given equal weight.
Lecture 21. The quantum de Finetti theorem 165

Of course it does not really work this way, which is why we must speak of Borel sets rather than
arbitrary sets—but it is a reasonable guide for the simple uses of it in this lecture.
In formal terms, there is a process involving several steps for building up the meaning of an
integral like the one above starting from the measure µ. It starts with characteristic functions for
Borel sets (where the value of the integral is simply the set’s measure), then it defines integrals
for positive linear combinations of characteristic functions in the obvious way, then it introduces
limits to define integrals of more functions, and continues for a few more steps until we finally
have integrals of operators. Needless to say, this process does not provide an efficient means to
calculate a given integral.
This leads us to the second question, which is how to calculate such integrals. There is certainly
no general method: just like ordinary integrals you are lucky when there is a closed form. For
some, however, the fact that the measure is unitarily invariant leads to a simple answer. For
instance, the integral above must satisfy
Z Z 
uu∗ dµ(u) = U uu∗ dµ(u) U ∗

for every unitary operator U ∈ U (X ), and must also satisfy


Z  Z

Tr uu dµ(u) = dµ(u) = µ(S) = 1.

There is only one possibility:


1
Z
uu∗ dµ(u) = 1X .
dim(X )
Now, what we need for the next part of the lecture is a generalization of this fact—which is
that for every n ≥ 1 we have
n+d−1
 Z
(uu∗ )⊗n dµ(u) = ΠX1 6···6Xn ,
d−1
the projection onto the symmetric subspace. This is yet another fact for which a complete proof
would be too much of a diversion at this point in the course. The main result we need is a fact from
⊗n
 in the space L (X1 ⊗ · · · ⊗ Xn ) that commutes with U for
algebra that states that every operator
Σ
every unitary operator U ∈ U C must be a linear combination of the operators {Wπ : π ∈ Sn }.
Given this fact, along with the fact that the operator expressed by the integral has the correct trace
and is invariant under multiplication by every Wπ , the proof follows easily.

21.3 The quantum de Finetti theorem

Now we are ready to state and prove one variant of the quantum de Finetti theorem, which is the
main goal of this lecture. The statement and proof follow.
Theorem 21.2. Let X1 , . . . , Xn be identical quantum registers, each having associated space C Σ for | Σ | =
d, and let ρ ∈ D (X1 ⊗ · · · ⊗ Xn ) be an exchangeable density operator representing the reduced state of
these registers. Then for any choice of k with 1 < k < n, there exists  a finite set Γ, a probability vector
p ∈ R Γ , and a collection of density operators {ξ a : a ∈ Γ} ⊂ D C Γ such that


X1 ···Xk
4d2 k
ρ − ∑ p(b)ξ b⊗k < .

b∈Γ
n
1
Lecture 21. The quantum de Finetti theorem 166

Proof. First we will prove a stronger bound for the case where ρ = vv∗ is pure (which requires
v ∈ X1 6 · · · 6 Xn ). This will then be combined with Lemma 21.1 to complete the proof.
For the sake of clarity, let us write Y = X1 ⊗ · · · ⊗ Xk and Z = Xk+1 ⊗ · · · ⊗ Xn . Let us also
write
m+d−1
 Z
S (m)
= (uu∗ )⊗m dµ(u),
d−1
which is the projection onto the symmetric subspace of m copies of C Σ for any choice of m ≥ 1.
Now consider a unit vector v ∈ X1 6 · · · 6 Xn . As v is invariant under every permutation of
its tensor factors, it holds that  
v = 1 Y ⊗ S(n−k) v.

Therefore, for σ ∈ D (Y ) defined as σ = TrZ (vv∗ ) we must have


  
σ = TrZ 1 Y ⊗ S(n−k) vv∗ .

Defining a super-operator Φu ∈ T (Y ⊗ Z , Y ) for each vector u ∈ C Σ as


 ∗  
⊗( n− k) ⊗( n− k)
Φu ( X ) = 1 Y ⊗ u X 1Y ⊗ u

for every X ∈ L (Y ⊗ Z), we have

n−k+d−1
 Z
σ= Φu (vv∗ ) dµ(u).
d−1

Now, our goal is to approximate σ by a density operator taking the form

∑ p(b)ξ b⊗k ,
b∈Γ

so we will guess a suitable approximation:

n+d−1
 Z D E
τ= (uu∗ )⊗k , Φu (vv∗ ) (uu∗ )⊗k dµ(u).
d−1

It certainly holds that τ has trace 1, because

n+d−1
 Z D E
(uu∗ )⊗n , vv∗ dµ(u) = S(n) , vv∗ = 1,


Tr(τ ) =
d−1

and τ also has the correct form:


n  o
τ ∈ conv (ww∗ )⊗k : w ∈ S C Σ .

(Some details are being swept under the rug here, but because we are working inside a compact
set this claim should not be surprising.)
We will now place an upper bound on k σ − τ k1 . To make the proof more readable, let us write

m+d−1
 
cm =
d−1
Lecture 21. The quantum de Finetti theorem 167

for each m ≥ 0. We begin by noting that


 
cn−k cn−k 1 1 cn−k
k σ − τ k1 ≤ σ −
τ +
τ − τ = cn−k
σ − τ + 1 −
. (21.2)
c n 1 c n 1 cn−k cn 1 cn

Next, by making use of the operator equality

A − BAB = A(1 − B) + (1 − B) A − (1 − B) A(1 − B),

and writing ∆u = (uu∗ )⊗k , we obtain


Z
1 1 ∗ ∗


c σ − τ = (Φu (vv ) − ∆u Φu (vv )∆u ) dµ(u)
n−k cn 1
Z Z 1
∗ ∗

≤ Φu (vv )(1 − ∆u ) dµ(u) + (1 − ∆u )Φu (vv ) dµ(u)


Z 1 1


+ (1 − ∆u )Φu (vv )(1 − ∆u ) dµ(u) .

1

It is clear that
Z Z
Φu (vv∗ )(1 − ∆u ) dµ(u) = (1 − ∆u )Φu (vv∗ ) dµ(u) ,


1 1

while
Z Z 
(1 − ∆u )Φu (vv∗ )(1 − ∆u ) dµ(u) = Tr ∗

(1 − ∆u )Φu (vv )(1 − ∆u ) dµ(u)
1
Z 
= Tr (1 − ∆u )Φu (vv∗ ) dµ(u)
Z


≤ (1 − ∆u )Φu (vv ) dµ(u)

.
1

Therefore we have
Z
1 1 ∗


c σ − τ ≤ 3 ( 1 − ∆ u ) Φ u ( vv ) dµ ( u ) .
n−k c n 1
1

At this point we note that


1
Z
Φu (vv∗ ) dµ(u) = σ
cn−k
while
1
Z Z

∆u Φu (vv ) dµ(u) = TrZ (uu∗ )⊗n vv∗ dµ(u) = σ.
cn
Therefore we have  
1 1 1 1

c σ − τ ≤ 3
− ,
n−k cn 1 cn−k cn
and so      
c 1 1 c
k σ − τ k1 ≤ 1 − n−k +3 − ≤ 4 1 − n−k .
cn cn−k cn cn
Lecture 21. The quantum de Finetti theorem 168

To finish-off the upper bound, we observe that


 d −1
(n − k + d − 1)(n − k + d − 2) · · · (n − k + 1) n−k+1

cn−k dk
= ≥ > 1− ,
cn (n + d − 1)(n + d − 2) · · · (n + 1) n+1 n

and so
4dk
k σ − τ k1 < .
n
This establishes essentially the bound given in the statement of the theorem, albeit only for pure
states, but with d2 replaced by d.
To prove the bound in the statement of the theorem for an arbitrary exchangeable density
operator ρ ∈ D (X1 ⊗ · · · ⊗ Xn ), we first apply Lemma 21.1 to obtain a symmetric purification

v ∈ (X1 ⊗ Y1 ) 6 · · · 6 (Xn ⊗ Yn )

of ρ, where Y1 , . . . , Yn represent isomorphic copies of X1 , . . . , Xn . By the argument above, we have

4d2 k
k σ − τ k1 < ,
n
where σ = TrZ (vv∗ ) for Z = (Xk+1 ⊗ Yk+1 ) ⊗ · · · ⊗ (Xn ⊗ Yn ) and where
n  o
τ ∈ conv (uu∗ )⊗k : u ∈ S C Σ ⊗ C Σ .

Taking the partial trace over Y1 ⊗ · · · ⊗ Yk then gives the result.

You might also like