Professional Documents
Culture Documents
John Watrous
Institute for Quantum Computing
University of Waterloo
Lecture contents
1 Mathematical Preliminaries – Part I 1
1.1 Complex Euclidean spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Definition of complex Euclidean spaces . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Inner product and norms of vectors . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.3 Orthogonal and orthonormal sets . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Linear operators and super-operators . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Matrices and their association with operators . . . . . . . . . . . . . . . . . . . 4
1.2.2 The entry-wise conjugate, transpose, and adjoint . . . . . . . . . . . . . . . . . 5
1.2.3 Algebras of operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.4 Trace and determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.5 Eigenvectors and eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Important classes of operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 The spectral and singular value theorems . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.1 The spectral theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.2 Functions of normal operators . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4.3 The singular-value theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4.4 The Moore-Penrose pseudo-inverse . . . . . . . . . . . . . . . . . . . . . . . . 13
i
Lecture contents ii
12 Separable operators 97
12.1 Definition and basic properties of separable operators . . . . . . . . . . . . . . . . . . 97
12.2 The Woronowicz–Horodecki Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
12.3 Separable ball around the identity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Lecture 1
Mathematical Preliminaries – Part I
The goal of the first two lectures of the course is to present an overview of basic mathematical
concepts and tools that are required throughout the course. In particular, we will discuss various
facts about linear algebra in finite-dimensional vector spaces.
This section introduces the simple notion of a complex Euclidean space, which is used throughout
this course. In particular, with every discrete and finite physical system we will associate some
complex Euclidean space; and fundamental notions such as states and measurements of systems
are represented in linear-algebraic terms that refer to these spaces.
1
Lecture 1. Mathematical Preliminaries – Part I 2
The convention to write ui rather than u(i ) such expressions is simply a matter of typographic
appeal, and is avoided when it is not helpful or would lead to confusion (such as when vectors
are subscripted for another purpose). Similar conventions are used as convenient for elements of
spaces of the more general form C Σ , assuming that some natural ordering is placed on the index
set Σ.
hu, vi = ∑ u ( a ) v ( a ).
a∈Σ
It may be verified that the inner product satisfies the following properties:
1. Linearity in the second argument: hu, αv + βwi = α hu, vi + β h u, wi for all u, v, w ∈ C Σ and
α, β ∈ C.
2. Conjugate symmetry: hu, vi = hv, ui for all u, v ∈ C Σ .
3. Positive definiteness: hu, ui ≥ 0 for all u ∈ C Σ , with hu, ui = 0 if and only if u = 0.
One typically refers to any function satisfying these three properties as inner product, but this is
the only inner product for vectors in complex Euclidean spaces that is considered in this course.
The Euclidean norm of a vector u ∈ C Σ is defined as
q r
k u k = hu, ui = ∑ | u( a)|2 .
a∈Σ
The Euclidean norm satisfies the following properties, which are the defining properties of any
function that is called a norm:
|hu, vi| ≤ k u k k v k
for all u, v ∈ C Σ , with equality if and only if u and v are linearly dependent.
The Euclidean norm corresponds to the special case p = 2 of the class of p-norms, defined for
each u ∈ C Σ as ! 1/p
p
kukp = ∑ | u(a)|
a∈Σ
Given complex Euclidean spaces X and Y , one writes L (X , Y ) to refer to the collection of all linear
mappings of the form
A : X → Y. (1.1)
Such mappings will be referred to as linear operators, or simply operators, from X to Y in this course.
Parentheses are typically omitted when expressing the action of linear operators on vectors when
there is little chance of confusion in doing so. For instance, one typically writes Au rather than
A(u) to denote the vector resulting from the application of an operator A ∈ L (X , Y ) to a vector
u ∈ X.
The set L (X , Y ) forms a vector space, where addition and scalar multiplication are defined as
follows:
( A + B)u = Au + Bu
for all u ∈ X .
2. Scalar multiplication: given A ∈ L (X , Y ) and α ∈ C, the operator αA ∈ L (X , Y ) is defined
by the equation
(αA)u = α Au
for all u ∈ X .
Let us assume that X = C Σ and Y = C Γ . For each choice of a ∈ Σ and b ∈ Γ, the operator
Eb,a ∈ L (X , Y ) is defined by the condition that
eb if c = a
Eb,a ec =
0 if c 6= a
Lecture 1. Mathematical Preliminaries – Part I 4
ker( A) = {u ∈ X : Au = 0},
im( A) = { Au : u ∈ X }.
The rank of A, denoted rank( A), is the dimension of the subspace im( A). For every operator
A ∈ L (X , Y ) it holds that
dim(ker( A)) + rank( A) = dim(X ).
( M + N )( a, b) = M ( a, b) + N ( a, b)
As a vector space, MΓ,Σ (C ) is therefore essentially the same as the complex Euclidean space C Γ×Σ .
Multiplication of matrices is defined in the following standard way. Given matrices M ∈
MΓ,∆ (C ) and N ∈ M∆,Σ (C ), for finite non-empty sets Γ, ∆, and Σ, the matrix MN ∈ MΓ,Σ (C ) is
defined as
( MN )( a, b) = ∑ M ( a, c) N (c, b)
c∈∆
for each a ∈ Γ and b ∈ Σ. Conversely, to each matrix M ∈ MΓ,Σ (C ) one associates a linear
operator A M ∈ L (X , Y ) defined by
for each a ∈ Γ. It holds that the mappings A 7→ M A and M 7→ A M are linear and inverse to one
other, and that compositions of linear operators are represented by matrix multiplications:
M AB = M A MB
A( a, b) = A( a, b)
AT (b, a) = A( a, b)
hv, Aui = h A∗ v, ui
for all u ∈ X and v ∈ Y . It may be obtained by performing both of the operations described in
items 1 and 2:
A ∗ = AT .
Lecture 1. Mathematical Preliminaries – Part I 6
The operators A, AT , and A∗ will be called the entry-wise conjugate, transpose, and adjoint operators
to A, respectively.
The mappings A 7→ A and A 7→ A∗ are conjugate linear and the mapping A 7→ AT is linear:
αA + βB = α A + β B,
(αA + βB)∗ = αA∗ + βB∗ ,
(αA + βB)T = αAT + βBT ,
for all A, B ∈ L (X , Y ) and α, β ∈ C. These mappings are bijections, each being its own inverse.
Every vector u ∈ X in a complex Euclidean space X may be identified with the linear operator
in L (C, X ) that maps α 7→ αu. Through this identification the linear mappings u ∈ L (C, X ) and
uT , u∗ ∈ L (X , C ) are defined as above. As an element of X , the vector u is of course simply the
entry-wise complex conjugate of u, i.e., if X = C Σ then
u ( a) = u ( a)
for every a ∈ Σ. For each vector u ∈ X the mapping u∗ ∈ L (X , C ) satisfies u∗ v = hu, vi for all
v ∈ X . The space of linear operators L (X , C ) is called the dual space of X , and is typically denoted
by X ∗ rather than L (X , C ).
For any choice of vectors u ∈ X and v ∈ Y , the operator vu∗ ∈ L (X , Y ) satisfies
for all w ∈ X . For nonzero vectors u and v the operator vu∗ has rank equal to one, and every rank
one operator in L (X , Y ) can be expressed in this form for vectors u and v that are unique up to
scalar multiples.
( AB)C = A( BC ),
C (αA + βB) = αCA + βCB,
(αA + βB)C = αAC + βBC,
where Sym(Σ) is the group of permutations on the set Σ and sign(π ) is the sign of the permu-
tation π (which is +1 if π is expressible as a product of an even number of transpositions of
elements of the set Σ, and −1 if π is expressible as a product of an odd number of transposi-
tions).
for any choice of operators A ∈ L (X , Y ) and B ∈ L (Y , X ), for arbitrary complex Euclidean spaces
X and Y .
By means of the trace, we define an inner product on the space L (X , Y ), for any choice of
complex Euclidean spaces X and Y , as
h A, Bi = Tr ( A∗ B)
for all A, B ∈ L (X , Y ). True to its name, this inner product satisfies the requisite properties of
being an inner product:
for all A, B ∈ L (X ), and its value is nonzero if and only if its argument is invertible.
Lecture 1. Mathematical Preliminaries – Part I 8
p A (z) = det(z1 X − A)
Tr( A) = ∑ λ
λ∈spec( A )
and
det( A) = ∏ λ
λ∈spec( A )
for every A ∈ L (X ).
Several classes of operators that have importance in quantum information are discussed in this
section.
An operator A ∈ L (X ) is normal if and only if it commutes with its adjoint: [ A, A∗ ] = 0, or
equivalently AA∗ = A∗ A. The importance of this collection of operators, for the purposes of this
course, is mainly derived from two facts: (1) the normal operators are those for which the spectral
theorem (discussed later in Section 1.4.1) holds, and (2) most of the special classes of operators
that are discussed below are subsets of the normal operators.
An operator A ∈ L (X ) is Hermitian if A = A∗ . The set of Hermitian operators acting on a
given complex Euclidean space X will hereafter be denoted Herm (X ) in this course:
Herm (X ) = { A ∈ L (X ) : A = A∗ }.
is defined so that
spec( A) = {λ1 ( A), λ2 ( A), . . . , λn ( A)}
and
λ1 ( A ) ≥ λ2 ( A ) ≥ · · · ≥ λ n ( A ).
Lecture 1. Mathematical Preliminaries – Part I 9
The sum of two Hermitian operators is obviously Hermitian, as is any real scalar multiple of a
Hermitian operator. The inner product of two Hermitian operators is real as well: h A, Bi ∈ R for
all A, B ∈ Herm (X ). The set Herm (X ) therefore forms a real inner product space, the dimension
of which is dim(X )2 . Assuming that X = C Σ , and that the elements of Σ are ordered in some
fixed way, the following collection of operators is one example of a basis of Herm (X ):
Ea,a ( a ∈ Σ)
Ea,b + Eb,a ( a, b ∈ Σ, a < b)
iEa,b − iEb,a ( a, b ∈ Σ, a < b).
Pos (X ) = { B∗ B : B ∈ L (X )}.
The notation P ≥ 0 is also used to mean that P is positive semidefinite, while A ≥ B means
that A − B is positive semidefinite. (Usually this notation is only used when A and B are both
Hermitian.) The inner product of two positive semidefinite operators is always a nonnegative real
number: h P, Qi ≥ 0 for P, Q ∈ Pos (X ).
There are alternate ways to describe positive semidefinite operators that are useful in different
situations. In particular, the following items are equivalent for a given operator P ∈ L (X ):
1. P is positive semidefinite.
2. hu, Pui is a nonnegative real number for every choice of u ∈ X .
3. P is Hermitian and every eigenvalue of P is nonnegative.
4. There exists a collection of vectors {u a : a ∈ Σ} such that P( a, b) = hu a , ub i for all a, b ∈ Σ.
Pd (X ) = { A ∈ Pos (X ) : det( A) 6= 0}
to denote the set of such operators for a given complex Euclidean space X . The property of being
positive definite is equivalent to hu, Aui being a positive (and therefore nonzero) real number for
every choice of u ∈ X .
Positive semidefinite operators having trace equal to 1 are called density operators, and it is
conventional to use lowercase Greek letters such as ρ, ξ, and σ to denote such operators. The
notation
D (X ) = {ρ ∈ Pos (X ) : Tr(ρ) = 1}
is used to denote the collection of density operators acting on a given complex Euclidean space.
A positive semidefinite operator P ∈ Pos (X ) is an orthogonal projection if, in addition to being
positive semidefinite, it satisfies P2 = P. Equivalently, an orthogonal projection is any Hermitian
operator whose only eigenvalues are 0 and 1. For each subspace V ⊆ X , we write ΠV to denote
the unique orthogonal projection whose image is equal to the subspace V . It is typically that the
term projection refers to an operator A ∈ L (X ) that satisfies A2 = A, but which might not be
Lecture 1. Mathematical Preliminaries – Part I 10
Hermitian. Given that there is no discussion of such operators in this course, we will use the term
projection to mean orthogonal projection.
An operator A ∈ L (X , Y ) is a linear isometry if it preserves the Euclidean norm—meaning
that k Au k = k u k for all u ∈ X . The condition that k Au k = k u k for all u ∈ X is equivalent to
A∗ A = 1 X . The notation
U (X , Y ) = { A ∈ L (X , Y ) : A∗ A = 1 X }
is used throughout this course. Every linear isometry preserves not only the Euclidean norm, but
inner products as well: h Au, Avi = hu, vi for all u, v ∈ X .
The set of linear isometries mapping X to itself is denoted U (X ), and operators in this set
are called unitary operators. The letters U, V, and W are conventionally used to refer to unitary
operators. Every unitary operator U ∈ U (X ) is invertible and satisfies UU ∗ = U ∗ U = 1 X , which
implies that every unitary operator is normal.
We now discuss the spectral and singular value theorems. These theorems establish the existence of
the spectral and singular value decompositions of operators, which are indispensable tools that are
used repeatedly throughout this course. The spectral theorem also provides us with the important
notion that functions defined on scalars can be extended to normal operators in a robust way.
Theorem 1.1 (Spectral Theorem). Let X be a complex Euclidean space, let A ∈ L (X ) be a normal
operator, and assume that the distinct eigenvalues of A are λ1 , . . . , λk . Then there exists a unique choice of
orthogonal projection operators P1 , . . . , Pk ∈ L (X ), with P1 + · · · + Pk = 1 X and Pi Pj = 0 for i 6= j, such
that
k
A= ∑ λi Pi . (1.3)
i =1
For each i ∈ {1, . . . , k}, it holds that the rank of Pi is equal to the multiplicity of λi as an eigenvalue of A.
As suggested above, the expression of a normal operator A in the form of the above equation (1.3)
is called a spectral decomposition of A.
A simple corollary of the spectral theorem follows. It expresses essentially the same fact as the
spectral theorem, but in a slightly different form that will be useful to refer to later in the course.
Corollary 1.2. Let X be a complex Euclidean space, let A ∈ L (X ) be a normal operator, and assume that
spec( A) = {λ1 , . . . , λn }. Then there exists an orthonormal basis { x1 , . . . , xn } of X such that
n
A= ∑ λi xi xi∗ . (1.4)
i =1
Lecture 1. Mathematical Preliminaries – Part I 11
It is clear from the expression (1.4), along with the requirement that the set { x1 , . . . , xn } is an
orthonormal basis, that each xi is an eigenvector of A whose corresponding eigenvalue is λi . It is
also clear that any operator A that is expressible in such a form as (1.4) is normal—implying that
the condition of normality is equivalent to the existence of an orthonormal basis of eigenvectors.
We will often refer to expressions of operators in the form (1.4) as spectral decompositions, de-
spite the fact that it differs slightly from the form (1.3). It must be noted that unlike the form (1.3),
the form (1.4) is generally not unique (unless each eigenvalue of A has multiplicity one, in which
case the expression is unique up to scalar multiples of the vectors { x1 , . . . , xn } and the ordering of
the eigenvalues).
Finally, let us mention one more important theorem regarding spectral decompositions of nor-
mal operators, which states that the same orthonormal basis of eigenvectors { x1 , . . . , xn } may be
chosen for any two normal operators, provided that they commute.
Theorem 1.3. Let X be a complex Euclidean space and let A, B ∈ L (X ) be normal operators for which
[ A, B] = 0. Then there exists an orthonormal basis { x1 , . . . , xn } of X such that
n n
A= ∑ λi xi xi∗ and B= ∑ µi xi xi∗
i =1 i =1
Naturally, functions defined only on subsets of scalars may be extended to normal operators
whose eigenvalues are restricted accordingly.
The following examples of scalar functions extended to operators will be important later in
this course.
1. The exponential function α 7→ exp(α) is defined for all α ∈ C, and may therefore be extended
to a function A 7→ exp( A) for any normal operator A ∈ L (X ) by defining
k
exp( A) = ∑ exp(λ j ) Pj ,
j =1
2. For r > 0 the function λ 7→ λr is defined for nonnegative real values λ ∈ [0, ∞). For a given
positive semidefinite operator Q ∈ Pos (X ), having spectral decomposition
k
Q= ∑ λ j Pj , (1.5)
j =1
k
Qr = ∑ λrj Pj .
j =1
For integer values of r, it is clear that Qr coincides with the usual meaning of this expression
given by the multiplication of operators.
√
The case that r = 1/2
√ is particularly common, and in this case we also write Q to denote
Q1/2 . The operator Q is the unique positive semidefinite operator that satisfies
p p
Q Q = Q.
3. Along similar lines to the previous example, for any real number r ∈ R the function λ 7→ λr is
defined for positive real values λ ∈ (0, ∞). For a given positive definite operator Q ∈ Pd (X ),
one defines Qr in a similar way to above.
4. The function λ 7→ log(λ) is defined for every positive real number λ ∈ (0, ∞). For a given
positive definite operator Q ∈ Pd (X ), having a spectral decomposition (1.5) as above, one
defines
k
log( Q) = ∑ log(λ j ) Pj .
j =1
Theorem 1.4 (Singular Value Theorem). Let X and Y be complex Euclidean spaces, let A ∈ L (X , Y ) be
a nonzero operator, and let r = rank( A). Then there exist positive real numbers s1 , . . . , sr and orthonormal
sets { x1 , . . . , xr } ⊂ X and {y1 , . . . , yr } ⊂ Y such that
r
A= ∑ si yi xi∗ . (1.6)
i =1
An expression of a given matrix A in the form of Equation (1.6) is said to be a singular value
decomposition of A. The numbers s1 , . . . , sr are called singular values and the vectors x1 , . . . , xr and
y1 , . . . , yr are called right and left singular vectors, respectively.
The singular values s1 , . . . , sr of an operator A are uniquely determined, up to their ordering.
Hereafter we will assume, without loss of generality, that the singular values are ordered from
largest to smallest: s1 ≥ · · · ≥ sr . When it is necessary to indicate the dependence of these
Lecture 1. Mathematical Preliminaries – Part I 13
singular values on A, we denote them s1 ( A), . . . , sr ( A). Although technically speaking 0 is never
a singular value of any operator, it will be convenient to also define sk ( A) = 0 for k > rank( A).
The notation s( A) is used to refer to the vector of singular values s( A) = (s1 ( A), . . . , sr ( A)), or to
an extension of this vector s( A) = (s1 ( A), . . . , sk ( A)) for k > r when it is convenient to view it as
an element of R k for k > rank( A).
There is a close relationship between singular value decompositions of an operator A and
spectral decompositions of the operators A∗ A and AA∗ . In particular, it will necessarily hold that
q q
sk ( A) = λk ( AA∗ ) = λk ( A∗ A) (1.7)
for 1 ≤ k ≤ rank( A), and moreover the right singular vectors of A will be eigenvectors of A∗ A
and the left singular vectors of A will be eigenvectors of AA∗ . One is free, in fact, to choose
the left singular vectors of A to be any orthonormal collection of eigenvectors of AA∗ for which
the corresponding eigenvalues are nonzero—and once this is done the right singular vectors will
be uniquely determined. Alternately, the right singular vectors of A may be chosen to be any or-
thonormal collection of eigenvectors of A∗ A for which the corresponding eigenvalues are nonzero,
which uniquely determines the left singular vectors.
Lecture 2
Mathematical Preliminaries – Part II
This lecture continues the overview of basic mathematical concepts and tools from linear algebra
that are required throughout the course.
The notion of a tensor product is fundamental in quantum information; for it is the mathematical
construct that is used when two or more quantum states, operations, or measurements are to
be considered as one. Tensor products are discussed in this section, beginning with a concrete
formulation based on Kronecker products of vectors, followed by a more abstract formulation that
carries valuable intuition about the meaning and uses of tensor products. Tensor products of
operators and super-operators may then be defined directly in terms of their actions on vectors
and operators, respectively.
14
Lecture 2. Mathematical Preliminaries – Part II 15
X ⊗ Y = C Σ×Γ ,
(u ⊗ v)( a, b) = u( a)v(b)
for all a ∈ Σ and b ∈ Γ. It holds that the space X ⊗ Y contains vectors that cannot be written in
the form u ⊗ v for u ∈ X and v ∈ Y , as linear combinations of such vectors will typically not take
this form.
The following rules for addition and scalar multiplication of Kronecker products hold:
u1 ⊗ v + u2 ⊗ v = (u1 + u2 ) ⊗ v,
u ⊗ v1 + u ⊗ v2 = u ⊗ ( v1 + v2 ) ,
and
α u ⊗ v = (αu) ⊗ v = u ⊗ (αv)
for all u, u1 , u2 ∈ X , v, v1 , v2 ∈ Y , and α ∈ C.
Kronecker products of three or more vectors, or of three or more complex Euclidean spaces, are
formed by using the ordinary (binary) Kronecker product operation multiple times. In particular,
we may define u1 ⊗ u2 ⊗ · · · ⊗ uk inductively as
u1 ⊗ u2 ⊗ · · · ⊗ u k = u1 ⊗ ( u2 ⊗ · · · ⊗ u k ) (2.1)
( u ⊗ v) ⊗ w = u ⊗ ( v ⊗ w )
for any choice of vectors u, v and w, or in other words that the Kronecker product is associative.
For this reason, the ordering in which the Kronecker products are taken in (2.1) is irrelevant.
To say that a bilinear function φ : X × Y → X ⊗ Y is universal means two things. First, for every
choice of a vector space V and a bilinear function ψ of the above form (2.2), there must exist a
linear mapping A : X ⊗ Y → V such that ψ = A ◦ φ. Second, the space X ⊗ Y must be minimal
in the sense that
X ⊗ Y = span{φ(u, v) : u ∈ X , v ∈ Y }.
This second property implies that for a given choice of ψ, the mapping A for which ψ = A ◦ φ is
necessarily unique. Once the universal bilinear function φ has been fixed, one simply writes u ⊗ v
to denote φ(u, v).
The defining properties of tensor products can be understood as meaning that they give a
perfect translation from bilinear to linear mappings. Everything there is to know about a given
bilinear mapping ψ : X × Y → V is captured by a uniquely determined linear map A : X ⊗ Y →
V . In the other direction, every linear map A : X ⊗ Y → V uniquely determines a bilinear function
ψ : X × Y → V defined by ψ(u, v) = A(u ⊗ v).
Now, the definition of tensor products suggests that tensor products are not really uniquely
defined, as there could be any number of specific choices for the space X ⊗ Y and universal bilin-
ear mapping φ. Although this is true, one typically speaks of the tensor product of X and Y with
the understanding that all tensor products are essentially equivalent. Specifically, going from one
to another amounts to composing one universal bilinear mapping with a linear bijection to the im-
age of the other; which is a change that never has any effect in arguments or calculations involving
tensor products.
The existence of a tensor product can be proved for any choice of vector spaces X and Y ,
by means of an extremely general construction. We do not need to concern ourselves with that
construction, however, as our interest is primarily with complex Euclidean spaces—and here, as
our notation would suggest, the Kronecker product operates as a universal bilinear mapping.
One may define tensor products of three or more spaces inductively, as is done for Kronecker
products, or the entire definition can be generalized by speaking of multilinear rather than bilinear
functions.
is defined as
( M ⊗ N )(( a1 , b1 ), ( a2 , b2 )) = M ( a1 , a2 ) N (b1 , b2 )
for all a1 ∈ Γ1 , a2 ∈ Γ2 , b1 ∈ Σ1 , and b2 ∈ Σ2 .
Naturally, for complex Euclidean spaces X1 = C Σ1 , X2 = C Σ2 , Y1 = C Γ1 , and Y2 = C Γ2 , the
Kronecker product of operators A ∈ L (X1 , X2 ) and B ∈ L (Y1 , Y2 ) may be defined in the way that
Lecture 2. Mathematical Preliminaries – Part II 17
is consistent with their representations as matrices. One may check that the following identities
hold for Kronecker products of operators:
A ⊗ B + C ⊗ B = ( A + C ) ⊗ B,
A ⊗ B + A ⊗ D = A ⊗ ( B + D ),
and
α A ⊗ B = (αA) ⊗ B = A ⊗ (αB)
for any choice of α ∈ C and operators A, C ∈ L (X1 , X2 ) and B, D ∈ L (Y1 , Y2 ).
2. For all choices of complex Euclidean spaces X1 , X2 , X3 , Y1 , Y2 , and Y3 , it holds that
( A ⊗ B)(C ⊗ D ) = ( AC ) ⊗ ( BD ) (2.3)
To define tensor products of operators in abstract terms, we first note that the general notion
of tensor products applies equally well to vector spaces of operators as for any other vector space;
so tensor products of operators may be defined by using the same definition as above, except that
the spaces X and Y are replaced, for instance, with L (X1 , X2 ) and L (Y1 , Y2 ) for some choice of
complex Euclidean spaces X1 , X2 , Y1 , and Y2 .
Without further qualifications, this leaves one with the impression that a tensor product space
of the form
L (X1 , X2 ) ⊗ L (Y1 , Y2 ) (2.4)
is merely a vector space and not a space of linear operators. To recover the operator structures of
these spaces, it suffices to make a special choice for the universal bilinear mapping defining the
tensor product. In particular, supposing that A ∈ L (X1 , X2 ) and B ∈ L (Y1 , Y2 ) are operators, we
define the linear operator
A ⊗ B ∈ L (X1 ⊗ Y1 , X2 ⊗ Y2 )
to be the unique linear mapping that satisfies
for all vectors u ∈ X1 and v ∈ Y1 . Such an operator must exist and be unique, given that the
mapping (u, v) 7→ ( Au) ⊗ ( Bv) is bilinear. As is to be expected, the mapping
( A, B) 7→ A ⊗ B
is a universal bilinear mapping, which allows for the identification of the tensor product space
(2.4) with the space L (X1 ⊗ Y1 , X2 ⊗ Y2 ). This, naturally, is consistent with the concrete definition
based on Kronecker products.
Finally, let us note that tensor products of super-operators may be defined in a similar way as
tensor products of operators. We have, at this point, not discussed the representation of super-
operators by matrices, although this will be done in a later lecture in a way that allows for a
concrete definition based on Kronecker products. In the abstract formulation of tensor products,
Lecture 2. Mathematical Preliminaries – Part II 18
however, the situation is completely analogous to the tensor product of operators. That is, the
tensor product space T (X1 , X2 ) ⊗ T (Y1 , Y2 ) is identified with the space T (X1 ⊗ Y1 , X2 ⊗ Y2 ) by
means of the universal bilinear mapping (Φ, Ψ) 7→ Φ ⊗ Ψ, where Φ ⊗ Ψ is the unique mapping
that satisfies
(Φ ⊗ Ψ)( A ⊗ B) = Φ( A) ⊗ Ψ( B)
for all A ∈ L (X1 ) and B ∈ L (Y1 ), assuming that Φ ∈ T (X1 , X2 ) and Ψ ∈ T (Y1 , Y2 ) are given
super-operators. Analogous properties to those listed above for tensor products of operators
emerge.
If it is the case that { xa : a ∈ Σ} is a basis of X , then there is exactly one choice of vectors
{ya : a ∈ Σ} ⊂ Y such that (2.5) holds. The same fact holds when vectors are replaced by
operators or super-operators.
2. Let X and Y be complex Euclidean spaces, and let { xa : a ∈ Σ} ⊂ X and {yb : b ∈ Γ} ⊂ Y
be linearly independent sets. Then
{ xa ⊗ yb : ( a, b) ∈ Σ × Γ} (2.6)
is a linearly independent set of vectors in X ⊗ Y . If it is the case that { xa : a ∈ Σ} ⊂ X and
{yb : b ∈ Γ} ⊂ Y are bases of X and Y , then (2.6) is a basis of X ⊗ Y . Again, similar facts hold
when vectors are replaced by operators or super-operators.
3. The inner product on X ⊗ Y satisfies
h x1 ⊗ y1 , x2 ⊗ y2 i = h x1 , x2 i h y1 , y2 i
for all x1 , x2 ∈ X and y1 , y2 ∈ Y . Similarly, the inner product on L (X1 ⊗ X2 , Y1 ⊗ Y2 ) satisfies
h A1 ⊗ A2 , B1 ⊗ B2 i = h A1, B1 i h A2 , B2 i
for all A1 , B1 ∈ L (X1 , Y1 ) and A2 , B2 ∈ L (X2 , Y2 ).
4. Let A ∈ L (X ) and B ∈ L (Y ) for complex Euclidean spaces X and Y , and let n = dim(X ) and
m = dim(Y ). Assume that spec( A) = {λ1 , . . . , λn } and spec( B) = {µ1 , . . . , µn }. Then
spec( A ⊗ B) = λi µ j : 1 ≤ i ≤ n, 1 ≤ j ≤ m
This implies, for instance, that
Tr( A ⊗ B) = Tr( A) Tr( B)
and
det( A ⊗ B) = det( A)m det( B)n .
Lecture 2. Mathematical Preliminaries – Part II 19
5. For any choice of A ∈ L (X1 , Y1 ) and B ∈ L (X2 , Y2 ), for complex Euclidean spaces X1 , X2 , Y1 ,
and Y2 , we have
( A ⊗ B)∗ = A∗ ⊗ B∗ , ( A ⊗ B )T = AT ⊗ BT , and A ⊗ B = A ⊗ B.
The first identity implies, for instance, that if A and B are both Hermitian, then so too is A ⊗ B.
(In fact, all of the special classes of operators discussed in Lecture 1 are preserved by taking
tensor products.)
Example 2.1 (The partial trace). Let X be a complex Euclidean space. We may of course view the
trace
Tr : L (X ) → C
as being a super-operator Tr ∈ T (X , C ) by making the identification C = L (C ). For a second
complex Euclidean space Y , we may therefore consider the super-operator
Tr ⊗1L(Y ) ∈ T (X ⊗ Y , Y ) .
for all A ∈ L (X ) and B ∈ L (Y ). This super-operator is called the partial trace, and is more
commonly denoted TrX . In general, the subscript refers to whatever space to which the trace is
applied, while the space or spaces that remains (Y in the case above) are implicit from the context
in which the super-operator is used.
We may alternately express the partial trace on X as follows, assuming that { xa : a ∈ Σ} is
any orthonormal basis of X :
TrX ( A) = ∑ ( x∗a ⊗ 1Y ) A( xa ⊗ 1Y )
a∈Σ
It will be enormously useful throughout this course to make use of a simple correspondence be-
tween the spaces L (X , Y ) and Y ⊗ X , for given complex Euclidean spaces X and Y .
We define the mapping
vec : L (X , Y ) → Y ⊗ X
to be the linear mapping that represents a change of bases from the standard basis of L (X , Y ) to
the standard basis of Y ⊗ X . Specifically, we define
vec( Eb,a ) = eb ⊗ ea
for all a ∈ Σ and b ∈ Γ, at which point the mapping is determined for every operator A ∈ L (X , Y )
by linearity. In the Dirac notation, this mapping amounts to flipping a bra to a ket:
The vec mapping is a linear bijection, which implies that every vector u ∈ Y ⊗ X uniquely
determines an operator A ∈ L (X , Y ) that satisfies vec( A) = u. It is also an isometry, in the sense
that
h A, Bi = hvec( A), vec( B)i
for all A, B ∈ L (X , Y ).
The following properties of the vec mapping are easily verified:
1. For every choice of complex Euclidean spaces X1 , X2 , Y1 , and Y2 , and every choice of operators
A ∈ L (X1 , Y1 ), B ∈ L (X2 , Y2 ), and X ∈ L (X2 , X1 ), it holds that
2. For every choice of complex Euclidean spaces X and Y , and every choice of operators A, B ∈
L (X , Y ), the following equations hold:
Example 2.2 (The Schmidt decomposition). Suppose u ∈ Y ⊗ X for given complex Euclidean
spaces X and Y . Let A ∈ L (X , Y ) be the unique operator for which u = vec( A). Then there
exists a singular value decomposition
r
A= ∑ si yi xi∗
i =1
of A. Consequently
!
r r r
u = vec( A) = vec ∑ si yi xi∗ = ∑ si vec (yi xi∗ ) = ∑ si yi ⊗ xi .
i =1 i =1 i =1
for positive real numbers s1 , . . . , sr and orthonormal sets {y1 , . . . , yr } ⊂ Y and {z1 , . . . , zr } ⊂ X .
Lecture 2. Mathematical Preliminaries – Part II 21
The last topic for this lecture concerns norms of operators. As for all vector spaces, a norm on the
space of operators L (X , Y ), for any choice of complex Euclidean spaces X and Y , is a function
k·k satisfying the following properties:
1. Positive definiteness: k A k ≥ 0 for all A ∈ L (X , Y ), with k A k = 0 if and only if A = 0.
2. Positive scalability: k αA k = | α | k A k for all A ∈ L (X , Y ) and α ∈ C.
3. The triangle inequality: k A + B k ≤ k A k + k B k for all A, B ∈ L (X , Y ).
Many interesting and useful norms can be defined on spaces of operators, but in this course
we will mostly be concerned with a single family of norms called Schatten p-norms. This family
includes the three most commonly used norms in quantum information theory: the spectral norm,
the Frobenius norm, and the trace norm.
For any operator A ∈ L (X , Y ) and any real number p ≥ 1, one defines the Schatten p-norms
of A as h i 1/p
k A k p = Tr ( A∗ A) p/2 .
We also define
k A k∞ = max {k Au k : u ∈ X , k u k = 1} , (2.11)
which happens to coincide with lim p→∞ k A k p , and therefore explains why the subscript ∞ is used.
For a given non-zero operator A ∈ L (X , Y ) with rank r ≥ 1, the vector s( A) of singular values
of A was defined in the previous lecture. For each real number p ∈ [1, ∞], it holds that the Schatten
p-norm of A coincides with the ordinary (vector) p-norm of s( A):
k A k p = k s( A)k p .
The Schatten p-norms satisfy many nice properties, some of which are summarized in the
following list:
1. The Schatten p-norms are non-increasing in p. In other words, for any operator A ∈ L (X , Y )
and for 1 ≤ p ≤ q ≤ ∞ we have
k A k p ≥ k A kq .
2. For every p ∈ [1, ∞], the Schatten p-norm is isometrically invariant (and therefore unitarily
invariant). This means that
k A k p = k U AV ∗ k p
for any choice of linear isometries U and V (which include unitary operators U and V) for
which the product U AV ∗ makes sense.
3. For each p ∈ [1, ∞], one defines p∗ ∈ [1, ∞] by the equation
1 1
+ ∗ = 1.
p p
For every operator A ∈ L (X , Y ), it holds that
n o
k A k p = max |h B, Ai| : B ∈ L (X , Y ) , k B k p∗ ≤ 1 .
This implies that
|h B, Ai| ≤ k A k p k B k p∗ ,
which is known as the Hölder Inequality for Schatten norms.
Lecture 2. Mathematical Preliminaries – Part II 22
4. For any choice of linear operators A ∈ L (X3 , X4 ), B ∈ L (X2 , X3 ), and C ∈ L (X1 , X2 ), and any
choice of p ∈ [1, ∞], we have
k ABC k p ≤ k A k∞ k B k p k C k∞ .
It follows that
k AB k p ≤ k A k p k B k p (2.12)
for all choices of p ∈ [1, ∞] and operators A and B for which the product AB exists. The
property (2.12) is known as submultiplicativity.
5. It holds that
k A k p = k A ∗ k p = k AT k p =
A
p ,
for every A ∈ L (X , Y ).
In this course we will generally write k·k rather than k·k∞ , but will not use the notation k·ktr
and k·kF .
To conclude this lecture, we will note just a few special properties of these three particular
norms:
1. The spectral norm. The spectral norm k·k = k·k∞ , also called the operator norm, is special in
several respects. It is the norm induced by the Euclidean norm, which is its defining property
(2.11). It satisfies the property
k A ∗ A k = k A k2
for every A ∈ L (X , Y ).
2. The Frobenius norm. Substituting p = 2 into the definition of k·k p we see that the Frobenius
norm k·k2 is given by q
∗ 1/2
k A k2 = [Tr ( A A)] = h A, Ai.
It is therefore the norm defined by the Hilbert-Schmidt inner product on L (X , Y ). In essence,
it is the norm that one obtains by thinking of elements of L (X , Y ) as ordinary vectors and
forgetting that they are operators:
s
k A k2 = k vec( A)k = ∑ | A( a, b)|2 ,
a,b
3. The trace norm. Substituting p = 1 into the definition of k·k p we see that the trace norm k·k1 is
given by √
k A k1 = Tr A∗ A .
k A k1 = max{|h A, U i| : U ∈ U (X )}.
k TrY ( A)k1 ≤ k A k1
while
k A k1 = max {|h A, U i| : U ∈ U (X ⊗ Y )} ;
and the inequality follows from the fact that the first maximum is taken over a subset of the
unitary operators for the second.
Finally, let us note that for any complex Euclidean space X and any choice of unit vectors
u, v ∈ X , we have q
∗ ∗
k uu − vv k1 = 2 1 − |hu, vi|2 . (2.13)
To see this, we note that the operator A = uu∗ − vv∗ is Hermitian and therefore normal, so
its singular values are the absolute values
p of its nonzero eigenvalues. It will therefore suffice
to prove that the eigenvalues of A are ± 1 − |hu, vi|2 , along with the eigenvalue 0 occurring
with multiplicity n − 2, where n = dim(X ). Given that Tr( A) = 0 and rank( A) ≤ 2, it is
evident that the eigenvalues of A are of the form ±λ for some λ ≥ 0, along with eigenvalue 0
with multiplicity n − 2. As
2λ2 = Tr( A2 ) = 2 − 2|hu, vi|2
p
we conclude λ = 1 − |hu, vi|2 , from which Equation (2.13) follows.
Theory of Quantum Information Lecture Notes
John Watrous (Institute for Quantum Computing, University of Waterloo)
Lecture 3
Registers, reduced states, and measurements
In this lecture we will begin to discuss the basic objects of quantum information, starting with
registers, states, and measurements.
D (X ) = {ρ ∈ Pos (X ) : Tr(ρ) = 1} .
A reduced state of either a register or a register array is a density operator acting on the complex
Euclidean space associated with the register or register array. When we refer to a state of a given
register, it should be interpreted that we are referring to the register’s reduced state.
A register array (X1 , . . . , Xn ) is said to be in a product reduced state, or just a product state, if its
reduced state has the form ρ1 ⊗ · · · ⊗ ρn for ρ1 ∈ D (X1 ) , . . . , ρn ∈ D (Xn ).
24
Lecture 3. Registers, reduced states, and measurements 25
3.1.3 Measurements
A measurement on a complex Euclidean space X is a function of the form
µ : Γ → Pos (X ) : a 7→ Pa , (3.1)
where Γ is a finite, nonempty set of measurement outcomes. This function must satisfy the constraint
∑ Pa = 1X .
a∈Γ
Often we express such a measurement in the form { Pa : a ∈ Γ}, with the understanding that it is a
collection indexed by Γ and not an unordered set of operators. Each operator Pa is the measurement
operator associated with the outcome a ∈ Γ.
When a measurement of the form (3.1) is applied to a register, or a register array, whose associ-
ated complex Euclidean space is X and whose reduced state is ρ ∈ D (X ), two things happen:
1. An element of Γ is selected at random, according to a distribution described by the probability
vector p ∈ R Γ defined by
p( a) = h Pa , ρi
for each a ∈ Γ.
2. The register, or register array, that was measured ceases to exist.
The assumption that the register, or register array, that is measured ceases to exist after the mea-
surement takes place creates no loss of generality. The notion of a nondestructive measurement,
which supposes that registers are not destroyed by being measured, and specifies their reduced
state after the measurements on them take place, can be formulated by combining measurements
as just defined with quantum operations (as we will discuss later).
A measurement of the form
µ : Γ1 × · · · × Γn → Pos (X1 ⊗ · · · ⊗ Xn ) ,
µ1 : Γ1 → Pos (X1 ) ,
..
.
µn : Γn → Pos (Xn )
Each operator Q a,b is positive semidefinite, and the set {Q a,b : ( a, b) ∈ Σ × Σ} spans the space
L (X ). With the exception of the trivial case | Σ | = 1, the operator
Q= ∑ Q a,b
( a,b )∈Σ× Σ
differs from the identity operator, which means that {Q a,b : ( a, b) ∈ Σ × Σ} is not generally a
measurement. The operator Q is, however, positive definite, and by defining
Pa,b = Q−1/2 Q a,b Q−1/2
we have that { Pa,b : ( a, b) ∈ Σ × Σ} is an information-complete measurement.
Lecture 3. Registers, reduced states, and measurements 27
As this holds for all measurements on X1 , we have that the reduced state of X1 must be given
by TrX2 (ρ).
One may extend this second line of reasoning in the following way. Suppose a1 ∈ Γ1 satisfies
p1 ( a1 ) 6= 0, and consider the distribution of outcomes on Γ2 that is obtained by conditioning on
the outcome of the measurement µ1 being a1 . This distribution is given by
( a1 ) p ( a1 , a2 ) h Pa1 ⊗ Pa2 , ρi D
(a )
E
p2 ( a2 ) = = = Pa2 , ρ2 1
p1 ( a1 ) h Pa1 ⊗ 1 X2 , ρi
for
( a1 ) TrX1 [( Pa1 ⊗ 1 X2 )ρ]
ρ2 = , (3.2)
h Pa1 ⊗ 1 X2 , ρi
and where we have denoted Pa1 = µ1 ( a1 ) and Pa2 = µ2 ( a2 ) for every a1 ∈ Γ1 and a2 ∈ Γ2 . If the
measurement µ1 is performed on X1 , and the outcome a1 ∈ Γ1 is obtained, we therefore have that
the reduced state of X2 conditioned on this outcome must be given by (3.2).
k p − q k1 = ∑ | p(a) − q(a)| .
a∈Γ
This is a natural measure of distance because it quantifies the optimal probability that two known
probability vectors can be distinguished, given a single sample from the distributions they specify.
As an example, let us consider a thought experiment involving two hypothetical people: Alice
and Bob. Two probability vectors p0 , p1 ∈ R Γ are fixed, and are considered to be known by both
Alice and Bob. Alice privately chooses a random bit b ∈ {0, 1}, uniformly at random, and uses
the value b to randomly chooses an element a ∈ Γ: if b = 0, she samples a according to p0 , and if
b = 1, she samples a according to p1 . The sampled element a ∈ Γ is given to Bob, whose goal is to
identify the value of Alice’s random bit b.
It is clear from Bayes’ Theorem what Bob should do to maximize his probability to correctly
guess the value of b: if p0 ( a) > p1 ( a), he should guess that b = 0, while if p0 ( a) < p1 ( a) he should
guess that b = 1. In case p0 ( a) = p1 ( a), Bob may as well guess that b = 0 or b = 1 arbitrarily, for
he has learned nothing at all about the value of b from such an element a ∈ Γ. Bob’s probability to
correctly identify the value of b using this strategy is
1 1
+ k p0 − p1 k1 ,
2 4
which can be verified by first calculating the probability Bob is correct minus the probability he is
incorrect, which is 12 k p0 − p1 k1 . Bob can do no better than this.
Lecture 3. Registers, reduced states, and measurements 29
Proposition 3.3. Let X be a complex Euclidean space, let A ∈ Herm (X ) be a Hermitian operator, and
let { Pa : a ∈ Γ} be a measurement on X . Define v ∈ R Γ as v( a) = h Pa , Ai for each a ∈ Γ. Then
k v k 1 ≤ k A k1 .
Proof. We have * +
k v k1 = ∑ |h Pa , Ai| = ∑ αa Pa , A (3.3)
a∈Γ a∈Γ
for some choice of scalars α a ∈ {−1, +1}. For every unit vector u ∈ X it holds that
!
u∗ ∑ αa Pa u= ∑ αa h Pa , uu∗ i
a∈Γ a∈Γ
is a convex combination of the values {−1, +1}, and therefore lies in the interval [−1, 1]. As
∑ a∈Γ α a Pa is Hermitian, it follows that k ∑ a∈Γ α a Pa k ≤ 1, and therefore
* +
∑ α a Pa , A ≤
∑ α a Pa
k A k1 ≤ k A k1 .
a∈Γ
a∈Γ
2. Define
P0 = ∑ u j u∗j and P1 = ∑ u j u∗j .
j:λ j ≥0 j:λ j <0
(The choice to include the terms u j u∗j for which λ j = 0 in P0 is arbitrary—they could be in-
cluded in P1 or split between P0 and P1 arbitrarily without changing the optimality of the
resulting measurement.)
Lecture 3. Registers, reduced states, and measurements 30
as required.
To illustrate the connection we have just established, let us consider a thought experiment
similar to the one above, except with density operators ρ0 , ρ1 ∈ D (X ) in place of the probability
vectors p0 and p1 . Alice chooses a random bit b uniformly; and if b = 0, she prepares a register X in
the reduced state ρ0 ∈ D (X ) and hands it to Bob, and if b = 1, she does likewise with ρ1 ∈ D (X ).
To make a guess for Alice’s random bit b, Bob performs a measurement of X, to extract classical
information from it, to help him make a decision.
Regardless of what measurement Bob performs, he will obtain measurement outcomes, dis-
tributed according to p0 and p1 for the two cases, for which k p0 − p1 k1 ≤ k ρ0 − ρ1 k1 . His proba-
bility to correctly identify Alice’s random bit will therefore be at most
1 1
+ k ρ0 − ρ1 k1 . (3.4)
2 4
By choosing the particular measurement { P0 , P1 } defined above, and letting the outcome of the
measurement be his guess for b, we find that Bob’s probability of correctness achieves the value
(3.4), and is therefore optimal.
The observations made above concerning the optimal probability with which two density opera-
tors can be distinguished can be generalized to arbitrary convex sets of density operators, as the
following theorem establishes.
Theorem 3.4. Let A0 , A1 ⊆ D (X ) be nonempty convex sets of density operators. Then there exists a
measurement { P0 , P1 } ⊂ Pos (X ) with the property that
1
h P0 − P1 , ρ0 − ρ1 i ≥ dist(A0 , A1 )
2
for all choices of ρ0 ∈ A0 and ρ1 ∈ A1 .
In the statement of the theorem, we have used the notation
dist(A0 , A1 ) = inf {k ρ0 − ρ1 k1 : ρ0 ∈ A0 , ρ1 ∈ A1 } .
What is important about the theorem is that the choice of a measurement { P0 , P1 } comes first, after
which the density operators ρ0 ∈ A0 and ρ1 ∈ A1 may be chosen arbitrarily.
To prove the theorem, we require a fundamental fact from convex analysis. It is one form of a
general type of fact that states that disjoint convex sets can be separated by a hyperplane.
Fact. Let A, B ⊆ Herm (X ) be nonempty, disjoint, convex sets such that B is open. Then there
exists a Hermitian operator H ∈ Herm (X ) and a real number α ∈ R such that
h H, Ai ≥ α > h H, Bi
for all A ∈ A and B ∈ B .
Lecture 3. Registers, reduced states, and measurements 31
Proof of Theorem 3.4. Let d = dist(A0 , A1 ). If d = 0, the theorem is trivially satisfied by the mea-
surement defined by P0 = P1 = 21 1. We will therefore consider just the case d > 0 for the remainder
of the proof.
Define
A = A 0 − A 1 = { ρ0 − ρ1 : ρ0 ∈ A 0 , ρ1 ∈ A 1 },
and let
B = { B ∈ Herm (X ) : k B k1 < 1}.
The sets A and B are nonempty, disjoint sets of Hermitian operators, both are convex, and B is
open. Using the fact from above, we have that there exists an operator H ∈ Herm (X ) and a scalar
α ∈ R such that
h H, Ai ≥ α > h H, Bi
for all A ∈ A and B ∈ B .
For every choice of B ∈ B we have that − B ∈ B as well, and therefore |h H, Bi| < α for all
B ∈ B . This implies first that α > 0, and second (using the fact that the trace norm and operator
norm are dual to one another) that k H k ≤ α/d.
Now let K = αd H, which is a Hermitian operator with k K k ≤ 1. Using a spectral decomposition
of K, we may write
K = K0 − K1
for K0 , K1 ∈ Pos (X ) satisfying K0 K1 = 0. (Sometimes this is called a Jordan decomposition of K. It
is different from and should not be confused with the Jordan Normal Form.) Given that k K k ≤ 1,
we have that K0 + K1 ≤ 1.
Finally, define
1 1
P0 = K0 + (1 − K0 − K1 ) and P1 = K1 + (1 − K0 − K1 ).
2 2
We have that { P0 , P1 } is a measurement on X , and it remains to verify that it has the properties
required to satisfy the theorem: for every choice of ρ0 ∈ A0 and ρ1 ∈ A1 we have
d
h P0 − P1 , ρ0 − ρ1 i = hK, ρ0 − ρ1 i = h H, ρ0 − ρ1 i ≥ d
α
as required.
We may apply this theorem to another variant of the thought experiment from above. As
before, Alice chooses b ∈ {0, 1} uniformly, but now is permitted to prepare a register X in any
reduced state ρ0 ∈ A0 in case b = 0 and any reduced state ρ1 ∈ A1 in case b = 1. Bob knows
A0 and A1 , but does not know which state ρ0 or ρ1 was selected by Alice. Nevertheless, he may
perform a measurement { P0 , P1 } which correctly determines the value of b with probability at
least
1 1
+ dist(A0 , A1 ),
2 4
even if Alice selected ρ0 or ρ1 with knowledge of { P0 , P1 } and in an attempt to minimize Bob’s
probability of correctness.
Theory of Quantum Information Lecture Notes
John Watrous (Institute for Quantum Computing, University of Waterloo)
Lecture 4
Pure states, purifications, and fidelity
32
Lecture 4. Pure states, purifications, and fidelity 33
Lemma 4.1. Let X and Y be complex Euclidean spaces and let P ∈ Pos (X ) be a non-zero positive
semidefinite operator. Then the following facts hold.
1. There exists an operator A ∈ L (Y , X ) such that AA∗ = P if and only if dim(Y ) ≥ rank( P).
X ) that satisfies AA∗ = P, there exists an operator B ∈ L (Y , X ) such
2. For every operator A ∈ L (Y ,√
that BB∗ = Πim( P) and A = PB.
3. For any choice of operators A1 , A2 ∈ L (Y , X ) satisfying A1 A1∗ = A2 A2∗ = P, there exists a unitary
operator U ∈ L (Y ) such that A1 U = A2 .
Now let us prove the first item. If it holds that dim(Y ) ≥ rank( P) = r, then there exists an
orthonormal set {y1 , . . . , yr } in Y . Defining
r
λi xi y∗i ∈ L (Y , X ) ,
p
A= ∑
i =1
Finally, let us prove the last item. As before, we have that there must exist orthonormal sets
{y1 , . . . , yr } and {z1 , . . . , zr } in Y for which
r r
λi xi y∗i λi xi z∗i
p p
A1 = ∑ and A2 = ∑
i =1 i =1
are singular value decompositions of A1 and A2 . These orthonormal sets may be extended to
orthonormal bases {y1 , . . . , yn } and {z1 , . . . , zn } of Y , where n = dim(Y ). It follows that
n
U= ∑ yi z∗i ∈ U (Y )
i =1
Now, with the above lemma in hand, it will be straightforward to prove the following theorems
about purifications.
Theorem 4.2. Let X and Y be complex Euclidean spaces and let P ∈ Pos (X ) be a positive semidefinite
operator. Then a purification of P exists in X ⊗ Y if and only if dim(Y ) ≥ rank( P).
Proof. The assumption dim(Y ) ≥ rank( P) implies, by item 1 of Lemma 4.1, that there exists an
operator A ∈ L (Y , X ) such that AA∗ = P. The vector vec( A) ∈ X ⊗ Y satisfies
The following theorem demonstrates that for any choice of u, these are the only possible purifica-
tions of P in X ⊗ Y .
Theorem 4.4 (Unitary equivalence of purifications). Suppose X and Y are complex Euclidean spaces
and P ∈ Pos (X ) is a positive semidefinite operator for which rank( P) ≤ dim(Y ). Then for any choice of
purifications u1 , u2 ∈ X ⊗ Y of P there exists a unitary operator V ∈ U (Y ) such that (1 ⊗ V )u1 = u2 .
Proof. Let A1 , A2 ∈ L (Y , X ) be the unique operators satisfying u1 = vec( A1 ) and u2 = vec( A2 ).
The assumptions of the theorem imply that A1 A1∗ = A2 A2∗ . It follows from item 3 of Lemma 4.1
that there exists a unitary operator U ∈ U (Y ) such that A1 U ∗ = A2 . Thus,
4.2 Fidelity
There are several ways that one may quantify the similarity or difference between density oper-
ators. One way that relates closely to the notion of purifications is known as the fidelity. This
will turn out to be an extremely useful notion throughout the course, and you will see it used
extensively in the theory of quantum information.
Equivalently, q√ √
F( P, Q) = Tr PQ P.
Similar to purifications, it is common to see the fidelity defined only for density operators as
opposed to arbitrary positive semidefinite operators. It is, however, useful to extend the definition
to all positive semidefinite operators as we have done.
In particular, F (uu∗ , vv∗ ) = |hu, vi| for any choice of unit vectors u, v ∈ X .
One nice property of the fidelity that we will utilize several times is that it is multiplicative
with respect to tensor products. This fact is stated in the following proposition.
Proposition 4.6. Let P1 , Q1 ∈ Pos (X1 ) and P2 , Q2 ∈ Pos (X2 ) be positive semidefinite operators. Then
F( P1 ⊗ P2 , Q1 ⊗ Q2 ) = F( P1 , Q1 ) F( P2 , Q2 ).
Proof. We have
p p
F( P1 ⊗ P2 , Q1 ⊗ Q2 ) =
P1 ⊗ P2 Q1 ⊗ Q2
1
p √ p p
=
P1 ⊗ P2 Q1 ⊗ Q2
1
p p √ p
=
P1 Q1 ⊗ P2 Q2
p p
√ p 1
=
P1 Q1
P2 Q2
1 1
= F( P1 , Q1 ) F( P2 , Q2 )
as claimed.
Lecture 4. Pure states, purifications, and fidelity 36
Theorem 4.7 (Uhlmann’s Theorem). Let X and Y be complex Euclidean spaces, let P, Q ∈ Pos (X ) be
positive semidefinite operators, both having rank at most dim(Y ), and let u ∈ X ⊗ Y be any purification
of P. Then
F( P, Q) = max {|hu, vi| : v ∈ X ⊗ Y is a purification of Q} .
Proof. It follows from Lemma 4.1 that there exists an operator A ∈ L (X , Y√ ) such that A∗ A =
Πim( P) and such that the purification u ∈ X ⊗ Y of P takes the form u = vec( PA∗ ). Along sim-
√
ilar lines, there exists an operator B ∈ L (X , Y ) satisfying B∗ B = Πim( Q) such that vec( QB∗ ) ∈
X ⊗ Y purifies Q; and moreover every purification of Q in X ⊗ Y takes the form
v = vec( QB∗ U ∗ )
p
Thus,
√ p
F( P, Q) =
A P QB∗
= max {|hu, vi| : v ∈ X ⊗ Y , TrY (vv∗ ) = Q}
1
as required.
Various properties of the fidelity follow from Uhlmann’s Theorem. For example, it is clear
from the theorem that 0 ≤ F(ρ, ξ ) ≤ 1 for density operators ρ and ξ. Moreover F(ρ, ξ ) =√1 if and
√
only if ρ = ξ. It is also evident (from the definition) that F(ρ, ξ ) = 0 if and only if ρ ξ = 0,
which is equivalent to ρξ = 0 (i.e., to ρ and ξ having orthogonal images).
Another property of the fidelity that follows from Uhlmann’s Theorem is as follows.
Proof. Let Y be a complex Euclidean space having dimension at least that of X , and choose vectors
u1 , . . . , uk , v1 , . . . , vk ∈ X ⊗ Y satisfying TrY (ui u∗i ) = Pi , TrY (vi v∗i ) = Qi , and hui , vi i = F( Pi , Qi )
for each i = 1, . . . , k. Such vectors exist by Uhlmann’s Theorem. Let Z be a complex Euclidean
space with dimension k and define u, v ∈ X ⊗ Y ⊗ Z as
k k
u= ∑ u i ⊗ ei and v= ∑ vi ⊗ ei .
i =1 i =1
Then
k k
TrY ⊗Z (uu∗ ) = ∑ Pi and TrY ⊗Z (vv∗ ) = ∑ Qi .
i =1 i =1
as required.
It follows from this proposition is that the fidelity function is concave in the first argument:
for all ρ1 , ρ2 , ξ ∈ D (X ) and λ ∈ [0, 1], and by symmetry it is concave in the second argument as
well. In fact, the fidelity is jointly concave:
Theorem 4.9 (Alberti). Let X be a complex Euclidean space and let P, Q ∈ Pos (X ) be positive semidefi-
nite operators. Then D E
(F( P, Q))2 = inf h R, Pi R−1 , Q .
R∈Pd(X )
To prove this theorem, it is helpful to start first with the special case that P = Q, which is
represented by the following lemma.
given that R = 1 is positive definite. To establish the reverse inequality, it suffices to prove that
D E
h R, Pi R−1, P ≥ (Tr( P))2
for any choice of R ∈ Pd (X ). This will follow from the simple observation that, for any choice of
positive real numbers α and β, we have α2 + β2 ≥ 2αβ and therefore αβ−1 + βα−1 ≥ 2. With this
fact in mind, consider a spectral decomposition
n
R= ∑ λi ui u∗i .
i =1
We have
D E
h R, Pi R−1 , P = ∑ λi λ− 1 ∗ ∗
j ( u i Pu i )( u j Pu j )
1≤ i,j≤ n
= (Tr( P))2
as required.
Proof of Theorem 4.9. We will first prove the theorem for P and Q positive definite. Let us define
S ∈ Pd (X ) to be
√ √ −1/4 √ √ √ √ −1/4
S= PQ P PR P PQ P .
Notice that as R ranges over all positive definite operators, so too does S. We have
√ √
1/2
S, PQ P = h R, Pi ,
√ √ 1/2 D E
−1
S , PQ P = R −1 , Q .
= (F( P, Q))2 .
To prove the general case, let us first note that, for any choice of R ∈ Pd (X ) and ε > 0, we have
D E D E
h R, Pi R−1 , Q ≤ h R, P + ε1 i R−1 , Q + ε1 .
Lecture 4. Pure states, purifications, and fidelity 39
Thus, D E
inf h R, Pi R−1 , Q ≤ (F( P + ε1, Q + ε1 ))2
R∈Pd(X )
we have D E
inf h R, Pi R−1, Q ≤ (F( P, Q))2 .
R∈Pd(X )
Lemma 4.11. Let X be a complex Euclidean space and let P, Q ∈ Pos (X ) be positive semidefinite opera-
tors on X . Then
√ p
2
k P − Q k1 ≥
P − Q
.
2
Proof. Let
√ n
∑ λi ui u∗i
p
P− Q=
i =1
√ p √ p
be a spectral decomposition of P− Q. Given that P− Q is Hermitian, it follows that
n
√
2
2
p
∑ i
| λ | = P − Q
.
2
i =1
Now, define
n
U= ∑ sign(λi )ui u∗i
i =1
where
1 if λ ≥ 0
sign(λ) =
−1 if λ < 0
Lecture 4. Pure states, purifications, and fidelity 40
k P − Q k1 ≥ | Tr (( P − Q)U )|
1 √ √ 1 √ √
p p p p
= Tr ( P − Q)( P + Q)U + Tr ( P + Q)( P − Q)U
2 2
√ p √ p
= Tr P − Q P+ Q .
as required.
Theorem 4.12 (Fuchs–van de Graaf Inequalities). Let X be a complex Euclidean space and assume that
ρ, ξ ∈ D (X ) are density operators on X . Then
r
1 1
1 − k ρ − ξ k1 ≤ F(ρ, ξ ) ≤ 1 − k ρ − ξ k21 .
2 4
Proof. The operators ρ and ξ have unit trace, and therefore
√ √ √ p
p
2 p 2
ρ − ξ
= Tr ρ − ξ = 2 − 2 Tr ρ ξ ≥ 2 − 2 F(ρ, ξ ).
2
and therefore r
1
F(ρ, ξ ) ≤ 1− k ρ − ξ k21
4
as required.
Theory of Quantum Information Lecture Notes
John Watrous (Institute for Quantum Computing, University of Waterloo)
Lecture 5
Naimark’s Theorem; Characterization of quantum operations
Most of this lecture will be devoted to establishing characterizations of quantum operations, based
on different ways of representing super-operators. First, though, we will discuss one more impor-
tant fact about measurements, known as Naimark’s Theorem, that relates general measurements to
projective measurements.
Theorem 5.1 (Naimark’s Theorem). Let X be a complex Euclidean space, let { Pa : a ∈ Γ} ⊂ Pos (X )
be a measurement, and let Y = C Γ . Then there exists a linear isometry A ∈ U (X , X ⊗ Y ) such that
Pa = A∗ (1 X ⊗ Ea,a ) A
for every a ∈ Γ.
Proof. Define A ∈ L (X , X ⊗ Y ) as √
A= ∑ Pa ⊗ ea .
a∈Γ
We have that
A∗ A = ∑ Pa = 1X ,
a∈Γ
The theorem is also sometimes called Neumark’s Theorem. The two names refer to the same
individual, Mark Naimark, whose name has been transliterated in these two different ways.
It should be noted that the term Naimark’s Theorem typically refers to a more general theorem
than the one above, which only has this extremely simple proof for finite dimensional vector
spaces and measurements with a finite number of outcomes. Despite the theorem’s simplicity, as
stated above, it is important for understanding the relationship between general measurements
and projective measurements.
41
Lecture 5. Naimark’s Theorem; Characterization of quantum operations 42
Now let us return to the discussion from above, where an arbitrary measurement
{ Pa : a ∈ Γ} ⊂ Pos (X )
Pa = A∗ (1 X ⊗ Ea,a ) A
for every a ∈ Γ. For an arbitrary, but fixed, choice of a unit vector u ∈ Y , we may choose a unitary
operator U ∈ U (X ⊗ Y ) for which
A = U (1 X ⊗ u ).
Consider the projective measurement {Q a : a ∈ Γ} ⊂ Pos (X ⊗ Y ) defined as
Q a = U ∗ (1 X ⊗ Ea,a )U
for each a ∈ Γ. If a density operator ρ ∈ D (X ) is given, and the registers (X, Y ) are prepared in
the reduced state ρ ⊗ uu∗ , then this projective measurement results in each outcome a ∈ Γ with
probability
h Q a , ρ ⊗ uu∗ i = h A∗ (1 X ⊗ Ea,a ) A, ρi = h Pa , ρi .
The probability for each measurement outcome is therefore in agreement with the original mea-
surement { Pa : a ∈ Γ}.
for all X ∈ L (X ). We will call the operator K (Φ) the natural representation of Φ. (It is also some-
times called the linear representation of Φ.) A concrete expression for K (Φ) is as follows:
from which it follows that K (Φ) vec( X ) = vec(Φ( X )) for all X by linearity.
Notice that the mapping K : T (X , Y ) → L (X ⊗ X , Y ⊗ Y ) is itself linear:
K (αΦ + βΨ) = αK (Φ) + βK (Ψ)
for all choices of α, β ∈ C and Φ, Ψ ∈ T (X , Y ). In fact, K is a linear bijection, as the above
form (5.1) makes clear that for every operator A ∈ L (X ⊗ X , Y ⊗ Y ) there exists a unique choice
of Φ ∈ T (X , Y ) for which A = K (Φ).
Recall that the adjoint of a super-operator Φ ∈ T (X , Y ) is the unique super-operator Φ∗ ∈
T (Y , X ) for which
hY, Φ( X )i = hΦ∗ (Y ), X i
for all X ∈ L (X ) and Y ∈ L (Y ). The natural representation perfectly reflects the notion of
adjoints: we have K (Φ∗ ) = (K (Φ))∗ for all super-operators Φ ∈ T (X , Y ).
The term Kraus representation is sometimes reserved for the case that A j = Bj for each j =
1, . . . , k. Such a representation exists, as we will see shortly, if and only if Φ is completely positive.
Lecture 5. Naimark’s Theorem; Characterization of quantum operations 44
for all X ∈ L (X ). Then the expression (5.3) is said to be a Stinespring representation of Φ. Similar
to the operator-sum representation, a Stinespring representation always exists for a given Φ and
is never unique.
If Φ ∈ T (X , Y ) is given by the above equation (5.3), then
Φ∗ (Y ) = A∗ (Y ⊗ 1 Z ) B.
for all X ∈ L (X ).
2. (Stinespring representations.) For Z = C k and A, B ∈ L (X , Y ⊗ Z) defined as
k k
A= ∑ Aj ⊗ ej and B= ∑ Bj ⊗ e j ,
j =1 j =1
Proof. The equivalence between items 1 and 2 is a straightforward calculation. The equivalence
between items 1 and 3 follows from the identity
for each j = 1, . . . , k and every X ∈ L (X ). Finally, the equivalence between items 1 and 4 follows
from the expression
J (Φ) = (Φ ⊗ 1L(X ) ) (vec(1 X ) vec(1 X )∗ )
along with
for each j = 1, . . . , k.
Various facts may be derived from the above proposition. For instance, it follows that every
super-operator Φ ∈ T (X , Y ) has a Kraus representation in which k = rank( J (Φ)) ≤ dim(X ⊗ Y ),
and similarly that every such Φ has a Stinespring representation in which dim(Z) = rank( J (Φ)).
Now we are ready to characterize quantum operations in terms of their Choi-Jamiołkowski, Kraus,
and Stinespring representations. (The natural representation does not happen to help us with
respect to these particular characterizations. This is not surprising, because it essentially throws
away the operator structure of the input and output of super-operators.) We will begin with a
characterization of completely positive super-operators in terms of these representations.
1. Φ is completely positive.
2. Φ ⊗ 1L(X ) is positive.
3. J (Φ) ∈ Pos (Y ⊗ X ).
4. There exists a positive integer k and operators A1 , . . . , Ak ∈ L (X , Y ) such that
k
Φ( X ) = ∑ Ai X A∗i (5.4)
i =1
for all X ∈ L (X ).
5. Item 4 holds for k = rank( J (Φ)).
6. There exists a complex Euclidean space Z and an operator A ∈ L (X , Y ⊗ Z) such that
Φ( X ) = TrZ ( AX A∗ )
for all X ∈ L (X ).
7. Item 6 holds for Z having dimension equal to the rank of J (Φ).
Lecture 5. Naimark’s Theorem; Characterization of quantum operations 46
Proof. Let us first note the following simple implications: item 1 implies item 2 by the definition
of complete positivity, item 5 implies item 7 by Proposition 5.2 (and likewise item 4 implies item
6), item 5 trivially implies item 4, and item 7 trivially implies item 6.
Super-operators of the form X 7→ AX A∗ are easily seen to be completely positive, and non-
negative linear combinations of completely positive super-operators are completely positive as
well. Item 4 therefore implies item 1. Along similar lines, the partial trace is completely positive
and completely positive super-operators are closed under composition. Item 6 therefore implies
item 1.
Now assume that Φ ⊗ 1L(X ) is positive. As
we have
J (Φ) = Φ ⊗ 1L(X ) (vec(1 X ) vec(1 X )∗ ) ∈ Pos (Y ⊗ X ) .
and so by Proposition 5.2 we have that equation (5.4) holds for every X ∈ L (X ). Item 3 therefore
implies item 5.
The above implications prove the equivalence of items 1 through 7, and so the theorem is
proved.
1. Φ is trace-preserving.
2. TrY ( J (Φ)) = 1 X .
3. There exists a Kraus representation
k
Φ( X ) = ∑ Ai XBi∗
i =1
Φ( X ) = TrZ ( AXB∗ )
The equality A∗ B = 1 X follows, and so we have proved that item 1 implies item 6. Of course, item
6 trivially implies item 5. Finally, assume that
Φ( X ) = TrZ ( AXB∗ )
Therefore item 5 implies item 1, and we have proved the equivalence of items 1, 5, and 6.
Along similar lines we may prove the equivalence of items 1, 2, 4, and 5. Specifically, if Φ is
trace-preserving, and if
k
Φ( X ) = ∑ A j XB∗j
j =1
The equality
k
∑ A∗ Bj = 1 X
j =1
Lecture 5. Naimark’s Theorem; Characterization of quantum operations 48
follows, and so we have proved that item 1 implies item 4. Like before, we have that item 4
trivially implies item 3. Now suppose
k
Φ( X ) = ∑ A j XB∗j
j =1
is a Kraus representation of Φ for which ∑kj=1 A∗j Bj = 1 X . Then ∑kj=1 B∗j A j = 1 X , and by Proposi-
tion 5.2 we have
k
J ( Φ) = ∑ vec( A j ) vec( Bj )∗ .
j =1
Thus
k T
TrY ( J (Φ)) = ∑ B∗j A j = 1X ,
j =1
We conclude that
1 if a = b
Tr(Φ( Ea,b )) =
0 if a 6= b,
so by linearity we have that Φ preserves trace. Thus item 2 implies item 1, and the theorem is
proved.
The two theorems are now easily combined to give the following characterization of admissible
super-operators.
Corollary 5.5. Let Φ ∈ T (X , Y ). Then the following are equivalent:
1. Φ is admissible.
2. J (Φ) ∈ Pos (Y ⊗ X ) and TrY ( J (Φ)) = 1 X .
3. There exists a positive integer k and operators A1 , . . . , Ak ∈ L (X , Y ) such that
k
Φ( X ) = ∑ Ai X A∗i
i =1
Φ( X ) = TrZ ( AX A∗ )
for all X ∈ L (X ).
6. Item 5 holds for a complex Euclidean space Z with dim(Z) = rank( J (Φ)).
Theory of Quantum Information Lecture Notes
John Watrous (Institute for Quantum Computing, University of Waterloo)
Lecture 6
Examples of states, measurements, and operations
The purpose of this lecture is to discuss a selection of examples of states, measurements, and oper-
ations. We will see some of these examples a bit later in the course, and the last one (teleportation)
is very important in quantum information theory. You should, however, not read too much into
the particular selection of examples in this lecture—it is convenient to discuss these examples now,
for the sake of lectures coming up, but there are of course many important examples that we will
not discuss at the present time.
I assume that you are familiar with convexity and the basic notions that concern it. Recall that
a subset C ⊆ V of some vector space V is convex if, for all choices of x, y ∈ C and λ ∈ [0, 1], it
holds that λx + (1 − λ)y ∈ C . When we refer to a convex combination of elements from some subset
A ⊆ V of a vector space V , we are referring to a (finite) linear combination of the form
N
∑ pj xj ,
j =1
where x1 , . . . , x N ∈ A and ( p1 , . . . , p N ) is a probability vector. The set of all such convex combina-
tions is called the convex hull of A, and is denoted conv(A). The set conv(A) is obviously convex,
and is the smallest convex set that contains A (meaning that any other convex set that contains A
must also contain conv(A)).
We have, of course, seen several examples of convex sets thus far in the course. For example,
for any choice of complex Euclidean space X , the set X itself, as well as L (X ), Herm (X ), Pos (X ),
and D (X ), are all obviously convex.
Convex combinations of admissible super-operators are also admissible: for any choice of ad-
missible super-operators Φ, Ψ ∈ T (X , Y ), and any real number λ ∈ [0, 1], it holds that the super-
operator
λΦ + (1 − λ)Ψ ∈ T (X , Y )
is also admissible. One can verify this in many ways, one of which is to consider the Choi-
Jamiołkowski representation:
Given that Φ and Ψ are completely positive, we have J (Φ), J (Ψ) ∈ Pos (Y ⊗ X ), and because
Pos (Y ⊗ X ) is convex it holds that J (λΦ + (1 − λ)Ψ) ∈ Pos (Y ⊗ X ). Thus, λΦ + (1 − λ)Ψ is
completely positive. The fact that λΦ + (1 − λ)Ψ preserves trace is immediate by linearity.
For any choice of a complex Euclidean space X and a finite, non-empty set Γ, the collection of
all measurements of the form
µ : Γ → Pos (X )
49
Lecture 6. Examples of states, measurements, and operations 50
is convex, assuming that addition and scalar multiplication are defined point-wise. That is, if
µ1 , µ2 : Γ → Pos (X ) are measurements and λ ∈ [0, 1], then then function
Suppose X and Y are arbitrary complex Euclidean spaces. It is clear that every linear isometry
A ∈ U (X , Y ) gives rise to an admissible super-operator Φ ∈ T (X , Y ) defined as
Φ( X ) = AX A∗
for all X ∈ L (X ). Such a super-operator represents an isometric quantum operation. Along similar
lines, an admissible super-operator Φ ∈ T (X ) defined as
Φ( X ) = UXU ∗
for all X ∈ L (X ).
A natural question to ask about mixed unitary super-operators is: for a given mixed-unitary
super-operator Φ, how large does N need to be in an expression of Φ in the form (6.1)? Initially it
is tempting to answer that N = dim(X )2 suffices, given what we learned in the previous lecture.
This is not necessarily so, however—for although any super-operator acting on L (X ) has a Kraus
representation with at most dim(X )2 terms, we cannot guarantee that the corresponding Kraus
operators are unitary.
Instead, we will need the following fundamental theorem about convex hulls, which places
a bound on the number of elements over which convex combinations must be taken to express
elements in convex hulls.
Theorem 6.1 (Carathéodory’s Theorem). Let V be an n-dimensional real vector space and let A ⊆ V be
any non-empty subset of V . Then every vector u ∈ conv(A) can be written as
n +1
u= ∑ pj xj
j =1
To apply this theorem to the case at hand, we must identify a real vector space in which we
are working. For each unitary operator U ∈ U (X ), let us write ΦU ∈ T (X ) to denote the super-
operator defined by
ΦU ( X ) = UXU ∗
for all X ∈ L (X ). Then we have
The set Herm (X ⊗ X ) is a real vector space with dimension n4 , for n = dim(X ), and is (essen-
tially) the real vector space we are looking for.
Now, for any mixed-unitary super-operator Φ, we have
in fact has dimension n4 − 2n2 + 1, owing to the fact that there are linear constraints that each
operator vec(U ) vec(U )∗ must satisfy. An expression of the form (6.2) is therefore possible for
every mixed-unitary super-operator Φ ∈ T (X ) for N = n4 − 2n2 + 2.
Sometimes it is convenient to consider measurements that do not destroy registers, but rather
leave them in some reduced state that may depend on the measurement outcome that is obtained
from the measurement. We will refer to such processes as non-destructive measurements.
Formally, a non-destructive measurement on a space X is a function
ν : Γ → L (X ) : a 7→ Ma ,
for some finite, non-empty set of measurement outcomes Γ, that satisfies the constraint
∑ Ma∗ Ma = 1X .
a∈Γ
When a non-destructive measurement of this form is applied to a register X that has reduced state
ρ ∈ D (X ), two things happen:
Lecture 6. Examples of states, measurements, and operations 52
A= ∑ Ma ⊗ ea .
a∈Γ
We have that
A∗ A = ∑ Ma∗ Ma = 1X ,
a∈Γ
Z n = {0, 1, . . . , n − 1},
and view this set as a ring with respect to addition and multiplication defined modulo n. Let us
also define
ωn = exp(2πi/n)
to be a principal n-th root of unity, which will typically be denoted ω rather than ωn when n has
been fixed or is clear from the context.
Now, for a fixed choice of n, let X = CZn , and define two unitary operators X, Z ∈ U (X ) as
follows:
X = ∑ Ea+1,a and Z = ∑ ω a Ea,a .
a ∈Z n a ∈Z n
Here, and throughout this section, the expression a + 1 refers to addition in Z n , and similar for
other arithmetic expressions involving elements of Z n . For any choice of ( a, b) ∈ Z2n we may
Lecture 6. Examples of states, measurements, and operations 53
consider the operator X a Z b . Such operators are known as generalized Pauli operators (or discrete
Weyl operators).
Let us note a few basic facts about the collection
{X a Z b : ( a, b) ∈ Z2n }. (6.3)
This implies that the collection (6.3) forms an orthogonal basis for L (X ), because
D E n if ( a, b) = (c, d)
a b c d −b − a c d c− a d−b
X Z , X Z = Tr Z X X Z = Tr X Z =
0 otherwise.
ZX = ωXZ,
The generalized Bell basis is an orthonormal basis of X ⊗ X that is derived from the generalized
Pauli operators by means of the vec mapping:
1 a b 2
√ vec( X Z ) : ( a, b) ∈ Z n . (6.4)
n
for all A ∈ L (X ).
Lecture 6. Examples of states, measurements, and operations 54
We have ∗
1 1
J ( Φ) = ∑ vec X a Z b vec X a Z b = 1 X ⊗X ,
n2 a,b ∈Z2
n
n
where the second equality follows from the fact that (6.4) is an orthonormal basis of X ⊗ X . By
(6.5) it follows that Φ = Ω as claimed.
Another way to verify this, which was suggested in the lecture, is to verify that
c d 1 a b c d −b − a 1 bc− ad c d 1 if (c, d) = (0, 0)
Φ( X Z ) = 2 ∑ X Z X Z Z X = 2 ∑ ω X Z =
n 2 n 2
0 otherwise.
a,b ∈Z n a,b ∈Z n
6.6 Teleportation
The last example for this lecture is quantum teleportation, with which I assume you are familiar.
Here let us note that teleportation works exactly as you would expect for registers of arbitrary
dimension n, by using the generalized Pauli operators and generalized Bell basis. So, just in case
you were wondering if teleportation works for 8,675,309-dimensional registers, the answer is yes.
Suppose that X1 , X2 , and X3 are registers whose associated complex Euclidean spaces X1 , X2 ,
and X3 are isomorphic copies of X = CZn . Assume that the pair of registers (X2 , X3 ) is initialized
to a maximally entangled pure state
1 1
√ vec(1 X ) = √
n n
∑ ea ⊗ ea ,
a ∈Z n
while X1 has some reduced state ρ. It is our assumption that one party, Alice, possesses X1 and
X2 , while a second party, Bob, possesses X3 . The goal is for Alice to teleport the state of X1 to X3 by
making use of the above shared state of (X2 , X3 ) along with classical communication. The registers
X1 and X2 are destroyed in the process.
Suppose that Alice measures the pair (X1 , X2 ) with respect to the generalized Bell basis
1
√ vec( X a Z b ) : ( a, b) ∈ Z2n .
n
We wish to describe the reduced state of X3 conditioned on the outcome ( a, b). Let us first note
that
(vec(1 )∗ ⊗ 1 ) (u ⊗ vec(1 )) = ∑ (e∗a ⊗ e∗a ⊗ 1 ) (u ⊗ eb ⊗ eb ) = ∑ hea , ui ea = u,
a,b a
Now, by including the normalization factors, we see that Alice’s measurement results in each
outcome ( a, b) ∈ Z2n with probability
1 ∗ 1
a b a b
Tr X Z ρ X Z = 2,
n2 n
and conditioned on the outcome ( a, b), the reduced state of X3 becomes
∗
X a Zb ρ X a Zb .
Alice may therefore transmit the measurement outcome ( a, b) to Bob, who then applies the unitary
operation X a Z b to X3 to recover the original reduced state ρ of X1 .
Notice that the above process does more than to set the reduced state of X3 to ρ. In fact, it
functions as a quantum channel that effectively acts as the identity super-operator mapping L (X1 )
to L (X3 ). In other words, if Alice’s register X1 was initially entangled or correlated with some
other register Y, the register X3 will have exactly the same relationship to Y after teleportation.
This must be so because (i) the above process can be described as an admissible super-operator of
the form
Φ ∈ T (X1 , X3 ) ,
and (ii) this super-operator Φ is in perfect agreement with the identity super-operator for all
choices of ρ ∈ D (X1 ). (Super-operators that agree precisely on all density operator inputs must
of course be equal as super-operators.)
Theory of Quantum Information Lecture Notes
John Watrous (Institute for Quantum Computing, University of Waterloo)
Lecture 7
Entropy and compression
For the next several lectures we will be discussing the von Neumann entropy and various concepts
relating to it. This lecture is intended to introduce the notion of entropy and its connection to
compression.
Before we discuss the von Neumann entropy, we will take a few moments to discuss the Shannon
entropy. This is a purely classical notion, but it is appropriate to start here. The Shannon entropy of
a probability vector p ∈ R Σ is defined as follows:
H ( p) = − ∑ p( a) log ( p( a)).
a∈Σ
p( a)>0
Here, and always in this course, the base of the logarithm is 2. (We will write ln(α) if we wish
to refer to the natural logarithm of a real number α.) It is typical to express the Shannon entropy
slightly more concisely as
H ( p) = − ∑ p( a) log ( p( a)),
a∈Σ
which is meaningful if we make the interpretation 0 log(0) = 0. This is sensible given that
lim α log(α) = 0.
α ↓0
There is no reason why we cannot extend the definition of the Shannon entropy to arbitrary vectors
with nonnegative entries if it is useful to do this—but mostly we will focus on probability vectors.
There are standard ways to interpret the Shannon entropy. For instance, the quantity H ( p) can
be viewed as a measure of the amount of uncertainty in a random experiment described by the
probability vector p, or as a measure of the amount of information one gains by learning the value
of such an experiment. Indeed, it is possible to start with simple axioms for what a measure of
uncertainty or information should satisfy, and to derive from these axioms that such a measure
must be equivalent to the Shannon entropy.
Something to keep in mind, however, when using these interpretations as a guide, is that the
Shannon entropy is usually only a meaningful measure of uncertainty in an asymptotic sense—as
the number of experiments becomes large. When a small number of samples from some experi-
ment is considered, the Shannon entropy may not conform to your intuition about uncertainty, as
the following example is meant to demonstrate.
2
Example 7.1. Let Σ = {0, 1, . . . , 2m }, and define a probability vector p ∈ R Σ as follows:
(
1 − m1 a = 0,
p( a) = 1 − m2 2
m2 1 ≤ a ≤ 2m .
56
Lecture 7. Entropy and compression 57
Then H ( p) > m, and yet the outcome 0 appears with probability 1 − 1/m. So, as m grows, we be-
come more and more “certain” that the outcome will be 0, and yet the “uncertainty” (as measured
by the entropy) goes to infinity.
The above example does not, of course, represent a paradox. The issue is simply that the
Shannon entropy can only be interpreted as measuring uncertainty if the number of random ex-
periments grows and the probability vector remains fixed, which is opposite to the example.
Let us now focus on an important use of the Shannon entropy, which involves the notion of a
compression scheme. This will allow us to attach a concrete meaning to the Shannon entropy.
Pr [ g( f ( a1 · · · an )) = a1 · · · an ] > 1 − δ, (7.1)
where the probability is over random choices of a1 , . . . , an ∈ Σ, each chosen independently accord-
ing to the probability vector p.
To understand what a compression scheme means at an intuitive level, let us imagine the fol-
lowing situation between two people: Alice and Bob. Alice has a device of some sort with a button
on it, and when she presses the button she gets an element of Σ, distributed according to p, inde-
pendent of any prior outputs of the device. She presses the button n times, obtaining outcomes
a1 · · · an , and she wants to communicate these outcomes to Bob using as few bits of communica-
tion as possible. So, what Alice does is to compress a1 · · · an into a string of ⌊αn⌋ bits by computing
f ( a1 · · · an ). She sends the resulting bitstring f ( a1 · · · an ) to Bob, who then decompresses by ap-
plying g, therefore obtaining g( f ( a1 · · · an )). Naturally they hope that g( f ( a1 · · · an )) = a1 · · · an ,
which means that Bob will have obtained the correct sequence a1 · · · an .
The quantity δ is a bound on the probability the compression scheme makes an error. We may
view that the pair ( f , g) works correctly for a string a1 · · · an ∈ Σn if g( f ( a1 · · · an )) = a1 · · · an , so
the above equation (7.1) is equivalent to the condition that the pair ( f , g) works correctly with
high probability (assuming δ is small).
Theorem 7.2 (Shannon’s Source Coding Theorem). Let Σ be a finite, non-empty set, let p ∈ R Σ be a
probability vector, let α > 0, and let δ ∈ (0, 1). Then the following statements hold.
1. If α > H ( p), then there exists an (n, α, δ)-compression scheme for p for all but finitely many choices of
n ≥ 1.
2. If α < H ( p), then there exists an (n, α, δ)-compression scheme for p for at most finitely many choices
of n ≥ 1.
It is not a mistake, by the way, that both statements hold for any fixed choice of δ ∈ (0, 1), re-
gardless of whether it is close to 0 or 1 for instance. This will make perfect sense when we see the
proof.
It should be mentioned that the above statement of Shannon’s Source Coding Theorem is spe-
cific to the somewhat simplified (fixed-length) notion of compression that we have defined. It is
more common, in fact, to consider variable-length compressions and to state Shannon’s Source
Coding Theorem in terms of the average length of compressed strings. The reason why we restrict
our attention to fixed-length compression schemes is that this sort of scheme will be more natural
when we turn to the quantum setting.
When the probability vector p is understood from context we write Tn,ε rather than Tn,ε ( p).
The following lemma establishes that a random selection of a string a1 · · · an is very likely to
be ε-typical as n gets large.
Lemma 7.3. Let p ∈ R Σ be a probability vector and let ε > 0. Then
lim
n→∞
∑ p ( a1 ) · · · p ( a n ) = 1
a1 ··· an ∈ Tn,ε ( p)
Proof. Let Y1 , . . . , Yn be independent and identically distributed random variables defined as fol-
lows: we choose a ∈ Σ randomly according to the probability vector p, and then let the output
value be the real number − log( p( a)) for whichever value of a was selected. It holds that the
expected value of each Yj is
E[Yj ] = − ∑ p(a) log( p(a)) = H ( p).
a∈Σ
Based on the previous lemma, it is straightforward to place upper and lower bounds on the
number of ε-typical strings, as shown in the following lemma.
Lemma 7.4. Let p ∈ R Σ be a probability vector and let ε be a positive real number. Then for all but finitely
many positive integers n it holds that
Proof. The upper bound holds for all n. Specifically, by the definition of ε-typical, we have
∑ p ( a1 ) · · · p ( a n ) > 1 − ε
a1 ···an ∈ Tn,ε
for all n ≥ n0 , which is possible by Lemma 7.3. Then for all n ≥ n0 we have
and therefore | Tn,ε | > (1 − ε)2n( H ( p)−ε), which completes the proof.
Proof of Theorem 7.2. First assume that α > H ( p), and choose ε > 0 so that α > H ( p) + 2ε. For
every choice of n > 1/ε we therefore have that
Now, because
| Tn,ε | < 2n( H ( p)+ε) < 2⌊αn⌋ ,
we may defined a function f : Σn → {0, 1}⌊αn⌋ that is 1-to-1 when restricted to Tn,ε , and we may
define g : {0, 1}⌊αn⌋ → Σn appropriately so that g( f ( a1 · · · an )) = a1 · · · an for every a1 · · · an ∈ Tn,ε .
As
Pr [ g( f ( a1 · · · an )) = a1 · · · an ] ≥ Pr[a1 · · · an ∈ Tn,ε ] = ∑ p ( a1 ) · · · p ( a n ) ,
a1 ··· an ∈ Tn,ε
strings a1 · · · an . Let us suppose such a scheme is given for each n, and let Gn ⊆ Σn be the collection
of strings on which the appropriate scheme works correctly. If we can show that
lim Pr[a1 · · · an ∈ Gn ] = 0
n→∞
As
lim Pr[a1 · · · an 6∈ Tn,ε ] = 0
n→∞
as required.
Next we will discuss the von Neumann entropy, which may be viewed as a quantum information-
theoretic analogue of the Shannon entropy. We will spend the next few lectures after this one
discussing the properties of the von Neumann entropy as well as some of its uses—but for now
let us just focus on the definition.
Let X be a complex Euclidean space, let n = dim(X ), and let ρ ∈ D (X ) be a density operator.
The von Neumann entropy of ρ is defined as
S(ρ) = H (λ(ρ)),
where λ(ρ) = (λ1 (ρ), . . . , λn (ρ)) is the vector of eigenvalues of ρ. An equivalent expression is
where log(ρ) is the Hermitian operator that has exactly the same eigenvectors as ρ, and we take the
base 2 logarithm of the corresponding eigenvalues. Technically speaking, log(ρ) is only defined
for ρ positive definite, but ρ log(ρ) may be defined for all positive semidefinite ρ by interpreting
0 log(0) as 0, just like in the definition of the Shannon entropy.
There are some ways in which the von Neumann entropy is similar to the Shannon entropy and
some ways in which it is very different. One way in which they are quite similar is in their rela-
tionships to notions of compression.
Lecture 7. Entropy and compression 61
Φ ∈ T (X1 ⊗ · · · ⊗ Xn , Y1 ⊗ · · · ⊗ Ym )
Ψ ∈ T (Y1 ⊗ · · · ⊗ Ym , X1 ⊗ · · · ⊗ Xn ) .
Now, we need to be careful about how we measure the accuracy of quantum compression
schemes. Our assumption on the reduced state of X1 , X2 , . . . , Xn does not rule out the existence of
other registers that they may be entangled or otherwise correlated with—so let us imagine that
there exists another register Z, and that the initial state of (X1 , X2 , . . . , Xn , Z) is
ξ ∈ D (X1 ⊗ · · · ⊗ Xn ⊗ Z) .
When Alice compresses and Bob decompresses X1 , . . . , Xn , the resulting state of (X1 , X2 , . . . , Xn , Z)
is given by
ΨΦ ⊗ 1L(Z) (ξ ).
For the compression to be successful, we require that this density operator is close to ξ. This must
in fact hold for all choices of Z and ξ, provided of course that the assumption TrZ (ξ ) = ρ⊗n is
met. There is nothing unreasonable about this assumption—it is the natural quantum analogue to
requiring that g( f ( a1 · · · an )) = a1 · · · an for classical compression.
It might seem complicated that we have to worry about all possible spaces Z and all ξ ∈
D (X1 ⊗ · · · ⊗ Xn ⊗ Z) that satisfy TrZ (ξ ) = ρ⊗n , but in fact it will be simple if we make use of the
notion of channel fidelity.
Lecture 7. Entropy and compression 62
represents one way of measuring the quality with which Φ acts trivially on the state uu∗ .
Now, any purification u ∈ W ⊗ Z of σ must take the form
√
u = vec σB
k
Ξ( X ) = ∑ Ai X A∗i
j =1
It is not surprising that this quantity is independent of the particular purification of σ that was
chosen. We define the quantity to be the channel fidelity of Ξ with σ:
v
u k
u
2
Fchannel (Ξ, σ) = t ∑ σ, A j .
j =1
The channel fidelity Fchannel (Ξ, σ) places a lower bound on the fidelity of the input and output
of a given channel Ξ provided that it acts on a part of a larger system whose reduced state is σ.
Although the above discussion only considered pure states of a larger system, it is clear from
Uhlmann’s Theorem that the minimal input/output fidelity is achieved on a pure state.
Φ ∈ T (X1 ⊗ · · · ⊗ Xn , Y1 ⊗ · · · ⊗ Ym ) ,
Ψ ∈ T (Y1 ⊗ · · · ⊗ Ym , X1 ⊗ · · · ⊗ Xn )
Lecture 7. Entropy and compression 63
The following theorem, which is the quantum analogue to Shannon’s Source Coding Theorem,
establishes conditions on α for which quantum compression is possible and impossible.
Theorem 7.5 (Schumacher). Let ρ ∈ D (X ) be a density operator, let α > 0 and let δ ∈ (0, 1). Then the
following statements hold.
1. If α > S(ρ), then there exists an (n, α, δ)-quantum compression scheme for ρ for all but finitely many
choices of n ≥ 1.
2. If α < S(ρ), then there exists an (n, α, δ)-quantum compression scheme for ρ for at most finitely many
choices of n ≥ 1.
Proof. Assume first that α > S(ρ). We begin by defining a quantum analogue of the set of typical
strings, which is the typical subspace. This notion is based on a spectral decomposition
ρ= ∑ p(a)ua u∗a .
a∈Σ
As p is a probability vector, we may consider for each n ≥ 1 the set of ε-typical strings Tn,ε ⊆ Σn
for this distribution. In particular, we form the projection onto the typical subspace:
Notice that
Πn,ε , ρ⊗n = ∑
p ( a1 ) · · · p ( a n ) ,
a1 ··· an ∈ Tn,ε
and therefore
lim Πn,ε , ρ⊗m = 1,
n→∞
g( f ( a1 · · · an )) = a1 · · · an
A ∈ L (X1 ⊗ · · · ⊗ Xn , Y1 ⊗ · · · ⊗ Ym )
as
A= ∑ e f ( a1 ··· an ) u∗a1 ··· an ,
a1 ··· an ∈ Tn,ε
where we have used the shorthand u a1 ··· an = u a1 ⊗ · · · ⊗ u an for each a1 · · · an ∈ Σn . Notice that
A∗ A = Πn,ε .
Lecture 7. Entropy and compression 64
Now, the super-operator defined by X 7→ AX A∗ is completely positive but generally not ad-
missible. However, it is sub-admissible, by which it is meant that there must exist a completely
positive super-operator Ξ for which
Φ( X ) = AX A∗ + Ξ( X ) (7.2)
Ξ( X ) = h1 − Πn,ε , X i σ
for some collection of operators B1 , . . . , Bk that we do not really care about. It follows that
This quantity approaches 1 in the limit, as we have observed, and therefore for sufficiently large n
it must hold that (Φ, Ψ) is an (n, α, δ) quantum compression scheme.
Now consider the case where α < S(ρ). Note that if Πn ∈ Pos (X1 ⊗ · · · ⊗ Xn ) is a projection
with rank at most 2n(S(ρ)−ε) for each n ≥ 1, then
lim Πn , ρ⊗n = 0.
(7.4)
n→∞
This is because, for any positive semidefinite operator P, the maximum value of hΠ, Pi over all
choices of orthogonal projections Π with rank(Π) ≤ r is precisely the sum of the r largest eigen-
values of P. The eigenvalues of ρ⊗n are the values p( a1 ) · · · p( an ) over all choices of a1 · · · an ∈ Σn ,
so for each n we have
Π n , ρ ⊗ n ≤ ∑ p ( a1 ) · · · p ( a n )
a1 ··· an ∈ Gn
for some set Gn of size at most 2n(S(ρ)−ε). At this point the equation (7.4) follows by similar rea-
soning to the proof of Theorem 7.2.
Now let us suppose, for each n ≥ 1 and for m = ⌊αn⌋, that
Φn ∈ T (X1 ⊗ · · · ⊗ Xn , Y1 ⊗ · · · ⊗ Ym ) ,
Ψn ∈ T (Y1 ⊗ · · · ⊗ Ym , X1 ⊗ · · · ⊗ Xn )
are admissible super-operators. Our goal is to prove that (Φn , Ψn ) fails as a quantum compression
scheme for all sufficiently large values of n.
Lecture 7. Entropy and compression 65
k k
Φn ( X ) = ∑ A j X A∗j and Ψn ( X ) = ∑ Bj XB∗j ,
j =1 j =1
where
A1 , . . . , Ak ∈ L (X1 ⊗ · · · ⊗ Xn , Y1 ⊗ · · · ⊗ Ym ) ,
B1 , . . . , Bk ∈ L (Y1 ⊗ · · · ⊗ Ym , X1 ⊗ · · · ⊗ Xn ) ,
and where the assumption that they have the same number of terms is easily made without loss
of generality. Let Π j be the projection onto the range of Bj for each j = 1, . . . k, and note that it
obviously holds that
rank(Π j ) ≤ dim(Y1 ⊗ · · · ⊗ Ym ) = 2⌊αn⌋ .
By the Cauchy-Schwarz inequality, we have
2
Fchannel (Ψn Φn , ρ⊗n )2 = ρ⊗ n , B j A i
∑
i,j
D p p E2
= ∑ Π j ρ⊗ n , B j A i ρ⊗ n
i,j
≤ ∑ Π j , ρ⊗n Tr Bj Ai ρ⊗n A∗i B∗j .
i,j
As
Tr Bj Ai ρ⊗n A∗i B∗j ≥ 0
it follows that
Fchannel (Ψn Φn , ρ⊗n )2 ∈ conv Π j , ρ⊗ n
: j = 1, . . . , k .
As each Π j has rank at most 2⌊αn⌋ , it follows that
So, for all but finitely many choices of n, the pair (Φn , Ψn ) fails to be an (n, α, δ) quantum com-
pression scheme.
Theory of Quantum Information Lecture Notes
John Watrous (Institute for Quantum Computing, University of Waterloo)
Lecture 8
Properties of the von Neumann entropy
In the previous lecture we defined the Shannon and von Neumann entropy functions, and estab-
lished the fundamental connection between these functions and the notion of compression. In this
lecture and the next we will look more closely at the von Neumann entropy in order to establish
some basic properties of this function, as well as an important related function called the quantum
relative entropy.
The first property we will establish about the von Neumann entropy is that it is continuous every-
where on its domain.
(1/e, 1/e)
(1, 0)
The fact that η is continuous on [0, ∞) implies that for every finite, nonempty set Σ the Shannon
entropy is continuous at every point on [0, ∞)Σ , as
1
H ( p) =
ln(2) ∑ η ( p(a)).
a∈Σ
66
Lecture 8. Properties of the von Neumann entropy 67
(1/e, 0)
We are usually only interested in H ( p) for probability vectors p, but of course the function is
defined on vectors having nonnegative real entries.
2. P − Q = A − B.
(An expression of a given Hermitian operator as P − Q for such a choice of P and Q is called a
Jordan decomposition of that operator.) Notice that k A − B k1 = Tr( P) + Tr( Q).
Now, define one more Hermitian operator
X = P + B = Q + A.
for 1 ≤ j ≤ n. Thus,
n
k λ( A) − λ( B)k1 = ∑ λ j ( A) − λ j ( B) ≥ Tr(2X − A − B) = Tr( P + Q) = k A − B k1
j =1
as required.
With the above fact in hand, it is immediate from the expression S( P) = H (λ( P)) that the
von Neumann entropy is continuous (as it is a composition of two continuous functions).
Theorem 8.3. For every complex Euclidean space X , the von Neumann entropy S( P) is continuous at
every point P ∈ Pos (X ).
Lemma 8.4. Suppose α and β are real numbers satisfying 0 ≤ α ≤ β ≤ 1 and β − α ≤ 1/2. Then
| η ( β) − η (α)| ≤ η ( β − α).
Proof. Consider the function η ′ (λ) = −(1 + ln(λ)), which is plotted in Figure 8.2. Given that η ′ is
monotonically decreasing on its domain (0, ∞), it holds that the function
Z λ+γ
f (λ) = η ′ (t)dt = η (λ + γ) − η (λ)
λ
is monotonically non-increasing for any choice of γ ≥ 0. This means that the maximum value of
| f (λ)| over the range λ = [0, 1 − γ] must occur at either λ = 0 or λ = 1 − γ, and so for λ in this
range we have
| η (λ + γ) − η (λ)| ≤ max{η (γ), η (1 − γ)}.
Here we have used the fact that η (1) = 0 and η (λ) ≥ 0 for λ ∈ [0, 1].
To complete the proof it suffices to prove that η (γ) ≥ η (1 − γ) for γ ∈ [0, 1/2]. This claim
is certainly supported by the plot in Figure 8.1, but we can easily prove it analytically. Define a
function g(λ) = η (λ) − η (1 − λ). We see that g happens to have zeroes at λ = 0 and λ = 1/2, and
were there an additional zero λ of g in the range (0, 1/2), then we would have two distinct values
δ1 , δ2 ∈ (0, 1/2) for which g′ (δ1 ) = g′ (δ2 ) = 0 by the Mean Value Theorem. This, however, is
in contradiction with the fact that the second derivative g′′ (λ) = 1−1 λ − λ1 of g is strictly negative
in the range (0, 1/2). As g(1/4) > 0, for instance, we have that g(λ) ≥ 0 for λ ∈ [0, 1/2] as
required.
Theorem 8.5 (Fannes Inequality). Let X be a complex Euclidean space and let n = dim(X ). Then for
all density operators ρ, ξ ∈ D (X ) such that k ρ − ξ k1 ≤ 1/e it holds that
1
| S(ρ) − S(ξ )| ≤ log(n) k ρ − ξ k1 + η (k ρ − ξ k1 ).
ln(2)
Lecture 8. Properties of the von Neumann entropy 69
Proof. Define
ε i = | λi (ρ) − λi (ξ )|
and let ε = ε 1 + · · · + ε n . Note that ε i ≤ k ρ − ξ k1 ≤ 1/e < 1/2 for each i, and therefore
1 n 1 n
∑ η (λi (ρ)) − η (λi (ξ )) ≤
ln(2) i∑
| S(ρ) − S(ξ )| = η (ε i )
ln(2) i=1 =1
by Lemma 8.4.
For any positive α and β we have βη (α/β) = η (α) + α ln( β), so
n n n
1 1 ε 1
ln(2) ∑ η (ε i ) = ln(2) ∑ (εη (ε i /ε) − ε i ln(ε)) = ln(2) ∑ η (ε i /ε) + ln(2) η (ε).
i =1 i =1 i =1
We have that ε ≤ k ρ − ξ k1 , and that η is monotone increasing on the interval [0, 1/e], so
1
| S(ρ) − S(ξ )| ≤ log(n) k ρ − ξ k1 + η (k ρ − ξ k1 ),
ln(2)
which completes the proof.
Next we will introduce a new function, which is indispensable as a tool for studying the von Neu-
mann entropy: the quantum relative entropy (or relative von Neumann entropy).
For two positive definite operators P, Q ∈ Pd (X ) we define the quantum relative entropy of
P with Q as follows:
S( PkQ) = Tr( P log( P)) − Tr( P log( Q)). (8.1)
We usually only care about the quantum relative entropy for density operators, but there is noth-
ing that prevents us from allowing the definition to hold for all positive definite operators.
We may also define the quantum relative entropy for positive semidefinite operators that are
not positive definite, provided we are willing to have an extended real-valued function. Specifi-
cally, when
ker( Q) 6⊥ im( P)
(or equivalently when im( Q) 6⊆ im( P)), we define S( PkQ) = ∞. Otherwise, there is no difficulty
in evaluating the above expression (8.1) by following the usual convention of setting 0 log(0) = 0.
Nevertheless, it will typically not be necessary for us to give up the convenience of restricting our
attention to positive definite operators. This is because we already know that the von Neumann
entropy function is continuous, and we will mostly use the relative von Neumann entropy in this
course to establish facts about the von Neumann entropy.
The quantum relative entropy S( PkQ) can be negative for some choices of P and Q, but not
when they are density operators (or more generally when Tr( P) = Tr( Q)). The following theorem
establishes that this is so, and in fact that the value of the quantum relative entropy of two density
operators is zero if and only if they are equal.
Lecture 8. Properties of the von Neumann entropy 70
Proof. Let us first note that for every choice of α, β ∈ (0, 1) we have
be spectral decompositions of ρ and ξ. The assumption that ρ and ξ are positive definite density
operators implies that pi and qi are positive for 1 ≤ i ≤ n. Applying the facts observed above, we
have that
1 xi , y j 2 ( pi ln( pi ) − pi ln(q j ))
∑
S ( ρkξ ) =
ln 2 1≤i,j≤n
1
2 1 ′′ 2
ln 2 1≤∑
= xi , y j q j − pi − η (γij )(q j − pi )
i,j≤ n
2
for some choice of real numbers {γij }, where each γij lies between pi and q j . In particular, this
means that 0 < γij ≤ 1, implying that −η ′′ (γij ) ≥ 1, for each choice of i and j. Consequently we
have
1 x i , y j 2 ( p i − q j )2 = 1 k ρ − ξ k 2
∑
S ( ρkξ ) ≥ 2
2 ln 2 1≤i,j≤n 2 ln 2
as required.
The following corollary represents a simple application of this fact. (We could just as easily
prove it using analogous facts about the Shannon entropy, but the proof is essentially the same.)
Corollary 8.7. Let X be a complex Euclidean space and let n = dim(X ). Then 0 ≤ S(ρ) ≤ log(n) for all
ρ ∈ D (X ). Furthermore, ρ = 1/n is the unique density operator in D (X ) having von Neumann entropy
equal to log(n).
Proof. It is obvious that 0 ≤ S(ρ) for every density operator ρ. To prove the upper bound, let us
assume ρ is positive definite, and consider the relative entropy S(ρk1/n). We have
Therefore S(ρ) ≤ log(n), and when ρ is not equal to 1/n the inequality becomes strict. For positive
semidefinite ρ, the result follows from continuity.
Lecture 8. Properties of the von Neumann entropy 71
Now let us prove two simple properties of the von Neumann entropy: subadditivity and concavity.
These properties also hold for the Shannon entropy—and while it is not difficult to prove them
directly for the Shannon entropy, we get the properties for free once they are established for the
von Neumann entropy.
When we refer to the von Neumann entropy of some collection of registers, we mean the
von Neumann entropy of the reduced state of those registers at some instant. For example, if X
and Y are registers and ρ ∈ D (X ⊗ Y ) is the reduced state of the pair (X, Y ) at some instant, then
where, in accordance with conventions we have already discussed, we have written ρX = TrY (ρ)
and ρY = TrX (ρ). We often state properties of the von Neumann entropy in terms of registers,
with the understanding that whatever statement is being discussed holds for all or some speci-
fied subset of the possible states of these registers. A similar convention is used for the Shannon
entropy (for classical registers).
Theorem 8.8 (Subadditivity of von Neumann entropy). Let X and Y be quantum registers. Then for
every state of the pair (X, Y ) we have
Proof. Assume that the reduced state of the pair (X, Y ) is ρ ∈ D (X ⊗ Y ). We will prove the
theorem for ρ positive definite, from which the general case follows by continuity.
Consider the relative von Neumann entropy S(ρXY kρX ⊗ ρY ). Using the formula
we find that
S(ρXY kρX ⊗ ρY ) = −S(ρXY ) + S(ρX ) + S(ρY ).
By Theorem 8.6 we have S(ρXY kρX ⊗ ρY ) ≥ 0, which completes the proof.
In the next lecture we will prove a much stronger version of subadditivity, which is aptly
named: strong subadditivity. It will imply the truth of the previous theorem, but it is instructive to
compare the very easy proof above with the much more difficult proof of strong subadditivity.
Subadditivity also holds for the Shannon entropy:
H (X, Y ) ≤ H (X) + H (Y )
for any choice of classical registers X and Y. This is simply a special case of the above theorem,
where the density operator ρ is diagonal with respect to the standard basis of X ⊗ Y .
Subadditivity implies that the von Neumann entropy is concave, as is established by the proof
of the following theorem.
Theorem 8.9 (Concavity of von Neumann entropy). Let ρ, ξ ∈ D (X ) and λ ∈ [0, 1]. Then
Proof. Let Y be a register corresponding to a single qubit, so that its associated space is Y = C {0,1} .
Consider the density operator
and suppose that the reduced state of the registers (X, Y ) is described by σ. We have
Concavity also holds for the Shannon entropy as a simple consequence of this theorem, as we
may take ρ and ξ to be diagonal with respect to the standard basis.
Let us finish off the lecture by defining a few more quantities associated with the von Neumann
entropy. We will not be able to say very much about these quantities until after we prove strong
subadditivity in the next lecture.
Classically we define the conditional Shannon entropy as follows for two classical registers X
and Y:
H (X|Y ) = ∑ Pr[Y = a] H (X|Y = a).
a
This quantity represents the expected value of the entropy of X given that you know the value
of Y. It is not hard to prove that
H (X|Y ) = H (X, Y ) − H (Y ).
Now we start to see some strangeness: we can have S(Y ) > S(X, Y ), as we will if (X, Y ) is in a
pure, non-product state. This means that S(X|Y ) can be negative, but such is life.
Lecture 8. Properties of the von Neumann entropy 73
Next, the (classical) mutual information between two classical registers X and Y is defined as
I (X : Y ) = H (X) + H (Y ) − H (X, Y ).
I ( X : Y ) = H ( Y ) − H ( Y |X ) = H ( X ) − H ( X |Y ) .
We view this quantity as representing the amount of information in X about Y and vice versa. The
quantum mutual information is defined similarly:
At least we know from subadditivity that this quantity is always nonnegative. We will, however,
need to further develop our understanding before we can safely associate any intuition with this
quantity.
Theory of Quantum Information Lecture Notes
John Watrous (Institute for Quantum Computing, University of Waterloo)
Lecture 9
Strong subadditivity of the von Neumann entropy
In this lecture we will prove a fundamental fact about the von Neumann entropy, known as strong
subadditivity. Let us begin with a precise statement of this fact.
Theorem 9.1 (Strong subadditivity of von Neumann entropy). Let X, Y, and Z be quantum registers.
Then for every reduced state ρ ∈ D (X ⊗ Y ⊗ Z) of these registers we have
There are multiple known ways to prove this theorem. The approach we will take is to first estab-
lish a property of the quantum relative entropy, known as joint convexity. Once we establish this
property, it will be straightforward to prove strong subadditivity.
We will now prove that the quantum relative entropy is jointly convex, as is stated by the following
theorem.
Theorem 9.2 (Joint convexity of the quantum relative entropy). Let X be a complex Euclidean space,
let ρ1 , ρ2 , ξ 1 , ξ 2 ∈ D (X ) be positive definite density operators, and let λ ∈ [0, 1]. Then
The proof of Theorem 9.2 that we will study is fairly standard and has the nice property of being
elementary. It is, however, relatively complicated, so we will need to break it up into a few pieces.
The first step is to consider, for any choice of positive definite density operators ρ, ξ ∈ D (X ),
a real-valued function f ρ,ξ : R → R defined as
f ρ,ξ (α) = Tr ξ α ρ1−α
for all α ∈ R. As ρ and ξ are both positive definite, we have that the function f ρ,ξ is well defined,
and is in fact differentiable (and therefore continuous) everywhere on its domain. In particular,
we have h i
′
f ρ,ξ (α) = Tr ξ α ρ1−α (ln(ξ ) − ln(ρ)) . (9.1)
To verify that this expression is correct, we may consider spectral decompositions
n n
ρ= ∑ pi xi xi∗ and ξ= ∑ qi yi y∗i .
i =1 i =1
We have 2
Tr ξ α ρ1−α = qαj p1i −α xi , y j
∑
1≤ i,j≤ n
74
Lecture 9. Strong subadditivity of the von Neumann entropy 75
and so
2 h i
′
(ln(q j ) − ln( pi ))qαj p1i −α xi , y j = Tr ξ α ρ1−α (ln(ξ ) − ln(ρ))
f ρ,ξ (α) = ∑
1≤ i,j≤ n
as claimed.
The main reason we are interested in the function f ρ,ξ is that its derivative has an interesting
value at 0:
′ 1
f ρ,ξ ( 0) = − S ( ρ k ξ ).
log(e)
We may therefore write
α ρ1 − α − 1
′ Tr ξ
S(ρkξ ) = − log(e) f ρ,ξ (0) = − log(e) lim ,
α ↓0 α
where the second equality follows by substituting f ρ,ξ (0) = 1 into the definition of the derivative.
Now consider the following lemma that concerns the relationship among the functions f ρ,ξ for
various choices of ρ and ξ.
Lemma 9.3. Let ξ 1 , ξ 2 , ρ1 , ρ2 ∈ Pd (X ) be positive definite operators. Then for every choice of α, λ ∈ [0, 1]
we have
Tr (λξ 1 + (1 − λ)ξ 2 )α (λρ1 + (1 − λ)ρ2 )1−α ≥ λ Tr ξ 1α ρ11−α + (1 − λ) Tr ξ 2α ρ12−α .
(The lemma happens to be true for all positive definite operators ρ1 , ρ2 , ξ 1 , and ξ 2 , but we will
really only need it for density operators.)
Before proving this lemma, let us note that it implies Theorem 9.2.
= λ S ( ρ1 k ξ 1 ) + ( 1 − λ ) S ( ρ2 k ξ 2 )
as required.
Our goal has therefore shifted to proving Lemma 9.3. To prove Lemma 9.3 we require an-
other fact that is stated in the theorem that follows. It is equivalent to a theorem known as Lieb’s
Concavity Theorem, although it is stated in a somewhat different form than is most typical.
Theorem 9.4. Let A1 , A2 ∈ Pd (X ) and B1 , B2 ∈ Pd (Y ) be positive definite operators. Then for every
choice of α ∈ [0, 1] we have
Once again, before proving this theorem, let us note that it implies the main result we are
working toward.
Proof of Lemma 9.3 (assuming Theorem 9.4). The substitutions
A1 = λξ 1 , B1 = λρT1 , A2 = (1 − λ ) ξ 2 , B2 = (1 − λ)ρT2 ,
taken in Theorem 9.4 imply the operator inequality
(λξ 1 + (1 − λ)ξ 2 )α ⊗ (λρT1 + (1 − λ)ρT2 )1−α
≥ λ ξ α ⊗ (ρT1 )1−α + (1 − λ) ξ 2α ⊗ (ρT2 )1−α
= λ ξ α ⊗ (ρ11−α )T + (1 − λ) ξ 2α ⊗ (ρ12−α )T .
The identity vec(1 )∗ ( X ⊗ Y T ) vec(1 ) = Tr( XY ) then gives the desired result.
Now, toward the proof of Theorem 9.4, we require the following lemma.
Lemma 9.5. Let P1 , P2 , Q1 , Q2 , R1 , R2 ∈ Pd (X ) be positive definite operators that satisfy these conditions:
1. [ P1 , P2 ] = [Q1 , Q2 ] = [ R1 , R2 ] = 0,
2. P12 ≥ Q21 + R21 , and
3. P22 ≥ Q22 + R22 .
Then it holds that P1 P2 ≥ Q1 Q2 + R1 R2 .
Remark. Notice that in the conclusion of the lemma, P1 P2 is positive definite given the assumption
that [ P1 , P2 ] = 0, and likewise for Q1 Q2 and R1 R2 .
Proof. The conclusion of the lemma is equivalent to X ≤ 1 for
X = P1−1/2 P2−1/2 ( Q1 Q2 + R1 R2 ) P2−1/2 P1−1/2 .
This in turn is equivalent to the condition that every eigenvalue of X is at most 1. To this end, let
us suppose that u is any eigenvector of X whose corresponding eigenvalue is λ.
As P1 and P2 are invertible and u is nonzero, we have that P1−1/2 P21/2 u is nonzero as well, and
therefore we may define a unit vector v as follows:
P1−1/2 P21/2 u
v=
.
−1/2 1/2
P
1 P2 u
We now have
v∗ P1−1 ( Q1 Q2 + R1 R2 ) P2−1 v = λ.
Finally, using the fact that v is a unit vector, we can establish the required bound on λ as
follows:
λ = v∗ P1−1 ( Q1 Q2 + R1 R2 ) P2−1 v
≤ v∗ P1−1 Q1 Q2 P2−1 v + v∗ P1−1 R1 R2 P2−1 v
q q q q
≤ v P1 Q1 P1 v v P2 Q2 P2 v + v P1 R1 P1 v v∗ P2−1 R22 P2−1 v
∗ −1 2 −1 ∗ −1 2 −1 ∗ −1 2 −1
q q
≤ v P1 ( Q1 + R1 ) P1 v v∗ P2−1 ( Q22 + R22 ) P2−1 v
∗ −1 2 2 −1
≤ 1.
Lecture 9. Strong subadditivity of the von Neumann entropy 77
Here we have used the triangle inequality once and the Cauchy-Schwarz inequality twice, along
with the given assumptions on the operators.
Finally, we can finish of the proof of Theorem 9.2 by proving Theorem 9.4.
and let K = {α ∈ [0, 1] : f (α) ∈ Pos (X ⊗ Y )} be the pre-image under f of the set Pos (X ⊗ Y ).
Notice that K is a closed set, given that f is continuous and Pos (X ⊗ Y ) is closed. Our goal is to
prove that K = [0, 1].
It is obvious that 0 and 1 are elements of K. For an arbitrary choice of α, β ∈ K, consider the
following operators:
P1 = ( A1 + A2 )α/2 ⊗ ( B1 + B2 )(1−α)/2 ,
P2 = ( A1 + A2 ) β/2 ⊗ ( B1 + B2 )(1− β)/2,
(1− α) /2
Q1 = A1α/2 ⊗ B1 ,
β/2 (1− β) /2
Q2 = A1 ⊗ B1 ,
α/2 (1− α) /2
R1 = A2 ⊗ B2 ,
β/2 (1− β) /2
R2 = A2 ⊗ B2 .
We have worked hard to prove that the quantum relative entropy is jointly convex, and now it is
time to reap the rewards. Let us begin by proving the following simple theorem, which states that
mixed unitary super-operators cannot increase the relative entropy of two density operators.
Theorem 9.6. Let X be a complex Euclidean space and let Φ ∈ T (X ) be a mixed unitary super-operator.
Then for any choice of positive definite density operators ρ, ξ ∈ D (X ) we have
S(Φ(ρ)kΦ(ξ )) ≤ S(ρkξ ).
Notice that for any choice of positive definite density operators ρ1 , ρ2 , ξ 1 , ξ 2 ∈ D (X ) we have
S ( ρ1 ⊗ ρ2 k ξ 1 ⊗ ξ 2 ) = S ( ρ1 k ξ 1 ) + S ( ρ2 k ξ 2 ) .
This fact follows easily from the identity log( P ⊗ Q) = log( P) ⊗ 1 + 1 ⊗ log( Q), which is valid
for all P, Q ∈ Pd (X ). Combining this observation with the previous theorem yields the following
corollary.
Corollary 9.7 (Monotonicity of the quantum relative entropy). Let X and Y be complex Euclidean
spaces. Then for any choice of positive definite density operators ρ, ξ ∈ D (X ⊗ Y ) we have
1Y
(1L(X ) ⊗ Ω)(σ) = TrY (σ) ⊗
m
where m = dim(Y ). Therefore we have
1
1Y
S (TrY (ρ)k TrY (ξ )) = S TrY (ρ) ⊗ Y
Tr Y ( ξ ) ⊗
m
m
= S (1L(X ) ⊗ Ω)(ρ) k (1L(X ) ⊗ Ω)(ξ )
≤ S ( ρkξ )
as required.
Finally we are prepared to prove strong subadditivity, which turns out to be very easy now
that we have established that the quantum relative entropy is monotonic.
Lecture 9. Strong subadditivity of the von Neumann entropy 79
holds for all choices of ρ ∈ D (X ⊗ Y ⊗ Z). It suffices to prove this inequality for all positive
definite ρ, as it then follows for arbitrary density operators ρ by the continuity of the von Neumann
entropy.
Let n = dim(X ), and observe that the following two identities hold: the first is
XYZ
1 X YZ
= −S(ρXYZ ) + S(ρYZ ) + log(n),
S ρ
n ⊗ρ
and therefore
S(ρXYZ ) + S(ρZ ) ≤ S(ρXZ ) + S(ρYZ )
as required.
To conclude the lecture, let us prove a statement about quantum mutual information that is
equivalent to strong subadditivity.
Corollary 9.8. Let X, Y, and Z be registers. Then for every reduced state ρ ∈ D (X ⊗ Y ⊗ Z) of these
registers we have
S(X : Y ) ≤ S(X : Y, Z).
which is equivalent to
S(Y ) − S(X, Y ) ≤ S(Y, Z) − S(X, Y, Z).
Adding S(X) to both sides gives
Lecture 10
Holevo’s Theorem and Nayak’s Bound
In this lecture we will prove Holevo’s Theorem. This is a famous theorem in quantum information
theory, which is often informally summarized by a statement along the lines of this:
It is not possible to communicate more than n classical bits of information by the transmission
of n qubits alone.
Although this is an implication of Holevo’s Theorem, the theorem itself says more, and is stated
in more precise terms that has no resemblance to the above statement. After stating and proving
Holevo’s Theorem, we will discuss an interesting application of this theorem to the notion of a
quantum random access code. In particular, we will prove Nayak’s Bound, which demonstrates that
quantum random access codes are surprisingly limited in power.
We will first discuss some of the concepts that Holevo’s Theorem concerns, and then state and
prove the theorem itself. Although the theorem is difficult to prove from first principles, it turns
out that there is a very simple proof that makes use of the strong subadditivity of the von Neu-
mann entropy. Having proved strong subadditivity in the previous lecture, we will naturally opt
for this simple proof.
The usual interpretation of this quantity is that it describes how many bits of information about B
are, on average, revealed by the value of A; or equivalently, given that the quantity is symmetric in
A and B, how many bits of information about A are revealed by the value of B. Like all quantities
involving the Shannon entropy, this interpretation should be understood to be in asymptotic sense.
To illustrate the intuition behind this interpretation, let us suppose that A and B are distributed
in some particular way, and Alice looks at the value of A. As Bob does not know what value Alice
sees, he has H (A) bits of uncertainty about her value. After sampling B, Bob’s average uncertainty
about Alice’s value becomes H (A|B), which is always at most H (A) and is less assuming that A
and B are correlated. Therefore, by sampling B, Bob has decreased his uncertainty of Alice’s value
by I (A : B) bits.
80
Lecture 10. Holevo’s Theorem and Nayak’s Bound 81
In analogy to the above formula we have defined the quantum mutual information between two
registers X and Y as
S(X : Y ) , S(X) + S(Y ) − S(X, Y ).
Although Holevo’s Theorem does not directly concern the quantum mutual information, it is nev-
ertheless related and indirectly appears in the proof.
Pr[(A, B) = ( a, b)] = p( a) h Pb , ρa i
for each ( a, b) ∈ Σ × Γ. The amount of information that Bob learns about Alice’s information by
means of this process is given by the mutual information I (A : B).
The accessible information is the maximum value of I (A : B) that can be achieved by Bob, over
all possible measurements. More precisely, the accessible information of the ensemble
E = {( p( a), ρa ) : a ∈ Σ}
is defined as the maximum of I (A : B) over all possible choices of the set Γ and the measurement
{ Pb : b ∈ Γ}, assuming that the pair (A, B) is distributed as described above for this choice of
a measurement. (Given that there is no upper bound on the size of Bob’s outcome set Γ, it is
not obvious that the accessible information of a given ensemble E is always achievable for some
fixed choice of a measurement { Pb : b ∈ Γ}. It turns out, however, that there is always an
achievable maximum value for I (A : B) that Bob reaches when his set of outcomes Γ has size at
most dim(X )2 .) We will write Iacc (E ) to denote the accessible information of the ensemble E .
Lecture 10. Holevo’s Theorem and Nayak’s Bound 82
Notice that the quantity χ(E ) is always nonnegative, which follows from the concavity of the von
Neumann entropy.
One way to think about this quantity is as follows. Consider the situation above where Al-
ice has prepared the register X depending on the value of A, and Bob has received (but not yet
measured) the register X. From Bob’s point of view, the reduced state of X is therefore
ρ= ∑ p( a) ρ a .
a∈Σ
If, however, Bob were to learn that the value of A is a ∈ Σ, he would then describe the reduced
state of X as ρa . The quantity χ(E ) therefore represents the average decrease in the von Neumann
entropy of X that Bob would expect from learning the value of A.
Another way to view the quantity χ(E ) is to consider the reduced state of the pair (A, X) in the
situation just considered, which is
ξ= ∑ p(a)Ea,a ⊗ ρa .
a∈Σ
We have
S (A, X) = H ( p) + ∑ p ( a ) S ( ρ a ),
a∈Σ
S (A) = H ( p ),
!
S (X) = S ∑ p( a) ρ a ,
a∈Σ
σ = (1 A ⊗ V ) ξ (1 A ⊗ V ) ∗ .
It is clear that
S σA = S ξ A = H ( p ) .
and !
S σXYB = S ξ X = S ∑ p( a) ρ a .
a∈Σ
and from this expression we see that the quantity S(A : B) is equal to the accessible information
Iacc (E ) for some choice of a measurement { Pb : b ∈ Γ}.
The theorem therefore follows from the inequality
which in turn follows from strong subadditivity (as shown at the end of the previous lecture).
As discussed at the beginning of the lecture, this theorem implies that Alice can communicate
no more than n classical bits of information to Bob by sending n qubits alone. If the register X
comprises n qubits, and therefore X has dimension 2n , then for any ensemble
E = {( p( a), ρa ) : a ∈ Σ}
This means that for any choice of the register A, the ensemble E , and the measurement that deter-
mines the value of a classical register B, we must have I (A : B) ≤ n. In other words, Bob can learn
no more than n bits of information by means of the process he and Alice have performed.
Lecture 10. Holevo’s Theorem and Nayak’s Bound 84
We will now consider a related, but nevertheless different setting from the one that Holevo’s The-
orem concerns. Suppose now that Alice has m bits, and she wants to encode them into fewer than
n qubits in such a way that Bob can recover not the entire string of bits, but rather any single bit (or
small number of bits) of his choice. Given that Bob will only learn a very small amount of infor-
mation by means of this process, the possibility that Alice could do this does not violate Holevo’s
Theorem in any obvious way.
This idea has been described as a “quantum phone book”. Imagine a very compact phone book
implemented using quantum information. The user measures the qubits forming the phone book
using a measurement that is unique to the individual whose number is being sought. The user’s
measurement destroys the phone book, so no more than 7 digits of information are effectively
transmitted by sending the phone book. Perhaps it is not unreasonable to hope that such a phone
book containing 100,000 numbers could be constructed using, say, 1,000 qubits?
Here is an example showing that something nontrivial along these lines can be realized. Sup-
pose Alice wants to encode 2 bits a, b ∈ {0, 1} into one qubit so that when she sends this qubit to
Bob, he can pick a two-outcome measurement giving him either a or b with reasonable probability.
Define
|ψ(θ )i = cos(θ ) |0i + sin(θ ) |1i
for θ ∈ [0, 2π ]. Alice encodes ab as follows:
00 → |ψ(π/8)i
10 → |ψ(3π/8)i
11 → |ψ(5π/8)i
01 → |ψ(7π/8)i
Alice sends the qubit to Bob. If Bob wants to decode a, be measures in the {|0i , |1i} basis, and if
he wants to decode b he measures in the {|+i , |−i} basis. A simple calculation shows that Bob
will correctly decode the bit he has chosen with probability cos2 (π/8) ≈ .85. There does not exist
an analogous classical scheme that allow Bob to do better than randomly guessing for at least one
of his choices.
such that the following holds. For each j ∈ {1, . . . , m} there exists a measurement
n o
j j n
P0 , P1 ⊂ Pos C {0,1}
such that D E
j
Pa j , ρa1 ··· am ≥p
for every j ∈ {1, . . . , m} and every choice of a1 · · · am ∈ {0, 1}m .
.85
For example, the above example is a 2 7→ 1 quantum random access encoding.
Lecture 10. Holevo’s Theorem and Nayak’s Bound 85
The following special case of Fano’s Inequality, where Σ = {0, 1} and A is uniform will be
useful.
Corollary 10.4. Let A be a uniformly distributed Boolean register and let B be any Boolean register. Then
for q = Pr(A = B) we have I (A : B) ≥ 1 − H (q).
Lecture 10. Holevo’s Theorem and Nayak’s Bound 86
To prove this theorem, we will first require the following lemma, which is a consequence of
Holevo’s Theorem and Fano’s Inequality.
Lemma 10.6. Suppose ρ0 , ρ1 ∈ D (X ) are density operators, {Q0 , Q1 } ⊆ Pos (X ) is a measurement, and
q ∈ [1/2, 1]. If it is the case that
h Q b , ρb i ≥ q
for b ∈ {0, 1}, then
ρ0 + ρ1 S ( ρ0 ) + S ( ρ1 )
S − ≥ 1 − H (q)
2 2
Proof. Let A and B be classical Boolean registers, let p ∈ R {0,1} be the uniform probability vector
(meaning p(0) = p(1) = 1/2), and assume that
Pr[(A, B) = ( a, b)] = p( a) h Qb , ρa i
ρ0 + ρ1 S ( ρ0 ) + S ( ρ1 )
I (A : B) ≤ S − ,
2 2
E : a1 · · · am 7→ ρa1 ··· am .
For 0 ≤ k ≤ m − 1 let
1
ρa1 ···ak =
2m − k
∑ ρa1 ···am ,
ak +1 ··· am ∈{0,1} m −k
1
S(ρa1 ···ak −1 ) ≥ (S(ρa1 ···ak −1 0 ) + S(ρa1 ···ak −1 1 )) + (1 − H ( p))
2
for 1 ≤ k ≤ m and all choices of a1 · · · ak−1 . By applying this inequality repeatedly, we conclude
that
m(1 − H ( p)) ≤ S(ρ) ≤ n,
which completes the proof.
p
It can be shown that there exists a classical random access encoding m 7→ n for any p > 1/2
provided that n ∈ (1 − H ( p))m + O(log m). Thus, asymptotically speaking, there is no significant
advantage to be gained from quantum random access codes over classical random access codes.
Theory of Quantum Information Lecture Notes
John Watrous (Institute for Quantum Computing, University of Waterloo)
Lecture 11
Majorization for real vectors and Hermitian operators
This lecture discusses the notion of majorization and some of its connections to quantum informa-
tion. We will see later (in Lecture 14) that majorization plays a critical role in Nielsen’s Theorem,
which precisely characterizes when it is possible for two parties to transform one pure state to
another by means of local operations and classical communication.
Let Σ be a finite, nonempty set, and for the sake of this discussion let us focus on the real vector
space R Σ .
An operator A ∈ L R Σ acting on this vector space is said to be stochastic if
11.1 (The Birkhoff–von Neumann Theorem). Let Σ be a finite, nonempty set and let A ∈
Theorem
L R Σ be a linear operator on R Σ . Then A is a doubly stochastic operator if and only if there exists a
probability vector p ∈ RSym(Σ) such that
A= ∑ p(π ) Pπ .
π ∈Sym( Σ)
88
Lecture 11. Majorization for real vectors and Hermitian operators 89
Proof. The Krein-Milman Theorem states that every compact, convex set is equal to the convex
hull of its extreme points. As the set of doubly stochastic operators is compact and convex, the
theorem will therefore follow if we prove that every extreme point in this set is a permutation
operator.
To this end, let us consider any doubly stochastic operator A that is not a permutation operator.
Our goal is to prove that A is not an extreme point in the set of doubly stochastic operators.
Given that A is doubly stochastic but not a permutation operator, there must exist at least one pair
( a1 , b1 ) ∈ Σ × Σ such that
A( a1 , b1 ) ∈ (0, 1).
As ∑b A( a1 , b) = 1 and A( a1 , b1 ) ∈ (0, 1), we conclude that there must exist b2 6= b1 such that
A( a1 , b2 ) ∈ (0, 1).
Applying similar reasoning, but to the first index rather than the second, there must exist a2 6= a1
such that
A( a2 , b2 ) ∈ (0, 1).
This argument may repeated, alternating between the first and second indices (i.e., between rows
and columns if we view A as a matrix), until eventually a closed loop of even length is formed
that alternates between horizontal and vertical moves among the entries of A. This process is
illustrated in Figure 11.1, where the loop is indicated by the dotted lines.
9
( a 5 , b5 ) ( a 2 , b3 ) ( a 2 , b2 )
3
4
5
( a 3 , b3 ) ( a 3 , b4 )
8
2
6
1
( a 1 , b1 ) ( a 1 , b2 )
( a 4 , b5 ) ( a 4 , b4 )
7
Figure 11.1: An example of a closed loop consisting of entries of A that are contained in the interval
(0, 1).
Now, let ε ∈ (0, 1) be equal to the minimum value over the entries in the closed loop, and
define B to be the operator obtained by setting each entry in the closed loop to be ±ε, alternating
sign along the entries as suggested in Figure 11.2. All of the other entries in B are set to 0.
Finally, consider the operators A + B and A − B. As A is doubly stochastic and the row and
column sums of B are all 0, we have that both A + B and A − B also have row and column sums
equal to 1. As ε was chosen to be no larger than the smallest entry within the chosen closed loop,
Lecture 11. Majorization for real vectors and Hermitian operators 90
−ε ε
−ε ε
−ε
ε
Figure 11.2: The operator B. All entries besides those indicated are 0.
none of the entries of A + B or A − B are negative, and therefore A − B and A + B are doubly
stochastic. As B is non-zero, we have that A + B and A − B are distinct. Thus, we have that
1 1
A= ( A + B) + ( A − B)
2 2
is a proper convex combination of doubly stochastic operators, and is therefore not an extreme
point in the set of doubly stochastic operators. This is what we needed to prove, and so we are
done.
We will now define what it means for one real vector to majorize another, and we will discuss one
alternate characterization of this notion. As usual we take Σ to be a finite, nonempty set, and as
in the previous section we will focus on the real vector space R Σ . The definition is as follows: for
u, v ∈ R Σ , we say that u majorizes v if there exists a doubly stochastic operator A such that v = Au.
We denote this relation as v ≺ u (or u ≻ v if it is convenient to switch the ordering).
By the Birkhoff–von Neumann Theorem, this definition can intuitively be interpreted as saying
that v ≺ u if and only if there is a way to “randomly shuffle” the entries of u to obtain v, where
by “randomly shuffle” it is meant that one averages in some way a collection of vectors that is
obtained by permuting the entries of u to obtain v. Informally speaking, the relation v ≺ u means
that u is “more ordered” than v, because we can get from u to v by randomizing the order of the
vector indices.
An alternate characterization of majorization (which is in fact more frequently taken as the
definition) is based on a condition on various sums of the entries of the vectors involved. To state
the condition more precisely, let us introduce the following notation. For every vector u ∈ R Σ and
for n = | Σ |, we write
s(u) = (s1 (u), . . . , sn (u))
Lecture 11. Majorization for real vectors and Hermitian operators 91
to denote the vector obtained by sorting the entries of u from largest to smallest. In other words,
we have
{u( a) : a ∈ Σ} = {s1 (u), . . . , sn (u)},
where the equality considers the two sides of the equation to be multisets, and moreover
s1 ( u ) ≥ · · · ≥ s n ( u ) .
The characterization is given by the equivalence of the first and second items in the following
theorem. (The equivalence of the third item in the theorem is an easy addition to the equivalence,
and will turn out to be useful for one application later in the lecture.)
Theorem 11.2. Let Σ be a finite, non-empty set, and let u, v ∈ R Σ . Then the following are equivalent.
1. v ≺ u.
2. For n = | Σ | and for 1 ≤ k < n we have
k k
∑ s j ( v) ≤ ∑ s j ( u ), (11.1)
j =1 j =1
and moreover
n n
∑ s j ( v ) = ∑ s j ( u ). (11.2)
j =1 j =1
3. There exists a unitary operator U ∈ L C Σ , for which U ( a, b) is a real number for each ( a, b) ∈ Σ, such
2
that v = Au, where A ∈ L R Σ is the operator defined by A( a, b) = | U ( a, b)| for all ( a, b) ∈ Σ × Σ.
Proof. First let us prove that item 1 implies item 2. Assume that A is a doubly stochastic opera-
tor satisfying v = Au. It is clear that the condition (11.2) holds, as doubly stochastic operators
preserve the sum of the entries in any vector, and so it remains to prove the condition (11.1) for
1 ≤ k < n.
To do this, let us first consider the effect of an arbitrary doubly stochastic operator B ∈ L R Σ
h e∆k − A ∗ eΓk , u i = ∑ α a u ( a) − ∑ α a u ( a ),
a∈∆ k a6 ∈∆ k
where (
(e∆k − A∗ eΓk )( a) if a ∈ ∆k
αa = ∗
−(e∆k − A eΓk )( a) if a 6∈ ∆k .
As argued above, we have α a ≥ 0 for each a ∈ Σ and ∑ a∈∆k α a = ∑ a6∈∆k α a . By the choice of ∆k we
have u( a) ≥ u(b) for all choices of a ∈ ∆k and b 6∈ ∆k , and therefore
∑ α a u ( a) ≥ ∑ α a u ( a ).
a∈∆ k a6 ∈∆ k
D = λ1 + (1 − λ) P(1 k) ,
where (1 k) ∈ Sn denotes the permutation that swaps 1 and k, leaving every other element of
{1, . . . , n} fixed.
Lecture 11. Majorization for real vectors and Hermitian operators 93
Thus, we may again apply the induction hypothesis to obtain an (n − 1) × (n − 1) unitary matrix
2
W such that, for B the doubly stochastic matrix defined by Bi,j = Wi,j , we have y = Bx.
Now define
1 0
U= V.
0 W
This is clearly a unitary matrix, and to complete the proof it suffices to prove that the doubly
2
stochastic matrix A defined by Ai,j = Ui,j satisfies v = Au. We have the following entries of A:
Where i and j range over all choices of indices with i, j 6∈ {1, k}. From these equations we see that
1 0
A= D,
0 B
which satisfies v = Au as required.
The final step in the proof is to observe that item 3 implies item 1, which is given that the
operator A determined by item 3 must be doubly stochastic.
We will now define an analogous notion of majorization for Hermitian operators. For Hermitian
operators A, B ∈ Herm (X ) we say that A majorizes B, which we express as B ≺ A or A ≻ B, if
there exists a mixed unitary super-operator Φ ∈ T (X ) such that
B = Φ ( A ).
Inspiration for this definition partly comes from the Birkhoff–von Neumann Theorem, along with
the intuitive idea that randomizing the entries of a real vector is analogous to randomizing the
choice of an orthonormal basis for a Hermitian operator.
The following theorem gives an alternate characterization this relationship that also connects
it with majorization for real vectors.
Lecture 11. Majorization for real vectors and Hermitian operators 94
Theorem 11.3. Let X be a complex Euclidean space and let A, B ∈ Herm (X ). Then B ≺ A if and only
if λ( B) ≺ λ( A).
Then
n
∑ p(π )Uπ AUπ∗ = ∑ ∑ p(π )λπ ( j) ( A) u j u∗j = B.
π ∈ Sn j =1 π ∈ Sn
Suppose on the other hand that there exists a probability vector ( p1 , . . . , pm ) and unitary oper-
ators U1 , . . . , Um so that
m
B= ∑ pi Ui AUi∗.
i =1
Define an n × n matrix D as
m 2
D j,k = ∑ pi u∗j Ui vk .
i =1
11.4 Applications
Finally, let us note a few applications of the facts we have proved about majorization. Let us begin
with a simple one.
Proposition 11.4. Suppose that ρ, ξ ∈ D (X ) satisfy λ(ρ) ≺ λ(ξ ). Then S(ρ) ≥ S(ξ ).
Proof. The proposition follows from fact that for every density operator ξ and every mixed unitary
operation Φ, we have S(ξ ) ≤ S(Φ(ξ )) by the concavity of the von Neumann entropy.
Lecture 11. Majorization for real vectors and Hermitian operators 95
Note that we could equally well have first observe that H ( p) ≥ H (q) for probability vectors p and
q for which p ≺ q, and then applied Theorem 11.3.
The second application concerns a relationship between the diagonal entries of an operator
and its eigenvalues, which was proved by Schur. First, a lemma is required.
Lemma 11.5. Let X be a complex Euclidean space and let { xa : a ∈ Σ} be any orthonormal basis of X .
Then the super-operator
Φ( A) = ∑ ( x∗a Axa ) xa x∗a
a∈Σ
is mixed unitary.
Proof. We may assume without loss of generality that Σ = Z n for some n ≥ 1. First consider the
standard basis {ea : a ∈ Z n }, and define a mixed unitary super-operator
1
Ψ( A ) = ∑ Z a A( Z a )∗ ,
n a ∈Z n
and therefore
Ψ( A ) = ∑ h Ea,a , Ai Ea,a .
a ∈Z n
For U ∈ U (X ) defined as
U= ∑ xa e∗a
a ∈Z n
it follows that
1
Φ( A ) = ∑ (UZ a U ∗ ) A (UZ a U ∗ )∗ .
n a ∈Z n
Theorem 11.6. Let X be a complex Euclidean space, let A ∈ Herm (X ) be a Hermitian operator, and let
{ xa : a ∈ Σ} be an orthonormal basis of X . Define u ∈ R Σ as u( a) = x∗a Axa . Then u ≺ λ( A).
Proof. Immediate from Lemma 11.5 and Theorem 11.3.
Notice that this theorem implies that the probability distribution arising from any complete projec-
tive measurement of a density operator ρ must have Shannon entropy at least S(ρ).
Finally, we will prove a characterization of precisely which probability vectors are consistent
with a given density operator, meaning that the density operator could have arisen from a random
choice of pure states according to the distribution described by the probability vector. Again, we
will first need a lemma.
Lemma 11.7. Suppose Σ is a finite, nonempty set, let X = C Σ , and suppose that A ∈ Herm (X ) and
v ∈ R Σ satisfy v ≺ λ( A). Then there exists an orthonormal basis { xa : a ∈ Σ} of X such that
v( a) = h xa x∗a , Ai for each a ∈ Σ.
Lecture 11. Majorization for real vectors and Hermitian operators 96
Proof. Let
A= ∑ w(a)ua u∗a
a∈Σ
be a spectral decomposition of A. It follows that there exists a unitary operator U with the follow-
ing property. For D defined by
2
D ( a, b) = | U ( a, b)|
for ( a, b) ∈ Σ × Σ, we have v = Dw.
Next, define
V= ∑ ea u∗a
a∈Σ
and let xa = V ∗ U ∗ Vu a for each a ∈ Σ. Then
h xa x∗a , Ai = ∑ | U ( a, b)|2 w(b) = ( Dw)( a) = v( a),
b
and given that x∗a ρxa = p( a) we have that k ya k2 = p( a) for each a ∈ Σ. Thus, for each a ∈ Σ for
which p( a) > 0, we may take
ya
ua = p
p( a)
(and let u a be an arbitrary unit vector if p( a) = 0), and we have that each u a is a unit vector and
as required.
Theory of Quantum Information Lecture Notes
John Watrous (Institute for Quantum Computing, University of Waterloo)
Lecture 12
Separable operators
For the next several lectures we will be discussing various aspects and properties of entanglement.
Mathematically speaking, we define entanglement in terms of what it is not, rather than what it
is—we define the notion of a separable operator, and define that any density operator that is not
separable represents an entangled state.
Let X and Y be complex Euclidean spaces. Then a positive semidefinite operator P ∈ Pos (X ⊗ Y )
is separable if and only if there exist a positive integer m and positive semidefinite operators
such that
m
P= ∑ Qj ⊗ Rj. (12.1)
j =1
We will write Sep (X : Y ) to denote the collection of all such operators.1 It is clear that Sep (X : Y )
is a convex cone, and it will not be difficult to prove that Sep (X : Y ) is properly contained in
Pos (X ⊗ Y ). Operators P ∈ Pos (X ⊗ Y ) that are not contained in Sep (X : Y ) are said to be
entangled.
Let us also define
SepD (X : Y ) = Sep (X : Y ) ∩ D (X ⊗ Y )
to be the set of separable density operators acting on X ⊗ Y . By thinking about spectral decom-
positions, one sees immediately that the set of separable density operators is equal to the convex
hull of the pure product density operators:
97
Lecture 12. Separable operators 98
The same bound m ≤ dim(X ⊗ Y )2 may therefore be taken for some expression (12.1) of any
P ∈ Sep (X : Y ).
Next, let us note that SepD (X : Y ) is a compact set. To see this, we first observe that the unit
spheres S(X ) and S(Y ) are compact, and therefore so too is the Cartesian product S(X ) × S(Y ).
The function
f : X × Y → L (X ⊗ Y ) : ( x, y) 7→ xx∗ ⊗ yy∗
is obviously continuous, and continuous functions map compact sets to compact sets, so the set
is compact as well. Finally, it is a basic fact from convex analysis (that we will take as given) that
the convex hull of any compact set is compact.
The set Sep (X ⊗ Y ) is of course not compact, given that it is not bounded. It is a closed,
convex cone, however, because it is the cone generated by a compact, convex set that does not
contain the origin.
Next we will discuss a necessary and sufficient condition for a given positive semidefinite operator
to be separable. Although this condition, sometimes known as the Woronowicz–Horodecki Criterion,
does not give us an efficiently computable method to determine whether or not an operator is
separable, it is useful nevertheless in an analytic sense.
The Woronowicz–Horodecki Criterion is based on the fundamental fact from convex analysis
that says that closed convex sets are completely determined by the closed half-spaces that contain
them. Here is one version of this fact that is well-suited to our needs.
Fact. Let X be a complex Euclidean space and let A ⊂ Herm (X ) be a closed, convex cone. Then
for any choice of an operator B ∈ Herm (X ) with B 6∈ A, there exists an operator H ∈ Herm (X )
such that
It should be noted that the particular statement above is only valid for closed convex cones, not
general closed convex sets. For a general closed, convex set, it may be necessary to replace 0 with
some other real scalar for each choice of B.
Theorem 12.1 (Woronowicz–Horodecki Criterion). Let X and Y be complex Euclidean spaces and let
P ∈ Pos (X ⊗ Y ). Then P ∈ Sep (X : Y ) if and only if
(Φ ⊗ 1L(Y ) )( P) ∈ Pos (Y ⊗ Y )
for some choice of Q1 , . . . , Qm ∈ Pos (X ) and R1 , . . . , Rm ∈ Pos (Y ). Thus, for every positive
super-operator Φ ∈ T (X , Y ) we have
m
(Φ ⊗ 1L(Y ) )( P) = ∑ Φ(Q j ) ⊗ R j ∈ Sep (Y : Y ) ⊂ Pos (Y ⊗ Y ) .
j =1
Let us now assume that P ∈ Pos (X ⊗ Y ) is not separable. Then by Fact 12.2 above there must
exist a Hermitian operator H ∈ Herm (X ⊗ Y ) such that:
2. h H, Pi < 0.
Let Ψ ∈ T (Y , X ) be the unique super-operator for which J (Ψ) = H. For any Q ∈ Pos (X ) and
R ∈ Pos (Y ) we therefore have
0 ≤ H, Q ⊗ R = J (Ψ), Q ⊗ R = hΨ( R), Qi .
From this we conclude that Ψ (and therefore Ψ∗ ) is a positive super-operator. At this point we
could easily conclude that (Ψ∗ ⊗ 1L(Y ) )( P) is not positive semidefinite, but we need to work a
little bit more to obtain the unital condition in the statement of the theorem.
We have h H, Pi < 0, so we may choose ε > 0 sufficiently small so that
h H, Pi + ε Tr( P) < 0.
Φ( X ) = A−1/2 Ξ( X ) A−1/2
for every X ∈ L (X ).
It is clear that Φ is both positive and unital, and it remains to prove that (Φ ⊗ 1L(Y ) )( P) is not
positive semidefinite. This may be verified as follows:
D √ √ ∗ E
vec A vec A , (Φ ⊗ 1L(Y ) )( P)
D E
= vec(1 Y ) vec(1 Y )∗ , (Ξ ⊗ 1L(Y ) )( P)
D E
= (Ξ∗ ⊗ 1L(Y ) ) (vec(1 Y ) vec(1 Y )∗ ) , P
= h J (Ξ∗ ) , Pi
= h H, Pi + ε Tr( P)
< 0.
Here we have used the observation that J (Λ∗ ) = 1 X ⊗ 1 Y . It follows that (Φ ⊗ 1L(Y ) )( P) is not
positive semidefinite, and so the proof is complete.
Lecture 12. Separable operators 100
This theorem allows us to easily prove that certain positive semidefinite operators are not
separable. For example, consider the operator
1
P= vec(1 X ) vec(1 X )∗
n
for X a complex Euclidean space of dimension n. As in Lecture 6, we consider the transposition
super-operator T ∈ T (X ), which is positive and unital. As
1 1
( T ⊗ 1L(X ) )( P) = − R + S,
n n
where R and S are the projections onto the antisymmetric and symmetric subspaces of X ⊗ X , we
have that ( T ⊗ 1L(X ) )( P) is not positive semidefinite, and therefore P is not separable.
Finally, we will prove that there exists a small region around the identity operator 1 X ⊗ 1 Y where
every Hermitian operator is separable. This fact gives us an intuitive connection between noise
and entanglement, which is that entanglement cannot exist in the presence of too much noise.
We will need two facts, beyond those we have already proved, to establish this result. The first
is straightforward, and is as follows.
Lemma 12.2. Let X and Y = C Σ be complex Euclidean spaces, and consider an operator A ∈ L (X ⊗ Y )
given by
A = ∑ A a,b ⊗ Ea,b
a,b ∈Σ
k A k2 ≤ ∑ k A a,b k2 .
a,b ∈Σ
Now,
A= ∑ Ba ,
a∈Σ
and given that Ba∗ Bb = 0 for a 6= b, we have that
A∗ A = ∑ Ba∗ Ba .
a∈Σ
Therefore
k A k2 = k A ∗ A k ≤ ∑ k Ba∗ Ba k ≤ ∑ k A a,b k2
a∈Σ a,b ∈Σ
as claimed.
Lecture 12. Separable operators 101
The second fact that we need is a theorem about positive unital super-operators, which is
sometimes known as the Russo–Dye Theorem.
Theorem 12.3 (Russo–Dye). Let X and Y be complex Euclidean spaces and let Φ ∈ T (X , Y ) be positive
and unital. Then
k Φ( X )k ≤ k X k
for every X ∈ L (X ).
Proof. Let us first prove that k Φ(U )k ≤ 1 for every unitary operator U ∈ U (X ). Assume X = C Σ ,
and let
U = ∑ λ a u a u∗a
a∈Σ
Φ (U ) = ∑ λa Φ (ua u∗a ) = ∑ λa Pa ,
a∈Σ a∈Σ
where Pa = Φ ( u a u∗a ) for each a ∈ Σ. As Φ is positive, we have that Pa ∈ Pos (Y ) for each a ∈ Σ,
and given that Φ is unital we have
∑ Pa = 1Y .
a∈Σ
Pa = A∗ (1 Y ⊗ Ea,a ) A
for
V= ∑ ea u∗a .
a∈Σ
As U and V are unitary and A is an isometry, the bound k Φ(U )k ≤ 1 follows from the submulti-
plicativity of the spectral norm.
For general X, it suffices to prove k Φ( X )k ≤ 1 whenever k X k ≤ 1. Because every operator
X ∈ L (X ) with k X k ≤ 1 can be expressed as a convex combination of unitary operators, the
required bound follows from the convexity of the spectral norm.
Theorem 12.4. Let X and Y be complex Euclidean spaces and suppose that A ∈ Herm (X ⊗ Y ) satisfies
k A k2 ≤ 1. Then
1 X ⊗Y − A ∈ Sep (X : Y ) .
Proof. Let Φ ∈ T (X , Y ) be positive and unital. Assume Y = C Σ and write
A= ∑ A a,b ⊗ Ea,b .
a,b ∈Σ
Then
(Φ ⊗ 1L(Y ) )( A) = ∑ Φ( A a,b ) ⊗ Ea,b ,
a,b ∈Σ
Lecture 12. Separable operators 102
and therefore
2
(Φ ⊗ 1L(Y ) )( A)
≤ ∑ k Φ( A a,b )k2 ≤ ∑ k A a,b k2 ≤ ∑ k A a,b k22 = k A k22 ≤ 1.
a,b ∈Σ a,b ∈Σ a,b ∈Σ
The first inequality is by Lemma 12.2 and the second inequality is by the Russo-Dye Theorem.
The positivity of Φ implies that (Φ ⊗ 1L(Y ) )( A) is Hermitian, and thus (Φ ⊗ 1L(Y ) )( A) ≤ 1 Y ⊗Y .
Therefore we have
As this holds for all positive unital Φ, we will have that 1 X ⊗Y − A is separable by Theorem 12.1.
Theory of Quantum Information Lecture Notes
John Watrous (Institute for Quantum Computing, University of Waterloo)
Lecture 13
Separable super-operators and the LOCC paradigm
In the previous lecture we discussed separable operators. The focus of this lecture will be on
analogous concepts for super-operators. In particular, we will discuss separable super-operators,
as well as the important subclass of LOCC super-operators. The acronym LOCC is short for local
operations and classical communication, and plays a central role in the study of entanglement.
13.1 Min-rank
Before discussing separable and LOCC super-operators, it will be helpful to briefly discuss a gen-
eralization of the concept of separability for operators.
Suppose two complex Euclidean spaces X and Y are fixed, and for a given choice of a nonneg-
ative integer k let us consider the collection of operators
Rk (X : Y ) = conv {vec( A) vec( A)∗ : A ∈ L (Y , X ) , rank( A) ≤ k} .
In other words, a given positive semidefinite operator P ∈ Pos (X ⊗ Y ) is contained in Rk (X : Y )
if and only if it is possible to write
m
P= ∑ vec( A j ) vec( A j )∗
j =1
for some choice of an integer m and operators A1 , . . . , Am ∈ L (Y , X ), each having rank at most k.
(It is important to note that this sort of expression does not require orthogonality of the operators
A1 , . . . , Am .)
Each of the sets Rk (X : Y ) is a closed convex cone. It is easy to see that
R0 (X : Y ) = {0},
R1 (X : Y ) = Sep (X : Y ) ,
and
Rn (X : Y ) = Pos (X ⊗ Y )
for n ≥ min{dim(X ), dim(Y )}. Moreover,
Rk (X : Y ) ( Rk+1 (X : Y )
for 0 ≤ k < min{dim(X ), dim(Y )}, as any rank one operator vec( A) vec( A)∗ is contained in
Rr (X : Y ) but not Rr −1 (X : Y ) for r = rank( A) (assuming A 6= 0).
Finally, for each positive semidefinite operator P ∈ Pos (X ⊗ Y ), we define the min-rank of P
as
min-rank( P) = min {k ≥ 0 : P ∈ Rk (X : Y )} .
(This quantity is more commonly known as the Schmidt number.)
103
Lecture 13. Separable super-operators and the LOCC paradigm 104
SepT (X A , Y A : X B , YB )
Φ ∈ SepT (X A , Y A : X B , YB )
if and only if
J (Φ) ∈ Sep (Y A ⊗ X A : YB ⊗ X B ) .
Let us now observe the simple and yet useful fact that separable super-operators cannot in-
crease min-rank. This implies, in particular, that separable super-operators cannot create entan-
glement out of thin air—if a separable operator is input to a separable super-operator, the output
will also be separable.
As
rank A j YBTj ≤ rank(Y )
Φ(vec(Y ) vec(Y )∗ ) ∈ Rr (Y A : YB )
Consider the situation in which two parties, Alice and Bob, collectively perform some sequence
of operations and/or measurements on a shared quantum system. One may of course consider
various restrictions on the sorts of actions Alice and Bob can perform—for instance they may be
forbidden to communicate, permitted to exchange classical information, or permitted to exchange
quantum information.
The paradigm of LOCC, which is an acronym for local operations and classical communication,
concerns the situation in which Alice and Bob are permitted to communicate, but only using
classical communication. More specifically, LOCC operations will be defined as those that can
obtained as follows.
1. Alice and Bob can independently perform admissible operations on their own registers, inde-
pendently of the other player.
2. Alice can perform a measurement on one or more of her registers, and then transmit the clas-
sical information she obtains to Bob.
3. The roles of Alice and Bob can be reversed in item 2.
4. Alice and Bob can compose any finite number of operations that correspond to items 1, 2,
and 3.
Many problems and results in quantum information theory concern LOCC super-operators in
one form or another, often involving Alice and Bob’s ability to manipulate entangled states by
means of such operations.
Φ A ⊗ ΦB ∈ T (X A ⊗ X B , Y A ⊗ YB )
{ Pa : a ∈ Σ} ⊂ Pos (Z A )
Φ ∈ T (X A ⊗ Z A ⊗ X B , X A ⊗ X B ⊗ Z B )
defined by
Φ( X ) = ∑ TrZ A
[(1 X A ⊗ Pa ⊗ 1 XB ) X ] ⊗ Ea,a
a∈Σ
Φ ∈ T (X A ⊗ X B ⊗ Z B , X A ⊗ Z A ⊗ X B )
Finite compositions
Finally, for complex Euclidean spaces X A , X B , Y A and YB , an LOCC super-operator is any super-
operator of the form
Φ ∈ T (X A ⊗ X B , Y A ⊗ YB )
that can be obtained from the composition of any finite number of admissible product super-
operators and measurement transmission super-operators. Note that by defining LOCC super-
operators in terms of finite compositions, we implicitly fix the number of messages exchanged by
Alice and Bob in the realization of any LOCC super-operator.
We will write
LOCC (X A , Y A : X B , YB )
to denote the collection of all LOCC super-operators as just defined.
Lecture 13. Separable super-operators and the LOCC paradigm 107
Φ ∈ SepT (X A , Y A : X B , YB ) .
Proof. The set of separable super-operators is closed under composition, so it suffices to note that
every admissible product super-operator and every measurement transmission super-operator is
separable. In both cases, it is straightforward to verify separability.
Note that the separable super-operators do not offer a perfect characterization of LOCC super-
operators: there exist super-operators that are both admissible and separable, but not LOCC.
Nevertheless, we will still be able to use this proposition to prove various things about LOCC
super-operators. One simple example follows.
min-rank(Φ(ρ)) ≤ min-rank(ρ).
Lecture 14
Pure state entanglement transformation
In this lecture we will consider pure-state entanglement transformation. The setting is as follows:
Alice and Bob share a pure state x ∈ X A ⊗ X B , but they would like to transform this state to
another state y ∈ Y A ⊗ YB by means of local operations and classical communication. This is
obviously possible in some situations and impossible in others—and what we would like is to
have a condition on x and y that tells us precisely when it is possible. The following theorem
provides such a condition.
Theorem 14.1 (Nielsen’s Theorem). Assume that X A , X B , Y A , and YB are complex Euclidean spaces,
and let x ∈ X A ⊗ X B and y ∈ Y A ⊗ YB be unit vectors. Then there exists an LOCC super-operator
Φ ∈ LOCC (X A , Y A : X B , YB ) such that Φ( xx∗ ) = yy∗ if and only if
Remark 14.2. It may be that X A and Y A do not have the same dimension, and in this case the
condition TrXB ( xx∗ ) ≺ TrY B (yy∗ ) requires further explanation. To be more precise, the condition
should be interpreted as
V (TrXB ( xx∗ )) V ∗ ≺ W (TrY B (yy∗ )) W ∗
for some choice of a complex Euclidean space Z A and linear isometries V ∈ U (X A , Z A ) and
W ∈ U (Y A , Z A ). (If the condition holds for one such choice of isometries V and W, it holds for
all choices.) In essence, this interpretation is analogous to padding vectors with zeroes as we did
when we discussed the majorization relation between real vectors of different dimensions. Here,
the isometries V and W embed the operators TrXB (yy∗ ) and TrY B (zz∗ ) into a single space so that
they may be related by our definition of majorization.
The remainder of this lecture will be devoted to proving this theorem. The most difficult aspect
of the proof is that one must reason about general LOCC super-operators, which are sometimes
cumbersome. For this reason we will begin with a restricted definition of LOCC super-operators
that will be easier to reason about in this context. As we will see, it turns out that there is no loss of
generality in working with this restricted notion. Once this is done, we will prove the implications
that are necessary to establish the theorem.
The restricted type of LOCC super-operators we will work with are defined as follows for given
complex Euclidean spaces Z A and Z B .
108
Lecture 14. Pure state entanglement transformation 109
{Ua : a ∈ Σ} ⊂ U (Z B )
such that
Φ( X ) = ∑ ( M a ⊗ Ua ) X ( M a ⊗ Ua ) ∗ .
a∈Σ
One imagines that such an operation represents the situation where Alice performs a non-
destructive measurement, transmits the result to Bob, and Bob applies a unitary operation to
his system that depends on Alice’s measurement result.
2. A super-operator Φ ∈ T (Z A ⊗ Z B ) will be said to be a B→A super-operator if there exists a
non-destructive measurement { Ma : a ∈ Σ} ⊂ L (Z B ) and a collection of unitary operators
{Ua : a ∈ Σ} ⊂ U (Z A ) such that
Φ( X ) = ∑ (Ua ⊗ M a ) X (Ua ⊗ M a ) ∗ .
a∈Σ
Such a super-operator is analogous to an A→B super-operator, but where the roles of Alice
and Bob are reversed.
3. Finally, a super-operator Φ ∈ T (Z A ⊗ Z B ) is a restricted LOCC super-operator if it is a composi-
tion of A→B and B →A super-operators.
It is not difficult to see that every restricted LOCC super-operator is LOCC. As the following
theorem shows, restricted LOCC super-operators turn out to be as powerful as general LOCC
super-operators, provided they are free to act on sufficiently large spaces.
Theorem 14.3. Suppose Φ ∈ LOCC (X A , Y A : X B , YB ) is an LOCC super-operator. Then for any choice
of complex Euclidean spaces Z A and Z B having sufficiently large dimension, and for any choice of linear
isometries
VA ∈ U (X A , Z A ) , WA ∈ U (Y A , Z A ) , VB ∈ U (X B , Z B ) , WB ∈ U (YB , Z B ) ,
for all X ∈ L (X A ⊗ X B ).
Remark 14.4. Before we prove this theorem, let us consider what it is saying. Alice has the in-
put space X A and the output space is Y A , which may have different dimensions—but we want
to view these two spaces as being embedded in a single space Z A . The isometries VA and WA
describe these embeddings. Likewise, VB and WB describe the embeddings of Bob’s input and
output spaces X B and YB in a single space Z B . The above equation (14.1) simply means that Ψ
correctly represents Φ in terms of these embeddings: Alice and Bob could either embed the input
X ∈ L (X A ⊗ X B ) in L (Z A ⊗ Z B ) as (VA ⊗ VB ) X (VA ⊗ VB )∗ , and then apply Ψ; or they could first
perform Φ and then embed the output Φ( X ) into L (Z A ⊗ Z B ) as (WA ⊗ WB )Φ( X )(WA ⊗ WB )∗ .
The equation (14.1) means that they obtain the same thing either way.
Lecture 14. Pure state entanglement transformation 110
Proof. We will first consider just a limited class of super-operators Φ whose closure under compo-
sition gives the class of LOCC super-operators. Once this has been done we will observe that the
theorem holds for compositions of such super-operators, which will suffice to prove the theorem.
Let us first consider the case that Φ takes the form Φ = Φ A ⊗ 1L(XB ) for Φ A ∈ T (X A , Y A ) an
admissible super-operator, and let
Φ A (X ) = ∑ Aa X A∗a
a∈Σ
be a Kraus representation of Φ A . We will establish that the claim in the statement of the the-
orem holds for any choice of Z A and Z B for which dim(Z A ) ≥ max{dim(X A ), dim(Y A )} and
dim(Z B ) ≥ dim(X B ). Linear isometries VA , VB , WA , and WB are given, and in the case at hand the
form of Φ dictates that VA ∈ U (X A , Z A ), WA ∈ U (Y A , Z A ), and VB , WB ∈ U (X B , Z B ).
Now, for each a ∈ Σ define
Ma = WA A a VA∗ ∈ L (Z A ) .
The collection { Ma : a ∈ Σ} may not quite give us a non-destructive measurement because
∑ Ma∗ Ma = VA VA∗
a∈Σ
which is an orthogonal projection operator that may not have full rank. So, let us assume without
loss of generality that 0 6∈ Σ, define Σ′ = Σ ∪ {0}, and define
M0 = 1 Z A − VA VA∗ .
Then
∑ ′ Ma∗ Ma = 1Z A
a∈Σ
as required. A symmetric argument also establishes the theorem for the case Φ = 1 X A ⊗ ΦB for
ΦB admissible.
Now suppose { Ma : a ∈ Σ} ⊂ L (X A ) is a non-destructive measurement and YB = X B ⊗ C Σ ,
and consider the super-operator Φ ∈ T (X A ⊗ X B , X A ⊗ YB ) defined by
Φ( X ) = ∑ ( Ma ⊗ 1 X B
) X ( Ma ⊗ 1 XB )∗ ⊗ Ea,a .
a∈Σ
We will establish that the claim in the statement of the theorem holds for any choice of Z A and Z B
for which dim(Z A ) ≥ dim(X A ) and dim(Z B ) ≥ dim(YB ). Linear isometries VA , VB , WA , and WB
are given, and in this case the form of Φ dictates that VA , WA ∈ U (X A , Z A ), VB ∈ U (X B , Z B ), and
WB ∈ U (YB , Z B ).
Along similar lines to the first type of super-operator considered above we define
Na = WA Ma VA∗ ∈ L (Z A )
Lecture 14. Pure state entanglement transformation 111
for each a ∈ Σ; and again we assume without loss of generality that 0 6∈ Σ, define Σ′ = Σ ∪ {0},
and define
N0 = 1 Z A − VA VA∗ .
Then { Na : a ∈ Σ′ } is a non-destructive measurement on Z A . Also define Ua ∈ U (Z B ) so that
Ua VB = WA (1 XB ⊗ ea )
for each a ∈ Σ, and define U0 = 1 Z B (which happens to be a completely arbitrary choice of a
unitary on Z B that does not effect the proof). We then have
(WA ⊗ WB )Φ( X )(WA ⊗ WB )∗ = Ψ ((VA ⊗ VB ) X (VA ⊗ VB )∗ )
as required. Once again, a similar argument holds for Φ taking the symmetric form
Φ( X ) = ∑ Ea,a ⊗ (1X A
⊗ M a ) X (1 X A ⊗ M a ) ∗ .
a∈Σ
Now suppose that the theorem has been established for super-operators
Φ1 ∈ LOCC (X A , W A : X B , W B ) ,
Φ2 ∈ LOCC (W A , Y A : W B , YB ) ,
and consider any specific choice of complex Euclidean spaces Z A and Z B for which the claim in
the statement of the theorem has been shown to hold for these super-operators. Assume linear
isometries
VA ∈ U (X A , Z A ) , WA ∈ U (Y A , Z A ) , VB ∈ U (X B , Z B ) , WB ∈ U (YB , Z B )
are given, and let isometries U A ∈ L (W A , Z A ) and UB ∈ L (W B , Z B ) be chosen arbitrarily. We
have that there exist restricted LOCC super-operators Ψ1 , Ψ2 ∈ T (Z A ⊗ Z B ) such that
(U A ⊗ UB )Φ1 ( X )(U A ⊗ UB )∗ = Ψ1 ((VA ⊗ VB ) X (VA ⊗ VB )∗ )
(WA ⊗ WB )Φ2 (W )(WA ⊗ WB )∗ = Ψ2 ((U A ⊗ UB )W (U A ⊗ UB )∗ ) .
It follows immediately that
(WA ⊗ WB )Φ2 (Φ1 ( X ))(WA ⊗ WB )∗
= Ψ2 ((U A ⊗ UB )Φ1 ( X )(U A ⊗ UB )∗ )
= Ψ2 (Ψ1 ((VA ⊗ VB ) X (VA ⊗ VB )∗ ) ,
and so the theorem is established for the composition Φ2 Φ1 . Given that compositions of the types
of super-operators above include admissible product super-operators and measurement transmis-
sion super-operators, and therefore all LOCC super-operators, the proof is complete.
We will now prove Nielsen’s Theorem. Based on the remark following the statement of this the-
orem together with Theorem 14.3, we see that we may assume without loss of generality that we
have x, y ∈ Z A ⊗ Z B for complex Euclidean spaces Z A and Z B , and that we may restrict our
attention to restricted LOCC super-operators rather than general LOCC super-operators. More
specifically, our goal is to prove that the following two conditions are equivalent:
1. There exists a restricted LOCC super-operator Φ ∈ T (Z A ⊗ Z B ) such that Φ( xx∗ ) = yy∗ .
2. There exists a mixed unitary super-operator Ξ ∈ T (Z A ) such that TrZ B ( xx∗ ) = Ξ(TrZ B (yy∗ )).
Also note that there is no loss of generality in assuming that Z A and Z B have equal dimension.
Lecture 14. Pure state entanglement transformation 112
implies the existence of a restricted LOCC super-operator Φ ∈ T (Z A ⊗ Z B ) that satisfies the con-
dition Φ( xx∗ ) = yy∗ .
Let X, Y ∈ L (Z B , Z A ) be the operators satisfying vec( X ) = x and vec(Y ) = y. The assumption
that there exists a mixed unitary super-operator Ξ ∈ T (Z A ) as above implies that
XX ∗ = ∑ p(a) Ua YY ∗ Ua∗
a∈Σ
for some choice of a finite, nonempty set Σ, a probability vector p ∈ R Σ , and a collection of
unitary operators {Ua : a ∈ Σ} ⊂ U (Z A ). We may of course assume without loss of generality
that p( a) > 0 for every a ∈ Σ.
Define q ∗
M a = p( a) X + Ua Y
for each a ∈ Σ, where X + denotes the Moore–Penrose pseudo-inverse of X (which is discussed in the
Lecture 1 notes). For each a ∈ Σ we have
and therefore ∗
∑ Ma∗ Ma = X + XX ∗ (X + )∗ = X+ X X+ X
= 1 − Πker( X ) .
a∈Σ
So, { Ma : a ∈ Σ} is not quite a non-destructive measurement, but we can turn it into one by
adding an additional element in a similar way that we did in the proof of Theorem 14.3—so
let us assume 0 6∈ Σ, define Σ′ = Σ ∪ {0}, and define M0 = Πker( X ) . Then, { Ma : a ∈ Σ′ }
is
a non-destructive measurement, and therefore so too is its element-wise complex conjugate
Ma : a ∈ Σ′ .
We have
Φ(vec( X ) vec( X )∗ ) = ∑ ′ vec(Ua∗ XMa∗ ) vec(Ua XMa∗ )∗ ,
a∈Σ
and as XΠker( X ) = 0 we see that the term corresponding to a = 0 takes the value 0. As
XX ∗ = ∑ p(a)Ua YY ∗ Ua∗
a∈Σ
for each a ∈ Σ. This implies that im(Y ) ⊆ im(Ua∗ X ) for each a ∈ Σ, and therefore
q q q q
Ua∗ XMa∗ = p( a) Ua∗ XX + Ua Y = p( a) Ua∗ Πim( X ) Ua Y = p( a) Πim(Ua∗ X ) Y = p( a) Y.
Lecture 14. Pure state entanglement transformation 113
Consequently
Φ(vec( X ) vec( X )∗ ) = vec(Y ) vec(Y )∗ .
We have therefore proved that the assumption that there exists a mixed unitary super-operator
Ξ ∈ T (Z A ) such that TrZ B ( xx∗ ) = Ξ(TrZ B (yy∗ )) implies that there exists a restricted LOCC super-
operator Φ ∈ T (Z A ⊗ Z B ) such that Φ( xx∗ ) = yy∗ .
Lemma 14.5. For any choice of complex Euclidean spaces Z A and Z B having equal dimension, the follow-
ing two statements hold:
Proof. The idea of the proof is to show that A→B and B→A super-operators can be interchanged
for fixed pure-state inputs, which allows any restricted LOCC super-operator to be “collapsed” to
a single A→B or B→A super-operator.
Suppose first that { Ma : a ∈ Σ} ⊂ L (Z B ) is a non-destructive measurement, {Ua : a ∈ Σ} ⊂
U (Z A ) is a collection of unitary operators, and
Φ( Z ) = ∑ (Ua ⊗ M a ) Z (Ua ⊗ M a ) ∗
a∈Σ
is the B→A super-operator that is described by these operators. Also let u ∈ Z A ⊗ Z B be a vector
and let X ∈ L (Z B , Z A ) satisfy vec( X ) = u. Thus
∗
Φ(vec( X ) vec( X )∗ ) = ∑ vec (Ua XMa ) vec (Ua XMa )
T T
.
a∈Σ
Na XVaT = Ua XMTa
for every a ∈ Σ. With this condition satisfied we will have that the A→B super-operator Ψ defined
by
Ψ( Z ) = ∑ ( Na ⊗ Va ) Z ( Na ⊗ Va )∗
a∈Σ
Lecture 14. Pure state entanglement transformation 114
satisfies
a∈Σ
∗
= ∑ vec (Ua XMa ) vec (Ua XMa )
T T
a∈Σ
= Φ(vec( X ) vec( X )∗ )
= Φ(uu∗ ).
for each a. Multiplying both sides of this equation on the left by Ua A∗ Wa∗ gives
Defining
Va = Wa
Na = Ua A∗ Wa∗ Ma A
for each a ∈ Σ therefore gives Na XVaT = Ua XMTa . It is clear that each Va is unitary and it is easily
checked that
∑ Na∗ Na = 1Z A ,
a∈Σ
We are now prepared to finish the proof. We assume that there exists a restricted LOCC super-
operator Φ ∈ T (Z A ⊗ Z B ) such that Φ( xx∗ ) = yy∗ , from which we conclude that there exists a
B→A super-operator Ψ ∈ T (Z A ⊗ Z B ) such that Ψ( xx∗ ) = yy∗ . Write
Ψ( Z ) = ∑ (Ua ⊗ M a ) Z (Ua ⊗ M a ) ∗ ,
a∈Σ
a∈Σ
This shows that there exists a mixed unitary super-operator Ξ such that
XX ∗ = Ξ(YY ∗ ).
which is equivalent to
TrZ B ( xx∗ ) = Ξ(TrZ B (yy∗ ))
as required.
Theory of Quantum Information Lecture Notes
John Watrous (Institute for Quantum Computing, University of Waterloo)
Lecture 15
Measures of entanglement
The topic of this lecture is measures of entanglement. The underlying idea throughout this discussion
is that entanglement may be viewed as a resource that is useful for various communication-related
tasks, such as teleportation, and we would like to quantify the amount of entanglement that is
contained in different states.
We will begin with a very simple quantity that relates to the amount of entanglement in a given
state: the maximum inner product with a maximally entangled state. It is not necessarily an
interesting concept in its own right, but it will be useful as a mathematical tool in the sections of
this lecture that follow.
Suppose X and Y are complex Euclidean spaces, and assume for the moment that dim(X ) ≥
dim(Y ) = n. A maximally entangled state of a pair of registers (X, Y ) to which these spaces are
associated is any pure state uu∗ for which
1
TrX (uu∗ ) = 1Y ;
n
or, in other words, tracing out the larger space leaves the completely mixed state on the other
register. Equivalently, the maximally entangled states are those that may be expressed as
1
vec(U ) vec(U )∗
n
for some choice of a linear isometry U ∈ U (Y , X ). If it is the case that n = dim(X ) ≤ dim(Y )
then the maximally entangled states are those states uu∗ such that
1
TrY (uu∗ ) = 1X ,
n
or equivalently those that can be written
1
vec(U ∗ ) vec(U ∗ )∗
n
where U ∈ L (X , Y ) is a linear isometry. Quite frequently, the term maximally entangled refers to
the situation in which dim(X ) = dim(Y ), where the two notions obviously coincide. As is to be
expected when discussing pure states, we will often refer to a unit vector u as being a maximally
entangled state, which simply means that uu∗ is maximally entangled.
Now, for an arbitrary density operator ρ ∈ D (X ⊗ Y ), let us define
M (ρ) = max{h uu∗ , ρi : u ∈ X ⊗ Y is maximally entangled}.
Clearly it holds that 0 < M (ρ) ≤ 1 for every density operator ρ ∈ D (X ⊗ Y ). The following
lemma establishes an upper bound on M (ρ) based on the min-rank of ρ.
116
Lecture 15. Measures of entanglement 117
Lemma 15.1. Let X and Y be complex Euclidean spaces, and let n = min{dim(X ), dim(Y )}. Then
min-rank(ρ)
M ( ρ) ≤
n
for all ρ ∈ D (X ⊗ Y ).
Proof. Let us assume dim(X ) ≥ dim(Y ) = n, and note that the argument is equivalent in case the
inequality is reversed.
First let us note that M is a convex function on D (X ⊗ Y ). To see this, consider any choice of
σ, ξ ∈ D (X ⊗ Y ) and p ∈ [0, 1]. Then for U ∈ U (Y , X ) we have
1
vec(U )∗ ( pσ + (1 − p)ξ ) vec(U )
n
1 1
= p vec(U )∗ σ vec(U ) + (1 − p) vec(U )∗ ξ vec(U )
n n
≤ pM (σ) + (1 − p) M (ξ ).
1 1
M (vec( A) vec( A)∗ ) = max |hU, Ai|2 = k A k21 .
n U ∈U(Y ,X ) n
Given that q
k A k1 ≤ rank( A) k A k2
for every operator A, the lemma follows.
We will now discuss two fundamental measures of entanglement: the entanglement cost and the
distillable entanglement. For the remainder of the lecture, let us take
1 1
τ = |φ + i hφ + | for |φ+ i = √ |00i + √ |11i .
2 2
We view that the state τ represents one unit of entanglement, typically called an e-bit of entangle-
ment.
Lecture 15. Measures of entanglement 118
Definition 15.2. The entanglement cost of a density operator ρ ∈ D (X A ⊗ X B ), denoted Ec (ρ), is the
infimum over all real numbers α ≥ 0 for which there exists a sequence of LOCC super-operators
{Φn : n ∈ N }, where
⊗⌊ αn⌋ ⊗⌊ αn⌋
Φn ∈ LOCC Y A , X A⊗n : YB , X B⊗n ,
such that
lim F Φn τ ⊗⌊αn⌋ , ρ⊗n = 1.
n→∞
The interpretation of the definition is that, in the limit of large n, Alice and Bob are able to
convert ⌊αn⌋ e-bits into n copies of ρ with high fidelity for any α > Ec (ρ). It is not entirely obvious
that there should exist any value of α for which there exists a sequence of LOCC super-operators
{Φn : n ∈ N } as in the statement of the definition, but indeed there always does exist a suitable
choice of α.
such that
lim F Φn ρ⊗n , τ ⊗⌊αn⌋ = 1.
n→∞
The interpretation of the definition is that, in the limit of large n, Alice and Bob are able to
convert n copies of ρ into ⌊αn⌋ e-bits with high fidelity for any α < Ed (ρ). In order to clarify the
definition, let us state explicitly that the operator τ ⊗0 is interpreted to be the scalar 1, implying
that condition in the lemma is trivially satisfied for α = 0.
Proof. Let us assume that α and β are nonnegative real numbers such that the following two prop-
erties are satisfied:
such that
⊗⌊αn⌋ ⊗n
lim F Φn τ ,ρ = 1.
n→∞
such that
lim F Ψn ρ⊗n , τ ⊗⌊ βn⌋ = 1.
n→∞
Using the Fuchs–van de Graaf Inequalities, along with triangle inequality for the trace norm,
we conclude that
lim F (Ψn Φn ) τ ⊗⌊αn⌋ , τ ⊗⌊ βn⌋ = 1. (15.1)
n→∞
Because min-rank(τ ⊗k ) = 2k for every choice of k ≥ 1, and LOCC super-operators cannot increase
min-rank, we have that
2
F (Ψn Φn ) τ ⊗⌊αn⌋ , τ ⊗⌊ βn⌋ ≤ 2⌊αn⌋−⌊ βn⌋. (15.2)
The remainder of the lecture will focus on the entanglement cost and distillable entanglement for
bipartite pure states. In this case, these measures turn out to be identical, and coincide precisely
with the von Neumann entropy of the reduced state of either subsystem.
Theorem 15.5. Let X A and X B be complex Euclidean spaces and let u ∈ X A ⊗ X B be a unit vector. Then
Proof. The proof will start with some basic observations about the vector u that will be used to
calculate both the entanglement cost and the distillable entanglement. Let ρ = TrXB (uu∗ ), and let
ρ= ∑ p(a) va v∗a
a∈Σ
Lecture 15. Measures of entanglement 120
be a spectral decomposition of ρ, for some arbitrary set Σ with | Σ | = dim(X A ). We may then
choose orthonormal vectors {w a : a ∈ Σ} ⊂ X B so that
q
u= ∑ p( a) v a ⊗ w a
a∈Σ
Here we are using the shorthand va1 ···an = va1 ⊗ · · · ⊗ van (and similar for w a1 ···an ). We have
k xn,ε k2 = ∑ p ( a1 ) · · · p ( a n ) ,
a1 ···an ∈ Tn,ε
which is the probability that a random choice of a1 · · · an is ε-typical with respect to the probability
vector p. It follows that limn→∞ k xn,ε k = 1 for any choice of ε > 0. Let
xn,ε
yn,ε =
k xn,ε k
denote the normalized versions of these vectors.
Next, consider the vector of eigenvalues
∗
λ TrX ⊗n xn,ε xn,ε .
B
The nonzero eigenvalues are given by the probabilities for the various ε-typical sequences, and so
2−n( H ( p)+ε) < λ j TrX ⊗n xn,ε xn,ε
∗
< 2−n( H ( p)−ε)
B
We will do this by means of Nielsen’s Theorem. Specifically, let us choose ε > 0 so that α >
H ( p) + 2ε, from which it follows that ⌊ αn⌋ ≥ n( H ( p) + ε) for sufficiently large n. We have
λ j TrY ⊗⌊αn⌋ τ ⊗⌊αn⌋ = 2−⌊αn⌋
B
2−n( H ( p)+ε)
≥ 2−n( H ( p)+ε) ≥ 2−⌊αn⌋ ,
k xn,ε k2
for sufficiently large n, it follows that
TrY ⊗⌊αn⌋ τ ⊗⌊αn⌋ ≺ TrX ⊗n yn,ε y∗n,ε .
(15.4)
B B
This means that τ ⊗⌊αn⌋ can be converted to yn,ε y∗n,ε by means of an LOCC super-operator Φn by
Nielsen’s Theorem. Given that
lim F yn,ε y∗n,ε , (uu∗ )⊗n = 1
n→∞
this implies that the required equation (15.3) holds. Consequently Ec (uu∗ ) ≤ S (TrXB (uu∗ )).
Next let us consider the distillable entanglement, for which a similar argument is used. Our
goal is to prove that for every α < H ( p), there exists a sequence {Φn : n ∈ N } of LOCC super-
operators such that
lim F Φn ρ⊗n , τ ⊗⌊αn⌋ = 1.
(15.5)
n→∞
In this case, let us choose ε > 0 small enough so that α < H ( p) − 2ε. Then for sufficiently large n
we have
2⌊αn⌋ ≤ k xn,ε k2 2n( H ( p)−ε).
Similar to above, we therefore have
TrX ⊗n yn,ε y∗n,ε ≺ TrY ⊗⌊αn⌋ τ ⊗⌊αn⌋ ,
B B
which implies that the state yn,ε y∗n,ε can be converted to the state τ ⊗⌊αn⌋ by means of an LOCC
super-operator Φn for sufficiently large n. Given that
it follows that
lim
Φn (uu∗ )⊗n − τ ⊗⌊αn⌋
n→∞
1
≤ lim
Φn (uu∗ )⊗n − Φn yn,ε y∗n,ε
1 +
Φn yn,ε y∗n,ε − τ ⊗⌊αn⌋
= 0,
n→∞ 1
which establishes the above equation (15.5). Consequently S (TrXB (uu∗ )) ≤ Ed (uu∗ ).
We have shown that
Ec (uu∗ ) ≤ S(TrXB (uu∗ )) ≤ Ed (uu∗ ).
As Ed (uu∗ ) ≤ Ec (uu∗ ), the equality in the statement of the theorem follows.
Theory of Quantum Information Lecture Notes
John Watrous (Institute for Quantum Computing, University of Waterloo)
Lecture 16
The partial transpose and bound-entangled states
In this lecture we will discuss the partial transpose and its connection to entanglement. The pri-
mary focus of the lecture will be to construct examples of bound-entangled states, which are mixed
states that are entangled and yet have zero distillable entanglement. The known constructions of
such states are closely connected to the partial transpose.
Recall the Woronowicz–Horodecki criterion for separability: for complex Euclidean spaces X
and Y , we have that a given operator P ∈ Pos (X ⊗ Y ) is separable if and only if
(Φ ⊗ 1L(Y ) )( P) ∈ Pos (Y ⊗ Y )
for every choice of a positive unital super-operator Φ ∈ T (X , Y ). We note, however, that the
restriction of the super-operator Φ to be both unital and to take the form Φ ∈ T (X , Y ) can be
relaxed. Specifically, the Woronowicz–Horodecki criterion implies the truth of the following two
facts:
1. If P ∈ Pos (X ⊗ Y ) is separable, then for every choice of a complex Euclidean space Z and a
positive super-operator Φ ∈ T (X , Z), we have
(Φ ⊗ 1L(Y ) )( P) ∈ Pos (Z ⊗ Y ) .
(Φ ⊗ 1L(Y ) )( P) 6∈ Pos (Z ⊗ Y ) .
Moreover, there exists such a super-operator Φ that is unital and for which Z = Y .
It is clear that the criterion illustrates a connection between separability and positive super-
operators that are not completely positive, for if Φ ∈ T (X , Z) is completely positive, then
(Φ ⊗ 1L(Y ) )( P) ∈ Pos (Z ⊗ Y )
122
Lecture 16. The partial transpose and bound-entangled states 123
for all X ∈ L (X ). The positivity of T is clear: X ∈ Pos (X ) if and only of X T ∈ Pos (X ) for every
X ∈ L (X ). Assuming that X = C Σ , we have
T (X ) = ∑ Ea,b XEb,a
a,b ∈Σ
J (T ) = ∑ Ea,b ⊗ Eb,a = W
a,b ∈Σ
where W ∈ U (X ⊗ X ) denotes the swap operator. The fact that W is not positive semidefinite
shows that T is not completely positive.
When we refer to the partial transpose, we mean that the transpose super-operator is tensored
with the identity super-operator on some other space. We will use a similar notation to the partial
trace: for given complex Euclidean spaces X and Y , we define
TX = T ⊗ 1L(Y ) ∈ T (X ⊗ Y ) .
More generally, the subscript refers to the space on which the transpose is performed.
Given that the transpose is positive, we may conclude the following from the Woronowicz–
Horodecki Criterion for any choice of P ∈ Pos (X ⊗ Y ):
1. If P is separable, then TX ( P) is necessarily positive semidefinite.
2. If P is not separable, then TX ( P) might not be positive semidefinite, although nothing defini-
tive can be concluded from the criterion.
We have seen a specific example where the transpose indeed does identify entanglement: if Σ is a
finite, nonempty set of size n, and we take X A = C Σ and X B = C Σ , then
1
P=
n ∑ Ea,b ⊗ Ea,b ∈ Pos (X A ⊗ X B )
a,b ∈Σ
P ⊗ Q ∈ PPT (X A ⊗ Y A : X B ⊗ YB ) .
Lecture 16. The partial transpose and bound-entangled states 124
Finally, notice that the definition of PPT (X A : X B ) does not really depend on the fact that the
partial transpose is performed on X A as opposed to X B . This follows from the observation that
T ( TX A ( X )) = TXB ( X )
In this section we will discuss two examples of operators that are both entangled and PPT. This
shows that the partial transpose test does not give an efficient test for separability, and also implies
something interesting about entanglement distillation to be discussed in the next section.
1 1 1
Pn =
n ∑ Ea,b ⊗ Ea,b , Wn = ∑ Eb,a ⊗ Ea,b , Rn =
2
1 − Wn ,
2
a,b ∈Z n a,b ∈Z n
1 1
Qn = 1 − Pn , Sn = 1 + Wn .
2 2
It holds that Pn , Qn , Rn , and Sn are projection operators with Pn + Qn = Rn + Sn = 1. The operator
Wn is the swap operator, Rn is the projection onto the anti-symmetric subspace of CZn ⊗ CZn and Sn
is the projection onto the symmetric subspace of CZn ⊗ CZn .
It is clear that
1
( T ⊗ 1 )( Pn ) = Wn and ( T ⊗ 1 )(1 ) = 1,
n
from which the following equations follow:
1 1 n−1 1
( T ⊗ 1 )( Pn ) = − Rn + Sn , ( T ⊗ 1 )( Rn ) = − Pn + Qn ,
n n 2 2
n+1 n−1 n+1 1
( T ⊗ 1 )( Qn ) = Rn + Sn , ( T ⊗ 1 )(Sn ) = Pn + Qn .
n n 2 2
Now let us suppose we have registers X2 , Y2 , X3 , and Y3 , where
X2 = C Z2 , Y2 = C Z 2 , X3 = C Z3 , Y3 = C Z 3 .
In other words, X2 and Y2 are qubit registers, while X3 and Y3 are qutrit registers. We will imagine
the situation in which Alice holds registers X2 and X3 , while Bob holds Y2 and Y3 .
For every choice of α > 0, define
Xα 6∈ Sep (X3 ⊗ X2 : Y3 ⊗ Y2 )
for every choice of α > 0, as we will now show. Define Ψ ∈ T (X2 ⊗ Y2 , X3 ⊗ Y3 ) to be the unique
super-operator for which J (Ψ) = Xα . Using the identity
we see that Ψ( P2 ) = αP3 . So, for α > 0 we have that Ψ increases min-rank and is therefore not a
separable super-operator. Thus, it is not the case that Xα is separable.
|0i − |1i
u 1 = |0i ⊗ √
2
|1i − |2i
u 2 = |2i ⊗ √
2
|0i − |1i
u3 = √ ⊗ |2i
2
|1i − |2i
u4 = √ ⊗ |0i
2
|0i + |1i + |2i |0i + |1i + |2i
u5 = √ ⊗ √
3 3
There are three relevant facts about this set for the purpose of our discussion:
To verify the third property, note that in order for a product vector v ⊗ w to be orthogonal to any
ui , it must be that hv, xi i = 0 or hw, yi i = 0. In order to have hv ⊗ w, ui i for i = 1, . . . , 5 we must
therefore have hv, xi i = 0 for at least three distinct choices of i or hw, yi i = 0 for at least three
distinct choices of i. However, for any three distinct choices of indices i, j, k ∈ {1, . . . , 5} we have
span{ xi , x j , xk } = X and span{yi , y j , yk } = Y , which implies that either v = 0 or w = 0, and
therefore v ⊗ w = 0.
Now, define a projection P ∈ Pos (X ⊗ Y ) as
5
P = 1 X ⊗Y − ∑ ui u∗i .
i =1
The second equality follows from the fact that each xi has only real coefficients, so xi = xi . Thus,
5 5
TX ( P) = TX (1 X ⊗Y ) − ∑ TX (ui u∗i ) = 1 X ⊗Y − ∑ ui u∗i = P ∈ Pos (X ⊗ Y ) ,
i =1 i =1
as claimed.
Now let us assume toward contradiction that P is separable. This implies that it is possible to
write
m
P= ∑ vj v∗j ⊗ w j w∗j
j =1
The last part of this lecture concerns the relationship between the partial transpose and entangle-
ment distillation. Our goal will be to prove that PPT states cannot be distilled, meaning that the
distillable entanglement is zero.
Let us begin the discussion with some further properties of PPT states that will be needed.
First we will observe that separable super-operators respect the positivity of the partial transpose.
TX A ( P) ∈ Pos (X A ⊗ X B )
Lecture 16. The partial transpose and bound-entangled states 127
and therefore
(1 X A ⊗ B) TX A ( P)(1 X A ⊗ B∗ ) ∈ Pos (X A ⊗ YB ) .
The partial transpose on X A commutes with the conjugation by B, and therefore
TX A ((1 X A ⊗ B) P(1 X A ⊗ B∗ )) ∈ Pos (X A ⊗ YB ) .
This implies that
T ( TX A (( I ⊗ B) P( I ⊗ B∗ ))) = TY B (( I ⊗ B) P( I ⊗ B∗ )) ∈ Pos (X A ⊗ YB )
as remarked in the first section of the lecture. Using the fact that conjugation by A commutes with
the partial transpose on YB , we have that
( A ⊗ 1 YB ) TYB (( I ⊗ B) P( I ⊗ B∗ )) ( A∗ ⊗ 1 YB ) = TYB (( A ⊗ B) P( A∗ ⊗ B∗ )) ∈ Pos (Y A ⊗ YB ) .
We have therefore proved that ( A ⊗ B) P( A∗ ⊗ B∗ ) ∈ PPT (Y A : YB ).
Now, for Φ ∈ SepT (X A , Y A : X B , YB ), we have that Φ( P) ∈ PPT (Y A : YB ) by the above obser-
vation together with the fact that PPT (Y A : YB ) is a convex cone.
Next, let us note that PPT states cannot have a large inner product with a maximally entangled
states.
Lemma 16.2. Let X and Y be complex Euclidean spaces and let n = min{dim(X ), dim(Y )}. For any
PPT density operator
ρ ∈ D (X ⊗ Y ) ∩ PPT (X : Y )
we have M (ρ) ≤ 1/n.
Proof. Let us assume, without loss of generality, that Y = CZn and dim(X ) ≥ n. Every maximally
entangled state on X ⊗ Y may therefore be written
(U ⊗ 1 Y ) Pn (U ⊗ 1 Y )∗
for U ∈ U (Y , X ) a linear isometry. We have that
h(U ⊗ 1 Y ) Pn (U ⊗ 1 Y )∗ , ρi = h Pn , (U ⊗ 1 Y )∗ ρ(U ⊗ 1 Y )i ,
and that
(U ⊗ 1 Y )∗ ρ(U ⊗ 1 Y ) ∈ PPT (Y : Y )
is a PPT operator with trace at most 1. To prove the lemma it therefore suffices to prove that
1
h Pn , ξ i ≤
n
for every ξ ∈ D (Y ⊗ Y ) ∩ PPT (Y : Y ).
The partial transpose is its own adjoint and inverse, which implies that
h( T ⊗ 1 )( A), ( T ⊗ 1 )( B)i = h A, Bi
for any choice of operators A, B ∈ L (Y ⊗ Y ). It is also clear that the partial transpose preserves
trace, which implies that ( T ⊗ 1 )(ξ ) ∈ D (Y ⊗ Y ) for every ξ ∈ D (Y ⊗ Y ) ∩ PPT (Y : Y ). Given
that Rn and Sn are both projections, and therefore h Rn , ( T ⊗ 1 )(ξ )i , hSn , ( T ⊗ 1 )(ξ )i ∈ [0, 1] it
follows that
1 1 1
h Pn , ξ i = h( T ⊗ 1 )( Pn ), ( T ⊗ 1 )(ξ )i = hSn , ( T ⊗ 1 )(ξ )i − h Rn , ( T ⊗ 1 )(ξ )i ≤
n n n
as claimed.
Lecture 16. The partial transpose and bound-entangled states 128
Finally we are ready for the main result of the section, which states that PPT density operators
have no distillable entanglement.
Theorem 16.3. Let X A and X B be complex Euclidean spaces and let ρ ∈ D (X A ⊗ X B ) ∩ PPT (X A : X B ).
Then Ed (ρ) = 0.
Proof. Let Y A = C {0,1} and YB = C {0,1} be complex Euclidean spaces each corresponding to a
single qubit, as in the definition of distillable entanglement, and let τ ∈ D (Y A ⊗ YB ) be the density
operator corresponding to a perfect e-bit.
Let α > 0 and let
⊗⌊αn⌋ ⊗⌊αn⌋
Φn ∈ LOCC X A⊗n , Y A : X B⊗n , YB
be an LOCC super-operator for each n ≥ 1. This implies that Φn is separable and admissible.
Now, if ρ ∈ PPT (X A : X B ) then ρ⊗n ∈ PPT X A⊗n : X B⊗n , and therefore
⊗n ⊗⌊αn⌋ ⊗⌊αn⌋ ⊗⌊αn⌋ ⊗⌊αn⌋
Φn ( ρ ) ∈ D Y A ⊗ YB ∩ PPT Y A : YB .
Lecture 17
LOCC and separable measurements
In this lecture we will discuss measurements that can be collectively performed by two parties by
means of local quantum operations and classical communication. Much of this discussion could
be generalized to measurements implemented by more than two parties, but as we did when
discussing separable states and operations, we will restrict our attention to the two-party case.
Informally speaking, an LOCC measurement is one that can be implemented by two (or more)
parties using only local quantum operations and classical communication. We must, however,
choose a more precise mathematical definition if we are to reason effectively about these objects.
There are many ways one could formally define LOCC measurements, and we will choose just
one way that makes the task simpler by making use of the definition we have already considered
for LOCC super-operators. Specifically, we will say that a measurement
{ Pa : a ∈ Σ} ⊂ Pos (X A ⊗ X B )
on a bipartite system having associated complex Euclidean spaces X A and X B is an LOCC measure-
ment if there exist complex Euclidean spaces Y A and YB , an LOCC super-operator
Φ ∈ LOCC (X A , Y A : X B , YB ) ,
and a measurement {Q a : a ∈ Σ} ⊆ Pos (Y A ) such that
h Pa , ρi = h Q a ⊗ 1 YB , Φ(ρ)i (17.1)
for every a ∈ Σ and every ρ ∈ D (X A ⊗ X B ). Let us note for future reference that the equation
(17.1) holds for all ρ ∈ D (X A ⊗ X B ) if and only if
Pa = Φ∗ ( Q a ⊗ 1 Y B ).
The interpretation of this definition is simple: Alice and Bob implement the measurement
{ Pa : a ∈ Σ} by first performing the LOCC super-operator Φ, after which Alice measures the
resulting system according to {Q a : a ∈ Σ}. Of course there is nothing special about letting
Alice perform the measurement rather than Bob; we are just making an arbitrary choice for the
sake of arriving at a definition, and it is easy to see that the definition would be equivalent if
Bob performed the measurement instead. The space YB has no significance in this definition—so
although the definition does not impose any particular choice of YB , there is no loss of generality
in choosing YB = C.
As we did for super-operators, we also consider a relaxation of LOCC measurements that is
often much easier to work with. A measurement
{ Pa : a ∈ Σ} ⊂ Pos (X A ⊗ X B )
is said to be separable if Pa ∈ Sep (X A : X B ) for every a ∈ Σ.
129
Lecture 17. LOCC and separable measurements 130
µ : Σ → Pos (X A ⊗ X B )
µ( a) = Pa = Φ∗ ( Q a ⊗ 1 Y B )
It is the case that there are separable measurements that are not LOCC measurements—we will
see such an example (albeit without a proof) later in the lecture. However, separable measure-
ments can be simulated by LOCC measurement in a probabilistic sense: the LOCC measurement
that simulates the separable measurement might fail, but it succeeds with nonzero probability,
and if it succeeds it generates the same output statistics as the original separable measurement.
This is the content of the following theorem.
h Q a , ρi
= h Pa , ρi .
1 − h Qfail , ρi
Pa = ∑ Aa,b ⊗ Ba,b
b∈Γ
{ A a,b : a ∈ Σ, b ∈ Γ} ⊂ Pos (X A ) ,
{ Ba,b : a ∈ Σ, b ∈ Γ} ⊂ Pos (X B ) .
{ A a,b ⊗ Ba,b : ( a, b) ∈ Σ × Γ}
can easily be transformed into one that simulates { Pa : a ∈ Σ} by adjusting Alice’s final measure-
ment outcome so that she effectively discards b and outputs a.
Lecture 17. LOCC and separable measurements 131
We may therefore assume without loss of generality that Pa = A a ⊗ Ba for A a ∈ Pos (X A ) and
Ba ∈ Pos (X B ) for each a ∈ Σ. Choose scalars α, β > 0 such that
∑(αAa ) ≤ 1X A
and ∑( βBa ) ≤ 1X B
,
a a
{ R a : a ∈ Σ ∪ {fail }} ⊂ Pos (X A ) ,
{Sa : a ∈ Σ ∪ {fail }} ⊂ Pos (X B ) ,
Now, consider the following LOCC protocol: Alice and Bob independently measure according
to { R a : a ∈ Σ ∪ {fail}} and {Sa : a ∈ Σ ∪ {fail}}, and then compare their measurement outcomes
by using classical communication. If either of their outcomes are fail, or if their outcomes do not
agree, then Alice and Bob decide on the measurement outcome fail. Otherwise, if they agree on
the outcome a ∈ Σ, then Alice and Bob decide on outcome a. We see that the measurement they
implement is described by
Q a = αβ A a ⊗ Ba
for a ∈ Σ and
Qfail = 1 X A ⊗XB − ∑ Qa .
a∈Σ
It is clear that certain measurements are not separable, and therefore cannot be performed by
means of local operations and classical communication. For instance, no LOCC measurement can
perfectly distinguish any fixed entangled pure state from all orthogonal states, given that one of
the required measurement operators would then necessarily be non-separable. This fact trivially
implies that Alice and Bob cannot perform a measurement with respect to any orthonormal basis
{u a : a ∈ Σ } ⊂ X A ⊗ X B
1 1
Rn = (1 − Wn ) and Sn = (1 + Wn ),
2 2
for Wn denoting the swap operator (as was discussed in the previous lecture). The fact that Rn is
not separable follows from the fact that it is not PPT:
n−1 1
( T ⊗ 1 )( Rn ) = − Pn + Qn ,
2 2
where Pn and Qn are as defined in the previous lecture.
Lecture 17. LOCC and separable measurements 132
When we force measurements to have non-separable measurement operators, it is clear that the
measurements cannot be performed using local operations and classical communication. Some-
times, however, we may be interested in a task that potentially allows for many different imple-
mentations as a measurement.
One interesting scenario along these lines that has been studied quite a bit is the task of dis-
tinguishing sets of pure states. Specifically, we suppose that X A and X B are complex Euclidean
spaces and {u1 , . . . , uk } ⊂ X A ⊗ X B is a set of orthogonal unit vectors. Alice and Bob are given a
pure state ui for i ∈ {1, . . . , k} and their goal is to determine the value of i. (It is assumed of course
that they have complete knowledge of the set {u1 , . . . , uk }.) Under the assumption that k is smaller
than the total dimension of the space X A ⊗ X B , there will be many measurements { P1 , . . . , Pk } that
correctly distinguish the elements of the set {u1 , . . . , uk }, and the general question we consider is
whether there is at least one such measurement that is LOCC.
therefore represents a set of maximally entangled pure states. We will show that such a set cannot
be perfectly distinguished under the assumption that k ≥ n + 1.
Consider any separable measurement { P1 , . . . , Pk } ⊂ Pos (X A ⊗ X B ). Given that this is a sepa-
rable measurement, we may write
Pi = ∑ Q a ⊗ R a
a∈Σi
∑ Qa ⊗ Ra = 1X A ⊗X B
.
a∈Σ
Now, given one of the above maximally entangled states chosen uniformly at random, the
probability that the measurement correctly determines the chosen index i is
k
1
p=
nk ∑ vec(Ui )∗ Pi vec(Ui )
i =1
k
1
= ∑∑ vec(Ui )∗ ( Q a ⊗ R a ) vec(Ui )
nk i =1 a ∈ Σ i
k
1
= ∑∑ Tr (Ui∗ Q a Ui RTa ) .
nk i =1 a ∈ Σ i
Lecture 17. LOCC and separable measurements 133
For any two positive semidefinite operators X and Y we have Tr( XY ) ≤ Tr( X ) Tr(Y ), and thus
k k
1 1 1
p≤ ∑∑ Tr (Ui∗ Q a Ui ) Tr ( RTa ) = ∑∑ Tr ( Q a ) Tr ( R a ) = ∑ Tr(Qa ) Tr( Ra ).
nk i =1 a ∈ Σ i
nk i =1 a ∈ Σ i
nk a∈Σ
Now,
n2 = Tr(1 X A ⊗XB ) = ∑ Tr(Qa ⊗ Ra ) = ∑ Tr(Qa ) Tr( Ra ),
a∈Σ a∈Σ
and thus
1 2 n
n = . p≤
nk k
Consequently, if k > n then the measurement errs with some nonzero probability.
As any measurement implementable by an LOCC protocol is separable, it follows that no
LOCC protocol can distinguish more than n maximally entangled states in X A ⊗ X B in the case
that dim(X A ) = dim(X B ) = n.
A measurement with respect to this basis is an example of a measurement that is separable but
not LOCC, and this can also be translated into an example of an admissible and separable super-
operator that is not LOCC.
The proof that the above set is not perfectly LOCC distinguishable is very technical, and so a
reference will have to suffice in place of a proof:
C. H. Bennett, D. DiVincenzo, C. Fuchs, T. Mor, E. Rains, P. Shor, J. Smolin, and W. Wootters.
Quantum nonlocality without entanglement. Physical Review A, 59(2):1070–1091, 1999.
Another family of examples of product states (but not product bases) that cannot be distin-
guished by LOCC measurements comes from an unextendible product set. For instance, the set
discussed in the previous lecture cannot be distinguished by an LOCC measurement:
|0i ⊗ |0i−|
√ 1i
2
|0i−|
√ 1 i ⊗ |2i
2
|0i+|√1i+|2i
3
⊗ |0i+|√1i+|2i
3
|2i ⊗ |1i−|
√ 2i
2
|1i−|
√ 2 i ⊗ |0i
2
Finally, we will prove an interesting and fundamental result in this area, which is that any two
orthogonal pure states can always be distinguished by an LOCC measurement.
In order to prove this fact, we need a theorem known as the Toeplitz–Hausdorff Theorem,
which concerns the numerical range of an operator. The numerical range of an operator A ∈ L (X )
is the set N ( A) ⊆ C defined as follows:
N ( A) = {u∗ Au : u ∈ X , k u k = 1} .
u∗ Xu = 1, v∗ Xv = 0,
u∗ Yu = 0, v∗ Yv = 0.
Without loss of generality we may also assume u∗ Yv is purely imaginary (i.e., has real part equal
to 0), for otherwise v may be replaced by eiθ v for an appropriate choice of θ without changing any
of the previously observed properties.
As u and v are linearly independent, we have that tu + (1 − t)v is a nonzero vector for every
choice of t. Thus, for each t ∈ [0, 1] we may define
tu + (1 − t)v
z( t) = ,
k tu + (1 − t)v k
which is of course a unit vector. Because u∗ Yu = v∗ Yv = 0 and u∗ Yv is purely imaginary, we have
z(t)∗ Yz(t) = 0 for every t. Thus
Corollary 17.4. For any complex Euclidean space X and any operator A ∈ L (X ) satisfying Tr( A) = 0,
there exists an orthonormal basis { x1 , . . . , xn } of X for which xi∗ Axi = 0 for i = 1, . . . , n.
Proof. The proof is by induction on n = dim(X ), and the base case n = 1 is trivial.
Suppose that n ≥ 2. It is clear that λ1 ( A), . . . , λn ( A) ∈ N ( A), and thus 0 ∈ N ( A) because
n
1 1
0=
n
Tr( A) =
n ∑ λ i ( A ),
i =1
which is a convex combination of elements of N ( A). Therefore there exists a unit vector u ∈ X
such that u∗ Au = 0.
Now, let Y ⊆ X be the orthogonal complement of u in X , and let ΠY = 1 X − uu∗ be the
orthogonal projection onto Y . It holds that
Tr(ΠY AΠY ) = Tr( A) − u∗ Au = 0.
Moreover, because Im(ΠY AΠY ) ⊆ Y , we may regard ΠY AΠY as an element of L (Y ). By the
induction hypothesis, we therefore have that there exists an orthonormal basis {v1 , . . . , vn−1 } of Y
such that v∗i ΠY AΠY vi = 0 for i = 1, . . . , n − 1. It follows that {u, v1 , . . . , vn−1 } is an orthonormal
basis of X with the properties required by the statement of the corollary.
Now we are ready to return to the problem of distinguishing orthogonal states. Suppose that
x, y ∈ X A ⊗ X B
are orthogonal unit vectors. We wish to show that there exists an LOCC measurement that cor-
rectly distinguishes between x and y. Let X, Y ∈ L (X B , X A ) be operators satisfying x = vec( X )
and y = vec(Y ), so that the orthogonality of x and y is equivalent to Tr( X ∗ Y ) = 0. By Corol-
lary 17.4, we have that there exists an orthonormal basis {u1 , . . . , un } of X B with the property that
u∗i X ∗ Yui = 0 for i = 1, . . . , n.
Now, suppose that Bob measures his part of either xx∗ or yy∗ with respect to the orthonormal
basis {u1 , . . . , un } of X B and transmits the result of the measurement to Alice. Conditioned on Bob
obtaining the outcome i, the (unnormalized) reduced state of Alice’s system becomes
(1 X A ⊗ uTi ) vec( X ) vec( X )∗ (1 X A ⊗ ui ) = Xui u∗i X ∗
in case the original state was xx∗ , and Yui u∗i Y ∗ in case the original state was yy∗ .
A necessary and sufficient condition for Alice to be able to correctly distinguish these two
states, given knowledge of i, is that
h Xui u∗i X ∗ , Yui u∗i Y ∗ i = 0.
This condition is equivalent to u∗i X ∗ Yui = 0 for each i = 1, . . . , n. The basis {u1 , . . . , un } was cho-
sen to satisfy this condition, which implies that Alice can correctly distinguish the two possibilities
without error.
Theory of Quantum Information Lecture Notes
John Watrous (Institute for Quantum Computing, University of Waterloo)
Lecture 18
Super-operator norms and distinguishability
This lecture is primarily concerned with the distinguishability of quantum operations. We will first
discuss some basic aspects of this notion, and then discuss norms on super-operators that are
closely related to it. In particular, we will define a super-operator norm called the diamond norm
(or completely bounded trace norm) that plays an analogous role for super-operator distinguishability
that the ordinary trace norm plays for density operator distinguishability.
Recall from Lecture 3 that the trace norm has a close relationship to the optimal probability of
distinguishing between two quantum states. For instance, if a bit a ∈ {0, 1} is chosen uniformly at
random, and a register X is prepared in the reduced state ρa for a fixed choice of density operators
ρ0 , ρ1 ∈ D (X ), then the optimal probability to correctly guess a by means of a measurement on X
alone is
1 1
+ k ρ0 − ρ1 k1 .
2 4
One may consider a similar situation involving admissible super-operators rather than density
operators. Specifically, let us suppose that Φ0 , Φ1 ∈ T (X , Y ) are admissible super-operators, and
that a bit a ∈ {0, 1} is chosen uniformly at random. A single evaluation of the operation Φa is
made available, and the goal is to determine a with maximal probability.
One approach to this problem is to choose ξ ∈ D (X ) in order to maximize the quantity
k ρ0 − ρ1 k1 for ρ0 = Φ0 (ξ ) and ρ1 = Φ1 (ξ ). If a register X is prepared so that its reduced state
is ξ, and X is input to the given operation Φa , then the output is a register Y that can be measured
using an optimal measurement to distinguish the two possible outputs ρ0 and ρ1 .
This, however, is not the most general approach. More generally, one may include an auxiliary
register Z in the process—meaning that a pair of registers (X, Z) is prepared in some reduced state
ξ ∈ D (X ⊗ Z), and the given super-operator Φa is applied to X. This results in a pair of registers
(Y, Z) that will be in either of the states ρ0 = (Φ0 ⊗ 1L(Z) )(ξ ) or ρ1 = (Φ1 ⊗ 1L(Z) )(ξ ), which may
then be distinguished by a measurement on Y ⊗ Z . Indeed, this more general approach can give
a sometimes striking improvement in the probability to distinguish Φ0 and Φ1 , as the following
example illustrates.
Example 18.1. Let X be a complex Euclidean space and let n = dim(X ). Define admissible super-
operators Φ0 , Φ1 ∈ T (X ) as follows:
1
Φ0 ( X ) = ((Tr X )1 X + X T ) ,
n+1
1
Φ1 ( X ) = ((Tr X )1 X − X T ) .
n−1
136
Lecture 18. Super-operator norms and distinguishability 137
It is clear from the definitions that both Φ0 and Φ1 are trace-preserving, while complete positivity
follows from a calculation of the Choi-Jamiołkowski representations of these super-operators:
1 2
J ( Φ0 ) = (1 X ⊗X + W ) = S,
n+1 n+1
1 2
J ( Φ1 ) = (1 X ⊗X − W ) = R,
n−1 n−1
where W ∈ L (X ⊗ X ) is the swap operator and R, S ∈ L (X ⊗ X ) are the projections onto the
symmetric and antisymmetric subspaces of X ⊗ X , respectively.
Now, for any choice of a density operator ξ ∈ D (X ) we have
1 1 1 1
Φ0 ( ξ ) − Φ1 ( ξ ) = − 1X + + ξT
n+1 n−1 n+1 n−1
2 2n T
=− 2 1X + 2 ξ .
n −1 n −1
The trace norm of such an operator is maximized when ξ has rank 1, in which case the value of
the trace norm is
2n − 2 2 4
2
+ ( n − 1) 2 = .
n −1 n −1 n+1
Consequently
4
k Φ0 (ξ ) − Φ1 (ξ )k1 ≤
n+1
for all ξ ∈ D (X ). For large n, this quantity is small, which is not surprising because both Φ0 (ξ )
and Φ1 (ξ ) are almost completely mixed for any choice of ξ ∈ D (X ).
However, suppose that we prepare two registers (X1 , X2 ) in the maximally entangled state
1
τ= vec(1 X ) vec(1 X )∗ ∈ D (X ⊗ X ) .
n
We have
1
(Φa ⊗ 1L(X ) )(τ ) = J (Φa )
n
for a ∈ {0, 1}, and therefore
2 2
(Φ0 ⊗ 1L(X ) )(τ ) = S and (Φ1 ⊗ 1L(X ) )(τ ) = R.
n ( n + 1) n ( n − 1)
Because R and S are orthogonal, we have
(Φ0 ⊗ 1L(X ) )(τ ) − (Φ1 ⊗ 1L(X ) )(τ )
= 2,
1
meaning that the states (Φ0 ⊗ 1L(X ) )(τ ) and (Φ1 ⊗ 1L(X ) )(τ ), and therefore the super-operators
Φ0 and Φ1 , can be distinguished perfectly.
By applying Φ0 or Φ1 to part of a larger system, we have therefore completely eliminated the
large error that was present when the limited approach of choosing ξ ∈ D (X ) as an input to Φ0
and Φ1 was considered.
The previous example makes clear that auxiliary systems must be taken into account if we
are to understand the optimal probability with which admissible super-operators can be distin-
guished.
Lecture 18. Super-operator norms and distinguishability 138
With the discussion from the previous section in mind, we will now discuss two super-operator
norms: the super-operator trace norm and the diamond norm. The precise relationship these norms
have to the notion of super-operator distinguishability will be made clear later in the section.
k Φ k1 = max {k Φ( X )k1 : X ∈ L (X ) , k X k1 ≤ 1} .
k ΦΨ k 1 ≤ k Φ k1 k Ψ k1 . (18.1)
This is a general property of every induced super-operator norm, provided that the same norm on
L (Y ) is taken for both super-operator norms.
Second, the super-operator trace norm can be expressed as follows for a given super-operator
Φ ∈ T (X , Y ):
k Φ k1 = max{k Φ(uv∗ )k1 : u, v ∈ S(X )}. (18.2)
This is because the trace norm (like every other norm) is a convex function, and the unit ball with
respect to the trace norm can be represented as
This norm is also called the completely bounded trace norm, as well as some other variants of this
name. The principle behind its definition is that tensoring Φ with the identity super-operator
has the effect of stabilizing its super-operator trace norm. This sort of stabilization endows the
diamond norm with many nice properties, and allows it to be used in contexts where the super-
operator trace norm is not sufficient. It is also clear that this sort of stabilization is related to the
phenomenon illustrated in the previous section, where tensoring with the identity super-operator
had the effect of amplifying the difference between admissible super-operators.
Lecture 18. Super-operator norms and distinguishability 139
The first thing we must do is to explain why the definition of the diamond norm tensors Φ with
the identity super-operator on X , rather than some other space. The answer is that X has suffi-
ciently large dimension—and tensoring with any complex Euclidean space with larger dimension
would not change anything. The following lemma allows us to prove this fact, and includes a
special case that will be useful later in the lecture.
Lemma 18.2. Let Φ ∈ T (X , Y ), and let Z be a complex Euclidean space. For every choice of unit vectors
u, v ∈ X ⊗ Z there exist unit vectors x, y ∈ X ⊗ X such that
(Φ ⊗ 1L(Z) )(uv∗ )
=
(Φ ⊗ 1L(X ) )( xy∗ )
.
1 1
Proof. The lemma is straightforward when dim(Z) ≤ dim(X ); for any choice of a linear isometry
U ∈ U (Z , X ) the vectors x = (1 X ⊗ U )u and y = (1 X ⊗ U )v satisfy the required conditions. Let
us therefore consider the case where dim(Z) > dim(X ) = n.
Consider the vector u ∈ X ⊗ Z . Given that dim(X ) ≤ dim(Z), there must exist an orthogonal
projection Π ∈ L (Z) having rank at most n = dim(X ) that satisfies u = (1 X ⊗ Π)u and therefore
there must exist a linear isometry U ∈ U (X , Z) such that u = (1 X ⊗ UU ∗ )u. Likewise there must
exist a linear isometry V ∈ U (X , Z) such that v = (1 X ⊗ VV ∗ )v. Such linear isometries U and V
can be obtained from the singular value decompositions of u and v.
Let x = (1 ⊗ U ∗ )u and y = (1 ⊗ V ∗ )v. Notice that we therefore have u = (1 ⊗ U ) x and
v = (1 ⊗ V )y, which shows that x and y are unit vectors. Moreover, given that k U k = k V k = 1,
we have
(Φ ⊗ 1L(Z) )(uv∗ )
=
(Φ ⊗ 1L(Z) )((1 ⊗ U ) xy∗ (1 ⊗ V ∗ ))
1
1
∗ ∗
=
(1 ⊗ U )(Φ ⊗ 1L(X ) )( xy )(1 ⊗ V )
1
∗
=
(Φ ⊗ 1L(X ) )( xy )
1
The following theorem, which gives a precise statement of the fact we wish to prove, is imme-
diate from Lemma 18.2 together with (18.2).
Theorem 18.3. Let X and Y be complex Euclidean spaces, Φ ∈ T (X , Y ) be any super-operator, and let
Z be any complex Euclidean space for which dim(Z) ≥ dim(X ). Then
Φ ⊗ 1L(Z)
= k Φ k ⋄ .
1
One simple but important consequence of this theorem is that the diamond norm is multiplica-
tive with respect to tensor products.
k Φ1 ⊗ Φ2 k ⋄ = k Φ1 k ⋄ k Φ2 k ⋄ .
Lecture 18. Super-operator norms and distinguishability 140
Proof. To make the proof more clear, let us take W1 and W2 to be complex Euclidean spaces with
dim(W1 ) = dim(X1 ) and dim(W2 ) = dim(X2 ), so that
k Φ1 k⋄ =
Φ1 ⊗ 1L(W1 )
,
1
k Φ2 k⋄ =
Φ2 ⊗ 1L(W2 )
,
1
k Φ1 ⊗ Φ2 k⋄ =
Φ1 ⊗ Φ2 ⊗ 1L(W1 ⊗W2 )
.
1
We have
k Φ1 ⊗ Φ2 k⋄ =
Φ1 ⊗ Φ2 ⊗ 1L(W1 ⊗W2 )
1
=
Φ1 ⊗ 1L(Y2 ) ⊗ 1L(W1 ⊗W2 ) 1L(X1 ) ⊗ Φ2 ⊗ 1L(W1 ⊗W2 )
1
≤
Φ1 ⊗ 1L(Y2 ) ⊗ 1L(W1 ⊗W2 )
1L(X1 ) ⊗ Φ2 ⊗ 1L(W1 ⊗W2 )
1 1
= k Φ1 k ⋄ k Φ2 k ⋄ ,
as required.
The final fact that we will establish about the diamond norm in this lecture concerns input
operators for which the value of the diamond norm is achieved. In particular, we will establish a
simple condition on super-operators Φ ∈ T (X , Y ) for which there exists a unit vector u ∈ X ⊗ X
for which
k Φ k⋄ =
(Φ ⊗ 1L(X ) )(uu∗ )
.
1
This fact will give us the final piece we need to connect the diamond norm to the distinguishability
problem discussed in the beginning of the lecture.
Theorem 18.5. Suppose that Φ ∈ T (X , Y ) satisfies Φ( X ∗ ) = (Φ( X ))∗ for every X ∈ L (X ). Then
n
o
k Φ k⋄ = max
(Φ ⊗ 1L(X ) )( xx∗ )
: x ∈ S(X ⊗ X ) .
1
Remark 18.6. The following conditions are easily shown to be equivalent for a given super-
operator Φ ∈ T (X , Y ):
Lecture 18. Super-operator norms and distinguishability 141
=
(Φ ⊗ 1L(X ) )( X )
1
= k Φ k⋄ .
Now, because Y is Hermitian, we may consider a spectral decomposition
Y= ∑ λ j u j u∗j .
j
Theorem 18.7. Let X and Y be complex Euclidean spaces, let U, V ∈ U (X , Y ) be linear isometries, and
suppose
Φ0 ( X ) = UXU ∗ and Φ1 ( X ) = VXV ∗
for all X ∈ L (X ). Then there exists a unit vector u ∈ X such that
N ( A) = {u∗ Au : u ∈ S(X )} .
ν( A) = min {| α | : α ∈ N ( A)} .
ν( A ⊗ 1 X ) = ν( A )
r
u∗ ( A ⊗ 1 X )u = ∑ pj x∗j Axj ∈ N ( A)
j =1
| u ∗ ( A ⊗ 1 X ) u | ≥ ν( A )
Lecture 19
Alternate characterizations of the diamond norm
In the previous lecture we discussed the diamond norm, its connection to the problem of distin-
guishing admissible super-operators, and some of its basic properties. Recall that for X and Y
complex Euclidean spaces and Φ ∈ T (X , Y ) a super-operator, we have
n
o
k Φ k⋄ = max
(Φ ⊗ 1L(X ) )(uv∗ )
: u, v ∈ S(X ⊗ X ) .
1
In this lecture we will discuss two alternate ways in which this norm may be characterized.
Suppose X and Y are complex Euclidean spaces and Φ, Ψ ∈ T (X , Y ) are completely positive (but
not necessarily trace-preserving) super-operators. Let us define the maximum output fidelity of Φ
and Ψ as
Fmax (Φ, Ψ) = max {F(Φ(ρ), Ψ(ξ )) : ρ, ξ ∈ D (X )} .
In other words, this is the maximum fidelity between an output of Φ and an output of Ψ, ranging
over all pairs of density operator inputs. Our first characterization of the diamond norm is based
on the maximum output fidelity, and is given by the following theorem.
Theorem 19.1. Let X and Y be complex Euclidean spaces and let Φ ∈ T (X , Y ) be an arbitrary super-
operator. Suppose further that Z is a complex Euclidean space and A, B ∈ L (X , Y ⊗ Z) satisfy
Φ( X ) = TrZ ( AXB∗ )
Ψ A ( X ) = TrY ( AX A∗ ) ,
ΨB ( X ) = TrY ( BXB∗ ) ,
Remark 19.2. Note that it is the space Y that is traced-out in the definition of Ψ A and ΨB , rather
than the space Z .
To prove this theorem, we will begin with the following lemma that establishes a simple rela-
tionship between the fidelity and the trace norm. (This appeared as a problem on problem set 1.)
Lemma 19.3. Let X and Y be complex Euclidean spaces and let P, Q ∈ Pos (X ). Assume that u, v ∈ X ⊗
Y are arbitrary purifications of P and Q, respectively, meaning that TrY (uu∗ ) = P and TrY (vv∗ ) = Q.
Then
F( P, Q) = k TrX (uv∗ )k1 .
144
Lecture 19. Alternate characterizations of the diamond norm 145
k Y k1 = max {|hU, Y i| : U ∈ U (Y )} ,
and therefore
As U ranges over all possible unitary operators on Y , the vector (1 X ⊗ U ∗ )u ranges over all pu-
rifications of P in X ⊗ Y . The above quantity is therefore equal to F( P, Q) by Uhlmann’s Theo-
rem.
Proof of Theorem 19.1. For clarity, let us take W to be a complex Euclidean space with the same
dimension as X , so that
n
o
k Φ k ⋄ = max
(Φ ⊗ 1L(W ) )(uv∗ )
: u, v ∈ S(X ⊗ W )
1
= max {k TrZ [( A ⊗ IW ) uv∗ ( B∗ ⊗ IW )]k1 : u, v ∈ S(X ⊗ W )} .
Consequently
as required.
The following corollary follows immediately from this characterization along with the fact that
the diamond norm is multiplicative with respect to tensor products.
This is a simple but not obvious fact—it says that the maximum fidelity between the outputs of
any two completely positive product super-operators is achieved for product state inputs. Several
other quantities of interest based on quantum operations fail to respect tensor products in this
way.
Lecture 19. Alternate characterizations of the diamond norm 146
The second characterization of the diamond norm that we will consider will be more difficult
to establish than the first. Consider any super-operator Φ ∈ T (X , Y ) for complex Euclidean
spaces X and Y . For a given choice of a complex Euclidean space Z , we have that there exists a
Stinespring representation
Φ( X ) = TrZ ( AXB∗ )
for Φ if and only if dim(Z) ≥ rank( J (Φ)). Under the assumption that dim(Z) ≥ rank( J (Φ)), we
may therefore consider the non-empty set of pairs ( A, B) that represent Φ in this way:
SΦ , {( A, B) : A, B ∈ L (X , Y ⊗ Z) , Φ( X ) = TrZ ( AXB∗ ) for all X ∈ L (X )} .
The characterization of the diamond norm that is established in this section concerns the spectral
norm of the operators in this set, and is given by the following theorem.
Theorem 19.5. Let X and Y be complex Euclidean spaces, let Φ ∈ T (X , Y ), and let Z be a complex
Euclidean space with dimension at least rank( J (Φ)). Then for SΦ as defined above we have
k Φ k ⋄ = inf {k A k k B k : ( A, B) ∈ SΦ } .
It will require some work to prove this theorem, but in the process we will learn something
interesting about the maximum fidelity between compact, convex sets of positive semidefinite
operators. This fact generalizes the characterization of the fidelity function given by Alberti’s
theorem, which we discussed in Lecture 4.
The interesting aspect of this theorem is the ordering of the infimum and maximum—for if they
were reversed, the theorem would follow trivially from Alberti’s theorem. The assumption that
allows for this reversal is, of course, the convexity of A and B .
To prove Theorem 19.6, we will start with the following lemma. This lemma concerns only
convex sets of density operators, rather than arbitrary positive semidefinite operators, but this
will be sufficient for our needs (and makes for a simpler technical statement).
Lemma 19.7. Let A, B ⊆ D (X ) be closed and convex sets of density operators. Then A and B are disjoint
if and only if there exists a Hermitian operator H ∈ Herm (X ) and a positive real number δ > 0 such that
D ED E
etH , ρ e−tH , ξ < 1
for all choices of t ∈ R and H ∈ Herm (X ). It is therefore clear that if H ∈ Herm (X ), δ > 0, and
D ED E
etH , ρ e−tH , ξ < 1
h H, ξ − ρi > α
for all ρ ∈ A and ξ ∈ B . Let us fix such a choice of H and α for the rest of the proof.
Now consider the Taylor series expansion of the exponential function. As k H k ≤ 1, we have
that
−tH
e − (1 − tH )
≤ t2 ,
etH − (1 + tH )
≤ t2 ,
e−tH
≤ 1 + 2t, and
etH
≤ 1 + 2t
for t ∈ [0, 1]. (These bounds are not tight, but are simple to state and are sufficient for our needs.)
Using the fact that A and B are sets of density operators, we may therefore conclude that for all
t ∈ [0, 1] we have D ED E
etH , ρ e−tH , ξ = 1 − t h H, ξ − ρi + ct2 ,
for some constant c > 0. (We may choose c = 7, for instance.) For a sufficiently small choice of
δ > 0, we therefore have D ED E
etH , ρ ξ, e−tH ≤ 1 − αt + ct2 < 1
Proof of Theorem 19.6. We will prove the equality by proving inequalities in both directions. One
direction is easy, because the infimum of a maximum is at least the maximum of the infimum, i.e.,
Later we will take the limit as ε goes to 0, but for now we will view ε as being fixed. The reason
for adding ε1 to both P and Q is that it guarantees that the infimum in the expression is achieved
for some R.
To see this, it suffices to argue that the infimum may without loss of generality be restricted to
a compact set. In particular, consider the compact set K defined as
K = { R ∈ Pd (X ) : 1 X /K ≤ R ≤ K 1 X } ,
Lecture 19. Alternate characterizations of the diamond norm 148
for some choice of a positive real number K to be chosen shortly. Now, there is clearly no change in
the infimum if we restrict it to operators R ∈ Pd (X ) that satisfy Tr( R) = Tr( R−1 ), given that the
expression (19.2) is invariant under positive scaling of R. There is also no change in the infimum
if we restrict R further to satisfy
1
Tr( R) Tr( R−1 ) ≤ max Tr( P + ε1 ) Tr( Q + ε1 ),
ε2 P∈A
Q ∈B
for any R that does not satisfy this bound will surely not produce a value for expression (19.1) that
is smaller than the choice R = 1. Any R that satisfies these properties is certainly contained K for
the choice
1
K = 2 max Tr( P + ε1 ) Tr( Q + ε1 ),
ε P∈A
Q ∈B
and so we have
Given that
max h R, P + ε1 i h R−1 , Q + ε1 i (19.2)
P ∈A
Q ∈B
is a continuous function of R, we have that the infimum is achieved for some choice of R ∈ K. Let
us fix such a choice of R that achieves this infimum, and let us again rescale R so that
We have that C and D are compact and convex, given that the same is true of A and B . We also
have that C and D are non-empty given that we have scaled R in accordance with the equation
(19.3) above. Finally, we have by Lemma 19.7 that C and D have a nonempty intersection. If this
were not the case, then there would exist an operator S = R1/2 etH R1/2 > 0 for which
for all P ∈ A and Q ∈ B , contradicting the fact that minimum of the above expression (19.1) is
achieved by R and equals α2 .
Now, because C and D have a nonempty intersection, we may conclude that there exists P ∈ A
and Q ∈ B such that
Z = R1/2 ( P + ε1 ) R1/2 = R−1/2 ( Q + ε1 ) R−1/2
Lecture 19. Alternate characterizations of the diamond norm 149
as required.
k Φ k ⋄ ≤ inf {k A k k B k : ( A, B) ∈ SΦ } .
Next, consider the two quantities k(1 Y ⊗ Z ) A k and
1 Y ⊗ Z −1 B
. The spectral norm has
the property that k X k2 = k X ∗ X k for every operator X. Using this property we see that
k(1 Y ⊗ Z ) A k2 = k A∗ (1 Y ⊗ Z ∗ Z ) A k
= max {hρ, A∗ (1 Y ⊗ Z ∗ Z ) Ai : ρ ∈ D (X )}
= max {h Z ∗ Z, TrY ( AρA∗ )i : ρ ∈ D (X )} ,
Lecture 19. Alternate characterizations of the diamond norm 150
and likewise
k(1 Y ⊗ Z −1 ) Bk2 = max{h( Z ∗ Z )−1 , TrY ( BξB∗ )i : ξ ∈ D (X )}.
We now define two completely positive super-operators Ψ A , ΨB ∈ T (X , Z) precisely as we did
for the maximum fidelity characterization:
Ψ A ( X ) = TrY ( AX A∗ ) ,
ΨB ( X ) = TrY ( BXB∗ ) ,
A = {Ψ A (ρ) : ρ ∈ D (X )},
B = {ΨB (ξ ) : ξ ∈ D (X )}
are nonempty, compact and convex, we may apply Theorem 19.6 to obtain
Lecture 20
Semidefinite programming
This lecture is concerned with semidefinite programming and a few (among many) of its applications
to quantum information. We will begin with some basic definitions, and then move on to a discus-
sion of the duality theory of semidefinite programs—which is a powerful tool for reasoning about
them. Finally, two applications are discussed: one concerns optimal measurements for identifying
elements of finite sets of quantum states and the other concerns the diamond norm.
20.1 Definitions
Let X and Y be complex Euclidean spaces. Recall that the following conditions are equivalent for
a given super-operator Φ ∈ T (X , Y ):
1. It holds that Φ( X ) ∈ Herm (Y ) for all Hermitian operators X ∈ Herm (X ).
2. It holds that (Φ( X ))∗ = Φ( X ∗ ) for every X ∈ L (X ).
3. It holds that J (Φ) ∈ Herm (Y ⊗ X ).
4. It holds that Φ = Φ0 − Φ1 for completely positive super-operators Φ0 , Φ1 ∈ T (X , Y ).
We will hereafter refer to super-operators that satisfy these conditions as Hermiticity preserving
super-operators.
Now, a semidefinite program over X and Y is specified by a triple (Φ, A, B) where Φ ∈ T (X , Y )
is a Hermiticity preserving super-operator and A ∈ Herm (X ) and B ∈ Herm (Y ) are Hermitian
operators. We associate with this triple the following two optimization problems:
Primal problem Dual problem
minimize: h A, X i maximize: h B, Y i
subject to: Φ( X ) ≥ B, subject to: Φ∗ (Y ) ≤ A,
X ∈ Pos (X ) . Y ∈ Pos (Y ) .
The primal and dual problems are closely related and turn out to say a great deal about one
another, as we will see shortly.
A few additional definitions will be helpful when discussing semidefinite programs. For a
given semidefinite program (Φ, A, B) as above, we define the primal feasible set P and the dual
feasible set D as
P = {X ∈ Pos (X ) : Φ( X ) ≥ B} ,
D = {Y ∈ Pos (Y ) : Φ∗ (Y ) ≤ A} .
We also say that operators X ∈ P and Y ∈ D are primal feasible and dual feasible, respectively,
and that (Φ, A, B) is primal feasible or dual feasible when it is the case that P 6= ∅ or D 6= ∅,
respectively.
151
Lecture 20. Semidefinite programming 152
The functions X 7→ h A, X i and Y 7→ h B, Y i are typically called the primal and dual objective
functions. We also define
α = inf{h A, X i : X ∈ P },
β = sup{h B, Y i : Y ∈ D},
which are the correct solutions to the primal and dual problems, or in other words the optimal
values of the objective functions taken over the feasible sets for the two problems. (If it is the case
that P = ∅ or D = ∅, then the above definitions should be interpreted as α = ∞ and β = −∞,
respectively.) It is important at this stage to note that we cannot always replace the infimum and
supremum by the minimum and maximum—in some cases, the values α and β might not actually
be achieved for any choice of X ∈ P and Y ∈ Q, respectively.
One interpretation of the word “duality”, when used in ordinary life, is that it refers to pairs of
concepts that are directly opposite or complimentary in some way: like good and evil, or matter
and anti-matter. In semidefinite programming, duality refers to the special relationship between
the primal and dual problems associated with a given semidefinite program that is reminiscent of
this every-day interpretation.
We will start with a theorem that establishes the property of weak duality, which holds for all
semidefinite programs.
Theorem 20.1 (Weak duality for semidefinite programs). Let X and Y be complex Euclidean spaces,
let (Φ, A, B) be a semidefinite program over X and Y , and let α and β be as defined above. Then α ≥ β.
Proof. The theorem is obvious if either P = ∅ or D = ∅, so let us assume this is not the case. For
any choice of X ∈ P and Y ∈ D we have that X ≥ 0 and Φ∗ (Y ) ≤ A, and therefore
h A, X i ≥ hΦ∗ (Y ), X i .
Likewise, given that Y ≥ 0 and Φ( X ) ≥ B we have
hΦ( X ), Y i ≥ h B, Y i .
Thus
h A, X i ≥ hΦ∗ (Y ), X i = hΦ( X ), Y i ≥ h B, Y i . (20.1)
Taking the infimum over X ∈ P and the supremum over Y ∈ D establishes the theorem.
Notice that the above equation (20.1) expresses a simple but important fact, which is that every
dual feasible operator Y ∈ D provides a lower bound of h B, Y i on the value h A, X i that is achiev-
able over all choices of a primal feasible X ∈ P ; and likewise every primal feasible operator X ∈ P
provides an upper bound of h A, X i on the value h B, Y i that is achievable over all choices of a dual
feasible Y ∈ D .
Now, it may not always be the case that α = β for a given semidefinite program (Φ, A, B). One
can easily invent an example where P = ∅ = D , which gives α = ∞ and β = −∞; but this is
just a triviality based on our choice of how to interpret this case. But even when both P and D are
nonempty, it may still be that α 6= β. However, in most cases that come up naturally, it will be the
case that α = β; a situation called strong duality. We will now establish some specific conditions
under which this happens.
Lecture 20. Semidefinite programming 153
Lemma 20.2. The following two implications hold for a given semidefinite program (Φ, A, B) over com-
plex Euclidean spaces X and Y . The two implications are symmetric with respect to the primal and dual
problems associated with (Φ, A, B).
1. Suppose that α is finite (i.e., neither ∞ nor −∞) and that the set
is closed. Then it holds that α = β, and moreover there exists a primal feasible operator X ∈ P such
that h A, X i = α.
2. Suppose that β is finite and that the set
is closed. Then it holds that α = β, and moreover there exists a dual feasible operator Y ∈ D such that
h B, Y i = β.
Proof. The two implications are symmetric and proved in precisely the same way, so we will just
prove the first implication. For convenience, let us write
Let us choose a value ε > 0. By the definition of α, it holds that the point (α − ε, B) is not
contained in the set C . Given that C is obviously convex, and is closed by assumption, we have
that there exists a hyperplane that (strongly) separates (α − ε, B) from C . To be more precise, there
exists a point (λ, Y ) ∈ R ⊕ Herm (Y ) such that
Given that h A, X i − α + ε is positive and hY, B − Φ( X )i is non-positive for every primal feasible
X, we conclude that λ < 0. We may therefore rescale (λ, Y ) so that λ = −1 without changing
any of the observations we have made so far. After this rescaling, we may again set Q = 0 and
rearrange the inequality (20.2) to obtain
h X, Φ∗ (Y ) − Ai < hY, Bi − α + ε
for all X ∈ Pos (X ). The right-hand-side is bounded while X is free to range over all positive
semidefinite operators on the left-hand-side, which implies that Φ∗ (Y ) ≤ A. We have therefore
proved that Y is dual feasible. Setting X = 0, we conclude that hY, Bi > α − ε. Given that ε > 0
was chosen arbitrarily, we have that β ≥ α, from which it follows that α = β by weak duality.
It remains to prove that there exists a primal feasible X ∈ P such that h A, X i = α. This is
done by a similar argument to the one above, except that we consider the point (α, B) rather than
Lecture 20. Semidefinite programming 154
(α − ε, B). If it is the case that (α, B) ∈ C , then there is nothing more to do—for we must have that
a primal feasible X ∈ P exists for which h A, X i = α. If, however, it is not the case that (α, B) ∈ C ,
then may be separated from C by a hyperplane precisely as above. Similar reasoning establishes
Y ∈ Pos (Y ) and that λ < 0, after which we rescale to obtain a dual feasible Y ∈ D . This time,
however, Y must satisfy h B, Y i < α, which contradicts weak duality and completes the proof.
While the lemma we have just proved establishes basic conditions under which strong duality
holds for semidefinite programs, it is not very useful as a tool—because it is not obvious how
one checks that the sets described are closed for a given semidefinite program of interest. The
following theorem establishes a more useful and often easy-to-check condition (sometimes called
the Slater condition) under which strong duality holds.
Theorem 20.3. Let (Φ, A, B) be a semidefinite program over complex Euclidean spaces X and Y . Then the
following two implications, which are symmetric with respect to the primal and dual problems associated
with (Φ, A, B), hold.
1. (Strict dual feasibility.) Suppose that α is finite and that there exists an operator Y ∈ Pd (Y ) such that
Φ∗ (Y ) < A. Then α = β and there exists X ∈ P such that h A, X i = α.
2. (Strict primal feasibility.) Suppose that β is finite and that there exists an operator X ∈ Pd (X ) such
that Φ( X ) > B. Then α = β and there exists Y ∈ D such that h B, Y i = β.
Proof. The set of operators Pos (X ⊕ Y ) consists of all operators that may be expressed in block
form as
P Z
(20.3)
Z∗ Q
for operators P ∈ Pos (X ), Q ∈ Pos (Y ), and Z ∈ L (Y , X ) ranging over a restricted set of opera-
tors that depends in some way on P and Q that does not concern us for the sake of this proof.
Now let us define a function φ : Herm (X ⊕ Y ) → R ⊕ Herm (Y ) as
P Z
φ = (h A, Pi , Φ( P) − Q).
Z∗ Q
Our goal is to prove that the assumptions of the theorem guarantee that the image of the set
Pos (X ⊕ Y ) under φ is closed.
Given that φ is linear and the set D (X ⊕ Y ) of density operators on X ⊕ Y is convex and
compact, it is the case that φ(D (X ⊕ Y )) is convex and compact as well. Moreover, it holds that
φ(Pos (X ⊕ Y )) is the convex cone generated by φ(D (X ⊕ Y )). This is a closed set provided that
0 is not contained in φ(D (X ⊕ Y )), and therefore it suffices to prove 0 6∈ φ(D (X ⊕ Y )).
Now, the assumptions of the theorem imply that the operator
A − Φ ∗ (Y ) 0
∈ Herm (X ⊕ Y )
0 Y
is positive definite. For any density operator of the above form (20.3) we have that
A − Φ ∗ (Y ) 0
P Z
0< , = h A, Pi − hY, Φ( P) − Qi .
0 Y Z∗ Q
We will now consider an application of semidefinite programming duality to the problem of opti-
mally identifying sets of density operators. To be more precise, let us suppose that X is a complex
Euclidean space and
{( p( a), ρa ) : a ∈ Γ}
is an ensemble of density operators over X , meaning that Γ is a finite and nonempty set, p ∈ R Γ
is a probability vector, and ρa ∈ D (X ) is a density operator for each a ∈ Γ. We are interested in
measurements of the form
{ Pa : a ∈ Γ} ⊂ Pos (X ) ,
and specifically in the probability
∑ p(a) h Pa , ρa i (20.4)
a∈Γ
that such a measurement correctly identifies a sample from the above ensemble.
The main result that we will establish is a necessary and sufficient condition on a given mea-
surement { Pa : a ∈ Γ} for it to achieve the optimal probability of correctness, among all possible
measurements with outcome set Γ.
There is no simple formula known for the maximum value of the expression (20.4), over all
possible choices of the measurement { Pa : a ∈ Γ}. It can, however, be expressed as a semidefinite
program. Specifically, let Z = C Γ , let Y = Z ⊗ X , and consider the semidefinite program (Φ, A, B)
where Φ ∈ T (X , Y ), A ∈ Herm (X ), and B ∈ Herm (Y ) are defined as
Φ( X ) = 1 Z ⊗ X (for all X ∈ Herm (X )),
A = 1X ,
B= ∑ p(a)Ea,a ⊗ ρa .
a∈Γ
The primal and dual problems associated with this semidefinite program can be expressed as
follows.
Primal problem Dual problem
minimize: Tr( X ) maximize: h B, Y i
subject to: 1 Z ⊗ X ≥ B, subject to: TrZ (Y ) ≤ 1 X ,
X ∈ Pos (X ) . Y ∈ Pos (Z ⊗ X ) .
(Notice that the constraint 1 Z ⊗ X ≥ B in the primal problem may equivalently be expressed as
X ≥ p( a)ρa for all a ∈ Γ.)
Let us begin by verifying that the dual problem indeed expresses the optimal value of the
expression (20.4), optimized over all measurements of the form { Pa : a ∈ Γ}. First, for any
measurement { Pa : a ∈ Γ} ⊂ Pos (X ), it holds that the operator
Y= ∑ Ea,a ⊗ Pa
a∈Γ
is dual feasible, and the value of h B, Y i is equal to the value expressed in the above equation (20.4).
This implies that the optimal value β of the dual problem is at least as large as the optimal value
of the expression (20.4). On the other hand, for any dual feasible operator
Y= ∑ Ea,b ⊗ Xa,b
a,b ∈Γ
Lecture 20. Semidefinite programming 156
∑ Xa,a = TrY (Y ) ≤ 1X .
a∈Γ
By taking !
1
Q=
|Γ|
1X − ∑ Xa,a
a∈Γ
and Pa = Xa,a + Q for each a ∈ Γ, we obtain a valid measurement { Pa : a ∈ Γ} on X for which the
expression (20.4) is at least as large as the optimal value β associated with the above dual problem.
Thus, the two quantities are equal as claimed.
It is straightforward to prove that strong duality holds for this semidefinite program, and that
the optimal value α = β is achieved in both the primal and dual problems. In particular, the values
α and β are clearly finite, and one may (for instance) take X = (1 + δ)1 X and Y = 1|− δ
Γ | 1 Z ⊗ 1 X for
any choice of δ ∈ (0, 1) in Theorem 20.3 to establish this claim.
Now, based on the above discussion, we are prepared to state and prove the following theorem
that establishes a necessary and sufficient condition on a given measurement for it to optimally
identify samples from a given ensemble.
Theorem 20.4. Let X be a complex Euclidean space, let {( p( a), ρa ) : a ∈ Γ} be an ensemble of density
operators on X , and let { Pa : a ∈ Γ} be a measurement on X . Then the measurement { Pa : a ∈ Γ} is
optimal for {( p( a), ρa ) : a ∈ Γ} if and only if the operator
∑ p(a)ρa Pa
a∈Γ
for every b ∈ Γ.
and assume that X is positive semidefinite and satisfies X ≥ p(b)ρb for every b ∈ Γ. Then X is
primal feasible for the semidefinite program discussed above. As is the case for all valid measure-
ments, the operator
Y = ∑ Ea,a ⊗ Pa
a∈Γ
we have that X is optimal for the primal problem and Y is optimal for the dual problem, and
therefore the measurement { Pa : a ∈ Γ} is optimal for the ensemble {( p( a), ρa ) : a ∈ Γ}.
Suppose on the other hand that the measurement { Pa : a ∈ Γ} is optimal for the ensemble
{( p( a), ρa ) : a ∈ Γ}. Then there exists a primal feasible operator X ∈ Pos (X ) such that
Tr( X ) = ∑ p(a) h Pa , ρa i .
a∈Γ
Lecture 20. Semidefinite programming 157
∑ h Pa, X − p(a)ρa i = 0.
a∈Γ
∑ Pa (X − p(a)ρa ) = 0,
a∈Γ
where we have used the fact that Tr( PQ) = 0 implies PQ = 0 for every choice of positive semidef-
inite operators P, Q ≥ 0. Thus,
X = ∑ p( a) Pa ρa ,
a∈Γ
and therefore the required conditions on this operator hold.
The square of the diamond norm of an arbitrary super-operator can be expressed as the optimal
value of a semidefinite program, as we will now verify. This provides a simplification of the spec-
tral norm characterization of the diamond norm from the previous lecture, and has the pleasant
side-effect of giving a means to efficiently compute the diamond norm of a given super-operator—
because there exist efficient algorithms to approximate the optimal value of very general classes
of semidefinite programs to high precision.
Let us begin by describing the semidefinite program, starting first with its associated primal
and dual problems. After doing this we will verify that its value corresponds to the square of the
diamond norm. Throughout this discussion we assume that a Stinespring representation
Φ( X ) = TrZ ( AXB∗ )
of an arbitrary super-operator Φ ∈ T (X , Y ) has been fixed. It will be helpful to assume that
dim(Z) = rank( J (Φ)), which is the smallest dimension for which such a Stinespring representa-
tion exists, although this is not critical for most of our discussion. Let us also agree hereafter to
disregard the trivial case where Φ = 0.
It is straightforward to verify that the primal problem given above corresponds to this formal
description, while the dual problem may be verified by calculating the adjoint of Ψ, which is
∗
∗ X M Tr( X ) 0
Ψ = .
N W 0 TrY (W − AX A∗ )
(In the above expressions of Ψ and Ψ∗ , the vectors u, v ∈ Z and the operators M, N ∈ L (X , Y ⊗ Z)
essentially act as place-holders that do not contribute to the output of these super-operators.)
which illustrates strict primal feasibility. Thus, the optimal value β associated with the dual prob-
lem is equal to the optimal primal value α, and is achieved for some choice of a dual feasible
operator.
The same reasoning can be applied to show that there exists a primal feasible operator for
which the optimal value α is achieved, but here is where we require an assumption on the dimen-
sion of Z . If it is the case that dim(Z) = rank( J (Φ)), then necessarily we have that TrY ( AX A∗ )
is positive definite for any positive definite X. Therefore, by choosing any positive definite X for
which Tr( X ) < 1, there must exist a sufficiently small ε > 0 for which
X 0 1 0
Ψ∗ < .
0 ε 1Y ⊗ 1Z 0 0
This shows that we have strict dual feasibility, and therefore the optimal value is achieved for
some primal feasible operator.
Now let us verify that the optimal value associated with this semidefinite program corre-
sponds to k Φ k2⋄ . We will do this by considering the dual problem, and because our focus is on the
dual problem we will refer to the optimal value of the semidefinite program as β.
Let us define a set
M W
Lecture 20. Semidefinite programming 159
to the dual problem could be “inflated” (meaning that we add some positive semidefinite oper-
ator to it) to achieve equality in the constraints Tr( X ) ≤ 1 and TrY (W ) ≤ TrY ( AX A∗ ), without
decreasing the value of the objective function.
Now we need to prove that
k Φ k2⋄ = max h BB∗ , W i .
W ∈A
For any choice of a complex Euclidean space W for which dim(W ) ≥ dim(X ), we have
k Φ k2⋄ = max k TrZ [( A ⊗ 1 W )uv∗ ( B ⊗ 1 W )∗ ]k21
u,v∈S(X ⊗W )
= max | v∗ ( B ⊗ 1 W )∗ (U ⊗ 1 Z )( A ⊗ 1 W )u |2
u,v∈S(X ⊗W )
U ∈U(Y ⊗W )
= max k( B ⊗ 1 W )∗ (U ⊗ 1 Z )( A ⊗ 1 W )u k2
u ∈S(X ⊗W )
U ∈U(Y ⊗W )
= max u∗ ( A ⊗ 1 W )∗ (U ⊗ 1 W )∗ ( B ⊗ 1 W )( B ⊗ 1 W )∗ (U ⊗ 1 Z )( A ⊗ 1 W )u
u ∈S(X ⊗W )
U ∈U(Y ⊗W )
(If this calculation looks nasty, it is only because I have taken small steps—the equalities are all
applications simple facts that we have probably seen several times previously in the course.)
It now remains to prove that
A = {TrW [(U ⊗ 1 Z )( A ⊗ 1 W )uu∗ ( A ⊗ 1 W )∗ (U ⊗ 1 Z )∗ ] : u ∈ S(X ⊗ W ) , U ∈ U (Y ⊗ W )}
for some choice of W with dim(W ) ≥ dim(X ). We will choose W such that
dim(W ) = max{dim(X ), dim(Y ⊗ Z)}.
First consider an arbitrary choice of u ∈ S(X ⊗ W ) and U ∈ U (Y ⊗ W ), and let
W = TrW [(U ⊗ 1 Z )( A ⊗ 1 W )uu∗ ( A ⊗ 1 W )∗ (U ⊗ 1 Z )∗ ] .
Then TrY (W ) = TrY ( A TrW (uu∗ ) A∗ ), and so it is clear that W ∈ A. Now consider an arbitrary
element W ∈ A, and let ρ ∈ D (X ) satisfy TrY (W ) = TrY ( AρA∗ ). Let u ∈ S(X ⊗ W ) purify ρ and
let w ∈ Y ⊗ Z ⊗ W purify W. Then
TrY ⊗W (ww∗ ) = TrY ⊗W (( A ⊗ 1 W )uu∗ ( A ⊗ 1 W )∗ ) ,
so there exists U ∈ U (Y ⊗ W ) such that (U ⊗ 1 Z )( A ⊗ 1 W )u = w, and therefore
W = TrW (ww∗ ) = TrW [(U ⊗ 1 Z )( A ⊗ 1 W )uu∗ ( A ⊗ 1 W )∗ (U ⊗ 1 Z )∗ ] .
We have therefore proved that
A = {TrW [(U ⊗ 1 Z )( A ⊗ 1 W )uu∗ ( A ⊗ 1 W )∗ (U ⊗ 1 Z )∗ ] : u ∈ S(X ⊗ W ) , U ∈ U (Y ⊗ W )} ,
and so we have that β = k Φ k2⋄ as required.
Lecture 20. Semidefinite programming 160
k Φ k ⋄ = inf{k A k k B k : ( A, B) ∈ SΦ }, (20.5)
where SΦ denotes the collection of all pairs ( A, B) that give Stinespring representations of Φ. The
easy part of our proof of this characterization was establishing
k Φ k⋄ ≤ k A k k B k
for all ( A, B) ∈ SΦ , while the more difficult task of showing that the infimum equals k Φ k ⋄ re-
quired a much more complicated argument relating to Alberti’s theorem.
Given the description of k Φ k2⋄ by the semidefinite program above, we may now simplify this
part of the proof. Notice first that the primal problem may alternately be expressed as
minimize: k A ∗ (1 Y ⊗ Z ) A k
subject to: 1 Y ⊗ Z ≥ BB∗ ,
Z ∈ Pos (Z) .
The optimal value of this problem does not change if we restrict Z to be positive definite, provided
we accept that an optimal solution may not be achieved. We therefore have
k A∗ (1 Y ⊗ Z ) A k ≤ (k Φ k⋄ + ε)2
This establishes that the infimum equals k Φ k ⋄ in the expression (20.5), which completes the char-
acterization.
Theory of Quantum Information Lecture Notes
John Watrous (Institute for Quantum Computing, University of Waterloo)
Lecture 21
The quantum de Finetti theorem
The main goal of this lecture is to prove a theorem known as the quantum de Finetti theorem. There
are in fact several known variants of this theorem, so to be more precise it may be said that we
will prove a theorem of the quantum de Finetti type. This type of theorem states, in effect, that if a
collection of identical quantum registers have a reduced state that is invariant under permutations,
then the reduced state of a comparatively small number of these registers must be close to a convex
combination of identical product states. Variants of this type of theorem may differ with respect
to the precise bounds that are obtained.
There are three main parts of the lecture. First we will introduce various concepts concerning
quantum states of multiple register systems that are invariant under permutations of these reg-
isters. Then we will very briefly discuss integrals defined with respect to the unitarily invariant
measure on the unit sphere of a given complex Euclidean space, simply because it supplies us
with a tool we need for the last part of the lecture. The last part of the lecture is the statement and
proof of the quantum de Finetti theorem.
This lecture will be somewhat different from the others in the course in that I will skip many
of the details of the proofs. This is inevitable for two reasons: one is because we have very limited
time remaining in the course, and certainly not enough time for a proper discussion of the details;
and the other is that the background knowledge required to formalize some of the details is sig-
nificantly more than for the other lectures. Nevertheless, I hope there will be enough information
for you to follow-up on this lecture on your own, in case you choose to do this.
Let us fix a finite, nonempty set Σ, and let d = | Σ | for the remainder of this lecture. Also let n
be a positive integer, and let X1 , . . . , Xn be identical quantum registers, with associated complex
Euclidean spaces X1 , . . . , Xn taking the form Xk = C Σ for 1 ≤ k ≤ n.
161
Lecture 21. The quantum de Finetti theorem 162
X1 6 · · · 6 Xn = {u ∈ X1 ⊗ · · · ⊗ Xn : u = Wπ u for every π ∈ Sn } .
It is not difficult to verify that the orthogonal projection operator that projects onto this subspace
is given by
1
n! π∑
ΠX1 6···6Xn = Wπ .
∈ Sn
Let us now construct an orthonormal basis for the symmetric subspace. First, consider the set
Urn(n, Σ) of functions of the form φ : Σ → N (where N = {0, 1, 2, . . . }) that satisfy
∑ φ(a) = n.
a∈Σ
The elements of this set describe urns containing n marbles, where each marble is labelled by
an element of Σ. (There is no order associated with the marbles—all that matters is how many
marbles with each possible label are contained in the urn.)
Now, to say that a string a1 · · · an ∈ Σ is consistent with a particular function φ ∈ Urn(n, Σ)
means simply that a1 · · · an is one possible ordering of the marbles in the urn described by φ. One
can express this formally by defining a function
f a1 ··· an (b) = { j ∈ {1, . . . , n} : b = a j } ,
and by defining that a1 · · · an is consistent with φ if and only if f a1 ··· an = φ. The number of distinct
strings a1 · · · an ∈ Σn that are consistent with a given function φ ∈ Urn(n, Σ) is given by the
multinomial coefficient
n n!
, .
φ ∏ a∈Σ ( φ ( a) !)
Finally, we define an orthonormal basis of X1 6 · · · 6 Xn as uφ : φ ∈ Urn(n, Σ) , where
−1/2
n
uφ =
φ ∑ n
e a1 ⊗ · · · ⊗ e a n .
a 1 ··· a n ∈ Σ
f a1 ···an = φ
In other words, uφ is the uniform superposition over all of the strings that are consistent with the
function φ.
For example, taking n = 3 and Σ = {0, 1}, we obtain the following four vectors:
u 0 = e0 ⊗ e0 ⊗ e0
1
u 1 = √ ( e0 ⊗ e0 ⊗ e1 + e0 ⊗ e1 ⊗ e0 + e1 ⊗ e0 ⊗ e0 )
3
1
u 2 = √ ( e0 ⊗ e1 ⊗ e1 + e1 ⊗ e0 ⊗ e1 + e1 ⊗ e1 ⊗ e0 )
3
u 3 = e1 ⊗ e1 ⊗ e1 ,
Lecture 21. The quantum de Finetti theorem 163
where the vectors are indexed by integers rather than functions φ ∈ Urn(3, {0, 1}) in a straight-
forward way.
Using simple combinatorics, it can be shown that | Urn(n, Σ)| = (n+d−d− 1
1 ), and therefore
n+d−1
dim(X1 6 · · · 6 Xn ) = .
d−1
Notice that for small d and large n, the dimension of the symmetric subspace is therefore very
small compared with the entire space X1 ⊗ · · · ⊗ Xn .
It is also the case that
n o
X1 6 · · · 6 Xn = span u⊗n : u ∈ C Σ .
This follows from an elementary fact concerning the theory of symmetric functions, but (like many
things in this lecture) I will not prove it here.
P = Wπ PWπ∗
for every π ∈ Sn .
There is not an immediate relationship between exchangeable operators and the symmetric
subspace. It is the case that every positive semidefinite operator whose image is contained in
the symmetric subspace is exchangeable, but this is not a necessary condition—for instance, the
identity operator 1 X1 ⊗···⊗Xn is exchangeable and its image is all of X1 ⊗ · · · ⊗ Xn . We may, however,
relate exchangeable operators and the symmetric subspace by means of the following lemma.
Lemma 21.1. Let X1 , . . . , Xn and Y1 , . . . , Yn be copies of the complex Euclidean space C Σ , and suppose
that
P ∈ Pos (X1 ⊗ · · · ⊗ Xn )
is an exchangeable operator. Then there exists a symmetric vector
u ∈ (X1 ⊗ Y1 ) 6 · · · 6 (Xn ⊗ Yn )
where λ1 , . . . , λk are the distinct eigenvalues of P and Q1 , . . . , Qk are orthogonal projection opera-
tors onto the associated eigenspaces. Such a decomposition is unique, and given that Wπ PWπ∗ = P
for each permutation π ∈ Sn it is therefore clear that Wπ Q j Wπ∗ = Q j for each j = 1, . . . , k.
Let us view that each of the above projection operators takes the form
Q1 , . . . , Qk ∈ L (Y1 ⊗ · · · ⊗ Yn , X1 ⊗ · · · ⊗ Xn ) ,
Lecture 21. The quantum de Finetti theorem 164
which we may do because the spaces Y1 , . . . , Yn are isomorphic copies of X1 , . . . , Xn . Having done
this, we define
k q
u = ∑ λ j vec( Q j ).
j =1
For the proof of the main result in the next section, we will need to be able to express certain linear
operators as integrals. Here is a very simple expression that will serve as an example for the sake
of this discussion: Z
uu∗ dµ(u).
Now, there are two very different questions one may have about such an operator:
The answer to the first question is a bit complicated—and although we will not have time to
discuss it in detail, I would like to say enough to at least give you some clues and key-words in
case you wish to learn more on your own.
In the above expression, µ refers to the normalized unitarily invariant measure defined on the
Borel sets of the unit sphere S = S(X ) in some chosen complex Euclidean space X . (The space
X is implicit in the above expression, and generally would be determined by the context of the
expression.) To say that µ is normalized means that µ(S) = 1, and to say that µ is unitarily invariant
means that µ(A) = µ(U (A)) for every Borel set A ⊆ S and every unitary operator U ∈ U (X ).
It turns out that there is only one measure with the properties that have just been described.
Sometimes this measure is called Haar measure, although this term is considerably more general
than what we have just described. (There is a uniquely defined Haar measure on many different
sorts of measure spaces with groups acting on them in a particular way.)
Informally speaking, you may think of the measure described above as a way of assigning an
infinitesimally small probability to each point on the unit sphere in such a way that no one vector
is weighted more or less than any other. So, in an integral like the one above, we may view that it
is an average of operators uu∗ over the entire unit sphere, with each u being given equal weight.
Lecture 21. The quantum de Finetti theorem 165
Of course it does not really work this way, which is why we must speak of Borel sets rather than
arbitrary sets—but it is a reasonable guide for the simple uses of it in this lecture.
In formal terms, there is a process involving several steps for building up the meaning of an
integral like the one above starting from the measure µ. It starts with characteristic functions for
Borel sets (where the value of the integral is simply the set’s measure), then it defines integrals
for positive linear combinations of characteristic functions in the obvious way, then it introduces
limits to define integrals of more functions, and continues for a few more steps until we finally
have integrals of operators. Needless to say, this process does not provide an efficient means to
calculate a given integral.
This leads us to the second question, which is how to calculate such integrals. There is certainly
no general method: just like ordinary integrals you are lucky when there is a closed form. For
some, however, the fact that the measure is unitarily invariant leads to a simple answer. For
instance, the integral above must satisfy
Z Z
uu∗ dµ(u) = U uu∗ dµ(u) U ∗
Now we are ready to state and prove one variant of the quantum de Finetti theorem, which is the
main goal of this lecture. The statement and proof follow.
Theorem 21.2. Let X1 , . . . , Xn be identical quantum registers, each having associated space C Σ for | Σ | =
d, and let ρ ∈ D (X1 ⊗ · · · ⊗ Xn ) be an exchangeable density operator representing the reduced state of
these registers. Then for any choice of k with 1 < k < n, there exists a finite set Γ, a probability vector
p ∈ R Γ , and a collection of density operators {ξ a : a ∈ Γ} ⊂ D C Γ such that
X1 ···Xk
4d2 k
ρ − ∑ p(b)ξ b⊗k
< .
b∈Γ
n
1
Lecture 21. The quantum de Finetti theorem 166
Proof. First we will prove a stronger bound for the case where ρ = vv∗ is pure (which requires
v ∈ X1 6 · · · 6 Xn ). This will then be combined with Lemma 21.1 to complete the proof.
For the sake of clarity, let us write Y = X1 ⊗ · · · ⊗ Xk and Z = Xk+1 ⊗ · · · ⊗ Xn . Let us also
write
m+d−1
Z
S (m)
= (uu∗ )⊗m dµ(u),
d−1
which is the projection onto the symmetric subspace of m copies of C Σ for any choice of m ≥ 1.
Now consider a unit vector v ∈ X1 6 · · · 6 Xn . As v is invariant under every permutation of
its tensor factors, it holds that
v = 1 Y ⊗ S(n−k) v.
n−k+d−1
Z
σ= Φu (vv∗ ) dµ(u).
d−1
∑ p(b)ξ b⊗k ,
b∈Γ
n+d−1
Z D E
τ= (uu∗ )⊗k , Φu (vv∗ ) (uu∗ )⊗k dµ(u).
d−1
n+d−1
Z D E
(uu∗ )⊗n , vv∗ dµ(u) = S(n) , vv∗ = 1,
Tr(τ ) =
d−1
(Some details are being swept under the rug here, but because we are working inside a compact
set this claim should not be surprising.)
We will now place an upper bound on k σ − τ k1 . To make the proof more readable, let us write
m+d−1
cm =
d−1
Lecture 21. The quantum de Finetti theorem 167
It is clear that
Z
Z
Φu (vv∗ )(1 − ∆u ) dµ(u)
=
(1 − ∆u )Φu (vv∗ ) dµ(u)
,
1 1
while
Z
Z
(1 − ∆u )Φu (vv∗ )(1 − ∆u ) dµ(u)
= Tr ∗
(1 − ∆u )Φu (vv )(1 − ∆u ) dµ(u)
1
Z
= Tr (1 − ∆u )Φu (vv∗ ) dµ(u)
Z
∗
≤
(1 − ∆u )Φu (vv ) dµ(u)
.
1
Therefore we have
Z
1 1
∗
c σ − τ
≤ 3
( 1 − ∆ u ) Φ u ( vv ) dµ ( u )
.
n−k c n
1
1
and so
4dk
k σ − τ k1 < .
n
This establishes essentially the bound given in the statement of the theorem, albeit only for pure
states, but with d2 replaced by d.
To prove the bound in the statement of the theorem for an arbitrary exchangeable density
operator ρ ∈ D (X1 ⊗ · · · ⊗ Xn ), we first apply Lemma 21.1 to obtain a symmetric purification
v ∈ (X1 ⊗ Y1 ) 6 · · · 6 (Xn ⊗ Yn )
4d2 k
k σ − τ k1 < ,
n
where σ = TrZ (vv∗ ) for Z = (Xk+1 ⊗ Yk+1 ) ⊗ · · · ⊗ (Xn ⊗ Yn ) and where
n o
τ ∈ conv (uu∗ )⊗k : u ∈ S C Σ ⊗ C Σ .