Weighted Sum Rate Maximization For

2362 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 61, NO.
6, JUNE 2013
Weighted Sum Rate Maximization for

MIMO Broadcast Channels Using
Dirty Paper Coding and Zero-forcing Methods
Le-Nam Tran, Member, IEEE, Markku Juntti, Senior Member, IEEE,
Mats Bengtsson, Senior Member, IEEE, and Bjrn Ottersten, Fellow, IEEE
AbstractWe consider precoder design for maximizing the antenna transmitter, e.g, a BS in cellular networks, to several
weighted sum rate (WSR) of successive zero-forcing dirty paper receivers, possibly with multiple antennas. This scenario is
coding (SZF-DPC). For this problem, the existing precoder referred to as the MIMO broadcast channel (BC) (alternatively,
designs often assume a sum power constraint (SPC) and rely
on the singular value decomposition (SVD). The SVD-based downlink channel). It is now well known that dirty paper
designs are known to be optimal but require high complexity. coding (DPC) achieves the capacity region of the MIMO BC
We first propose a low-complexity optimal precoder design for [6]. Although DPC is the optimal transmission scheme for
SZF-DPC under SPC, using the QR decomposition. Then, we MIMO BCs, finding the transmit covariances that achieve a
propose an efficient numerical algorithm to find the optimal point in the capacity region is generally of high complexity.
precoders subject to per-antenna power constraints (PAPCs). To
this end, the precoder design for PAPCs is formulated as an For example, to find the sum-capacity point, several numerical
optimization problem with a rank constraint on the covariance algorithms were proposed, e.g., in [7][9], which are basically
matrices. A well-known approach to solve this problem is to relax based on the iterative water-filling algorithm. Thus, it is
the rank constraints and solve the relaxed problem. Interestingly, of great interest to develop simplified DPC-based precoding
for SZF-DPC, we are able to prove that the rank relaxation is techniques, where optimal precoders are easy to find.
tight. Consequently, the optimal precoder design for PAPCs is
computed by solving the relaxed problem, for which we propose Successive zero-forcing dirty paper coding (SZF-DPC),
a customized interior-point method that exhibits a superlinear introduced in [10] for single-antenna receivers, and later
convergence rate. Two suboptimal precoder designs are also generalized in [11] for multiple-antenna receivers, combines
presented and compared to the optimal ones. We also show the zero-forcing (ZF) technique with DPC. In the SZF-DPC
that the proposed numerical method is applicable for finding
the optimal precoders for block diagonalization scheme. scheme, for the kth user, the interference caused by users 1 to
k 1 is canceled by DPC, and that caused by users k + 1 to
Index TermsMIMO systems, broadcast channels, dirty paper
K is eliminated by the ZF technique, where K is the number
coding, multiuser multi-antenna communication, zero-forcing.
of users. To be specific, let W k be the precoder of the kth
user. Then, the ZF constraints impose that H j W k = 0 for all
I. I NTRODUCTION j < k, where H j is the channel matrix between the BS and the
VER the last decade, multiple-input multiple-output
O (MIMO) transmission techniques have drawn a lot of
attention due to its capability of boosting the channel capacity
jth user. In this way, SZF-DPC decomposes a MIMO BC into
a group of parallel interference-free point-to-point channels,
which simplifies the problem of precoder design. It was shown
without the need for additional bandwidth or power [3] in [10], [11] that SZF-DPC performs very close to DPC, and
[5]. An important scenario is the transmission from a multi- that the optimal precoders can be computed analytically. Thus,
SZF-DPC can also be used as a benchmark for comparison
Manuscript received February 1, 2013; revised April 5, 2013. The editor
coordinating the review of this paper and approving it for publication was D. purposes.
Gunduz. It is convenient to consider a cascade structure for W k , i.e.,
L.-N. Tran and M. Juntti are with the Centre for Wireless Communications
and the Department of Communications Engineering, University of Oulu, W k = B k Dk , where B k is designed to satisfy the ZF con-
Finland (e-mail: {ltran, markku.juntti}@ee.oulu.fi). L.-N. Tran was with the straints, and Dk is optimized under the power constraints. The
Signal Processing Laboratory, ACCESS Linnaeus Center, KTH Royal Institute precoder design methods for SZF-DPC differ in how B k is
of Technology, SE-100 44 Stockholm, Sweden.
M. Bengtsson and B. Ottersten are with the Signal Processing Laboratory, calculated. Since W k must lie in N (H k ) to satisfy the zero-
ACCESS Linnaeus Center, KTH Royal Institute of Technology, SE-100 44 interference constraints, where H k = [H H H H H H ]H ,
1 2 k1
Stockholm, Sweden (e-mail: {mats.bengtsson, bjorn.ottersten}@ee.kth.se). B. and N (H) denotes the null space of H, it is optimal to
Ottersten is also with the Interdisciplinary Center for Security, Reliability
and Trust (SnT), University of Luxembourg, L-1359 Luxembourg-Kirchberg, design B k as an orthonormal basis of N (H k ). This fact
Luxembourg (e-mail: bjorn.ottersten@uni.lu). was exploited in [11], where B k is an orthonormal basis of
This research has been supported by Tekes, the Finnish Funding Agency for N (H k ), which is obtained from the singular value decompo-
Technology and Innovation, Nokia Siemens Networks, Renesas Mobile Eu-
rope, Elektrobit, Xilinx, Academy of Finland, and by the European Research sition (SVD) of H k . However, finding a basis of N (H k ) via
Council under the European Communitys Seventh Framework Programme SVD is computationally costly, since the size of H k grows
(FP7/2007-2013) / ERC grant agreement n 228044. Parts of this paper were with the user index. We note that the SVD-based design in
presented at the IEEE International Conference on Communications, Ottawa,
Canada, June, 2012 [1], [2]. [11] needs to calculate a series of SVD to find the precoders
Digital Object Identifier 10.1109/TCOMM.2013.043013.130100 for all users.
0090-6778/13$31.00 2013 IEEE
TRAN et al.: WEIGHTED SUM RATE MAXIMIZATION FOR MIMO BROADCAST CHANNELS USING DIRTY PAPER CODING AND ZERO-FORCING . . . 2363
Herein, we propose a low-complexity precoder design for tion, we present two suboptimal precoder designs which have
SZF-DPC, using only a single QR decomposition (QRD). lower computational complexity, and perform very close to
First, we note that it is computationally cheaper to calculate the optimal one.
the null space of a matrix using a QRD instead of an SVD. As mentioned before, the precoder design for SZF-DPC is
Thus, a natural way to reduce the complexity of the SVD- closely related to that for block diagonalization (BD) [15],
based method is to find a basis of N (H k ) using the QRD, [16]. Of the two precoding schemes, BD suffers from a
instead of the SVD. This approach is referred to as the natural stricter ZF condition than SZF-DPC. More explicitly, the ZF
QRD-based design (NQRD-based design). Still, the NQRD- constraints for BD force the precoder of a user to lie in the
based design computes several QR decompositions. As one intersection of null spaces of all other users channel matrices.
of our main contributions, we propose a precoder design Subsequently, in contrast to SZF-DPC, not all the antennas in
that results from applying a QRD to a matrix composed the BD transmit scheme will use full power, since the number
of the channel matrices of all users. More explicitly, only of degrees of freedom could be too low. In other words, some
a single QRD is performed to find all B k s, instead of of the PAPCs are nonbinding. In spite of this difference, we
separately computing N (H k ) to find B k for each k as in the show that the proposed precoder design method for SZF-DPC
SVD-based design. Thus, the proposed method requires much can be slightly modified to solve the problem of the precoder
lower complexity, compared to the SVD- and NQRD-based design for BD, leading to a numerical method that converges
designs. Even though the columns of B k in the proposed faster than the existing design using the subgradient method.
method do not span N (H k ), we will prove that it is also The remainder of the paper is organized as follows. In
an optimal precoder design for SZF-DPC. Particularly, the Section II, we briefly review the system model and the
proposed method reduces to the QRD-based design in [10] precoder design for SZF-DPC schemes. Section III deals with
for multiple-input multiple-output (MISO) BCs, meaning that the precoder design with the SPC. In this section, we present
the QRD-based design in [10] is optimal for MISO BCs. an optimal precoder design, and analyze the computational
The precoder designs for SZF-DPC mentioned above as- complexity. The precoder design with PAPCs is addressed
sume a sum-power constraint (SPC), for which the optimal in Section IV, where we present a specialized numerical
precoders admit a water-filling solution. In practice, individual algorithm to the find the optimal precoders. We also intro-
per-antenna power constraints (PAPCs) are more useful than duce two suboptimal designs which have lower computational
the SPC, since each antenna is equipped with its own power complexity, but perform close to the optimal one. In Section V,
amplifier. We notice that the precoder designs for channel we address a precoder design for BD under PAPCs, extending
inversion or block diagonalization with PAPCs in [12], [13] the proposed algorithm for SZF-DPC. Numerical results are
are applicable to SZF-DPC since SZF-DPC can be viewed as given in Section VI, followed by some conclusions in Section
a relaxation of these schemes. Particularly, a numerical algo- VII.
rithm based on a dual decomposition method was proposed in Notation: Standard notations are used in this paper. Bold
[13], using a subgradient method to find the optimal precoders lower and upper case letters represent vectors and matrices,
for block diagonalized systems. respectively; H H and H T are Hermitian and normal trans-
Since subgradient methods in general show slow conver- pose of H, respectively; ||H||F and |H| are the Frobenius
gence, a second contribution of this paper is to propose a norm and determinant of H, respectively. I M represents an
more efficient solution. To the best of our knowledge, no M M identity matrix. R(H) and N (H) denote the column
analytical form for the optimal precoder design for SZF- space and the null space of H, respectively. diag(x), where
DPC with PAPCs has been reported. Hence, we resort to x is a vector, denotes a diagonal matrix with elements x;
numerical algorithms. To this end, we first formulate the diag(H), where H is a square matrix, denotes a vector of its
precoder design as a rank-constrained optimization problem, diagonal elements. [x]i is the ith entry of vector x; [H]i,j is
to which the rank relaxation technique is a popular approach the entry at the ith row and jth column of H.
[14]. Basically, this technique drops the rank constraints to
form a relaxed problem, which is (very often) convex and, II. MIMO BC S AND SZF-DPC
thus, easier to solve. In general, however, such an approach
may yield suboptimal solutions to the original problem since Consider a single-cell MIMO BC with a base station (BS)
the rank constraints are not guaranteed when solving the and K multiple antenna users. The channel between the BS
relaxed problem. Interestingly, due to the special structure and the kth user is generally modeled by a matrix H k
of SZF-DPC, we are able to show that all optimal solutions Cnk N , where N and nk 1 are the number of antennas at
of the relaxed problem always satisfy the rank constraints. the BS and the kth user, respectively. The received signal at
In other words, the relaxation is tight, and both the original the kth user is given by
and the relaxed problems are equivalent. Then, we propose
y k = H k xk + H k xj + nk (1)
a numerical algorithm to solve the relaxed problem, based j=k
on the barrier method. As a part of the proposed algorithm,
we recognize that all the power constraints must be active where xk CN 1 denotes the transmitted signals for the kth
at the optimum. This facilitates finding the optimal solutions user, and nk Cnk 1 is assumed to be a complex-Gaussian
numerically since equality constraints are easier to cope with. noise vector with zero mean and covariance matrix I nk . We
By numerical examples, we demonstrate that the proposed can further write
algorithm achieves remarkably fast convergence rate. In addi- xk = W k sk (2)
2364 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 61, NO. 6, JUNE 2013
where sk CLk 1 , Lk min(N, nk ) with E[sk sH k ] = I, the outer boundary of the achievable rate region of SZF-DPC.
and W k CN Lk are the vector of transmitted symbols and In [11], an approach to find the optimal precoders W k s for
the precoder of the kth user, respectively. Accordingly, (1) can (7) was proposed using the SVD, which is described next.
be rewritten as

y k = H k W k sk + H k W j sj + H k W j sj + nk . (3) A. SVD-based Design
j<k j>k To solve (7), we can write W k = B k Dk , where B k is
It is well known that DPC is a capacity achieving transmission designed to remove the interference, and Dk is adjusted to
strategy for MIMO maximize the WSR under some power constraints. Obviously,
BCs. For the kth user, the BS views the the ZF constraints in (7) mean that B k must lie in N (H
k ),
interference term j<k H k W j sj as known non-causally, so
1
it can be perfectly eliminated using DPC. As a result, the where
k1
resulting data rate of the kth user is given by k = [H T H T H T ]T C
H i=1 ni N
. (8)
1 2 k1
K
DPC
|I + jk H k S j H H k | Like DPC, the user ordering also affects the achievable sum
Rk = log (4)
|I + j>k H k S j H H k | rate of SZF-DPC [10], [11]. Note that a different user ordering
results in a different H k in (8), and accordingly different
where S j = W j W H j is the transmit covariance matrix N (H k ). More specifically, the singular values of the effective
for the jth user. Since (4) is non-convex with respect to channel for the kth user are changed with a new user ordering,
S j , it is generally difficult to deal with. For the sum rate providing higher or lower sum rate. Obviously, the best sum
maximization problem for MIMO BCs under a total power rate can be obtained by searching all possible combinations of
constraint, optimal S j s cannot be found analytically, and user ordering. For the sake of simplicity, we assume a natural
iterative numerical algorithms are involved [7][9]. user ordering in this paper. In [11], B k is chosen to be an
To overcome the difficulty in finding optimal covariance orthonormal basis of N (H k ), which can be found by an SVD
matrices in DPC, the authors in [11] proposed SZF-DPC, of H k . To be specific, denote the full-size SVD of H k as
which admits a closed-form solution for optimal precoders.
In fact, SZF-DPC is a generalization of the zero-forcing DPC H k
k=U k [V k B k ]H (9)
(ZF-DPC) in [10], devised for single-antenna
receivers. For
where k has the same dimensions as H
k1
k . Then columns
SZF-DPC, the interference term j<k H k
W j sj in (3) is
of B k C N nk
,n k = N i=1 ni , form an orthonormal
canceled by DPC, and the interference term j>k H k W j sj k ). The condition for the BS to support all users
basis of N (H
is eliminated by designing W j such that
is that N (H k ) has a dimension larger than zero, for all k.
H k W j = 0 for all j > k. (5) Assuming the rows of all H k s to be linearly independent,
this requirement is equivalent to N > K1 i=1 ni . When the
Accordingly, the resulting data rate of the kth user for SZF- number of users is large, a user scheduling algorithm is needed
DPC is given by to choose a set of users that satisfies the above condition and
RkSZF-DPC = log |I + H k S k H H
k |. (6) can exploit the multiuser
K1 diversity gain [17]. In this paper,
we assume N > i=1 ni and focus on the precoder design.
The goal of the precoder design for SZF-DPC schemes is Since tr(W k W H H H H
k ) = tr(B k D k D k B k ) = tr(D k D k ), the
to find W k s that maximize a performance measure under maximization problem in (7) reduces to
the ZF constraints in (5), and additional constraints on the K H H
k=1 k log |I + H k D k D k H k |
transmit power. In this paper, we consider the precoder design maximize
Dk (10)
for SZF-DPC to maximize the WSR under a sum power and K H
subject to k=1 tr(D k D k ) P
per-antenna power constraints.
where H k = H k B k . Then, D k can be easily found with
III. P RECODER D ESIGNS FOR SZF-DPC WITH SPC water-filling over non-zero singular values of H k . More
specifically, define a compact SVD of Hk as

In this section, we address the maximization problem for
SZF-DPC under a SPC, which is mathematically formulated k = H k B k = U k k V H Cnk nk
H (11)
k
as
K H H where k is an Lk Lk diagonal matrix that contains all
maximize
Wk k=1 k log |I + H k W k W k H k | non-zero singular values of H k , Lk = min(nk , n k ), and V k
subject to H j W k = 0, j < k (7) contains the Lk singular vectors of H k . To maximize the
K H
1
k=1 tr(W k W k ) P WSR, D k is found as Dk = V k k2 , where k CLk Lk is

a diagonal matrix, which is a solution to the following problem
where P is the maximum transmit power at the BS, and k is
the weighting factor associated with the rate of user k, which is K

typically included to achieve a certain fairness index among RSZF-DPC = maximize
k log2 |I + (k )2 k |. (12)
k : tr(k )P
users. For example, to obtain the proportional fairness, we k=1
can set k = 1/R k , where R
k is the average data rate of
1H k is only defined for k 2 since we just need to find the precoding
user k in the previous time slots. Moreover, by choosing k matrices B k for k 2. The precoding matrix B 1 for the first user is set to
appropriately, we can also find the full Pareto boundary, i.e., be an identity matrix, i.e., B 1 = I N .
The set of optimal {k } to (12) can be easily calculated using which is a lower block triangular matrix, where H k =
nk Lk
the water-filling algorithm. The resulting precoder for the kth H kW k C is the effective channel matrix of the kth
1
user is given by W k = B k V k k2 . Since the columns of user. In the generalized QRD (GQRD)-based design, we fur-
B k span N (H k ), it is not difficult to see that the SVD-based ther force HW to be a lower triangular matrix. Specifically,
method is optimal for (7). However, consider a QRD of H given in (14) as
k1 this method employs SVD
to find the null space of a ( i=1 ni ) N matrix for the kth H = LQ (17)
user, which is computationally costly.
A simple method to reduce the complexity of the SVD- and partition L into
based design is to replace the SVD by the QRD, which has
L1
lower and deterministic complexity. Specifically, applying a L2
QRD to H k gives
..
L=
.
(18)
k = [Lk 0][Q B k ]H
H (13)
k ..
k1 k1
.
where L C i=1 ni i=1 ni is a lower triangular matrix, LK
k1
Qk CN i=1 ni contains an orthonormal basis of R(H H ),
k nk nk
and B k CN nk forms an orthonormal basis of N (H k ), i.e., where stands for Lk C is a lower triangular matrix,
k B k = 0, and B H B k = I. This simple method is referred and Q into
H k K ]H
to as the natural QRD (NQRD)-based design in the sequel. Q = [B1B
2 B (19)
Note that B k in (13) and B k in (9) are both orthonormal basis where B k CN nk satisfies H j B k = 0, j < k, i.e.
of N (H k ), and, thus, the NQRD- and SVD-based precoder H
H
kB k = 0, and B B
k
k = I n . Note that the columns of
k
designs are equivalent. k does not form a basis of N (H
B k ). Let
Apparently, the NQRD-based design can lower the com-
plexity of SVD-based method, but its complexity is still high Bk = B k B k+1 B K CN nk . (20)
because, like SVD-based design, the QRD is sequentially Clearly, the columns of B k in (20) forms an orthogonal basis
applied to a matrix of large dimensions. In the following, we of N (H k ), and thus precoder for user k can always be written
propose another optimal precoder design, which is derived as W k = B k D k . In this way, an orthogonal basis for each
directly from a single QRD of H in (14). N (H k ) is computed by a single QRD of H in (14). To gain
further insights into the structure of optimal precoders under
B. Generalized QRD-based Design (GQRD-based design) SPC, let us analyze the effective channel of the kth user, which
is given by
The precoding matrix B k in the NQRD- and SVD-based
designs is found to be a basis of N (H k ) for each k. Due to the Dk,1
H k W k = H k B k D k = Lk 0 = Lk D k,1 (21)
concatenated structure of H k , the complexity of these meth-
Dk,0
ods increases with k. Note that rank(B k ) = dim(N (H k )) =
where Dk,1 contains the top nk rows of Dk and Dk,0 the
N n k . Since rank(H k B k ) = rank(H k ) = nk
remaining (nk nk ) rows. From (21), we can easily see that
rank(B k ), we can reduce the dimension of B k as long as
the cost function of (7) stays the same no matter how Dk,0
its singular values are aligned with those of H k . This fact
is chosen. We now show that it is optimal to set D k,0 = 0,
suggests that B k need not be a basis of N (H k ) to be an
which follows from
optimal design. In what follows, we propose a precoder design
based on a single QRD, where columns of B k do not span tr(W k W H H H H H
k ) = tr(B k D k D k B k ) = tr(B k B k D k D k )
N (H k ). For notational convenience, let us stack the channel
= tr(D k DH H H
k ) = tr(D k,1 D k,1 ) + tr(D k,0 D k,0 )
matrix of all users in a matrix H defined as
H tr(D k,1 D H
k,1 ) (22)
H = HH 1 HH 2 HHK CnR N (14)
with equality if and only if D k,0 = 0. In fact, we have proved
K the following theorem
where nR = k=1 nk is the total number of receive antennas,
and all precoders in a matrix W given by Theorem 1. The optimal precoders W k s for SZF-DPC under
W = [W 1 W 2 W K ] C N LR
(15) SPC are of the form W k = B k Dk,1 , where D k,1 is the
solution to the following problem
K
where LR = k=1 Lk is the total number of data streams K H H
that the BS is able to transmit to all users in the system. The
maximize
D k,1 k=1 k log |I + Lk D k,1 D k,1 Lk |
K (23)
H
k=1 tr(D k,1 D k,1 ) P.
ZF constraints in (5) can be equivalently expressed as subject to

H1 Theorem 1 leads to the GQRD-based design, which is
H 2 summarized in Algorithm 1.

..
HW = .
(16) Remark 1. As a special case when nk = 1 for all k, i.e. single-
. antenna receivers, the GQRD-based method is the same as
.. the precoder design based on QRD proposed in [10]. Indeed,
H K this design is widely used in current works [18][20] without
Algorithm 1 GQRD-based precoder design for SZF-DPC. 106

2
1: Compute the QR decomposition as H = LQ SVD-based design
2: Partition Q = [B1B2 B K ]H , where B
k CN nk . NQRD-based design
3: Solve (23) using water-filling algorithm over effective GQRD-based design
channel matrices H k = HkB k = Lk . Denote the 1.5
optimal solution by Dk,1 .
Number of flops
4: Optimal precoder for user k is found as W k = B k Dk,1 .
1
recognizing its optimality. As an intermediate consequence of

Theorem 1, the precoder design based on QRD in [10] is
0.5
shown to be optimal for SZF-DPC in MISO BCs under a SPC.
We note that the optimality of the QRD design for SZF-DPC
with single-antenna receivers under a SPC was also established
in our earlier works using a different approach [21][23]. 0
4 6 8 10 12 14 16 18 20 22 24 26 28 30 32
Number of transmit antennas, N
C. Complexity Comparison
In this subsection, we analyze the complexity of the pre- Fig. 1. Complexity comparison of various precoder designs for SZF-DPC
schemes, nk = n
= 2.
coder designs for SZF-DPC schemes presented above. Since
the process of finding Dk is the same for all methods, we only
consider the complexity of calculating B k . The complexity is versus the number of transmit antennas, N . In Fig. 1, the
measured by the number of flops as in [24], [25], denoted number of receive antennas is the same for all users, nk =
as .2 Although flop counting is a crude measurement of the = 2, for all k, and the number of users is K = N/
n n.
true computational complexity, it captures the order of the As we can see, the NQRD-based design requires significantly
computation load. lower complexity than the SVD-based method. Noticeably, the
For simplicity, we show the complexity for the case where GQRD-based precoder design greatly reduces the complexity
all users have the same number of antennas, i.e. nk = n , for compared to other precoder designs.
all k. The number of supportable users is then given by K =
N/ n, where N is the number of transmit antennas at the BS,
assumed to be a multiple of n . The number of flops of some IV. P RECODER D ESIGNS FOR SZF-DPC WITH PAPC S
typical operations is given as follows. Multiplication of an m While the precoder designs for SZF-DPC with a sum power
p matrix and a p n matrix requires 8mpn flops. The number constraint can be found easily using the water-filling algorithm
of flops needed to compute QRD of a real matrix of size [11], that with PAPCs has not been extensively studied. In
mn, m n with fast Given transformations is 2n2 (mn/3) practice, PAPCs are more relevant since each antenna has its
[26]. For a complex matrix of the same size, we approximate own power amplifier [12], [27], [28]. The maximization for
the number of flops involved in a QRD by 4n2 (3m n), i.e. SZF-DPC under PAPCs is formulated as
treating every operation as a complex multiplication. Similarly, K H H
maximize k=1 k log |I + H k W k W k H k |
the number of flops for the SVD of an m n complex-valued Wk
matrix is approximated by 24mn2 + 48m2 n + 54m3 [24]. H j W k = 0, j < k,
subject to
K H
1) SVD-based design: For the kth user, k 2, the number k=1 [W k W k ]n,n Pn , n = 1, . . . , N
of flops needed to compute the SVD of H k C(k1)nN is (26)
24(k 1) nN 2 + 48(k 1)2 n 2 N + 54(k 1)3 n 3 . Thus, the where Pn is the power constraint for the nth transmit antenna.
total number of flops of the SVD-based precoder design is Let B k be a basis of N (H k ) as given in (8). Then, solving
(26) amounts to solving the following optimization
K

SVD nN 2 + 48(k 1)2 n
{24(k 1) 2N K H
k=1 k log |I + H k k H k |
maximize
k 0
k=2 (24) K (27)
+54(k 1)3 n
3} subject to [B B H ] P , n
k=1 k k k n,n n
3 rank(k ) nk
41KN .
2) GQRD-based Design: The complexity of the general- where k D k D H
k C
n
k
nk
, and H
k = H k Bk
ized QRD-based design is equivalent to the number of flops Cnk nk .
needed to compute a QRD of H, which is given by
GQRD = 4N 2 (3N N ) = 8N 3 . (25) A. Optimal Precoder Design
The complexity of the precoder designs mentioned above We begin by omitting the rank constraints in (27), and
is compared in Fig. 1, where we plot the number of flops consider the following relaxed problem
K H
2 A flop is equal to a real floating point operation [26]. A real addition, maximize
k 0 (28)
multiplication, or division is counted as one flop. A complex addition and K H
multiplication have two flops and six flops, respectively. subject to k=1 [B k k B k ]n,n Pn .
Now problem (28) is a convex program, and thus can be solved can be replaced by equality ones, which facilitates a Newton-
by numerical optimization tools, for example SDPT3 [29]. type method. In other words, (28) is equivalent to
Regarding the relaxation technique, an important question to K
ask is whether the optimal solution to the relaxation problem maximize k k H
k log |I + H H|
k=1 k
K H
k=1 [B k k B k ]n,n = Pn , n
is also optimal to the original problem. Interestingly, we show subject to (31)
that (27) and (28) are equivalent, and thus the optimal solution k 0, k.
to (28) is also optimal to (27). The proof is an immediate
The proposed algorithm is based on the barrier method [30] to
consequence of the following lemma.
solve (31). As a standard step, we define a modified objective
Lemma 1. The optimal solutions k to (28) satisfy function
rank(k ) Lk = min(nk , n
k ). K
K

f (t, {k }) = (t H| +
k k H
k log |I + H log |k |)
Proof: Please refer to Appendix A. k
k=1 k=1
Although problem (28) can be solved by a general purpose (32)
optimization package, developing a specialized algorithm, if where log |k | is the logarithmic barrier function to account
possible, that exploits the problem structure is always of for the positive semidefinite constraint k 0, and t > 0
great interest. Herein, we present a customized interior-point is a parameter that controls the logarithmic barrier terms. For
algorithm to solve (28), using the barrier method. Before mathematical convenience, let
proceeding, it is worth pointing out that a two-stage iterative
(n)
algorithm was proposed for the precoder design of block Ak = BH
k diag(0, . . . , 0, 1, 0, . . . , 0)B k C
n
k
nk
(33)

diagonalization scheme in [13] using a subgradient method, n1 N n
which can be applied to solve (28). More explicitly, consider
and consider a standard equality constrained minimization
the partial Lagrangian function of (28), which is given by
problem
K

H| minimize f ({k }, t)
L(k , ) = k k H
k log |I + H k K (n) (34)
k=1 subject to k=1 tr[Ak k ] = Pn , n.
N
K
The general idea of a barrier method is that, for a fixed
n ( [B k k B H
k ]n,n Pn ) (29) t, we find the optimal solutions {k (t)} to (34) (which is
n=1 k=1
known as the centering step), and increase t until the algorithm
where n is the dual variable associated with the power converges. In this paper, we employ the infeasible start Newton
constraint on the nth antenna . Since strong duality holds method to find the optimal solutions to (34). The purpose of
for (28), its optimal solution can be found via the following using the infeasible start Newton method is to simplify the
convex-concave optimization problem initialization of {k } that satisfy the equality constraints. We
start with the optimal conditions (i.e. KKT conditions) for
minimize maximize L(k , ) = minimize g() (30) (34), which are given by
0 k 0 0
H H
tk H k )1 H
k k H
k (I + H k
where g() max L(k , ) is the dual function of problem N
k 0 (n)
(28). The two-stage iterative algorithm in [13] works as 1
k + n Ak = 0, k (35)
follows. For fixed , the set of covariance matrices {k } n=1
that maximizes L(k , n ) can be obtained by the water- K
(n)
filling algorithm. Next, for a set of given {k }, the iterative tr[Ak k ] = Pn , n (36)
algorithm updates to minimize the dual function g() based k=1
on the subgradient method. Generally, however, the subgradi- where {n } are the dual variables. In (35) we have used the
ent method converges slowly to the optimum. It is known
fact that the gradient of log |I + H H | with respect
k k H
k
that for a minimax optimization problem, the infeasible-start
to k is given by k log |I + H H| = H
k k H H (I +
Newtons method [30] that solves the maximization and the k k
minimization at the same time has a faster convergence rate H H )1 H
k k H k . The main effort in a Newton method is
k
[28], [30], [31]. In the following, we propose a numerical to calculate the Newton step. To do this, we replace k by
algorithm to solve (28), which exhibits a better convergence k + k and n by n + n in the KKT conditions,
behavior. which yields the KKT system for the Newton step
First, we observe that the constraints in (28) are active at the
optimum. As proof, suppose the ith H (I + H
tk H H +H
k k H H )1 H
k k H k
K H constraint
K
is inactive, i.e. k k k
k=1 [B k k B k ] i,i = [ 1 ] i,i + k=2 [B k k BHk ]i,i < Pi .

N
(n)
There exists > 0 to be small enough such that [1 ]i,i + + (k + k )1 (n + n )Ak = 0, k (37)
K H
k=2 [B k k B k ]i,i + < Pi . Replacing [1 ]i,i by ([1 ]i,i +
n=1
) yields a larger objective value in (28), which contradicts K K

the assumption that {k } are optimal. This observation is (n)
tr[Ak k ] = Pn
(n)
tr[Ak k ], n. (38)
computationally useful since the inequality constraints in (28) k=1 k=1
Denote k k H
k = I + H H . Since
k is invertible, and Algorithm 2 The proposed numerical algorithm to solve (28)
k
1 1 1 1 Initinalization: k = In k , = 0, t = t0 , and and
(A + B) A A BA for small B, we can write
(37) as tolerance > 0
1: repeat {Outer iteration}
H
tk H 1 1 H 1 1
k (k k H k k H k k )H k + (k 2: repeat {Inner iteration (centering step)}
N
3: Compute the Newton step k and dual step
(n)
1 1
k k k ) (n + n )Ak = 0, k (39) from (41) and (44), respectively
n=1 4: Backtracking line search:
or equivalently as 5: s=1
6: while r({k } + s{k }, + s) > (1
k k H
tk k H k k + k = tk k H
k k s)r({k }, ) or ({k } + s{k }
0) do
N 7: s = s
(n)
+ k (n + n )k Ak k , k (40) 8: end while
n=1 9: Update primal and dual variables: k = k +sk ;
= + s
where H k = H H 1 H
k . One possibility to find {k } 10: until r({k }, ) <
k k
and {n } is to vectorize k as a vector k of length 11: Increase t. t = t
n nk + 1)/2 for each k, transform(38) and (40) into
k ( 12: until t is sufficiently large to tolerate the duality gap.
K
a form of a linear system of (N + k=1 n k (
nk + 1)/2)
variables, and use a generic method to solve the resulting
linear system. However, the complexity of such a method is
primal-dual interior-point method which often shows faster
O(N 6 ). In this paper, we present a low-complexity method to
convergence rate. However, we skip the details for the sake of
find {k } and {n }, using block elimination [30, Section
simplicity.
10.4]. Specifically, we express k as
Remark 2. Here, we provide a rough comparison of the
N

(0) (i) computational cost of an iteration for the proposed algorithm
k = k + i k . (41)
and the two-stage iterative method. Note that, to update k
i=1
in each iteration of the two-stage iterative method, presented
Substituting (41) into (40) yields a system of (N +1) discrete- in [13], we need to compute the singular value decomposition
time Sylvester equations (SVD) of an nk n k matrix, and the inverse of an n k n
k
(0) H
k k + (0) = tk k H
k k + matrix, of which the complexity is O( n2k nk ) and O(
n3k ),
tk k H k k respectively. As mentioned earlier, the complexity of solving
N
(n) (42) is reduced to O( n3k ). That is to say, the complexity of
k n k Ak k
each iteration of the two-stage iterative method is of the same
n=1
order as that of Algorithm 2. Moreover, Algorithm 2 is actually
(i) H
tk k H k k + (i) = k A(i) k , i = 1, . . . , N.
k k k a customized interior-point method using the barrier method,
(42)
and thus it shows a superlinear convergence rate as we show
Numerical methods to solve the discrete-time Sylvester
in the numerical result section. As a result, for the proposed
equations in (42) with complexity O( n3k ) can be found e.g. in
algorithm, less computation effort is required to obtain the
[32]. We note that the solution to each discrete-time Sylvester
optimal precoders.
equation in (42) is a Hermitian matrix, and thus k in (41)
is Hermitian as well. To compute the Newton step for the dual
variable , we plug k from (41) into (38), which results B. Suboptimal Designs
in a linear system
= (43) In practice, it is always interesting to find suboptimal
K designs that can achieve a significant fraction of the optimal
(i) (j)
where []i,j = tr(A ) for i, j = 1, 2, . . . , N , performance, but require lower complexity. In this subsection
K k=1 (i) k (0)k K (i)
and []i = k=1 tr(Ak k ) + Pi k=1 tr(Ak k ). we propose two suboptimal precoder designs, and briefly
We define residual norm at {k } and , which is used in the discuss their computational complexity, compared to (28).
backtracking line search, as 1) QRD-based PAPC Precoder Design: The first sub-
K
optimal design, which is referred to as QRD-based PAPC

r({k }, ) = H (I + H
|| tk H H )1 H
k k H k design, is derived from the GQRD-based design for the SPC.
k k
Specifically, let Qk CN nk be the result of applying the
k=1
N QRD to H as given in (19). By Theorem 1, it holds that

1 +
(n)
n Ak ||F + v 2 (44) H k Qk QH H H H
k H k = H k B k B k H k . This equality suggests
k
n=1 that precoders based on a linear combination of Qk are
K expected to work well. In this way, the transmit covariance
(i)
where vi = Pi k=1 tr[Ak k ] for i = 1, 2, . . . , N. The matrices of the first suboptimal design are given by
proposed algorithm to solve (28) is summarized in Algorithm
2. The proposed algorithm can be easily modified into a S k = Qk k QH
k (45)

where k Cnk nk is the solution to the following problem Cnk Lk , and problem (48) is equivalent to
K H H K H H H
maximize k=1 k log |I + H k Qk k Qk H k | maximize
Sk k=1 k log |I + H k k T k T k k H k |
k
K H
(46) K H H
subject to k=1 [Qk k Qk ]n,n Pn , n. subject to k=1 [k T k T k k ]n,n Pn , n
(50)
We note that problem formulation in (46) is analogous to that We can further rewrite (50) as
in (31). Hence, Algorithm (2) is used to compute optimal
K H
Qk in (46). Recall that the dimension of each k in (46) maximize k=1 k log |I + H k k H k |

k 0
is nk (nk + 1)/2, which is smaller than n nk + 1)/2 for
k ( K (51)
each optimal k in (28). Thus, solving (46) requires much subject to [ ]
k=1 k k P , n
k n,n n
lower complexity, especially when N is large, and nk is small. rank(k ) min(nk , Lk )

The simulation results show that the sum-rate gap between
where k = T k T H k Cnk nk , and H k = H k k is
the optimal design in (28) and suboptimal design in (46) is
the effective channel matrix for user k in the BD scheme.
negligible.
Similarly, the relaxed problem is equivalently given by
2) Rescaled SPC Precoder Design: The second suboptimal
design is obtained by scaling each column of the optimal K H
maximize
k 0 (52)
precoders for the SPC to meet the PAPCs. To be specific, let K
W k be the optimal precoder for the kth user with the SPC, subject to k=1 [k k k ]n,n Pn , n.
which is presented in Section III. Then a suboptimal precoder
We have shown in Section IV that the relaxation when
for the kth user can be given by W k = W k diag( k ), designing the optimal precoders for SZF-DPC with PAPCs
where k = [k,1 k,2 k,nk ], and k,i is a factor used is tight. The same result holds also for BD, as shown in the
to the scale the ith column of W k to satisfy the PAPCs. following lemma.
Mathematically, k is found through solving the following
problem Lemma 2. The optimal solutions to (52) satisfy rank(k )
K min(nk , nk ) for all k = 1, 2, . . . K.
H H
maximize
k 0 k=1 k log |I + H k W k diag(k )W k H k |
K Proof: The proof of Lemma 2 adopts that of Lemma 1.
H
subject to k=1 [W k diag(k )W k ]n,n Pn , n. Further details are given in Appendix B.
(47) Different from SZF-DPC, the inequality constraints in (52)
We can particularize Algorithm 2 to solve (47), and the are not necessarily active at the optimum. However, introduc-
required computational complexity is lower, compared to the ing slack variables, we can transform (52) into an easier-to-
first one, since the dimension of each k is just nk . handle formulation, which is expressed as
K H
maximize
V. P RECODER D ESIGN FOR B LOCK D IAGONALIZATION
K H
k=1 [k k k ]n,n + n = Pn , n
WITH PAPC S subject to (53)
n 0 n
As mentioned earlier, an optimal precoder design was
k 0, k.
proposed in [13] for BD under PAPCs, using the subgradient
method. In this section, we illustrate how to modify Algo- The proposed algorithm for BD follows the same steps as
rithm 2 to attain the optimal precoder design for BD with those for SZF-DPC, with some modifications in the centering
significantly faster convergence rate. Let T k CN Lk be the step. First, the modified objective function is changed to
precoder of the kth user in the BD scheme, where Lk nk is
the number of data streams that the BS can allocate to user k. K
Then the ZF constraints for BD impose that H i T k = 0 for f ({k }, v, t) = t k k H

k log |I + H Hk |
all i = k. That is, the precoder of a user is designed to cancel k=1
the interference induced by all other users. The maximization K
N

for BD with PAPCs is formulated as + log |k | + log n (54)
K k=1 n=1
H H
maximize
Sk k=1 k log |I + H k T k T k H k | and the problem for the centering step is
subject to H j T k = 0, j = k (48)
K H minimize f ({k }, , t)
k=1 [T T ]
k k n,n P n , n K (55)
subject to k=1 [B k k B k ]n,n + n = Pn , n.
To remove the ZF constraints in (48), for user k, we define
as
a matrix, comprising all other users channel matrices H Similar to (35) and (36), the KKT conditions for (55) are
k
[15] expressed as

= [H T H T H T H T H T ]T C(
H k 1 2 k1 k+1 K
i=k ni )N
. H (I + H
tk H H )1 H
k k H k
k k
(49) N

(n)
Similarly, let k CN nk , where n k = N i=k ni , be 1
k + n Ak = 0, k (56)
a basis of N (H k ). Then we can write T k = k T k , T k n=1
n1 + n = 0, n (57) 101
K
Proposed method
100
[k k H
Two-stage iterative method
k ]n,n + n = Pn , n (58)
k=1
101
and the resulting KKT system to find the Newton direction is
Error sum rate

given by 102

tk k H
k H k k + k = tk k H k k 103
N
(n)
+ k (n + n )k Ak k , k (59) 104
n=1
105
n2 n + n = n1
n , n (60)
K K 106
(n)
(n)
20 40 60 80 100
tr[Ak k ] + n = Pn tr[Ak k ] n n (61) Iteration
k=1 k=1
Fig. 2. Convergence behavior, N = 6, nk = 2 for k = 1, 2, 3.
The Newton step k is computed analogously by a system
of (N + 1) discrete-time Sylvester equations as in (42).
Plugging (60) into (61), and using (41), we can find the
line search parameters in Algorithm 2 are = 0.01, and
Newton step for the dual variables as the solution to the
= 0.5. Fig. 2 plots the error in the sum rate versus
following
the number of iterations for a random realization of channel
= (62)
matrices. For the first iterations, the two-stage iterative method
K (i) (j) 2
k=1 tr(Ak k ) i i,j for i, j =
where []i,j = converges faster than the proposed method to the optimal
1, 2, . . . , N , where i,j denotes the Kroneckers function, solution. However, the proposed method performs better when
i.e., i,j = 1 if i = j and i,j = 0 otherwise, and precoders approach the optimum. Simulation results obtained
K (i) (0) (i)
[]i = Pi k=1 tr(Ak k + Ak k ) + n2 n 2i . The with other randomly generated channel matrices illustrate
Newton direction for the slack variables is computed using the same convergence behavior of the two methods. Recall
(60), namely as that a subgradient need not be a descent direction. Thus, an
iteration can even decrease the objective function. Moreover,
n = n2 ( + n ) + n . (63) the convergence rate of subgradient methods relies strongly
on the problem size. For example, we observe that the two-
Remark 3. We have treated the cases of sum-power and
stage iterative method fails to converge within three thousand
per-antenna power constraints separately and their specific
iterations when N = 16, and K = 8, while Algorithm 2 still
properties are exploited to arrive at a computationally efficient
converges to the optimal solution within tens of iterations. As
algorithm for each case. However, it could also be practical to
a conclusion, the proposed numerical algorithm demonstrates
consider both types of power constraints simultaneously. For
better convergence rate than the two-stage iterative algorithm
example, in addition to PAPCs due to the physical limitation of
in [13]. It is worth mentioning again that, since the proposed
the power amplifiers, we can impose a SPC on transmitted data
algorithm and the two-stage iterative method are alternatives to
to meet, e.g., a regulatory body requirement for health factors
each other to find the optimal solution of the resulting convex
or to reduce the overall interference situation [31]. We note
problem given in (28), they will converge to the same optimal
that the proposed algorithms for SZF-DPC and BD schemes
objective value, i.e., the proposed algorithm and the two-stage
presented in the paper can be easily modified to handle SPC
iterative approach yield the same sum-rate performance. Thus,
and PAPCs simultaneously. For such a case, all the constraints
it is sufficient to provide the sum rate offered by the proposed
will not be binding, in general. However, we can introduce
algorithm in Figs. 3 and 4 to follow.
some slack variables to convert all constraints to be equality
In Fig. 3, we plot the average sum rate, i.e., k = 1 for
ones as done in (53). Then, the steps from (54) to (63) can
all k, of optimal and suboptimal precoder design methods
be slightly changed to solve the new problem.
for SZF-DPC schemes as a function of P , the total transmit
power. The resulting power constraint for each antenna (when
VI. N UMERICAL RESULTS considering the PAPCs) is Pn = P/N for all n. A quasi-
In this section, we provide numerical examples to demon- static fading model is used in our simulation.
The channel
strate the results in this paper. In the first numerical experi- for user k is generated as H k = dk H k where dk is a
ment, we compare the convergence rate of Algorithm 2 and given parameter to capture the path loss and entries of H k
the two-stage iterative method in [13]. A MIMO BC with follow zero mean and unit variance complex Gaussian random
N = 6 transmit antennas, K = 3 users, each with 2 receive variables for each snapshot. Fig. 3 considers a scenario with
antennas is simulated. The tolerance (for each centering step) dk = 1, N = 8, K = 4, and nk = 2 for k = 1, 2, . . . , 4. That
is set to be = 105 . The barrier method parameters t0 and is, we ignore the effect of path loss and simply consider small
are set to 50 and 1, respectively. The effects of the initial scale fading in Fig. 3. We can see that the optimal precoder
value of and t0 are discussed in [30]. The backtracking designs with PAPCs yield a slightly lower sum rate than those
Sum-power constraint VII. C ONCLUSIONS

Per-antenna optimal design This paper addresses the precoder design of SZF-DPC for
20 QRD-based PAPC precoder design MIMO BCs. For the SPC, we propose a precoder design,
Average sum rate (b/s/Hz)
Rescaled SPC Precoder Design which is shown to be optimal and has greatly lower complexity
than the existing method using the SVD. For PAPCs, the
precoder design for SZF-DPC is first formulated as a rank-
constrained optimization problem. Then we consider a relaxed
15
problem, which is obtained by dropping the rank constraints.
Exploiting the special features of SZF-DPC, we prove that the
relaxed and original problems are equivalent. More explicitly,
we show that optimal solutions of the relaxed problem always
10 satisfy the rank constraints. Next, we propose an efficient
numerical method based on a barrier method to solve the
0 2 4 6 8 relaxed problem. The proposed numerical method is shown to
Total transmit power, P (dB)
have a superior convergence behavior, compared with the two-
stage iterative method based on the dual subgradient method.
Fig. 3. Sum rate comparison of optimal and suboptimal designs for SZF-DPC In addition, we illustrate that that the proposed precoder design
schemes with dk = 1, N = 8, K = 4, nk = 2, k = 1, 2, . . . , 4. for SZF-DPC can be slightly modified to solve the problem
of precoder design for BD.
A PPENDIX
Sum-power constraint
6 A. Proof of Lemma 1
Per-antenna optimal design
QRD-based PAPC precoder design In this appendix, we prove that the rank of optimal solutions
Average sum rate (b/s/Hz)
5 Rescaled SPC Precoder Design to (28) is less than or equal to Lk . The proof follows
similar arguments as in [23], [33]. We begin by forming the
Lagrangian function of (28), which is given by
4
K

L(k , ) = k log |I + H H|
k k H
k
3 k=1
N
K
K
(n)

n tr(k Ak ) Pn + tr(k k ) (64)
2
n=1 k=1 k=1
where H k = H k B k Cnk nk , A(n) is defined in (33),

0 2 4 6 8 k
{n } are dual variables associated with the PAPCs, and
Total transmit power, P (dB)
k 0 is the dual variable for the positive semidefi-
Fig. 4. Sum rate comparison
of optimal and suboptimal designs for SZF- nite constraint. Denote P = diag(P1 , P2 , . . . , PN ), =
DPC schemes with dk = k 2 / K l=1 l , N = 8, K = 4, nk = 2, k =
2
diag(1 , 2 , . . . , N ), and k = B H
k B k . We can then
1, 2, . . . , 4. rewrite (64) as
K

L(k , ) = H|
k k H
k log |I + H k
k=1
with the corresponding SPC. The sum-rate gap between the
optimal and suboptimal designs are small, and decreases as tr(k k ) + tr(k k ) + tr(P ). (65)
P increases. The suboptimal design based on QRD performs
slightly better than the suboptimal design based on a heuristic At the optimum, we have
manner. We can expect that these precoder designs give the H (I + H
k H H )1 H
k k H k k + k = 0. (66)
k k
same performance as P approaches to infinity, since the equal
power allocation is proved to be optimal in the high SNR Using the complementary slackness property k k = 0, we
regime. obtain
In Fig. 4, we investigate the sum rate performance of the H (I + H
k H H )1 H
k k H k k = k k . (67)
k k
precoder designs for SZF-DPC where users have different path
loss. The simulation scenario for Fig.
4 is the same as that We now show that the dual optimal variables of (28) are
K
for Fig. 3 but now we set dk = k 2 / l=1 l2 . Due to non- strictly positive, i.e., n > 0 for all 1 n N . As proof,
uniformly distributed location of the users, the gap between consider the dual objective of (28), which can be expressed
the optimal precoder designs with SPC and PAPCs become as,
larger. g(, k ) = max L(k , , k ). (68)
By contradiction, suppose i = 0 for some 1 [9] M. Kobayashi and G. Caire, An iterative water-filling algorithm for
i N . We construct a set of k such that 1 = maximum weighted sum-rate of Gaussian MIMO-BC, IEEE J. Sel.
Areas Commun., vol. 24, no. 8, pp. 16401646, Aug. 2006.
diag(0, . . . , 0, , 0, . . . , 0), and k = 0 for 2 k K.
[10] G. Caire and S. Shamai, On the achievable throughput of a multi-
i1 N i
antenna Gaussian broadcast channel, IEEE Trans. Inf. Theory, vol. 49,
Then, the objective function in (65) becomes no. 7, pp. 16911706, Jul. 2003.
[11] A. Dabbagh and D. Love, Precoding for multiple antenna Gaussian
1 ]i ||22 ) + tr(1 1 ) (69)
L(k , , k ) = 1 log(1 + ||[H broadcast channels with successive zero-forcing, IEEE Trans. Signal
Process., vol. 55, no. 7, pp. 38373850, Jul. 2007.
[12] A. Wiesel, Y. Eldar, and S. Shamai, Zero-forcing precoding and
where [H 1 ]i is the ith column of H 1 . We can see that the
generalized inverses, IEEE Trans. Signal Process., vol. 56, no. 9, pp.
objective function in (69) is unbounded above if . 44094418, 2008.
Since we are only interested in the case where g(, k ) is [13] R. Zhang, Cooperative multi-cell block diagonalization with per-base-
station power constraints, IEEE J. Sel. Areas Commun., vol. 28, no. 9,
finite, we conclude that i > 0 for all 1 i N . Thus, pp. 14351445, Dec. 2010.
must be positive definite, and k is invertible. It follows from [14] Z. Q. Luo, W. K. Ma, A.-C. So, Y. Ye, and S. Zhang, Semidefinite
(67) that rank(k ) rank(H k ) = Lk , which completes the relaxation of quadratic optimization problems, IEEE Signal Process.
Mag., vol. 27, no. 3, pp. 2034, May 2010.
proof. [15] Q. Spencer, A. Swindlehurst, and M. Haardt, Zero-forcing methods
for downlink spatial multiplexing in multiuser MIMO channels, IEEE
Trans. Signal Process., vol. 52, no. 2, pp. 461471, Feb. 2004.
B. Proof of Lemma 2 [16] L.-N. Tran, M. Juntti, and E.-K. Hong, On the precoder design for
In this appendix, we slightly modify the proof of Lemma block diagonalized MIMO broadcast channels, IEEE Commun. Lett.,
vol. 16, no. 8, pp. 11651168, Aug. 2012.
1 to show that the optimal solutions to problem (52) satisfy [17] L.-N. Tran and E.-K. Hong, Multiuser diversity for successive zero-
rank(k ) min(nk , nk ). Let {n } be dual variables associ- forcing dirty paper coding: greedy scheduling algorithms and asymptotic
ated with the PAPCs in (52). Following the same derivations performance analysis, IEEE Trans. Signal Process., vol. 58, no. 6, pp.
34113416, Jun. 2010.
from (64) to (67) as in Appendix A, we have [18] J. Jiang, R. Buehrer, and W. Tranter, Greedy scheduling performance
H H
for a zero-forcing dirty-paper coded system, IEEE Trans. Commun.,
k H k k H
(I + H )1 H
k k = k k (70) vol. 54, no. 5, pp. 789793, May 2006.
k k
[19] Z. Tu and R. Blum, Multiuser diversity for a dirty paper approach,
where k and H k are defined in (49)-(52), = IEEE Commun. Lett., vol. 7, no. 8, pp. 370372, Aug. 2003.
[20] M. Maddah-Ali, M. Sadrabadi, and A. Khandani, Broadcast in MIMO
diag(1 , 2 , . . . , N ), and k is now defined as k = systems based on a generalized QR decomposition: signaling and
H k k . Unlike SZF-DPC where we are able to prove that performance analysis, IEEE Trans. Inf. Theory, vol. 54, no. 3, pp. 1124
n > 0 for all n, the dual variables associated with the PAPCs 1138, Mar. 2008.
[21] L.-N. Tran, M. Juntti, M. Bengtsson, and B. Ottersten, Beamformer
for BD are not necessarily positive. That is to say, not all designs for zero-forcing dirty paper coding, in Proc. 2011 International
the power constraints are necessarily tight at the optimum. Conference on Wireless Communications and Signal Processing, Nov.
However, k in (70) is still invertible. To see this, let be a 2011, pp. 15, invited paper.
[22] L.-N. Tran, M. Juntti, M. Bengtsson, and B. Ottersten, On the opti-
vector lying in N (k ), i.e. k = 0. Due to the assumption mality of beamformer design for zero-forcing DPC with QR decompo-
of independence among H k s, it is guaranteed with probability sition, in Proc. 2012 IEEE ICC, pp. 25362541.
one that H H = H k k = 0. Let k = H . Then, [23] L.-N. Tran, M. Juntti, M. Bengtsson, and B. Ottersten, Beamformer
k designs for MISO broadcast channels with zero-forcing dirty paper
lim L(k , , k ) = , i.e., the objective function in (68) coding, IEEE Trans. Wireless Commun., vol. 12, no. 3, pp. 11731185,
is unbounded above, which is not of the interest. For k to Mar. 2013.

[24] Z. Shen, R. Chen, J. Andrews, J. Heath, R.W., and B. Evans, Low
be invertible, the number of non-zero n is not less than nk , complexity user selection algorithms for multiuser MIMO systems with
which is a special case of [13, Lemma 3.2]. block diagonalization, IEEE Trans. Signal Process., vol. 54, no. 9, pp.
36583663, Sep. 2006.
[25] X. Zhang and J. Lee, Low complexity MIMO scheduling with chan-
R EFERENCES nel decomposition using capacity upperbound, IEEE Commun. Lett.,
[1] L.-N. Tran, M. Juntti, M. Bengtsson, and B. Ottersten, Successive vol. 56, no. 6, pp. 871876, Jun. 2008.
zero-forcing DPC with sum power constraint: low-complexity optimal [26] G. H. Golub and C. F. V. Loan, Matrix Computations, 3rd edition. The
precoders, in Proc. 2012 IEEE ICC, pp. 48574861. John Hopkins University Press, 1996.
[2] L.-N. Tran, M. Juntti, M. Bengtsson, and B. Ottersten, Successive zero- [27] F. Boccardi and H. Huang, Zero-forcing precoding for the MIMO
forcing DPC with per-antenna power constraint: optimal and suboptimal broadcast channel under per-antenna power constraints, in Proc. 2006
designs, in Proc. 2012 IEEE ICC, pp. 37463751. IEEE SPAWC, pp. 15.
[3] E. Telatar, Capacity of multi-antenna Gaussian channels, Eur. Trans. [28] W. Yu and T. Lan, Transmitter optimization for the multi-antenna
Telecommun, vol. 10, pp. 585598, Nov. 1999. downlink with per-antenna power constraints, IEEE Trans. Signal
[4] G. J. Foschini and M. J. Gans, On limits of wireless communications Process., vol. 55, no. 6, pp. 26462660, Jun. 2007.
in a fading environment when using multiple antennas, Wireless Pers. [29] K. C. Toh, M. J. Todd, and R. Tutuncu, SDPT3a Matlab software
Commun, vol. 6, pp. 311335, Mar. 1998. package for semidefinite programming, Optimization Methods and
[5] A. Goldsmith, S. Jafar, N. Jindal, and S. Vishwanath, Capacity limits Software, Nov. 1999.
of MIMO channels, IEEE J. Sel. Areas Commun., vol. 21, no. 5, pp. [30] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge Uni-
684702, Jun. 2003. versity Press, 2004.
[6] H. Weingarten, Y. Steinberg, and S. Shamai, The capacity region of [31] H. Huh, H. Papadopoulos, and G. Caire, Multiuser MISO transmitter
the Gaussian multiple-input multiple-output broadcast channel, IEEE optimization for intercell interference mitigation, IEEE Trans. Signal
Trans. Inf. Theory, vol. 52, no. 9, pp. 39363964, Sep. 2006. Process., vol. 58, no. 8, pp. 42724285, Aug. 2010.
[7] N. Jindal, W. Rhee, S. Vishwanath, S. Jafar, and A. Goldsmith, Sum [32] N. J. Higham, Accuracy and Stability of Numerical Algorithms, 2nd edi-
power iterative water-filling for multi-antenna Gaussian broadcast chan- tion. Society for Industrial and Applied Mathematics, 2002.
nels, IEEE Trans. Inf. Theory, vol. 51, no. 4, pp. 15701580, Apr. [33] M. Vu, MISO capacity with per-antenna power constraint, IEEE Trans.
2005. Commun., vol. 59, no. 5, pp. 12681274, May 2011.
[8] W. Yu, Sum-capacity computation for the Gaussian vector broadcast
channel via dual decomposition, IEEE Trans. Inf. Theory, vol. 52, no. 2,
pp. 754759, Feb. 2006.
Le-Nam Tran received the B.S. degree in Electrical Mats Bengtsson (M00-SM06) received the M.S.
Engineering from Ho Chi Minh National University degree in computer science from Linkping Univer-
of Technology, Vietnam in 2003, and M.S and PhD sity, Linkping, Sweden, in 1991 and the Tech. Lic
in Radio Engineering from Kyung Hee University, and Ph.D. degrees in electrical engineering from the
Republic of Korea, in 2006 and 2009, respectively. Royal Institute of Technology (KTH), Stockholm,
In 2009, he joined the Department of Electri- Sweden, in 1997 and 2000, respectively. From 1991
cal Engineering, Kyung Hee University, Republic to 1995, he was with Ericsson Telecom AB Karlstad.
of Korea, as a lecturer. From September 2010 to He currently holds a position as Associate Profes-
July 2011, he was a postdoc fellow at the Signal sor at the Signal Processing Laboratory, School of
Processing Laboratory, ACCESS Linnaeus Centre, Electrical Engineering, KTH. His research interests
KTH Royal Institute of Technology, Sweden. Since include statistical signal processing and its applica-
August 2011, he has been with Centre for Wireless Communications and tions to communications, multi-antenna processing, cooperative communica-
Department of Communications Engineering, University of Oulu, Finland. His tion, radio resource management, and propagation channel modelling. Dr.
current research interests include multiuser MIMO systems, energy efficient Bengtsson served as Associate Editor for the IEEE Transactions on Signal
communications, and full duplex transmission. He received the Best Paper Processing 2007-2009 and was a member of the IEEE SPCOM Technical
Award from IITA in August 2005. Committee 2007-2012.
Markku Juntti (S93-M98-SM04) received his Bjrn Ottersten (S87-M89-SM99-F04) was

M.Sc. (Tech.) and Dr.Sc. (Tech.) degrees in Elec- born in Stockholm, Sweden, 1961. He received the
trical Engineering from University of Oulu, Oulu, M.S. degree in electrical engineering and applied
Finland in 1993 and 1997, respectively. physics from Linkping University, Linkping, Swe-
Dr. Juntti was with University of Oulu in 1992 den, in 1986. In 1989 he received the Ph.D. degree
98. In academic year 199495 he was a Visiting in electrical engineering from Stanford University,
Scholar at Rice University, Houston, Texas. In 1999 Stanford, CA. Dr. Ottersten has held research po-
2000 he was a Senior Specialist with Nokia Net- sitions at the Department of Electrical Engineer-
works. Dr. Juntti has been a professor of commu- ing, Linkping University, the Information Systems
nications engineering at University of Oulu, De- Laboratory, Stanford University, the Katholieke Uni-
partment of Communication Engineering and Centre versiteit Leuven, Leuven, and the University of
for Wireless Communications (CWC) since 2000. His research interests Luxembourg. During 96/97 Dr. Ottersten was Director of Research at Array-
include signal processing for wireless networks as well as communication Comm Inc, a start-up in San Jose, California based on Otterstens patented
and information theory. He is an author or co-author in some 200 papers technology. He has co-authored journal papers that received the IEEE Signal
published in international journals and conference records as well as in book Processing Society Best Paper Award in 1993, 2001, and 2006 and 3 IEEE
WCDMA for UMTS published by Wiley. Dr. Juntti is also an Adjunct Professor conference papers receiving Best Paper Awards. In 1991 he was appointed
at Department of Electrical and Computer Engineering, Rice University, Professor of Signal Processing at the Royal Institute of Technology (KTH),
Houston, Texas, USA. Stockholm. From 1992 to 2004 he was head of the department for Signals,
Dr. Juntti is an Editor of IEEE T RANSACTIONS ON C OMMUNICATIONS Sensors, and Systems at KTH and from 2004 to 2008 he was dean of
and was an Associate Editor for IEEE T RANSACTIONS ON V EHICULAR the School of Electrical Engineering at KTH. Currently, Dr. Ottersten is
T ECHNOLOGY in 20022008. He was Secretary of IEEE Communication Director for the Interdisciplinary Centre for Security, Reliability and Trust
Society Finland Chapter in 199697 and the Chairman for years 200001. He at the University of Luxembourg. As Digital Champion of Luxembourg, he
has been Secretary of the Technical Program Committee (TPC) of the 2001 acts as an adviser to European Commissioner Neelie Kroes. Dr. Ottersten
IEEE International Conference on Communications (ICC01), and the Co- has served as Associate Editor for the IEEE T RANSACTIONS ON S IGNAL
Chair of the Technical Program Committee of 2004 Nordic Radio Symposium P ROCESSING and on the editorial board of IEEE Signal Processing Magazine.
and 2006 IEEE International Symposium on Personal, Indoor and Mobile He is currently editor in chief of EURASIP Signal Processing Journal and
Radio Communications (PIMRC 2006). He is the General Chair of 2011 a member of the editorial boards of EURASIP Journal of Applied Signal
IEEE Communication Theory Workshop (CTW 2011). Processing and Foundations and Trends in Signal Processing. Dr. Ottersten
is a Fellow of the IEEE and EURASIP. In 2011 he received the IEEE Signal
Processing Society Technical Achievement Award. He is a first recipient of the
European Research Council advanced research grant. His research interests
include security and trust, reliable wireless communications, and statistical
signal processing.

Weighted Sum Rate Maximization For

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Weighted Sum Rate Maximization For

Uploaded by

Copyright:

Available Formats

2362 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 61, NO.