Cee272r PDF

Distributed Power System State Estimation
Ra Sevlian1 and Umnouy Ponsukcharoen2

1. Department of Electrical Engineering
Stanford University, Stanford, CA
2. Institute for Computational and Mathematical Engineering, Stanford, CA
{raffisev, umnouyp}@stanford.edu
June 17, 2012

Abstract
The need for real time power system state estimation to be performed on a large
number of smart grid components make the use of distributed algorithms very attractive. This report explores the current state of the art of distributed power system state
estimation algorithms and evaluates their performance on multiple IEEE benchmark
transmission and distribution systems as well as randomly generated graphs.
Three
distributed estimation algorithms are evaluated in terms of convergence, communication cost, computational cost and robustness to communication errors for the IEEE 14
bus network. Analysis is also performed for convergence of one algorithm for a general
graph.
Introduction
Power system state estimation (PSSE) refers to obtaining the voltage phasors of all system buses at a given moment.
This is generally performed by making many redundant
observations of the many power ows through the network, then performing an inference to
determine the underlying phasor values. In early days, PSSE was performed in a centralized
data processing center which aggregated all the observations and computed a global solution
for the entire network [8, 7]. However, there is a a move towards expansion of system sensing
capabilities as well higher rate of estimation making decentralized estimation very attractive.
Also, the new grid structure is open for two-way electricity ow and distributed generations
down to distribution level.
Hence, power system state estimation needs to be performed
in a large scale where the centralized operations might not as eective. Distributed power
system state estimation algorithms allow system operators to deal with large scale problems
by dividing the measurements and buses into control areas. Each control area will collect
it's measurements, perform it's own state estimation and exchange information with other
control areas.
Recently proposed methods for solving for state estimates come from the sensor networking literature. [9, 4]. This report provides an experimental study for the use of these
two techniques. It also evaluates the performance of recent work in distributed sub gradient
based optimization [1]. The study divides into two parts. In the rst part, three algorithms
are implemented and their performance compared with a centralized estimator. Comparisons
are made in four performance indicators of interest: (1) convergence behavior, (2) computational complexity (3) communication cost and (4) local observability requirement. In the
second part, we analyze the performance of these techniques on dierent tree networks and
present an empirical model for one of the algorithms.
Power System State Estimation
The centralized state estimation problem is the following: Given an underlying state of the
network
X = {(V0 ), (1 , V1 ), . . . (N , VN )}
which represents the voltage magnitude and angle
at every bus except the reference, the state estimator will use a set of measurements of the
form
hk (X) + k
to construct it's state estimate
each measurement,
N (0, R), (Rii = i )

argmin
X
x.
Assuming normally distributed errors on
the general estimator will be.
2
M
X
zk hk (X)
k
k=1
(1)
For the sake of simplicity, we only consider power measurements for the entire system.
In the fully non linear AC power ow, a state estimator will receive the following sets of
measurements.
Real and reactive power injection at bus
Pi = Vi
Vj ( Gij cos(ij ) + Bij sin(ij ))
(2)
Vj ( Gij sin(ij ) + Bij cos(ij ))
(3)
jN (i)
Qi = V i
X
jN (i)
Real and reactive branch ow from bus
to bus
j.
Pij = Vi2 gij + Vi Vj ( gij cos(ij ) + bij cos(ij ))

Qij = Vi2 bij + Vi Vj ( gij sin(ij ) bij cos(ij ))
Where the line admittance of branch
(i, j)
is
yij = Gij + jBij
and
(4)
(5)
ij = i j .
In this
paper, however we will solve the state estimation problem based on linearized DC power
ow. In linearized DC power ow, we assume rst that all bus voltage magnitudes are close
to 1.0 per unit. Next, that all the transmission lines are lossless and the nodal voltage phase
angle dierences are small. Applying these to the nonlinear measurements to eq. (2), (4) we
have the following equation relating bus angle and real bus injection and branch ow power.
Pi =
Bij (i j )
(6)
Pij = bij (i j )
(7)
jN (i)
The linearized measurement model allows us to represent the measurement set as well
as central estimator as a simple weighted least squares formulation. From the relations in
eq. (7), the measurement vector and system state vector are related by Z = HX + . Here
Z Rnobs is the column vector containing all bus injection and branch ow measurements
n nbus
is the matrix containing elements of eq. (7). From the
(M = nbus + nbranch ). H R obs
DC modeling assumptions the state vector only contains bus angles
now on, state variable
and bus angle
x = {1 . . . N }.
will be used interchangeably.
Fron
We can therefore
simplify eq. (1) to the following centralized weighted linear least squares estimate.
xc = (H T R1 H)1 H T R1 Z
(8)
Note that in order to be able to estimate the state vector, mbus must be at least N 1
rank(H T H) = nbus 1. To make the system of equation full rank, we assume that
and
the bus
is always the slack bus with phase angle zero. We solve for each angle relative to
0 , this leads to a reduced sensing matrix Hr Rnobs nbus 1 which is H but with the rst
column removed. We can apply eq. (8) with
Hr
which results in the following.
[1 0 . . . N 0 ]T = (HrT R1 Hr )1 HrT R1 Z
(9)
Distributed Estimation Algorithms for Power System

State Estimation
Distributed algorithms for estimation work by splitting large problems into many locally
computable problems.
shown in Figure 1.
This can be illustrated in the model of the IEEE 14 bus network
As opposed to the centralized estimator which requires each sensor
to transmit information to a single location and then perform inference.
A decentralized
algorithms work by distributing the observations and computation into dierent domains.
Here, each area receives observations of variables local only to itself as well as variables
it shares with neighboring areas.
For the 14 bus example, there are four domains, where
each domain has access to only specic observations.
It then uses specically designed
messages from it's neighbors to compute estimates of it's unknown variables and possibly the
unknown variables contained in other areas. Local observability of an area is an important
assumption in many distributed techniques. If an area is locally observable, then estimation
can occur without the rest of the network. Many early and heuristic methods required local
observability, however this is not required.
The algorithms presented here do not require
local observability. In the following sections we will introduce the various algorithms and
illustrate their behavior using the 14 bus network.
Figure 1: Computational abstraction of 14 Bus IEEE benchmark network. Splitting nodes

into dierent computational domains. Block markers on edges represents branch ow measurement. Circle markers on vertices represents bus injection measurement. Network contains 26 measurements and is overdetermined.
3.1
CSE
The algorithm introduced in [9] for use in the distributed power system state estimation
originates from work in distributed estimation in sensor networks [3].
In CSE, each area
builds an estimate of the entire state of the system. In each step, an area will compute a
local estimate of the global state, and then transmit the full estimate to all neighbors in it's
communication network. Note that in CSE, each area's state vector is the full set of unkown
variables.
estimate
Therefore the message sent from area
xtk Rnbus
if
from it's neighbors, the area
k
l Nk .
to area
is in the communication set of
at time
for example is the
With a given set of messages
performs the following update.
"
xt+1
= xtk a b
k
#
X
(
xtk xtl ) HnT (zn Hn xtk )
(10)
lNk
This method puts no limitations on the communication architecture required. That is
the structure of
Nk
is independent of the topology of power system. Also, since the second
term of eq. (10) does not involve any matrix inversion, the algorithm does not assume any
local observability. Therefore, it can be used in a fully decentralized manner, in that each
bus can be a computing area. However, is no implicit inversion, since the update term is
similar to a gradient ascent direction.
Analysis of the algorithm is based mainly on introducing an aggregate state
[
xT1 (i) . . . xTnarea (i)]T .
x(i) =
With this, the evolution of the estimate vector becomes.
T
x)
x(i + 1) = (Inbus Inarea ab(L Inarea )x) DH (Z DH
(11)
is the graph Laplacian of the communication topology and represents a kroM M

necker product over matrices. IM represents an identity matrix in R
. Finally the matrix
Here
DH
is
H1T . . . 0
.
.
DH = ...
.
T
0 . . . HM
Note the similarity of the update step with that of recursive least squares lters in
standard lter theory [2] as well as gradient descent.
In standard recursive lter theory,
state updates are of the form.
x(i + 1) = x(i) + Ak (yk Hxk )

With this, we can show that the error process evolves as.
(12)
ek+1 = (I Ak H)ek .
In the case
of the CSE algorithm, we can use a similar method to show that the error process evolves
T
as ek+1 = (IM N ab(L IM + DH DH ))ek . Like RLS lters, this technique has an associate
Riccatti equation relating error covariance as a function of iteration.
3.2
ADMM
The ADMM algorithm was rst introduced to power system state estimation in [4].
It
relies on formulating an augmented Lagrangian of the combined estimation problem. Given

the global optimization problem that needs to be solved in eq. (1). We can decompose the
objective function into
L separate objective functions dependent on their own set of variables
if we introduce an extra set a constraints.
min
x
s.t.
L
X
kzk Hxk k2
k=1
xk [l] = xkl ,
l Nk .
From this we can formulate the augmented Lagrangian for the system. This is given for
the problem as
L({xk }, {xkl }, {vk,l }) =

=
nX
area
k=1
nX
area
"
#
kzk Hxk k2 +

T
vk,l
(xk [l] xkl ) + ckxk [l] xkl k2
lNk
Lk ({xk }, {xkl }, {vk,l })
(13)
k=1
The alternating direction term comes from the fact that the augmented Lagrangian in eq.
(16) is maximized by each area partially minimizing their local Lagrangian
Lk ({xk }, {xkl }, {vk,l })
then exchanging dual variables in a distributed fashion.
xt+1
=
k
arg min
x
t
Lk ({xk }, {xtkl }, {vk,l
})
(14)
xt+1
=
kl
argmin
xkl
t
L({xt+1 }, {xkl }, {vk,l
})
(15)
t+1
vk,l
=
argmin
vk,l
L({xt+1 }, {xt+1
kl }, {vk,l })
(16)
Note that in ADMM, each area's state vector is the local set of unkown variables as well
as variables that eect the obsevations set in the area. For example, the state vector for area
[1 , 2 , 5 ] as well as [4 , 6 ] since the bus power observed at bus 5

4 and 5. As shown in [4] this reduces to the following set of recursions.
1 is the local uknowns

a function of bus
xt+1
= (HkT Hk + cDk )1 (HkT zk + cDk pk )
k
1 X r+1
xl [i]
st+1
=
kl
|Nki |
i
is
(17)
(18)
lNk
pt+1
k
prk (i)
sr+1
k
xrk + srk
(19)
(20)
i
corresponding to xk (i) dened for all l Nk . Nk is
th
dened as the set of all areas which share the i element of area k 's state vector (xk [i]). Dk is
i
a diagonal matrix with the (i, i) entry of |Nk |. With appropriate choice of c, for each control
r
area k, xk converges to the estimate of the subset of the whole system estimate. Combining
values from all areas, one can obtain the whole system estimate.
Here,
3.3
xl [i]
denotes the entry of
xl
Distributed Dual Averaging
In distributed dual averaging (DDA), estimates are constructed by each node calculating a
local estimate of the global subgradient and then sharing it with neighbors dened by a communication graph. In the case of the power system estimation problem, the communication
graph is independent of the topology of the power network as shown in 1. This makes DDA
similar to the CSE algorithm in that regard. Specically, we have a neighbor set
for each node
where
i V N (i) = {j V |(i, j) E}
The core algorithm dened for a general optimization problem is dened as the following.
For a given generic convex objective function of the form.
1X
fi (x)
n k=1
min
xX
is convex but not necessarily smooth. So at iteration t, each

t
t
node k V computes an element gk fk (xk ) the subdierential of the local function fk
t
and receives information about the parameters wk , j N (i). It updates it's estimate of the
t
solution xi based on a combination of the current subdierential and the messages from it's
neighbors, via a projection operation . The algorithm is therefore.
We must assume that
fi (x)
wkt+1 =
pkl wj (t) + gkt
(21)
lN (k)
xt+1
k
= X (wit , (t))
(22)
(23)
The projection operator
is dened as.
X (z, )

= argmin
xX
1
< z, x > + (x)

(24)
Here is the proximal function, and in this study is set to the canonical proximal function
1
kxk22 as stated in [1].
2
In the case of distributed linear estimation for power system state estimation, the algoT 1
rithm reduces to the following. The objective function now is fk (x) = (zk Hk x) Rk (zk

1 T
Hk x) which gives us gk (t) = 2Hkt R1 (zk Hk x). Next, X (z, ) = argmin z T x + 2
x x .
xX
t+1
In an unconstrained case, the solution becomes xk
= t+1 zkt+1 . Since x [, ] we need
to constrain the update. The nal recursion becomes.
wi (t + 1) =
pij wj (t) + (t)
(25)
jN (i)
xi (t + 1) =
min
max
n
o o
t wi (t + 1),
,
4
4
(26)
Numerical Experiments with IEEE 14 Bus Network
The centralized and three distributed algorithms are numerically tested using MATLAB.
The power system used in the test is the IEEE 14-bus system which is shown in Figure
1. The associated admittance matrix and true underlying power states/measurements are
obtained using MATPOWER and veried with the benchmark source. In the IEEE 14-bus
grid, measurements sites and types are shown in the gure.
An abstraction of the buses, measurements and control areas is also shown in 1 The
four rectangles represent local control areas with a total of 23 measurements in p.u. where
boxes on an edge represents branch ow measurement. The redundancy ratio is 23/14=1.64
therefore, the system is very likely to have a unique solution. The communication between
the areas forms a fully connected graph.
4.1
Preliminary test and calibration
Figure 2: Performance all three algorithms under no observation noise or communication

noise.
MSE shown is between the current estimate and the underlying true power state.
Title on plots indicates levels of measurement and communication noise in simulation.

In the preliminary test, we draw measurement values as appeared in the gure above
(IEEE 14-bus system) from true measurement values without adding simulating errors.
Hence, we expect all algorithms to give the true underlying power state as a solution. Here
we keep the iterations run until the dierence between any pair of the estimate and true
underlying value diers by less than a specied threshold. The results are shown in Figure
2.
Centralized algorithm: using the iterations in (9), we found that
xc
agrees with the
true underlying state vector obtained from MATPOWER. That means all pieces of
centralized algorithm and related information are consistent.
CSE algorithm: using formula (10), with tuning until

that the iterations terminated in
13, 626
a = 8e10
and
b = 1e10 , we found
iterations, and the estimates from all control
area agree with true underlying state vector obtained from MATPOWER.
Distributed Dual Averaging algorithm: using formula (26), with

computed from formula discussed above, with extra tuning until
found that the algorithm terminated in
495, 536
0 = 2.66e10 as
N EW = 2000 ,
we
we
iterations, and the estimates from all
control area agree with true underlying state vector obtained from MATPOWER.
ADMM-based algorithm: using formula (16), with tuning until

the iterations terminate in
34 iterations ,
c = 9,
we found that
and the estimates from all control area agree
after matching with true underlying state vector obtained from MATPOWER.
4.2
Convergence under Measurement and Communication Noise
To study the convergence behavior of three iterative algorithms, we simulate measurements as

appeared above from IEEE 14-bus system with observation noise as well as communication
noise.
The convergence behavior is shown by calculating the error with respect to the
xc xrk k2 and the error to the true underlying
centralized solution x
c dened as erk,c = N1 k
1
r
r
state dened as ek,c =
k
xc xk k2 . Here is the size of vector . Note that we do not
N
use stopping criteria here in order to see the convergence behavior across large number of
iterations. The error curves obtained from the IEEE 14-bus network are shown below.
4.2.1
Measurement Noise Only
Figure 3: Performance three algorithms under only observation noise. Measurement noise
indicates values of
HIGH = 0.01
and
HIGH = 0.00001.
Title on plots indicates levels of
measurement and communication noise in simulation.

Convergence behavior for measurement noise only experiments is shown in Figure 3. In
term of convergence behavior, the ADMM algorithm is the most eective method.
The
convergence rate is high and the number of steps to reach the cut-o level is about 35
iterations, while other two methods require 3 or 4 more order of magnitude number of
iterations. Moreover, the convergence rate of the ADMM algorithm in terms of
be linear after reaching the cut-o level of
M SE .
ek,c
tends to
The CSE algorithm tends to be at
after reaching this cut-o level, while the dual averaging algorithm has very at convergence
rate long before reaching the cut-o level. It would be interesting to see why the convergence
behavior of the dual averaging algorithm has a sudden increase and then become at.
4.2.2
Communication Noise Only
Figure 4:
Performance three algorithms under only communication noise.
noise indicates values of
HIGH = 0.01
and
HIGH = 0.001.
Measurement

We now simulate the three methods using perfect observations and having additive gaussian noise in the messages transmitted between the areas. The results are shown in Figure 4.
In term of convergence behavior, the ADMM algorithm is still most eective method since
for both low and high communication noise, there the MSE reaches the lowest point quickly.
It is interesting to note however, that when the technique is run for the same iterations as
DDA and CSE the accuracy tends to decrease. In these experiments we also see the benet of
decreasing step size in the estimate updates in that in the high noise, no measurement case,
when the iterations tend to innity, the DDA and CSE algorithms will have higher accuracy
since they weight down more recent messages as opposed to the standard ADMM which will
weigh it equally with previous estimates.
However, for a practical distributed estimation
technique, messaging times are nite thus these techniques seem of only theoretical interest
for this application.
4.2.3
Measurement and Communication Noise
Here we simulate the three methods by having noise in the message as well as the observations. The results are shown in Figure 7. Using the same variances as in 4.2.3 and 4.2.3. We
only present cases where both are in low and high noise conditions. In the rst experiment
we see again, that given enough time, the CSE algorithm will outperform ADMM and DDA
in terms of forming consensus, however like in the other situations, ADMM performs many
orders of magnitude faster.
10
Figure 5:
Performance three algorithms under only communication noise.
noise indicates values of
HIGH = 0.01
and
HIGH = 0.001.
Measurement
Computational and Communication cost
In this section we give an overview of the computational as well as communication costs

and time for the three algorithms based on their formulation. To assess computation and
communication cost (time), we simulate measurements as appeared above from IEEE 14-bus
system with no noise in the experiment in 4.1. Here we will estimate computation cost and
communication cost (time) for the experiment and will extend it to general settings.
To measure the computational complexity, we count the number of basic mathematical
operations needed in the algorithm until the algorithm converges with criterion as set above.
Note that we assume there is no computational complexity for inverting
is diagonal.
since we assume
We separate the analysis of computational complexity by pre operational
(xed) and operational (marginal) computation cost.

To measure the communication cost, we dene the maximum cost of sending one number
from a local control area to the central control center to be
clc
, and the maximum cost of
sending one number for a local control area to its neighbor local area to be
clcl .
We also
assume there is no communication cost to collect local measurements to local control areas.
Notice that the communication cost is proportional to the number and size of messages sent
during the algorithm.
Similarly, to measure the communication time, we dene the maximum time of sending
one number to for a local control area to the central control center to be
tll
, and the
maximum time of sending one number for a local control area to its neighbor local area to be
tll .
We also assume there is no communication time to collect local measurements to local
control areas. In addition, there is no bandwidth congestion problem within communication

network. Notice that the communication time is not necessarily proportional to the number
and size of messages sent during the algorithm if there is not bandwidth congestion problem.
We also dene
to be the maximum degree in the communication network graph and
approximate
Centralized algorithm:
we use (9) for analysis.
11
Computational complexity: The pre-operational computational cost is dominated by
the matrix-matrix multiplication and matrix inversion therefore:
O(nbus n2area +(nbus )3 ).
The operational computational complexity for each implementation is dominated by

2
matrix-vector multiplication: O(nbus narea + (nbus ) ).
Communication cost:
the communication cost here is the cost of sending all local
measurements to the central control area. Hence, the communication cost is bounded
above by:
O(mclcl )
Communication time: The communication time is bounded above by
O(tlc )
CSE algorithm: we use (10). We found that with a = 8e 10 and b = 1e10. We

found that ni terate = 14184 for low observation noise. It would be interesting to nd a
complexity bound on the number of iterations required for a given epsilon of accuracy.
Computational complexity: There is no pre-operational computational complexity for
in this algorithm. The operational complexity for each implementation is dominated

by matrix-vector multiplication and the summations with information from neighbors:
O((narea )(niterate )(maxi |zi |) + (d)(nbus )).

Communication cost: the communication cost here is the cost of sending whole vec-
tor
O ((niterate )(d)(narea )(nbus ))
to the neighbors.
Hence, the communication cost is
bounded above by
Communication time: the communication time is bounded above by
Distributed Dual Averaging algorithm:
we use eq. (26), with
we obtained in 4.1 and the algorithms terminated after
509334
O(niterate tll )
new = 2000
as
iterations.
Computational complexity: There is no pre-operational computational complexity for
this algorithm. The operational complexity for each implementation is dominated by

matrix-vector multiplication and the summations with information from neighbors:
O((narea )(niterate )(nb us)(maxi |zi |) + (d)(nbus )).

Communication cost: the communication cost here is the cost of sending whole vector
wkr to the neighbors.
Hence, the communication cost grows as
Communication time: the communication time grows as
ADMM-based algorithm:
O((niterate )(d)(nbus )(narea )(cll ))
O((niterate )(tll )):
we use formula (16), with c = 9 as we obtain from the
previous part. We found that now the iterations terminate in
niterate = 33.
The pre-operational computational complexity is domi2

nated by the matrix-matrix multiplication and matrix inversion: O((narea )(maxi |zi |) +
(maxi |xi |)3 ). The operational complexity for each implementation is dominated by
Computational complexity:
matrix-vector multiplication and the summations with information from neighbors:
O((narea )(niterate )(maxi |xi |)(maxi |zi | + d)).

Communication cost: The communication cost here is the cost of sending components
(not whole vector) of
xrk
to the neighbors. The communication cost is bounded above
12
Computational Complexity
Algorithm
Centralized
CSE
DDA
ADMM
Pre-Operational
Operational
O(nbus M 2 + (nbus )3 )
0
0
O((narea )(maxi |zi |)2 + (maxi |xi |)3 )
O(nbus M + (nbus )2 )
O((narea )(niterate )(maxi |zi |) + (d)(nbus ))
O((narea )(niterate )(nb us)(maxi |zi |) + (d)(nbus ))
O((narea )(niterate )(maxi |xi |)(maxi |zi | + d))
Table 1: Computational complexity of three algorithms separate with pre-operational complexity and operational complexity.
Algorithm
Communication Cost
Communication Time
Centralized
O(mclcl )
O ((niterate )(d)(narea )(nbus ))
O((niterate )(d)(nbus )(narea )(cll ))
O(niterate )(d)(narea (maxi |xi |)clc + (narea )2 (maxi |xi |)
O(tlc )
O(niterate tll )
O((niterate )(tll ))
O(narea tll )
CSE
DDA
ADMM
Table 2: Computational Time and Cost of three algorithms.
by
O((niterate )(d)(narea (maxi |xi |)clc )
However, to make every control area know
the whole system state, it requires extra communication cost of sending incomplete
state information to all other control areas through local communication channels.
Only for this 4 control area system, with careful counting, we found that this extra
2
communication cost is bounded by O((narea ) (maxi |xi |). So the total communication
2
cost is O(niterate )(d)(narea (maxi |xi |)clc + (narea ) (maxi |xi |)
Communication time: The communication time is bounded by
O(narea tll ).
However,
to make every control area know the whole system state, it requires extra time to
send incomplete states information to all other control areas through local communication channels. This extra communication time is bounded by
communication cost is bounded by
O(narea tll ).
The total
O(narea tll )
The computational complexity and communication time/cost for each algorithm are in
Tables 5, 5, The table includes a numerical result from IEEE 14-bus system.
5.1
Discussion
Pre-operational computational complexity:
The CSE and dual averaging algorithms
are the best among four algorithms, while the centralized algorithms is the worst. This is
because the centralized algorithms require a big matrix inversion.
Operational computational complexity:
The centralized algorithm is the best among
four algorithms, while the dual averaging algorithm is the worst. This is because the dual
averaging algorithm requires high number of iterations.
Overall computational complexity:
In real world application, the operational com-
putational complexity is more critical than the pre-operational computational complexity.
13
Once the pre-operational computation is done, the result can be stored and reused unless
there is any update in the system (e.g. a bus connection line is cut). The numerical result
here implies that, in term of overall computational complexity, the centralized algorithm is
more advantageous than the distributed algorithm. Among the distributed algorithms, the
ADMM algorithm is the best algorithm. In the general setting, especially when the problem
is large, the ADMM algorithm may be as good or better than the centralized algorithm if
niterate
is in the same order as the problem size. In, [4] for the larger problem, IEEE 118-bus
4
system, niterate does not grow as the problem size grows. The 118-bus system reaches 10
accuracy in
ek,c
after
10
iterations. A theoretical justication is needed to show how in the
ADMM algorithm actually relates to the problems size.
Communication time (cost):
clc cll
and
tlc tll .
In small communication network, we may assume that
Then, the centralized algorithm is more preferable. When the
communication network is large, it is possible that
clc cll
and
tlc tll .
Then, the
distributed algorithm might be more preferable. According to the numbers we computed,

CSE and dual averaging algorithms has disadvantages due to large number of iterations and
large bits of messages.
The ADMM algorithm is a good candidate in which the commu-
nication time/cost might be comparable to the centralized algorithm.
We need a model
for relationship between local-to-local parameters and local-to-central parameters when the
problem is large as well as how in the ADMM algorithm relates to the problems size.
Roughly speaking, if the local-to-central parameters grow as fast as
niterate ,
the ADMM
algorithm is in even with the centralized algorithm.
Local Observability
None of the algorithms presented here require local observability.

T
In CSE no explicit inversion is taking place, therefore we do not require Hk Hk to be full
rank for all k . In [9] the authors illustrate numerically how lack of observability leads to
P T
the global solution. The CSE algorithm only requires global observability, that is
k Hk Hk
needs to be full rank. ADMM does not require local observability, since matrix inversion step
is being performed with psuedo measurments from it's neighbors. For the dual averaging
algorithm, this is a new result in which one may explore its mathematical proof.
Convergence on Trees Networks
Due to proliferation of sensing and computation capabilities on the distribution grid. Exploring the convergence properties of distributed estimation on tree like graphs seems and
interesting direction of study. We simulated the PSSE problem using the ADMM only. It
would be fruitful to explore the other techniques as well, however the large converegence
times limited study to faster converging techniques. In the simulations, we tested randomly
generated trees of size
N = 251020.
The algorithm used for random tree generation is
documented in [6]. For these simulations, the ground truth state vectors were randomly generated as well as the observation matrices. We assumed that each bus made complete power
observations of the entire system. That is, each bus had a bus power measurement as well
as a branch ow measurement. This was chosen to reduce the variability of the convergence
n
results. Convergence was measured by the number of iterations taken for the ec,k < . We
14
chose an
value of
0.01
and
0.001.
5
10
10
10
Iterations
Iterations
10
10
10
10
10
15
10 2
10
20
Tree Size
10
10
10
10
Iterations
10
10
10
10
10
10
10
10
10
10
Iterations
10
10
10
10
10
15
20
Tree Size
10 2
10
10
10
10
k
Figure 6: (Top Left) Number of iterations required for ec < 0.001 to be satied vs. tree size.
k
(Top Right) Number of iterations required for ec < 0.001 to be satied v.s. 2 . (Bottom
k
Left) Number of iterations required for ec < 0.01 to be satied vs. tree size. (Bottom Right)
k
Number of iterations required for ec < 0.01 to be satied v.s. 2 .
Figure 6 illustrates the convergence properties of ADMM on randomly generated tree
networks. The rst result that is apparent is that there is a large variation of termination
time for a given graph size. In the second sets of plots, we see the variation of minimum
iteration counts vs. the second smallest eigenvalue of the system. The loglog scale of the
plots show a linear relationship. Also, there is high variation in the termination conditioning
on
2 .
This is because each randomly generated tree had a randomly generated bus matrix
as well as observation.
The variation of bus matrices and observations might be causing
the additional variation.
At this moment, there has been no study of graph properties
on the ADMM algorithm assuming single bus areas. However, the experimental results
point towards a polynomial relationship O(2 (L)). The simulation data is used to t a
simple regression model. In the two experiments a exponent value of =01 = 0.52143
and =001 = 0.51850. The 95% percent condence intervals under linear regression is
95
95
=(.001)
= [0.56, 0.48] and (.01)
= [0.56 0.47]. It would be of theoretical interest to
derive this bound based on rst principles, however the scope of this work leaves this to the
future.
Table 3 shows the mean number of iterations required before termination for dierent
sized trees.
The results hint towards a linear relationship, however there is a vary large
15
Tree Size
10
20
.01
32
125
542
2198
.001
20
116
1083
2931
Table 3: Mean computation interations required for convergence for trees of size N.
3000
= 0.01
= 0.001
Mean Iterations
2500
2000
1500
1000
500
0
0
10
15
20
Tree Size
Figure 7: Mean computation time for various sizes of trees.
varaince as shown in 6. An interesting question is how much of the variation is explainable

by the sensing matrix generated (randomly generated for each test), observations (randomly
generated for each test), and
(randomly generated for each test). More study is needed
to undertand the fundemental evolution of the ADMM algorithm. One avenue of interest
would be interpreting the entire algorithm as a linear dynamical system. It can be shown,
thatusing the following
state vector.
x1 (t + 1)
s1 (t)
s1 (t + 1)
p1 (t + 1)
x1 (t)
s1 (t 1)
s1 (t)
p1 (t)
.
.
.
.
=
A
+B z
.
.
xM (t)
xM (t + 1)
sM (t 1)
sM (t + 1)
sM (t)
sM (t + 1)
pM (t + 1)
pM (t)
This system can be treated in the same manner as other stochastic approximation problems. It would be interesting to apply similar methodologies used in [5] in uncovering the
emprically determined results relating convergence and graph spectrum as well as introducing
damped update sceams to combat communication noise.
16
Conclusion
In 4, we investigated the CSE and ADMM algorithms, which have appeared in the distributed power system estimation literature.
In addition, we investigated the distributed
dual averaging method. It turns out that CSE and the dual averaging algorithms suer from
a large number of iterations and size of messages needed to be sent. The ADMM algorithm
is experimentally justied to be a good algorithm for power system state estimation for
many reasons. It converges fastest among three algorithms. It requires small computational
complexity, communication cost and time.
The local observability is not required for the
ADMM algorithm. The only poor performance indicator for the ADMM algorithm is the
tolerance to the communication noise.
It is less tolerant to the communication noise than the CSE algorithm. Yet this ammount
is small for moderate termination steps. In comparison with the centralized algorithm, the
ADMM algorithm may be more ecient in term of computational complexity, communication time and cost when the problem size is large.
According to [4], which expects the
number of iterations in the power state system estimation to grow not as fast as the size of
whole network, we found there to be a large increase as the size of the network grew in the
case of trees. Future work should also include experimental results on connected graphs to
correspond with future theoretical ndings.
References
[1] J. Duchi, A. Agarwal, and M. Wainwright. Dual averaging for distributed optimization:
convergence analysis and network scaling.
Automatic Control, IEEE Transactions on,
(99):11, 2010.
[2] S. Haykin. Adaptive lter theory (ise). 2003.
[3] S. Kar, J.M.F. Moura, and K. Ramanan.
Distributed parameter estimation in sensor
networks: Nonlinear observation models and imperfect communication. Arxiv preprint

arXiv:0809.0009, 2008.
[4] V. Kekatos and G.B. Giannakis. Distributed robust power system state estimation. Arxiv
preprint arXiv:1204.0991, 2012.
[5] R. Rajagopal and M.J. Wainwright.
Network-based consensus averaging with general
noisy channels. Signal Processing, IEEE Transactions on, 59(1):373385, 2011.

[6] A. Rodionov and H. Choo. On generating random network structures: Trees. Computational ScienceICCS 2003, pages 677677, 2003.
[7] F.C. Schweppe. Power system static-state estimation, part iii: Implementation. Power
Apparatus and Systems, IEEE Transactions on, (1):130135, 1970.
17
[8] F.C. Schweppe and D.B. Rom. Power system static-state estimation, part ii: Approximate model. power apparatus and systems, ieee transactions on, (1):125130, 1970.
[9] L. Xie, D.H. Choi, and S. Kar. Cooperative distributed state estimation: Local observability relaxed. In Power and Energy Society General Meeting, 2011 IEEE, pages 111.
IEEE, 2011.
18

Cee272r PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cee272r PDF

Uploaded by

Copyright:

Available Formats

Distributed Power System State Estimation

Ra Sevlian1 and Umnouy Ponsukcharoen2

June 17, 2012

This is generally performed by making many redundant

Hence, power system state estimation needs to be performed

Power System State Estimation

which represents the voltage magnitude and angle

to construct it's state estimate

 N (0, R), (Rii = i )

Assuming normally distributed errors on

the general estimator will be.

Vj ( Gij cos(ij ) + Bij sin(ij ))

Vj ( Gij sin(ij ) + Bij cos(ij ))

Real and reactive branch ow from bus

Pij = Vi2 gij + Vi Vj ( gij cos(ij ) + bij cos(ij ))

yij = Gij + jBij

and bus angle

will be used interchangeably.

which results in the following.

Distributed Estimation Algorithms for Power System

This can be illustrated in the model of the IEEE 14 bus network

As opposed to the centralized estimator which requires each sensor

to transmit information to a single location and then perform inference.

For the 14 bus example, there are four domains, where

each domain has access to only specic observations.

It then uses specically designed

The algorithms presented here do not require

Figure 1: Computational abstraction of 14 Bus IEEE benchmark network. Splitting nodes

In CSE, each area

Therefore the message sent from area

from it's neighbors, the area

is in the communication set of

for example is the

With a given set of messages

performs the following update.

is independent of the topology of power system. Also, since the second

With this, the evolution of the estimate vector becomes.

is the graph Laplacian of the communication topology and represents a kroM M

standard lter theory [2] as well as gradient descent.

In standard recursive lter theory,

state updates are of the form.

x(i + 1) = x(i) + Ak (yk Hxk )

relies on formulating an augmented Lagrangian of the combined estimation problem. Given

L separate objective functions dependent on their own set of variables

if we introduce an extra set a constraints.

L({xk }, {xkl }, {vk,l }) =

Lk ({xk }, {xkl }, {vk,l })

Lk ({xk }, {xkl }, {vk,l })

then exchanging dual variables in a distributed fashion.

[1 , 2 , 5 ] as well as [4 , 6 ] since the bus power observed at bus 5

1 is the local uknowns

denotes the entry of

Distributed Dual Averaging

is convex but not necessarily smooth. So at iteration t, each

pkl wj (t) + gkt

The projection operator

pij wj (t) + (t)

Numerical Experiments with IEEE 14 Bus Network

Preliminary test and calibration

Figure 2: Performance all three algorithms under no observation noise or communication

Title on plots indicates levels of measurement and communication noise in simulation.

Centralized algorithm: using the iterations in (9), we found that

Ra Sevlian1 and Umnouy Ponsukcharoen2

N (0, R), (Rii = i )

Real and reactive branch ow from bus

each domain has access to only specic observations.

It then uses specically designed

standard lter theory [2] as well as gradient descent.

In standard recursive lter theory,

The CSE algorithm tends to be at

(xed) and operational (marginal) computation cost.