An Experimental Study On Agent Learning For Market-Based Sensor Management

An Experimental Study on Agent Learning for Market-based
Sensor Management
Viswanath Avasarala , Member, IEEE , Tracy Mullen , Member, IEEE,
David Hall, Senior Member IEEE, and Sudheer Tumu
Abstract—Distributed Sensor management, the process of target (task Ta). B is interested in searching for the
managing or coordinating the use of sensing resources in a presence of new targets in a certain region (task Tb). The
distributed environment, is a multi-objective optimization sensor manager should allocate its resources including
problem. In our earlier work, we proposed MASM (Market- sensor schedules and communication bandwidth to
Architecture for Sensor Management), a market-based optimize performance on both Ta and Tb. Approaches like
approach to allocate sensor resources in real-time to various dynamic goal lattices [5], decision theory [6], Bayesian
resource requestors. MASM models the multi-objective
networks [7] have had use in finding the priorities/weights
sensor management problem as a combinatorial-auction
based market where the network resources sell goods to the of various objectives in the sensor management
resource requestors. To allow the resource requestors to optimization. However, a comprehensive paradigm to
participate in the market, MASM grants “budgets” to these take mission objectives and the current environments state
resource requestors based on their priority to the overall simultaneously and directly into consideration during
mission. However, for a given budget, self-interested resource allocation is not available.
resource requestors or buyers can learn from market-data In our earlier work, we have proposed a market-based
and adapt their bidding behavior. This paper presents resource allocation method for sensor management called
results of an initial experimental study, where the learning MASM (Market-Architecture for Sensor Management)
behavior of resource requestors is modeled and their effect
[8]. MASM is based on the Market-Oriented
on market performance is examined.
programming technique, which uses market algorithms
for resource allocation in distributed environments [9].
MASM models the multi-objective sensor management
I. INTRODUCTION problem as a combinatorial-auction based market where
the network resources sell goods to the resource
Sensor management which may be defined as “a process, requestors. To allow the resource requestors to participate
which seeks to manage or coordinate the use of sensing in the market, MASM grants “budgets” to these resource
resources in a manner that improves the process of data requestors based on their priority to the overall mission.
fusion and ultimately that of perception, However, for a given budget, self-interested resource
synergistically”[1]. The recent advances in sensor and requestors or buyers can learn from market-data and adapt
communication technologies have led to a proliferation of their bidding behavior.
sensor network applications [2][3]. For example, The adaptive nature of the agent behavior due to their
distributed sensor networks that cater to multiple users learning from market-history has a direct impact on
which query the sensor manager using web-based market-based resource allocation. In this paper, we use a
communication are now feasible. Examples for this multi- simple representation of agent learning to model their
sensor, multi-user sensor management scenario include: adaptive behavior and study the impact on market-based
(1) network-centric warfare, in which multiple sensing resource allocation. The paper is organized as follows.
platforms, sensor nets, and individual soldiers with Section II introduces MASM architecture and explains
sensors interact to allow rapid tactical situation briefly the combinatorial-based mechanism used for
assessment and threat assessment [3, 4], and (2) Condition sensor management. Section III explains our approach to
monitoring of the environment via ground-based, airborne model agent learning based on market data. Section IV
and space-based sensing systems for assessing different shows the results and provides some concluding remarks.
issues like weather, pollution etc. In these scenarios, the
various users could be trying to accomplish different tasks II. MASM OVERVIEW
using network resources. Distributed sensor management
is a multi-objective optimization problem. For example,
assume that two resource requestors A and B are trying to The design of MASM is shown in Figure 1. The
use a sensor network to accomplish different two tasks. A Mission Manager (MM) is responsible for allocating task
is interested in estimating the position and velocity of a responsibilities and budgets to the various agents in the
market. The actual details of the sensor network are
invisible to the consumer agents. Instead, consumers are
allowed to bid on high-level tasks, like “Track Target X CCA uses a bid translation function to create low-level
within an accuracy limit Y”. The sensor manager (SM) resource bids from high-level task bids. For each time slot
stipulates the type of tasks that the consumers can bid on, t, the auctioneer constructs bids on each resource set, S,
so that the consumer bids are restricted to tasks SM knows that can be allotted to task X. To construct these bids, the
how to accomplish. Consumer bids are of the type <t, p> auctioneer needs to calculate the price associated with the
where t is the task description that includes the task type bid on resource set S, for task X, based on the consumer
and final task quality desired by the consumer, and p is bid price P. For this purpose, a novel mechanism of price
the price that the consumer is willing to pay. For calculation has been devised. For the task X, the
example, the task description for a bid to search a auctioneer computes the bid price for a resource set S as
particular grid, x, so that the uncertainty of target presence the percentage of the consumer task completed by the
(as measured by entropy) < 0.001 is as follows: resource set multiplied by the actual consumer bid price P.
(type: search Computation of the percentage of task completed by a
entity: grid no x resource set S, is in terms of readings of a canonical
quality: (entropy < 0.001)) sensor, A, as follows. Let a particular task X require, on
average, na consecutive schedules of the standard sensor A
The sensor manager uses a combinatorial auction to be completed. Instead, if a resource bundle S is used at
mechanism to find the optimal allocation given the time t = 1, the expected number of standard sensor
consumer bids. However, the consumer bids are on high- readings required is reduced to n’a. The percentage of task
level task and the sensor resources are raw sensor completed by resource set S is equal to the percentage
schedules or communication bandwidth etc. Thus, it is savings in the required number of sensor A readings.
clear that some method for bundling goods produced by
fS X,t = ( na – n’a)/ na
various sellers is essential to create commodities that
consumers are interested in and can bid for. The sensor Equation 1: Calculation of fraction of task computed
manager uses a special-purpose protocol CCA (continuous based on readings of canonical sensor
combinatorial auction) to disintegrate the high-level
consumer bids to bids for sensor resources. CCA has been Then, the bid-price for resource set S at time t for task
designed to decrease computational and communication X with bid price P is
complexity of the market. The details of CCA are ps,X,,t = fs,X,t*P
available in [10]. We briefly describe the salient features Equation 2 : Calculation of bid-price for resource set
of CCA. CCA uses discrete time slots to schedule S at time t for task X
resources. For each time slot, a combinatorial auction is
held to figure out the optimal allocation of resources.
The number of resource combinations that can be
allocated to a task is exponential in the number of
Consume resources. For large sensor networks, we have formulated
Finished Goods a special genetic algorithm that solves the bid translation
Consumers problem in polynomial time [10]. The representation
Area scans, target tracks etc
schema used for the algorithms resembles the SGA
algorithm that we developed for determining
Market
Bid combinatorial auction winners [11]. First, let the number
Reports of bids be k and the number of sensors be n. The
Budgets
representation schema is a real-valued string of length n,
Sensors where each string bit is a real-valued number between 1
and k. The jth string member essentially represents the bid
Sensor Transmission Schedule Sensor Mission
Pointing, Manager to which sensor j is allocated. We obtain this string’s
Battery Power,
Channels Manager
Market auction overall utility as the sum of utilities obtained from
Processing Bandwidth
individual bids, which are calculated as the sum of
power.
individual bid-prices calculated as shown in Equation 2.
The resulting algorithm is anytime and has polynomial
Figure 1 Architecture of MASM complexity.
As explained earlier, there is a dichotomy in what the

consumers bid on (high-level tasks) and what resources
the sensor network has. To address this dichotomy, the
III. AGENT LEARNING FOR TASK PRIORITIZATION The constructed relationship should be in the form of a
probability distribution, Prp t i (QoS), where Prp,t i (QoS) is
Several methods based on techniques like dynamic the probability that service at level QoS is offered by SM
goal-lattices [5], decision-theory [6] and Bayesian at price, p, at time, t for task, i. Clearly, estimating this
networks [7] have been proposed to determine the weights distribution is a very difficult problem, since the
or priorities for the objectives in the multi-objective sensor distribution depends on a number of undetectable factors
management problem. However, none of these methods like the bidding strategies of other customers, the future
are directly applicable to a market-based scenario where state of the sensor network etc. Methods like
the auctioneer uses bid-prices to differentiate task reinforcement learning that learn by interacting with the
priorities. In MASM, the SM accepts bids only on a set of environment and monitoring the results of these
pre-defined tasks. The consumer agent is responsible for interactions can be used for the consumer agents to modify
deconstructing its high level tasks or goals into a sequence the price-Qos mapping [12]. Reinforcement learning is a
of SM acceptable subtasks on which it can bid. trial-and-error based search procedure that attempts to
Furthermore, it is responsible for assigning appropriate discover the action that will result in the highest reward,
budgets to these sub-tasks in order to bid in the sensor given a system state. The agent evolves while analyzing
manager’s market. The resource requestors have utility for consequences of its actions during it’s interactions with
tasks they are trying to accomplish but not necessarily for the environment.
the tasks the SM accepts bids for. Therefore, the consumer b. Response formulation: The agent learning technique
agents have to formulate a pricing strategy to decide the should have an optimization routine to determine the
optimal bid prices for their auction bids. To help optimal bid prices, based on the price-QoS mappings
consumer agents formulate their bids, the SM periodically generated by the data-interpretation module. To formulate
provides market reports, which contains statistics about the optimization problem, it is assumed that the consumer
auction outcomes in the previous schedules. Consumer agents’ overall performance can be expressed in terms of
agents, which are software entities that may be under its performance on the actionable tasks. Then, based on
human supervision, are required to analyze this mapping between price and quality of service generated by
information and formulate optimal bidding strategies. For the data interpretation module, the overall agent’s
example, assume that a consumer has utility Ut for performance can be expressed as a function of the bid
destroying a target T. The high level task of destroying T prices of the individual tasks. The formal definition of the
might consist of the following two sub-tasks, for which problem is available in [10]. Finding the best possible bid
SM accepts bids: a) search for target T, and b) track target prices that maximize the expected consumer utility can be
T so that the uncertainty about its position is reduced. The calculated using stochastic programming techniques.
consumer then has to divide its overall utility, Ut into However, in dynamic environments, estimating the
utilities for the two sub-tasks, so that it can formulate bid underlying probability distributions is difficult. The
prices. The optimal weights of the individual sub-tasks estimated distributions are likely to have large variation
are dependant on the system conditions. For example, it and hence, the optimized values obtained from the above
might be difficult to search for targets in some equations may not be useful.
environments whereas in others, tracking tracks to high We make the following simplifying assumptions to
accuracy might be the bottleneck. Also, utilities for the calculate the optimal bid prices for an agent in a given
sub-tasks should take into consideration the competition round of scheduling. We assume that agents are myopic
for resources. For example, sensors that can be used for that consider the current estimated values of the
velocity estimation, like Doppler sensors, might be probability distributions to be true for all future schedules.
abundant in the network, making the track task a less Furthermore, we assume that the market is a fixed price
competitive one. In this paper, a simple agent-learning market. In a fixed price market, the different commodities
approach is developed that allows consumer agents to use have fixed prices and the only choice consumers have is
market information to adapt dynamically to the system whether to buy them or not [13]. For fixed-price markets,
conditions and adapt their bidding strategies, so as to consumers can construct the price-to-QoS mapping as a
improve their performance. deterministic relationship, independent of time, based on
The agent learning approach that has been developed historic-data based approximations. Under these
requires two modules: assumptions, consumer agents use the same bid price,
a. Data interpretation module: In MASM, consumers can whenever they bid for resources for i-th subtask.
use two key resources of data; 1) The periodic market To calculate the bid-prices for their tasks, consumers solve
reports generated by the SM, that provide the current price the following optimization problem.
trends in the market and 2) Historic consumer
performance data, which record the market response for max E[performance] = max h ( γ)
bi
previous consumer, bids. The consumer agent to create
price-QoS relationships for each sub-task uses this data. Equation 3: Optimization of expected performance
B. Implementation Details
with the budgetary constraint:
∑b
i =1:n
i*
*Pi <B, 1) Market Reports
Equation 4: Budget constraints for a consumer

In the current simulation, SM generates market reports
once, every ten rounds. The market report contains the
Here, bi is the consumer bid-price for sub-task, i, γ is the mean, the upper and lower quantiles of bid prices for both
deterministic mapping between price-to-QOS constructed search and track tasks and the corresponding Quality-of-
based on the historic market data, n is the total number of Service (QoS) offered to the consumers. The QoS at a
tasks, h is the expected performance of the consumer, particular price is the percentage of the task that is
given the price-QoS mapping and the bid prices, Pi is the completed at that particular price in that particular round.
expected number of times the consumer has to perform 2) Agent Learning
task i, estimated using historic data and B is the total As explained in III, consumer agent learning involves
consumer budget. constructing a price to QoS mapping for each sub task
based on market conditions. This simulation implements a
simple technique where each consumer agent use a market
IV. EXPERIMENTAL STUDY report provided by the SM to construct an approximate
relationship between QoS and bid price for each task. A
linear interpolation of values, provided in the market
A. Simulation Platform report, generates an approximate QoS-Price mapping. For
We developed a multi-sensor, multi-consumer, multi- example, Figure 2 shows the price to QoS mapping,
target platform to a serve as a test bed for MASM. The constructed by a consumer for the search task, during a
design of the sensor network and the communication simulation experiment. Based on the QoS-Price mapping,
channel are derived from [14][15]. The complete details the consumers calculate the optimal decomposition of the
of the simulation environment are available in [10]. The task budget into a budget for the sub-tasks so that their
simulation environment represents a two-dimensional area expected performance is maximized.
where targets are uniformly distributed. These targets
move around the search area with constant velocity. The
search area has several different kinds of sensors,
including sensors that provide range and bearing, bearings
only sensors, electronic support measure (ESM) sensors
.The simulation has a set of software agents that search for
and destroy targets. The agents are not provided any
sensing resources and they depend on the sensor network
for obtaining information about the environment. They bid
for sensor resources and update their status based on
information provided by the sensor manager. They use the
sensor network’s resource to search for potential targets
and if the probability of target existence within their range
exceeds a cetarin threshold, initialize target tracks. Once a
target track is initialized, the agents attack the target if
their confidence in the target position is greater than a
certain threshold. This is again accomplished by buying Figure 2 Approximate price-QoS mapping generated
sensing resources from the sensor network. Agents are by consumer for search task, during a simulation
assumed to have a utility ut for destroying a target. To experiment
divide the overall utility into utilities for search and track This study’s experiments used the simplified version of
tasks, agents initially use equal priorities. During the optimization problem as stated in Equation 3 and
simulation run, agents update the search to track budget Equation 4, obtained under myopic assumptions. The
ratio using the learning described in section III. optimization problem faced by the consumers can be
formalized as follows. The task of the consumers is
destroying targets that involve the subtasks of searching
for targets (tsearch) and tracking targets (ttracking). The
agents have a budget Ut for destroying a target. In this
environment, the consumers are assumed to be interested
in maximizing the number of target destroyed. , while not
exceeding the budget constraints. Let γsearch(p) (γtrack(p)) The learning rate parameter, λ, determines the rate at
be the approximate QoS obtained at price p for search which the budget is changed. During each iteration, the
(track) task. Let δsearch(q) (δtrack (q)) be the average time search-budget is updated using the current momentum,
associated with completing the search (track) task when τcurrent, which is a weighted sum of the momentum during
service at QoS q is available during each round of the previous iteration and the current error. Current error
scheduling. A search task is complete when a new target is defined as the difference between the current search-
is found. A tracking target is complete, when the target budget and optimal search-budgets. The momentum rate
track uncertainty achieves the required threshold. The best parameter, γ, determines the weight given to the past
bidding prices, so that the expected time for completing T changes in the calculation of momentum.
is minimized is
3) Results
performance = To determine the optimal search to track task budget
min δ search ( γ ( p 1 )) + δ track ( γ track ( p 2 )) ratios, simulation experimentations were conducted,
search
pi , p 2 varying the percentage of the total budget that is used for
subject to the constraint search task. Since all the consumer agents are similar in
δ search ( γ search ( p 1 )) * p 1 + δ track ( γ track ( p 2 )) * p 2 <= Ut. the simulation experiment, they will have similar optimal
track to search budget ratios. Figure 3 shows the average
number of targets killed for different values of the
Equation 5: Optimization problem for finding the best percentage of the search budget in the overall task budget.
bidding prices. It is clear that the optimal search budget percentage is
around 0.05.
Equation 5 minimizes the expected time taken for
destroying a target under the constraint that the total bid
amount of the consumer is less than the prescribed
budgetary limit. This optimization problem was solved 9
using exhaustive enumeration techniques. The value of p1 8
N o o f T a r g e ts D e s tr o y e d
is varied between [0, Ut] and the corresponding value of p2

7
(the highest value of p2 meeting the budgetary constraint)
is calculated. The values of p1 and p2, which give the 6
highest performance, are established. Directly using the 5
values obtained by maximizing overall performance leads
4
to extensive reliance on the sparse market data. For
example, a particular consumer agent that is tracking a 3
highly important target might bid aggressively for 2
tracking resources. As a result, QoS-Price mappings for
1
the track subtask might show a temporary shift. If all the
consumer agents adapt rapidly to the new auction 0
outcomes, they cause a permanent price increase in the 0 0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.9 1
market.
To avoid speculative behavior, consumers use Widrow- Search Budget
Hoff learning to buffer their responses, similar to Cliff and
Bruten’s ZIP traders [14]. ZIP traders were originally
Figure 3 Number of targets destroyed versus search
designed to extend the Zero intelligence agent based
budget (averaged over 10 simulation experiments)
simulations of Gode and Sunder [15]. The Widrow-Hoff
learning rule with momentum [16] is used by the To determine the efficiency of the learning algorithm, the
consumers to adapt their bid prices, based on their current search budget of the five consumers in the simulation were
bid prices and the calculated optimal prices. If the initialized to different values. The learning algorithm was
current search budget is βcurrent and the optimal search allowed to modify the search budget, after each market
budget is calculated to be βoptimal using the optimization report is available. Since the overall utility of target
routine explained above, the search budget of the destruction in the simulation is equal to one, search
consumer for the next round of bidding is updated to budgets were initialized to the following values.
βupdated as follows: search-budget of i-th consumer = 0.1*i;
e = (βoptimal - βcurrent) track-budget of i-th consumer = 1 – 0.1*i;
τupdated = γ* τcurrent + (1- γ )*e Figure 4 shows the pattern in which the learning
βupdated = βcurrent + λ *τupdated algorithm of each consumer adapts the search-budget as
Equation 6: Widrow Hoff’s based learning the simulation progresses, for a sample run. Search
budgets of the consumers converge to the optimal value
for all consumers within 350 rounds of scheduling. Thus, simulation environment. But in a more general
the learning technique is very efficient in finding the environment, the myopic assumptions could prove to be a
optimal search-to -track budget ratios. critical issue. Also, we have ignored the effects of
strategic bidding in this paper, by assuming that the
agents bid honestly. In a non-cooperative environment
with self-interested agents, agents could indulge in
dishonest utility revelations. Game-theoretic analysis is
required to understand market performance under
strategic agent behavior. However, we note that since
MASM hides the actual details of task execution from the
consumers, impact of strategic behavior could be reduced.
A preliminary study of the effect of strategic behavior on
MASM has been conducted in [10]. We are currently
investigating the design of agent learning in more
rigorous sensor network platforms, based on
reinforcement-learning based approaches.
References:
[1] J.M.Manyika and H. F. D. Whyte, Data fusion and

Sensor Management: A decentralized information
theoretic approach. New York: Ellis Horwood, 1994.
[2] G. A. McIntyre and K. J. Hintz, "Sensor management
simulation and comparative study," Proc. SPIE Signal
Processing, Sensor Fusion, and Target Recognition VI -
The International Society for Optical Engineering, vol.
Figure 4 Convergence of search budget to optimal 3068, Orlando, FL, 20-25, pp 250-260, Apr 1997.
value, based on Widrow-Hoff learning [3]D. L. Hall, S. A. H. McMullen, Mathematical
Techniques in Multisensor Data Fusion, second ed.
Clearly, the convergence nature of the algorithm cannot Norwood, MA: Artech House Publishers, 2004.
be generalized. The simulation environment has many [4] D. L. Hall and J. Llinas, Handbook of Multisensor
simplifying design elements, including a. the symmetry of Data Fusion: CRC Press, 2001.
the market-consumers b. the small number of tasks and c. [5] K.J. Hintz and J. Malachowski, “Dynamic Goal
the myopic assumptions used by the consumers. In spite of Instantiation in Goal Lattices for Sensor Management,”
its simplicity, the learning approach proposed in this Proc. SPIE Signal Processing, Sensor Fusion, and Target
chapter has some important advantages, compared to Recognition XIV, vol. 5809, pp. 93–99, 2005.
conventional static approaches like goal lattices. The [6] F. Dormon, V. Leung, D. Nicholson, E. Siva, M. I.
learning algorithm requires the agent to analyze market Williams, “Information-Based Decision Making over a
data and compute the optimal weights (prices) for Data Fusion Network,” Proc. SPIE Signal Processing,
subtasks, based on market conditions and state of the Sensor Fusion, and Target Recognition XIV, vol. 5809,
environment in which the agent operates, during each pp. 100–110, 2005.
scheduling. As a result, the sub-task weights adapt [7] K. Veeramachaneni, L. Osadciw, and P.K. Varshney,
dynamically to changed system conditions. For example, “An Adaptive Multimodal Biometric Management
assume that the target density in the environment Algorithm,” IEEE Trans. Systems, Man, and Cybernetics,
decreases suddenly, due to some targets leaving the pp. 344–356, Aug. 2005.
simulation area. The search task becomes more difficult [8] Tracy Mullen, Viswanath Avasarala, David L. Hall,
and the optimization routine automatically increases the “Customer-Driven Sensor Management,” IEEE Intelligent
search-budget. On the other hand, hard-coded sub-task Systems 21(2): 41-49 (2006).
weights do not respond adequately to the new situation. [9] M. Wellman, W. Walsh, P. Wurman, and J. MacKie-
We are currently investigating the design of agent Mason, "Auction protocols for decentralized scheduling,"
learning for a real distributed sensor network Games and Economic Behavior, vol. 35, pp. 271-303,
environment. 2001.
The agent-learning technique introduced in this paper is [10] V. Avasarala, "Multi-agent systems for data-rich,
intended to be a preliminary study. A more rigorous information-poor environments," in College of
approach should relax the strong myopic assumptions that Information Sciences and Technology, Ph.D. University
we used. By design, these assumptions are true in our Park: The Pennsylvania State University, 2006.
[11] Viswanath Avasarala, Himanshu Polavarapu, Tracy
Mullen: An Approximate Algorithm for Resource
Allocation Using Combinatorial Auctions. IAT 2006: 571-
578.
[12] L. P. Kaebling., M. L .Littman. and A. W.Moore,
“Reinforcement Learning: A Survey”, Journal of
Artificial Intelligence Research, vol. 4, pp. 237-285,
1996.
[13] M. Karlsson, F. Ygge, and A. Andersson, “Market-
based approaches to optimization,” Computational
Intelligence, Volume 23, Number 1, February 2007 , pp.
92-109(18)
[13] D.Cliff and J.Bruten, “Minimal-intelligence agents
for bargaining behaviors in market-based environments,”
Technical Report, HPL-97-91, HP Labs, 1997.
[14] V. Lesser, C. L. Ortiz, and M. Tambe, Distributed
sensor networks: a multiagent perspective. Boston:
Kluwer Academic Publishers, 2003.
[15] G. A. McIntyre and K. J. Hintz, "Sensor management
simulation and comparative study," presented at
Proceedings of the Signal Processing, Sensor Fusion, and
Target Recognition VI, 1997 SPIE International
Symposium on Aerospace/Defense Sensing & Control,
Orlando, FL, 1997.
[16] D.K Gode, S.Sunder, “Allocative efficiency of
markets with zero-intelligence traders: Market as a partial
substitute for individual rationality,” Journal of Political
Economy, vol.101, pp. 119-37, 1997.
[17] B. Widrow and M.E.Hoff, “Adaptive switching
circuits,” IRE WESCON Conv. Rec., vol. 4, pp.96-104,
1960.

An Experimental Study On Agent Learning For Market-Based Sensor Management

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

An Experimental Study On Agent Learning For Market-Based Sensor Management

Uploaded by

Copyright:

Available Formats

An Experimental Study on Agent Learning for Market-based

As explained earlier, there is a dichotomy in what the

Equation 4: Budget constraints for a consumer

is varied between [0, Ut] and the corresponding value of p2

[1] J.M.Manyika and H. F. D. Whyte, Data fusion and

You might also like