Professional Documents
Culture Documents
Abstract— Zero-shot learning (ZSL) is of critical importance a convolutional neural network (CNN) to SAR ATR and
for practical synthetic aperture radar (SAR) automatic target achieved a state-of-the-art accuracy of 99% on the ten-types
recognition (ATR) as training samples are not always available for of supervised classification tasks of the Moving and Station-
all targets and all observation configurations. We propose a novel
generative-based deep neural network framework for ZSL of ary Target Acquisition and Recognition (MSTAR) benchmark
SAR ATR. The key component of the framework is a generative data sets.
deconvolutional neural network referred to as generator. It learns Although being powerful in classification, CNN as a super-
a faithful hierarchical representation of known targets while vised discriminative network is highly sensitive to selection
automatically constructing a continuous SAR target feature space and size of training samples and thus susceptible to overfitting
spanned by orientation-invariant features and orientation angle.
It is then used as a reference to design and initialize an interpreter problems. It is particularly well known for its weakness
convolutional neural network, which is inversely symmetric to the in generalization. Its performance would rapidly degrade if
generator network. The interpreter network is then trained to applied to different observation configurations or tasked to
map any input SAR image, including those of unseen targets, discriminate variant target types. It would become completely
into the target feature space. In a preliminary experiment with useless for new target types which have never been seen in the
the Moving and Stationary Target Acquisition and Recognition
data set, seven targets are used in the training of generator and training samples, which is a practical need in many applica-
interpreter networks. Then, the eighth target is used to test the tions. This becomes a major hindrance in practical application
interpreter, where it is correctly mapped to the reasonable spot of CNN to SAR ATR.
spanned by the previous seven targets and its orientation can In deep learning regime, this is an important research
also be estimated. topic which has motivated the so-called zero-shot learn-
Index Terms— Deep generative neural network, orientation- ing (ZSL) [5], [6] and one-shot learning (OSL) [7].
invariant feature space, synthetic aperture radar (SAR).
ZSL requires no training samples while OSL requires only few
I. I NTRODUCTION training samples or just one training sample. ZSL/OSL is often
achieved by semantic representation, which maps the low-
S YNTHETIC aperture radar (SAR) images are very dif-
ferent from optical images and are difficult to interpret
because of the microwave wavelength employed and the
level image features into mid-to-high-level space, where the
distance calculated between two samples reveals their relation-
phase-coherent nature of SAR imaging. Nevertheless, a SAR ships. Usually, semantic representation can be categorized into
image contains rich information about the target and scene two types: attribute-based binary vectors [8] and word-based
under observation, e.g., geometry, material, and structure. continuous vectors [9]. The former requires manually defining
Human interpretation of SAR imagery, which needs experi- the attributes of targets and then assigning the corresponding
enced experts to find small targets in massive SAR images, is labels for each training sample, which is apparently too costly
challenging, time consuming, and impractical in the big data for annotation. The later works only for common label words
era of remote sensing [1]. After being trained on enough sam- which have simple meanings, e.g., car, airplane, and dog.
ples, deep learning can imitate the mechanism of human brain, This is not the case for a specific class of SAR targets
learn the latent features of targets, and help machine to inter- such as the eight vehicles in the MSTAR data set, e.g.,
pret SAR images automatically. It has revolutionized first the T72 and T62.
computer vision area and then many other machine learning Generative neural network [10], [11] as opposed to discrim-
areas including the SAR image interpretation, e.g., automatic inative neural network is often used for unsupervised learning
target recognition (ATR) [2], terrain surface classification [3], such as representation learning. This letter addresses the
and parameter inversion [4]. Chen et al. [2] first applied ZSL problem of SAR ATR using deep generative neural
networks. It learns to mimic SAR images by training a
Manuscript received July 21, 2017; revised September 5, 2017 and generative deep neural network (DNN). It automatically learns
September 27, 2017; accepted September 28, 2017. Date of publication
November 10, 2017; date of current version December 4, 2017. This work was a hierarchical representation of SAR target features from given
supported in part by the National Key R&D Program of China under SAR images. In the meanwhile, it constructs a continuous SAR
Grant 2017YFB0502700 and in part by NSFC under Grant 61571134. target feature space where target orientation and orientation-
(Corresponding author: Feng Xu.)
The authors are with the Key Laboratory for Information Science of invariant intrinsic features are disentangled. Such feature space
Electromagnetic Waves (MoE), Fudan University, Shanghai 200433, China is spanned by known target samples and will be used as the
(e-mail: fengxu@fudan.edu.cn). reference to interpret targets which have never been seen in
Color versions of one or more of the figures in this letter are available
online at http://ieeexplore.ieee.org. the training samples. Thereafter, a deep CNN is trained to
Digital Object Identifier 10.1109/LGRS.2017.2758900 continuously map any input SAR image to the learned target
1545-598X © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
2246 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 14, NO. 12, DECEMBER 2017
II. OVERALL A RCHITECTURE After obtaining p(F|c), the interpreter aims at minimizing
the loss defined between p(F|c) and p(F|X x∼D ). The whole
Suppose we have cs known targets with n s samples
network is trained only by limited types of targets, i.e.,
S = {cs , vs , Fs , X s }, and ct unseen targets with n t samples
cs classes of S; however, the trained interpreter DNN can also
T = {vt , Ft , X t }. X denotes the SAR images, and c and v
be utilized to interpret other types of targets such as ct types
are the label and orientation information of targets. The key
of T , which is the principle of ZSL.
idea of this letter is to construct a feature space and learn the
Compared to conventional generative neural networks,
mapping from the SAR images to orientations and orientation-
the proposed framework includes a supervised constructor NN
invariant features Fs using cs , vs , and X s of S. The learned
which tends to map known labels to a continuous intrinsic
mapping can be used to infer vt and Ft of unseen targets T
feature space. The constructor NN is simultaneously trained
given X t .
with the deconvolutional generator network. The key is to take
Fig. 1 illustrates the overall framework of DNNs for the
advantage of hierarchical representation learning to construct
proposed ZSL SAR ATR. In the forward generative direction,
the desired feature space.
the input is discrete labels and orientation information of
targets. Labels are first fed into a fully connected construc-
tor NN, which constructs a continuous target space spanned III. C ONSTRUCTOR –G ENERATOR DNN OF SAR I MAGES
by orientation-invariant features, which are then concatenated The proposed framework is then implemented on the famous
with orientation information to form the complete feature SAR ATR benchmark data set MSTAR data set [12], which is
vector. Subsequently, this complete feature vector is fed widely used for SAR target recognition and classification task.
into a deep DNN, i.e., the generator DNN, which generates In this letter, eight types of vehicle targets as observed by the
pseudo-SAR images of the corresponding target at a particular X-band one-foot-resolution SAR from 360° orientations have
orientation. The generative networks, both the constructor and been used. For each type, there are ∼300 images. The orien-
the generator DNNs, are trained using SAR images of known tation (i.e., aspect angle) of each SAR image can be obtained
targets. They learn from the data to construct a continuous fea- in the header of its binary file. To test the ZSL capability, only
ture space where all known targets can find their corresponding seven types (i.e., cs = 7) of targets are used in training. The
position. eighth type is used for testing. This is to mimic the fact that
Once the orientation-invariant feature vector is constructed, not all targets are available as training samples, which is the
it is then used as the goal to train the inverse direction, i.e., key challenge for ZSL.
from an input SAR image to its feature space. The inverse The forward-pass constructor–generator network archi-
direction is a simple CNN which maps an SAR image to tecture is illustrated in Fig. 2. The inputs of the network are
the orientation-invariant feature space and at the same time one-hot target label c and orientation vector v of the target.
extracts its orientation information. Such a CNN is referred to In this case, the constructor is a two-layer fully connected
SONG AND XU: ZSL OF SAR TARGET FEATURE SPACE 2247
reflecting the mutual distances among different targets should The key difference between the interpreter DNN and con-
be stable. ventional supervised classifier DNN is that the goal output is
To demonstrate the efficacy of the generator DNN, we sam- a continuous target feature space instead of discrete labels.
ple arbitrary from the feature space and then examine the Another key factor of interpreter DNN is that it has to
generated fake SAR images. For example, we can sample from be the inverse network of a successfully trained generator
the orientation-invariant feature space a point between any two network. It guarantees that the interpreter will correctly map
targets. Then, full-aspect SAR images of such an imaginary the input SAR image to the continuous feature space. Again,
target can be generated. Fig. 7 shows examples of generating the generative DNN plays a critical role in constructing and
0°–360° SAR images of an imaginary target between D7 and training the interpreter DNN.
T62, T62 and 2S1, and between 2S1 and ZIL123, respectively. The interpreter DNN is trained using 80% of the known
We believe that the generative DNN is crucial in construct- seven types of real SAR images, and the rest (20%) is used
ing the continuous feature space which faithfully reflects the to validate the fitness of the network. The GPU takes 36 s
feature of each target and the distances among them. to train the network and 0.2 s to test one SAR image.
Fig. 9 shows the distribution of the 20% validation samples
IV. I NTERPRETER DNN FOR U NSEEN SAR TARGET as interpreted in the orientation-invariant feature space. It can
Once we constructed the target feature space via training be seen that the known seven types are well separated and
the constructor–generator DNN, the interpreter DNN can be clustered around respective centers. An overall classification
easily designed and trained. The interpreter DNN is symmetric accuracy of 96.8% can be achieved if we simply apply nearest
to the generator network, but in backward direction (Fig. 8). neighbor clustering. Fig. 10 shows the estimated orientation
It is a CNN as opposed to the generative DNN. Its architecture angle versus true orientation angle. The standard deviation
is the same as the generator DNN except that the input and error for orientation estimation is 16°.
output are swapped at each layer. In addition, the weights of Finally, let us examine the ZSL capability of the interpreter
interpreter DNN are initialized as the same weights as the network. We pick the T72 target from the rest target types
generator DNN. It serves as a weak regularization to bind the which are not included in the training samples. It means that
inverse interpreter to the generator whose SAR representative the interpreter has never seen any SAR images of a T72.
power had been tested. In an ideal case, we want the interpreter network to be able to
SONG AND XU: ZSL OF SAR TARGET FEATURE SPACE 2249