Deep Learning Architectures for Volumetric Segmentation

Exploration of Deep Learning Network
Architectures for Volumetric Semantic

Segmentation
Akmalul Khairi Nazaruddin Dr. Eric Ho Tatt Wei

Electrical and Electronic Engineering Department Electrical and Electronic Engineering Department
Universiti Teknologi PETRONAS Universiti Teknologi PETRONAS
Bandar Seri Iskandar, Malaysia Bandar Seri Iskandar, Malaysia
Abstract—As an active area of research, segmentation of contrasting classical Machine Learning algorithms which
volumetric data, the form in which most medical data is require manual feature extraction feature engineering.
represented, faced challenges in the lack of sufficient amount of
data for training. The purpose of this project is to implement and As of today, Deep Learning underlies many advanced
characterize architectures of deep learning neural network for pattern recognition features such as naming object in pictures,
three-dimensional image segmentation. Artificial data was used segmentation, and natural language processing. Although
to circumvent the scarcity of training examples in characterizing originating from the field of applied artificial intelligence
the neural networks for volumetric semantic segmentation. research, Deep Learning has found many application in various
Additionally, gaussian noise with varying SNR were introduced fields including but not limited to, computer vision[4], control
to the images to determine to what extent does noise affect the engineering[5], medicine [6], Internet-Of-Things and the
performance of deep learning networks. Three architectures automotive industry[7].
were successfully implemented, Fully Convolutional[1], Residual
Blocks[2], and Inception Block[3]. The performance neural Deep Learning Neural Network has been shown to be
networks were measured by computing the accuracy and able to address the more complex tasks in image processing,
visualizing the results of segmentation. Among the three such as semantic segmentation. The challenge with training a
implemented architectures, Inception were found to be the most deep learning neural network end-to-end is the scarcity of
robust in performance compared to other implementation, that it labeled data. Although there has been researches proposing to
could overcome, at certain conditions, class imbalance issue that address some issues of a volumetric segmentation[8]–[12],
was found during the preliminary work of the project. Moreover, there is yet an approach that can provide quantitative
the inception network was also found to have a relatively good measurements for the results of the segmentation. Additionally,
performance with noisy images. Although it is expected that the there is no clear relationship between the performance of the
performance of the network will not generalize well with real neural network with the choices of architecture, effect of noise
data, the project provided some insight as to the performance of and resource constraints. As such application of deep learning
deep learning neural networks for volumetric semantic neural networks for volumetric semantic segmentation is still
segmentation. an active area of research. Therefore, there is a need to
implement and explore architectural choices of a deep learning
Keywords—Deep Learning; Semantic Segmentation;
neural networks that can perform efficient semantic
segmentation on volumetric data.
I. INTRODUCTION
Machine Learning is one of the technology with the II. BACKGROUND
potential to revolutionize the way information is utilized in
various fields. At its core Machine Learning algorithms are a The project focuses on implementing and characterizing
data-driven decision model requiring minimal human three deep learning convolutional neural network architectures
intervention. With the advent of the Age of Big Data and for volumetric segmentation.
innovations in hardware technology, a subset of machine A. Convolutional Neural Network
learning, called Deep Learning, has risen to a significant
popularity due its accuracy, efficiency, efficacy and reliability. Convolutional neural network[13] is a class of artificial
Deep learning is an advanced and powerful class of neural neural network, which was inspired by the vision process
network algorithms that are able to learn complex mapping happening in visual cortex. CNN has been the choice
between input and output. The learning process occurs during a architecture for many deep learning applications which
phase called training phase in the development lifecycle of a involve computer vision tasks, such as classification and
Deep Learning neural networks. These mappings are learnt segmentation. The basic structure of a convolutional neural
directly from the data being used to train the neural network, network is a successive convolution of the input layers, which
extracts optimal features from each convolutional layer, with
filters producing feature maps. After the convolution the convolutional neural network for an automatic 3D
features map is passed through an activation function, usually segmentation of brain MRI images where segmentation is
a ReLu, to introduce non-linearity[14]. Afterwards, these done by taking patches of an image, which predicts the center
activated feature maps are either down sampled, which voxel, along with its close neighbors. DeepNat[17] used two
increases abstraction, through max pooling or go through networks, hierarchically arranged, one to separate the
another convolution. After several convolutions and max background and the foreground and the other was used in the
pooling, the extracted features followed by fully-connected foreground to anatomically classify 25 brain structures by
layers, where high-level reasoning occurs, and outputs of the implementing multi-task learning. The hierarchical
previous layer are connected to every input of the next layer. arrangement of the network was used to counter the class
Output of the fully-connected layer are then connected to a imbalance problem in patch-based segmentation method.
softmax layer producing probability distribution over N- Additionally, the approach tried to reconcile the issue of the
number of classes. Fig. 1 shows an example of convolutional depth dimension by augmenting the model with spectral
neural network architecture. coordinates.
Results of the different methods used during evaluation
are compared based on performance when tested using the
dataset of the MICCAI Multi-Atlas Labeling challenge
P0F1P[20]. [17] used mean Dice volume overlap score, called
Sørensen–Dice coefficient index to measure the accuracy of
the neural network. Increased performance of segmentation
was observed when the network implemented an efficient
fully connected Conditional Random Fields (CRF).
Fig. 1. Convolutional Neural Network, AlexNet[8]
B. Training of a Deep Learning Network D. Fully Convolutional Neural Network

Classification task such as semantic segmentation falls In the context of semantic segmentation, [16] mentions the
under the supervised learning technique of Deep Learning drawbacks in a patch-based approach to segmentation as
neural network. Supervised here is understood as the network in[19], the drawback includes the how the network is
trained by being provided labeled training data. In supervised relatively slow and overlaps resulting in redundancy. A novel
learning, the network is given an example pair which consists approach introduced by [1] called Fully convolutional neural
of an input (tensors, images) and a desired output value (label network is a recent CNN network architecture proposed to
or ground truth image). The learning algorithm is fed with perform semantic segmentation on 2D images. One of the key
batches of these examples for analysis and infer mapping advantage of the proposed architecture is that it is able to
derived from the data to suit the task at hand. The forward receive image with arbitrary input size without requiring pre-
propagation of the training data to the networks initializes or post-processing. The architecture is constructed by adapting
weights in the network try to find the function, while the classifier networks to perform semantic segmentation. This is
backward propagation working with a loss function and an achieved as remarked by [1] in that it is possible to view fully
optimizer (i.e. Stochastic Gradient Descent or Adam[15] connected layers as convolution with kernel windows
optimizer) work to minimize the loss with each prediction covering the inputs region. By doing the transformation the
made after the forward propagations. There are several loss network is cast into a fully convolutional neural network.
functions that can be used to guide the learning of the neural
network, most commonly used for image classification is the Building upon the works of [1], [16] proposed the “U-
cross-entropy loss. Cross-Entropy, to put simply, measures the Net”, a fully convolutional architecture, to reconcile the trade-
distance between two probability distributions. Thus, the off between localization accuracy and context, and also
probability distribution with the correct class will have as extending the architecture to require less training data with on
shorter distance compared to incorrectly predicted classes. An the fly data augmentation, while increasing the precision of
optimal scenario for the algorithm is when the algorithms can the segmentation. The U-Net architecture have two paths, a
classify previously unseen objects or images into the correct contracting and expansive. A classical CNN architecture of
classes. successive convolutions with ReLu activations was employed
in the contracting path, down sampled with a 2x2 max-pooling
C. Volumetric Semantic Segmentation after two convolutions by 3x3 kernels. Mirroring the max-
pooling operations in the contracting path, 2x2 up-convolution
Recently, there has been several approaches[12], [16]– or a more correct term, transposed convolution, which
[18] to automatic image segmentation, while preserving the increases the resolution with reduced number of feature maps.
information in the depth dimension of volumetric images. As The output of the transposed convolution is concatenated with
previously mentioned in the introduction, to directly predict
the segmentation of 3D image is computationally
expensive[8]. [17] proposed a patch-based[19] approach of
1
https://masi.vuse.vanderbilt.edu/workshop2012
cropped features at the corresponding level of the contracting computational resource, this architectural consideration is
path. motivated by the works of [21].
Similar to FCNN architecture discussed earlier, the key
idea of the architecture is that U-Net architecture uses a 1x1 III. METHODOLOGY
convolutional layer at the output layer, in lieu of classical fully The project methodology will follow the general workflow
connected layers. This technique allows for an added of implementing a deep learning neural network, which are
advantage to the network by being able to segment images of architecture definition, training and evaluation of the network
arbitrary sizes[16]. PhC-U373 and DIC-HeLa datasets were performance.
used to evaluate the performance of the U-Net CNN, these
datasets are part of segmentation task the ISBI cell tracking A. Network Architecture
challenge in 2014 and 2015P1F2,2F3P. One of the first changes to accommodate 3D data was to
change the input layer of the network. Another important
The U-Net architecture however, can only be used to
consideration for segmentation is that in conventional classifier
segment 2D images or low depth 3D images. An extension is
network, the image or data is down sampled for increase
necessary to apply the architecture for 3D image abstractions, thus the size of the image or data is reduced or
segmentation. Research work by [12] proposed an architecture less than that of the input. Segmentation task is a pixelwise, or
which extended the U-Net architecture to using three voxel-wise in 3D, classification which must output a mapping
dimensional convolution. This fully convolutional neural of the pixels or voxels of the input data to output with
network is called the V-Net, although bearing similar features predetermined classes, called segmentation map.
to its 2D counterpart, the U-Net, V-Net not only extended the
U-Net architecture to process 3D images but also introduced a Table 1 shows the base architecture that was used in
novel loss layer to better detect desired features in small project. Notice that in lieu of max pooling layers, a convolution
of size 2 and stride of 2 is used to down sample the 3D image,
regions. Evaluation metrics such as Jaccard Index (IoU) or
this is to allow the network to have further learning in the down
Sørensen–Dice coefficient are common metrics for evaluating
sampling layer, instead of the max pooling layers which are not
the performance of a segmentation, both however are non- trainable parameters. This technique is found in the works of
differentiable, thus not usually optimized directly. A standard [21]which shows improvement in the accuracy of the network.
method to resolve this is to use a proxy loss function such as This architecture will serve as the base architecture that will be
multinomial logistic function or cross-entropy loss for modified to see the effect of changing the channel width and
semantic segmentation. Using a proxy loss function can lead depth of the network.
to decent performances provided that the network parameters
has been setup correctly, using differing functions to guide the
TABLE I. BASE ARCHITECTURE FOR VOLUMETRIC SEMANTIC
algorithm during training and evaluation can lead to poor SEGMENTATION
results in segmentation, especially where there is heavy class
Layers Description Output Dimensionality
imbalance in the training examples. [12] alleviate the problem
1. Input Dimension: 128 x 128 x 128 x 1 128 x 128 x 128 x 1
by providing a loss layer, along with a gradient proof, based 2. Convolution 7 x 7 x 7 4 filters, Activation=ReLU 128 x 128 x 128 x 4
directly on Sørensen–Dice coefficient. [12] concluded that the 3. Convolution 2 x 2 x 2 4 filters, Stride: 2 64 x 64 x 64 x 4
dice-base loss introduced in the paper provide significant 4. Convolution 5 x 5 x 5 4 filters, Activation=ReLU 64 x 64 x 64 x 4
boost in performance of segmentation while also addressing 5. Convolution 2 x 2 x 2 4 filters, Stride: 2 32 x 32 x 32 x 4
the class imbalance problem by not requiring sample re- 6. Convolution 5 x 5 x 5 4 filters, Activation=ReLU 32 x 32 x 32 x 4
weighting when compared to multinomial logistic loss. 7. Convolution 2 x 2 x 2 4 filters, Stride: 2 16 x 16 x 16 x 4
Another important feature of the dice-based loss layer is that 8. Convolution 5 x 5 x 5 4 filters, Activation=ReLU 16 x 16 x 16 x 4
loss can only be used for binary classification, foreground or 9. Convolution 2 x 2 x 2 4 filters, Stride: 2 8x8x8x4
background. 10. Deconvolution Size: 4 x 4 x 4, Stride: 4 32 x 32 x 32 x 4

11. Deconvolution Size: 4 x 4 x 4, Stride: 4 128 x128 x 128 x 4
Furthermore, the V-Net incorporated the use of residual 12. Convolution 1 x 1 x 1 4 filters, 1 x 1 x1 Convolution 128 x 128 x 128 x 4
blocks, introduced by[2], the winner for ILSVRC2015, to 13. Softmax Layer Decision Layer 128 x 128 x 128 x 4
address the vanishing gradients in Convolutional Neural
Network with significant depth. Another architectural Three implemented neural network architectures were the
difference between V-Net with its inspirations is that instead classical fully convolutional[1], fully convolutional with
of max-pooling operations to down sample the images, a residual blocks[2] and inception blocks[3]. Empirical
convolutional operation with a stride of 2 and a valid padding, evidences show that deep neural network almost always
which in essence perform a down sampling with no noticeable outperform shallow ones. Residual block[2] were motivated
effect on the performance of the network while also reducing by the issue in very deep network called vanishing gradients.
Vanishing gradients is the occurrence whereby the gradient
2
updates of the neuron in earlier hidden layers becomes very
small which essentially means that earlier layer does not learn.
http://www.codesolorzano.com/Challenges/CTC/Welcome.ht Another problem with arbitrarily adding depth to a neural
ml network leads to, counter-intuitively, the issue where the
3
http://brainiac2.mit.edu/isbi_challenge/
performance of the network actually degrades during training 1) Artifical Data
and evaluation error actually increases. Fig. 2 shows the
The data will be generated in the form of 5-dimensional
architectures that were implemented in the project.
tensor with each dimension corresponding to batch size,
height, width, depth, and channel number of the data sample
respectively. The synthetic data will be populated with values,
forming a sphere inside the bounding of the specified height,
width and depth. This same data will be used as the sparsely
annotated label. Another advantage of using a synthetic
training data is the theoretically limitless types of shapes and
transformation that can be generated, which can potentially
allow the network to be trained to achieve perfect accuracy.
Fig. 3, shows a data sample with three instances of spheres;
Each sphere has a radius of 20 units in the cartesian
Fig. 2. Implemented CNN Architecture
coordinate, whose origin points are generated in a random
There are total of seven architectures, 5 classical fully fashion.
convolutional, one architecture implementing residual blocks
and one implementing inception blocks. The input convolution
of the first three architecture were varied to investigate the
effect of increasing the width of the network at initial layers,
while the two last two of the classical fully convolutional
implementation sees additional up-convolutional layers to
investigate the effect of more intermediate up-sampling before
performing voxel-wise classification. Residual implementation
incorporates the residual blocks by adding skip connections
between the input convolutional of the block with the output
convolutional before the strided-convolution used for down Fig. 3. Sample Data
sampling the volumes.
Using artificial data for training allows for an informative
B. Training insight as to the effect of chosen architecture on the
To circumvent the difficulty of training a deep learning performance of the neural network with less emphasis
neural network for volumetric segmentation, the project will overfitting. Additionally, using artificial data allows for a
utilize artificial data to train and characterize the performance study of how the performance of the neural network is
of networks. This is motivated by the fact that artificial data affected by varying degree of noise in the images. The noise to
can be generated, in theory, infinitely many, provide access to be added to the images were chosen to be the Gaussian Noise.
perfect ground truth, and easily added noise for observing the Although there are more accurate representaion of noise in
effect of noise with respect to the performance of the deep digital images, such as Rician noise in medical images,
learning neural network. Using generated data for training has gaussian noise was chosen due to its ubiquity in digital images
some implication on the term epoch in training a deep learning and relatively simple to implement. The three SNR values that
network. In the context of machine learning, one epoch is were genereated for the treaining of the neural netwrok were
defined as the entirety of examples in the dataset, which means noiseless images (SNR: ∞), images with 0dB noise and -10dB
when an epoch is complete the network has seen every noise. The formula to calculate the amount of noise is be
example in the dataset. This definition of epoch is invalid when added is shown in equation 1 and equation 2 from digital
the data being used can be generate infinitely, which implies signal processing, where equation 2 is used to calculate the
that the network will never be able to see or process the SNR in dB taking the ratio as pixel intensity in digital images.
entirety of the training examples. The project will limit the
amount of training examples into 10.000 examples for all (1)
architecture and compare the performance of each. The training
accuracy were logged every 10 steps and evaluation accuracy (2)
at every 2000 steps through the network. The validation was
done on 1000 examples, the average accuracy of these C. Evaluation
examples is taken to be the accuracy of the network at
validation. Batch size of 1 were used so that the updating of the In order to provide a clear result or whether or not there
network weight happens at every step through the network and are improvements or advantage of a neural network over
also to reduce overhead due to processing 3D images. For the another neural network, the evaluation metrics must be chosen
loss function, cross-entropy loss was used as proxy objective carefully. [22] mentioned the importance of using common or
function to guide the segmentation while the gradient optimizer standard evaluation metrics to provide correct inferences of
was ADAM[15] optimizer due to its effectiveness and required the results when comparing performances with other methods.
time for convergence. Additionally, [22] asserted that the value of chosen metrics
might varies depending on tasks, or the context for which the will not overfit. Additionally, overfitting were avoided due to
neural network is made. the data being generated on the fly in a random fashion.
For semantic segmentation task, there are several standard
metrics to choose from, the most common is accuracy, the
accuracy used to describe the performance of a segmentation
is quite different than the accuracy used for normal
classification. The difference lies in that the prediction of each
pixels or voxels falls into four cardinalities, TP, FP, TN, and
FN, which are True Positives, False Positive, True Negative
and False Negative[23]. [1] calculated the accuracy of the
network prediction by dividing the intersection between the
prediction made by the network and ground truth of the data
sample with the number of voxels from each classes in Fig. 4. Overall Training and Validation Accuracy (Noiseless)
equation 3[1].
(3)
Where i represent the number of classes and nii represent

the classes which belongs to class i correctly predicted to class
i, and ti represent the total voxels of class i. This evaluation
metric is also called Jaccard Coefficient Index. Class
imbalance is a common issue with neural network for
semantic segmentation, this is especially true with volumetric
data. In order to observe how severe an effect does the class
imbalance has on the performance of the neural network, the
pixel accuracy will be computed for a whole image case and Fig. 5. Overall Training and Validation Accuracy (0db Noise)
per class case. This is achieved by extracting class masks from
the ground truth of the sample and computing Intersection-
over-Union for each.
In addition to numerical evaluation, visualizing the results
of the segmentation will provide a clearer overview of the
networks performance. This is achieved by plotting the
predicted voxel color scheme to differentiate between the
predicted classes. Furthermore, visualization will also help in
checking whether there are any issues with the numerical
calculation itself.
Fig. 6. Overall Training and Validation Accuracy (-10db Noise)
IV. RESULTS AND DISCUSSION
In characterizing the performance of deep learning This is an interesting finding that comparison between
networks, accuracy during training do no represent the actual training and evaluation with 10.000 training examples show
performance of the neural network. However, it is a useful negligible bias. The evaluation accuracy was calculated by
heuristic tool to observer whether the neural netwrok can taking the sample mean of 1000 images generated every 2000
learns to perform the task. Moreover, training accuracy is used training steps through to network.
to determine if the neural netwrok exhibit overfitting. This
section will discuss the finding from characterization of the B. Class Imbalance
deep learning neural netwrok for volumetric semantic
segmentation. With overfitting circumvented by using the artificial
data, another issue was found during characterization of the
neural network. Class imbalance is a common issue in
A. Overfitting
semantic segmentation where there are more examples of one
Overfitting of the network can be observed by comparing class of compared to other thus skewing the network to be
the training accuracy with validation accuracy. This is shown more tuned to detect the class with more abundant examples,
Fig. 4, 5 and 6. Where training accuracy is in the first subgraph such as the background. This can be observed from the results
while the validation accuracy is in the second subgraph.
of network accuracy where even for the classical fully
Overfitting happens where there is a significant bias between
convolution with relatively small network capacity (A1), the
the training and evaluation accuracy. The results show that
overfitting is not observed for all seven architectures across accuracy reaches over 95 %. The effect of class imbalance can
three different SNR values. This is consistent with the be inferred by overseeing comparing the class-wise accuracy
assumption that with enough training example the networks of each network. Fig. 7 show the accuracy of each class in of
the neural network trained with noiseless images.
The results imply two important points, the first is that the
heavy class imbalance in the training examples prevents the
network from learning to segment classes other than the
background class (Class 0) which is guided by proxy loss
function, cross-entropy loss for multi-class classification. The
loss function used to guide the learning of the neural network
does not provide sufficient performance for the architectures
which implement the classical convolutional neural network,
the varying width at the input and adding more up-sampling
layers shows negligible effect on the accuracy of other classes.
Another is that deeper network, specifically implementation of
inception and residual block, seems to be able to overcome the
class imbalance, as can be seen across three different SNR
values in the images. However, the results by no means
conclusively implies that shallower neural networks are unable
to perform volumetric segmentation, it only shows that to
perform multi-class classification using cross-entropy loss on
Fig. 7. Class-wise Validation Accuracy (Noiseless) shallower network results in neural networks that are more
tuned to classify classes that has more examples in the data,
which in the case of the project is the background class. The
implication is that using a more a suitable loss function to
guide the learning might allow shallower netwrok to perform
volumetric semantic segmentation. Cross-entropy loss was
used due to the non-differentiable nature the accuracy measure
selected to measure the performance of the neural networks.
Optimization lead directly by the accuracy metric with which
we measure the performance of the network should provide
better segmentation results as loss function based on Dice-
Sorensen coefficient proposed by [12], however the proposed
loss were only able to perform binary volumetric semantic
segmentation. Fig. 10, 11 and 12 shows segmentation results
for volumetric segmentation from three different
implementations of fully convolutional neural network, A1,
A6 and A7 respectively, across all three different SNR values.
Fig. 8. Class-wise Validation Accuracy (0dB Noise)
Fig. 10. Segmentation results on noiseless images
Fig. 11. Segmentation results on images with 0dB Noise
Fig. 9. Class-wise Validation Accuracy (-10dB Noise) Fig. 12. Segmentation results on images with -10dB Noise
C. SNR Study [4] D. K. Nithin and P. B. Sivakumar, “Generic Feature Learning in
Computer Vision,” Procedia Comput. Sci., vol. 58, pp. 202–209, 2015.
The effect of noise on the performance of networks on [5] K. Cheon, J. Kim, M. Hamadache, and D. Lee, “On Replacing PID
volumetric segmentation can be observed in Fig. 7 to 12. The Controller with Deep Learning Controller for DC Motor System,” J.
amount of noise does not seem to adversely affect the Autom. Control Eng., vol. 3, no. 6, pp. 452–456, 2015.
performance of network with inception blocks. Architecture [6] J. Schmidhuber, “Deep Learning in neural networks: An overview,”
Neural Networks, vol. 61, pp. 85–117, 2015.
A6, which was shown to overcome class imbalance issue in [7] A. Luckow, M. Cook, N. Ashcraft, E. Weill, E. Djerekarov, and B.
noiseless images seems to be affected quite heavily by the Vorster, “Deep Learning in the Automotive Industry: Applications and
noise introduced for SNR study, in that it failed to learn to Tools,” 2016 IEEE Int. Conf. Big Data (Big Data), pp. 3759–3768,
classify classes which are not the background class, where it 2017.
was able to achieve more than 99% accuracy. This might imply [8] K. Kamnitsas et al., “Efficient multi-scale 3D CNN with fully
that the addition of noise into the images increase the severity connected CRF for accurate brain lesion segmentation,” Med. Image
Anal., vol. 36, pp. 61–78, Feb. 2017.
of class imbalance or that it shifts the mean and variance of the [9] J. Dolz, C. Desrosiers, and I. Ben Ayed, “3D fully convolutional
data sample thereby increasing the difficulty for the network to networks for subcortical segmentation in MRI: A large-scale study,”
find the optimal mapping. Architecture A7 shows a more Neuroimage, Apr. 2017.
interesting result where it is able to achieve a similar [10] Q. Dou et al., “3D deeply supervised network for automated
performance on very noise image (-10dB) as with noiseless segmentation of volumetric medical images,” Med. Image Anal., vol.
41, no. 4, pp. 40–54, 2017.
images, however fails to perform segmentation for images with [11] J. Kleesiek et al., “Deep MRI brain extraction: A 3D convolutional
0db noise. The cause of this failure might be due to the neural network for skull stripping,” Neuroimage, vol. 129, pp. 460–
occurrence whereby the optimizer is stuck in a local minimum 469, 2016.
and is not able to find a way out, possibly caused by the [12] F. Milletari, N. Navab, and S.-A. Ahmadi, “V-Net: Fully
addition of noise. For the classical implementations of fully Convolutional Neural Networks for Volumetric Medical Image
Segmentation,” ArXiv, pp. 1–11, 2016.
convolutional neural network it is more difficult to provide a [13] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet
general interpretation of how the amount of noise in the images Classification with Deep Convolutional Neural Networks,” Adv.
affect their performance due to the class imbalance issue. Neural Inf. Process. Syst., pp. 1–9, 2012.
[14] V. Nair and G. E. Hinton, “Rectified Linear Units Improve Restricted
Boltzmann Machines,” Proc. 27th Int. Conf. Mach. Learn., no. 3, pp.
V. CONCLUSION 807–814, 2010.
In conclusion, the project has successfully implemented and [15] D. P. Kingma and J. Ba, “Adam: A Method for Stochastic
Optimization,” arXiv:1412.6980v9, pp. 1–15, 2014.
characterize three fully convolutional architecture which
[16] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional
includes implementation of state of the art architectural choices Networks for Biomedical Image Segmentation,” in Medical Image
which were shown to allow deep neural netwrok to achieve Computing and Computer-Assisted Intervention -- MICCAI 2015: 18th
significantly better performance. Deeper neural networks are International Conference, Munich, Germany, October 5-9, 2015,
quite robust to, in certain conditions, to overcome the issue of Proceedings, Part III, N. Navab, J. Hornegger, W. M. Wells, and A. F.
class imbalance while also performing remarkably well on Frangi, Eds. Cham: Springer International Publishing, pp. 234–241,
2015.
images with signal-to-noise ratio less than one. However, the [17] C. Wachinger, M. Reuter, and T. Klein, “DeepNAT: Deep
performance shown in this paper is limited to applications of convolutional neural network for segmenting neuroanatomy,”
deep learning neural network trained and evaluated using Neuroimage, vol. Volume 170, pp. 434–445, 2017.
artificial data with access to abundance of examples with [18] B. Kayalibay, G. Jensen, and P. van der Smagt, “CNN-based
perfect ground truth. Although, similar performance cannot be Segmentation of Medical Imaging Data,” arXiv:1701.03056v2, 2017.
[19] D. Ciresan, A. Giusti, L. Gambardella, and J. Schmidhuber, “Deep
expected when dealing with real data with high variability, this Neural Networks Segment Neuronal Membranes in Electron
project provides an informative exercise in characterization of Microscopy Images,” Adv. Neural Inf. Process. Syst. 25, pp. 2843–
deep learning neural network for volumetric semantic 2851, 2012.
segmentation. [20] A. J. Asman and B. A. Landman, “Formulating spatially varying
performance in the statistical fusion framework,” IEEE Trans. Med.
Imaging, vol. 31, no. 6, pp. 1326–1336, 2012.
REFERENCES [21] J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller,
[1] J. Long, E. Shelhamer, and T. Darrell, “Fully Convolutional Networks “Striving for Simplicity: The All Convolutional Net,” pp. 1–14, 2014.
for Semantic Segmentation,” IEEE Conf. Comput. Vis. Pattern [22] A. Garcia-Garcia, S. Orts-Escolano, S. Oprea, V. Villena-Martinez,
Recognit., pp. 3431–3440, 2015. and J. Garcia-Rodriguez, “A Review on Deep Learning Techniques
[2] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Applied to Semantic Segmentation,” pp. 1–23, 2017.
Image Recognition,” arXiv:1512.03385v1, pp. 1–12, 2015. [23] A. A. Taha and A. Hanbury, “Metrics for evaluating 3D medical image
[3] C. Szegedy et al., “Going Deeper with Convolutions,” in Computer segmentation: Analysis, selection, and tool,” BMC Med. Imaging, vol.
Vision and Pattern Recognition (CVPR), 2015 IEEE Conference, pp. 15, no. 1, 2015.
1–9, 2015.

Deep Learning Architectures for Volumetric Segmentation

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Deep Learning Architectures for Volumetric Segmentation

Uploaded by

Copyright:

Available Formats

Exploration of Deep Learning Network

Architectures for Volumetric Semantic

Akmalul Khairi Nazaruddin Dr. Eric Ho Tatt Wei

B. Training of a Deep Learning Network D. Fully Convolutional Neural Network

background. 10. Deconvolution Size: 4 x 4 x 4, Stride: 4 32 x 32 x 32 x 4

Where i represent the number of classes and nii represent

Fig. 8. Class-wise Validation Accuracy (0dB Noise)

Fig. 10. Segmentation results on noiseless images

Fig. 11. Segmentation results on images with 0dB Noise

You might also like