Anu Document

1.
1 Introduction to Image Processing A new structure-based interest region detector called Principal Curvature-Based Regions (PCBR) which we use for object class recognition. The PCBR interest operator detects stable watershed regions within the multi-scale principal curvature image. To detect robust watershed regions, we clean a principal curvature image by combining a gray scale morphological close with our new eigenvector flow hysteresis threshold. Robustness across scales is achieved by selecting the maximally stable regions across consecutive scales. PCBR typically detects distinctive patterns distributed evenly on the objects and it shows significant robustness to local intensity perturbations and intra-class variations. We evaluate PCBR both qualitatively (through visual inspection) and quantitatively (by measuring repeatability and classification accuracy in real-world object-class recognition problems). Experiments on different benchmark datasets show that PCBR is comparable or superior to state-of-art detectors for both feature matching and object recognition. Moreover, we demonstrate the application of PCBR to symmetry detection In many object recognition tasks, within-class changes in pose, lighting, color, and texture can cause considerable variation in local intensities. Consequently, local intensity no longer provides a stable detection cue. As such, intensity-based interest operators (e.g., Harris, Kadir)and the object recognition systems based on themoften fail to identify discriminative features. An alternative to local intensity cues is to capture semi-local structural cues such as edges and curvilinear shapes [25]. These structural cues tend to be more robust to intensity, color, and pose variations. As such, they provide the basis for a more stable interest operator, which in turn improves object recognition accuracy. This paper introduces a new detector that exploits curvilinear structures to reliably detect interesting regions. The detector, called the Principal Curvature-Based Region (PCBR) detector, identifies stable watershed regions within the multi-scale principal curvature image. Curvilinear structures are lines (either curved or straight) such as roads in aerial or satellite images or blood vessels in medical scans. These curvilinear structures can be detected over a range of viewpoints, scales, and illumination changes. The PCBR detector employs the first steps of Stegers curvilinear detector algorithm [25]. It forms an image of the maximum or minimum eigen value of the Hessian matrix at each pixel. We call this the principal curvature image, as it measures the principal curvature of the image intensity surface. This process generates a single response for both lines and edges, producing a clearer structural sketch of an image than is usually provided by the gradient magnitude image. We develop a process that detects structural regions efficiently and robustly using the watershed transform of the principal curvature image across scale space. The watershed algorithm provides a more efficient
mechanism for defining structural regions than previous methods that fit circles, ellipses, and parallelograms [8, 27]. To improve the watersheds robustness to noise and other small image perturbations, we first clean the principal curvature image with a gray scale morphological close operation followed by a new hysteresis thresholding method based on local eigenvector flow. The watershed transform is then applied to the cleaned principal curvature image and the resulting watershed regions (i.e., the catchment basins) define the PCBR regions. To achieve robust detections across multiple scales, the watershed is applied to the maxima of three consecutive images in the principal curvature scale spacesimilar to local scale-space extreme used by Lowe [13], Mikolajczyk and Schmidt [17], and othersand we further search for stable PCBR regions across consecutive scalesan idea adapted from the stable regions detected across multiple threshold levels used by the MSER detector [15]. While PCBR shares similar ideas with previous detectors, it represents a very different approach to detecting interest regions. Many prior intensity-based detectors search for points with distinctive local differential geometry, such as corners, while ignoring image features such as lines and edges. Conversely, PCBR utilizes line and edge features to construct structural interest regions. Compared to MSER, PCBR differs two important aspects. First, MSER does not analyze regions in scale space, so it does not provide different levels of region abstraction. Second, MSERs intensity-based threshold process cannot overcome local intensity variations within regions. PCBR, however, overcomes this difficulty by focusing on region boundaries rather than the appearance of region interiors. This work makes two contributions. First, we develop a new interest operator that utilizes principal curvature to extract robust and invariant region structures based on both edge and curvilinear features. Second, we introduce an enhanced principle-curvature-based watershed segmentation and robust region selection process that is robust to intra-class variations and is more efficient than previous structure-based detectors. We demonstrate the value of our PCBR detector by applying it to object-class recognition problems and symmetry detection. Image Processing is a form of signal processing where images and their properties can be used to gather and analyze information about the objects in the image. Digital image processing uses digital images and computer algorithms to enhance, manipulate or transform images to obtain the necessary information and make decisions accordingly. Examples of digital image processing include improvements and analysis of the images of the Surveyor missions to the moon [15], magnetic resonance imaging scans of the brain and electronic face recognition packages. These techniques can be used to assist humans with complex tasks and make them easier. A detailed analysis of an X-ray can help a radiologist to decide whether a bone is
fractured or not. Digital image processing can increase the credibility of the decisions made by humans. 1.2 Introduction to Medical Imaging Image processing techniques have developed and are applied to various fields like space programs, aerial and satellite imagery and medicine [15]. Medical imaging is the set of digital image processing techniques that create and analyze images of the human body to assist doctors and medical scientists. In medicine, imaging is used for planning surgeries, X-ray imaging for bones, Magnetic resonance imaging, endoscopy and many other useful applications [31]. Digital X-ray imaging is used in this thesis project. Figure 1.1, shows the applications of digital imaging in medical imaging. Since Wilhelm Roentgen discovered X-rays in 1895 [14], X-ray technology has improved considerably. In medicine, X-rays help doctors to see inside a patient's body without surgery or any physical damage. X-rays can pass through solid objects without altering the physical state of the object because they have a small wavelength. So when this radiation is passed through a patient's body, objects of different density cast shadows of different intensities, resulting in black-and-white images. The bone, for example, will be shown in white as it is opaque and air will be shown in black. The other tissues in the body will be in gray. A detailed analysis of the bone structure can be performed using X-rays and any fractures can be detected. Conventionally, X-rays were taken using special photographic films using silver salts [28]. Digital X-rays can be taken using crystal photodiodes. Crystal photodiodes contain cadmium tungsten or bismuth germanate to capture light as electrical pulses. The signals are then converted from analogue to digital and can be viewed on computers. Digital X-rays are very advantageous as they are portable, require less energy than normal Xrays, less expensive and are environmentally friendly [28]. A radiologist would look at the Xrays and determine if a bone was fractured or not. This system is time consuming and unreliable because the probability of a fractured bone is low. Some fractures are easy to detect and a system can be developed to automatically detect fractures. This will assist the doctors and radiologists in their work and will improve the accuracy of the results [28]. According to the observations of [27], only 11% of the femur X-rays were showing fractured bones. So the radiologist has to look at a lot of X-rays to find a fractured one. An algorithm to automatically detect bone fractures could help the radiologist to find the fractured bones or at least confidently sort out the healthy ones. But no single algorithm can be used for the whole body because of the complexity of different bone structures. Even though a lot of research has been done in this field, there is no system that completely solves the problem [14]. This is because there are several complicated parts to this problem of fracture detection. Digital X-rays are very
detailed and complicated to interpret. Bones have different sizes and can differ in characteristics from person to person. So finding a general method to locate the bone, and decide if its fractured or not, is a complex problem. Some of the main aspects to the problem of automatic bone fracture detection are bone orientation in the X-ray, extracting bone contour information, bone segmentation, extraction of relevant features. 1.3 Description of the Problem This thesis investigates the different ways of separating a bone from an X-ray. Meth ods like edge detection and Active Shape Models are experimented with. The aim of this thesis is to find an efficient and reasonably fast way of separating the bone from the rest of the X-ray. The bone that was used for the analysis is the tibia bone. The tibia, also known as the shinbone or shankbone, is the larger and stronger of the two bones in the leg below the knee in vertebrates and connects the knee with the ankle bones. Details of the X-ray data used are provided in the next section. 2.1 Theory Development A typical digital image processing system consists of image segmentation, feature extraction, pattern recognition, thresholding and error classification. Image processing aims at extracting the necessary information from the image. The image needs to be reduced to certain defining characteristics and the analysis of these characteristics gives the relevant information. Figure 2.1 shows a process flow diagram of a typical digital image processing system, showing the sequence of the operations. Image segmentation is the main focus of this thesis. The other processes are briefly described for completeness and to inform the reader of the processes in the whole system.
2.1.1 Image Segmentation Image segmentation is the process of extracting the regions of interest from an image. There are many operations to segment images and their usage depends on the nature of the region to be extracted. For example, if an image has strong edges, edge detection techniques can be used to partition the image into its components using those edges. Image segmentation is the central theme of this thesis and is doneusing several techniques. Figure 2.2, shows how one of the
coins can be separated from the image. shows the original image and highlights the boundary of one of the coins. These techniques are analyzed and the best technique to separate bones from X-rays is suggested. When dealing with bone X-ray images, contour detection is an important step in image segmentation. According to [31], classical image segmentation and contour detection can be di_erent. Contour detection algorithms extract the contour of objects whereas image segmentation separates homogeneous sections of the image. A detailed literature review and history of the image segmentation techniques used for different applications is given in Chapter 3.
2 Segmentation of Images - An Overview Image segmentation can proceed on three dierent ways, Manually Automatically Semiautomatically 2.1 Manual Segmentation The pixels belonging to the same intensity range could manually be pointed out, but clearly this is a very time consuming method if the image is large. A better choice would be to mark the contours of the objects. This could be done discrete from the keyboard, giving high accuracy, but low speed, or it could be done with the mouse with higher speed but less accuracy. The manual techniques all have in common the amount of time spent in tracing the objects, and human resources are expensive. Tracing algorithms can also make use of geometrical figures like ellipses to approximate the boundaries of the objects. This has been done a lot for medical purposes, but the approximations may not be very good. 2.2 Automatic Segmentation Fully automatic segmentation is dicult to implement due to the high complexity and variation of images. Most algorithms need some a priori information to carry out the segmentation, and for a method to be automatic, this a priori information must be available to the computer. The needed apriori information could for instance be noise level and of the objects having a special distribution. 2.3 Semiautomatic Segmentation Semiautomatic segmentation combines the benets of both manual and automatic segmentation. By giving some initial information about the structures, we can proceed with
automatic methods. Thresholding. If the distribution of intensities is known, thresholding divides the image into two regions, separated by a manually chosen threshold value a as follows: if B(i; j) a; B(i; j) = 1 (object) else B(i; j) = 0 (background) for all i; j over the image B [YGV]. This can be repeated for each region, dividing them by the threshold value, which results in four regions etc. However, a successful segmentation requires that some properties of the image is known beforehand. This method has the drawback of including separated regions which correctly lie within the limits specified, but regionally do not belong to the selected region. These pixels could for instance appear from noise. The simplest way of choosing the threshold value would be a fixed value, for instance the mean value of the image. A better choice would be a histogram derived threshold. This method includes some knowledge of the distribution of the image, and will result in less misclassication. Isodata algorithm is an iterative process for nding the threshold value [YGV]. First segment the image into two regions according to a temporary chosen threshold value. Then calculate the mean value of the image corresponding to the two segmented regions. Calculate a new threshold value from thresholdnew = mean(meanregion1 + meanregion2) and repeat until the threshold value does not change any more. Finally choose this value for the threshold segmentation. To implement the triangle algorithm, construct a histogram of intensities vs number of pixels like in Figure 2.1. Draw a line between the maximum value of the histogram hmax and the minimum value hmin and calculate the distance d between the line and and the histogram. Increase hmin and repeat for all h until h = hmax. The threshold value becomes the h for which the distance d is maximised. This method is particularly eective when the pixels of the object we seek make a weak peak. Boundary tracking. Edge-nding by gradients is the method of selecting a boundary manually, and automatically follow this gradient until returning to the same point [YGV]. Returning to the same point can be a major problem of this method. Boundary tracking will wrongly include all interior holes in the region, and will meet problems if the gradient specifying the boundary is varying or is very small. A way to overcome this problem is rst to calculate the gradient and then apply a threshold segmentation. This will
exclude some wrongly included pixels compared to the threshold method only. Zero-crossing based procedure is a method based on the Laplacian. Assume the boundaries of an object has the property that the Laplacian will change sign across them. Consider a 1D problem where = @2 @x2 . Assume the boundary is blurred, and the gradient will have a shape like in Figure 2.2. The Laplacian will change sign just around the assumed edge for position = 0. For noisy images the noise will produce large second derivatives around zero crossings, and the zero-crossing based procedure needs a smoothing lter to produce satisfactory results. Clustering Methods. Clustering methods group pixels into larger regions using colour codes. The colour code for each pixel is usually given as a 3D vector, but
2.1.2 Feature Extraction Feature extraction is the process of reducing the segmented image into few numbers or sets of numbers that de_ne the relevant features of the image. These features must be carefully chosen in such a way that they are a good representation of the image and encapsulate the necessary information. Some examples of features can be image properties like the mean, standard deviation, gradient and edges. Generally, a combination of features is used to generate a model for the images. Cross validation is done on the images to see which features represent the image well and those features are used. Features can sometimes be assigned weights to signify the importance of certain features. For example, the mean in a certain image may be given a weight of 0.9 because it is more important than the standard deviation which may have a weight of 0.3 assigned to it. Weights generally range from 0 to 1 and they de_ne how important the features are. These features and their respective weights are then used on a test image to get the relevant information. To classify the bone as fractured or not,[27]measures the neck-shaft angle from the segmented femur contour as a feature . Texture features of the image such as Gabor orientation (GO), Markov Random Field (MRF) and intensity gradient direction (IGD) are used by [22] to generate a combination of classi_ers to detect fractures in bones. These techniques are also used in [20] to look at femur fractures speci_cally. Best parameter values for the features can be found using various techniques.
2.1.3 Classifiers and Pattern Recognition After the feature extraction stage, the features have to be analyzed and a pattern needs to be recognized. For example, the features mentioned above like the neck-shaft angle in a femur X-ray image need to be plotted. The patterns can be recognized if the neck-shaft angles of good femurs are di_erent from those of fractured femurs. Classifiers like Bayesian classifiers and Support Vector Machines are used to classify features and _nd the best values for them. For example, [22] used a support vector machine called the Gini-SVM [22] and found the feature values for GO, MRF and IGD that gave the best performance overall. Clustering, nearest neighbour approaches can also be used for pattern recognition and classi_cation of images. For example, the gradient vector of a healthy long bone X-ray may point in a certain direction that is very di_erent to the gradient vector of a fractured long bone X-ray. So, by observing this fact, a bone in an unknown X-ray image can be classi_ed as healthy or fractured using the gradient vector of the image. 2.1.4 Thresholding and Error Classi_cation Thresholding and Error Classi_cation is the _nal stage in the digital image processing system. Thresholding an image is a simple technique and can be done at any stage in the process. It can be used at the start to reduce the noise in the image or it can be used to separate certain sections in an image that has distinct variations in pixel values. Thresholding is done by comparing the value of each pixel in an image and comparing it to a threshold. The image can be separated into regions or pixels that are greater or lesser than the threshold value. Multiple thresholds can be used to achieve thresholding with many levels. Otsu's method [21] is a way of automatically thresholding any image. Thresholding is used at different stages in this thesis. It is a simple and useful tool in image processing. The following figures show the effects of thresholding. Thresholding of an image can be done manually by using the histogram of the intensities in an image. It is difficult to threshold noisy images as the background intensity and the foreground intensity may not be distinctly separate. Figure 2.3, shows an example of an image and its histogram that has the pixel intensities on the horizontal axis and the number of pixels on the vertical axis.
(a) The original image Figure 2.3: Histogram of image [23]
(b) The histogram of the image
IMAGE ENHANCEMENT TECHNIQUES Image enhancement techniques improve the quality of an image as perceived by a human. These techniques are most useful because many satellite images when examined on a colour display give inadequate information for image interpretation. There is no conscious effort to improve the fidelity of the image with regard to some ideal form of the image. There exists a wide variety of techniques for improving image quality. The contrast stretch, density slicing, edge enhancement, and spatial filtering are the more commonly used techniques. Image enhancement is attempted after the image is corrected for geometric and radiometric distortions. Image enhancement methods are applied separately to each band of a multispectral image. Digital techniques have been found to be most satisfactory than the photographic technique for image enhancement, because of the precision and wide variety of digital processes. Contrast Contrast generally refers to the difference in luminance or grey level values in an image and is an important characteristic. It can be defined as the ratio of the maximum intensity to the minimum intensity over an image. Contrast ratio has a strong bearing on the resolving power and detectability of an image. Larger this ratio, more easy it is to interpret the image. Satellite images lack adequate contrast and require contrast improvement.
Contrast Enhancement Contrast enhancement techniques expand the range of brightness values in an image so that the image can be efficiently displayed in a manner desired by the analyst. The density values in a scene are literally pulled farther apart, that is, expanded over a greater range. The effect is to increase the visual contrast between two areas of different uniform densities. This enables the analyst to discriminate easily between areas initially having a small difference in density. Linear Contrast Stretch This is the simplest contrast stretch algorithm. The grey values in the original image and the modified image follow a linear relation in this algorithm. A density number in the low range of the original histogram is assigned to extremely black and a value at the high end is assigned to extremely white. The remaining pixel values are distributed linearly between these extremes. The features or details that were obscure on the original image will be clear in the contrast stretched image. Linear contrast stretch operation can be represented graphically as shown in Fig. 4. To provide optimal contrast and colour variation in colour composites the small range of grey values in each band is stretched to the full brightness range of the output or display unit. Non-Linear Contrast Enhancement In these methods, the input and output data values follow a non-linear transformation. The general form of the non-linear contrast enhancement is defined by y = f (x), where x is the input data value and y is the output data value. The non-linear contrast enhancement techniques have been found to be useful for enhancing the colour contrast between the nearly classes and subclasses of a main class. A type of non linear contrast stretch involves scaling the input data logarithmically. This enhancement has greatest impact on the brightness values found in the darker part of histogram. It could be reversed to enhance values in brighter part of histogram by scaling the input data using an inverse log function. Histogram equalization is another non-linear contrast enhancement technique. In this technique, histogram of the original image is redistributed to produce a uniform population density. This is obtained by grouping certain adjacent grey values. Thus the number of grey levels in the enhanced image is less than the number of grey levels in the original image. SPATIAL FILTERING
A characteristic of remotely sensed images is a parameter called spatial frequency defined as number of changes in Brightness Value per unit distance for any particular part of an image. If there are very few changes in Brightness Value once a given area in an image, this is referred to as low frequency area. Conversely, if the Brightness Value changes dramatically over short distances, this is an area of high frequency. Spatial filtering is the process of dividing the image into its constituent spatial frequencies, and selectively altering certain spatial frequencies to emphasize some image features. This technique increases the analysts ability to discriminate detail. The three types of spatial filters used in remote sensor data processing are : Low pass filters, Band pass filters and High pass filters. Low-Frequency Filtering in the Spatial Domain Image enhancements that de-emphasize or block the high spatial frequency detail are lowfrequency or low-pass filters. The simplest low-frequency filter evaluates a particular input pixel brightness value, BVin, and the pixels surrounding the input pixel, and outputs a new brightness value, BVout , that is the mean of this convolution. The size of the neighbourhood convolution mask or kernel (n) is usually 3x3, 5x5, 7x7, or 9x9. The simple smoothing operation will, however, blur the image, especially at the edges of objects. Blurring becomes more severe as the size of the kernel increases. Using a 3x3 kernel can result in the low-pass image being two lines and two columns smaller than the original image. Techniques that can be applied to deal with this problem include (1) artificially extending the original image beyond its border by repeating the original border pixel brightness values or (2) replicating the averaged brightness values near the borders, based on the image behaviour within a view pixels of the border. The most commonly used low pass filters are mean, median and mode filters. High-Frequency Filtering in the Spatial Domain High-pass filtering is applied to imagery to remove the slowly varying components and enhance the high-frequency local variations. Brightness values tend to be highly correlated in a nine-element window. Thus, the highfrequency filtered image will have a relatively narrow intensity histogram. This suggests that the output from most high-frequency filtered images must be contrast stretched prior to visual analysis. Edge Enhancement in the Spatial Domain For many remote sensing earth science applications, the most valuable information that may be derived from an image is contained in the edges surrounding various objects of interest. Edge
enhancement delineates these edges and makes the shapes and details comprising the image more conspicuous and perhaps easier to analyze. Generally, what the eyes see as pictorial edges are simply sharp changes in brightness value between two adjacent pixels. The edges may be enhanced using either linear or nonlinear edge enhancement techniques. Linear Edge Enhancement A straightforward method of extracting edges in remotely sensed imagery is the application of a directional first-difference algorithm and approximates the first derivative between two adjacent pixels. The algorithm produces the first difference of the image input in the horizontal, vertical, and diagonal directions. The Laplacian operator generally highlights point, lines, and edges in the image and suppresses uniform and smoothly varying regions. Human vision physiological research suggests that we see objects in much the same way. Hence, the use of this operation has a more natural look than many of the other edge-enhanced images. Band ratioing Sometimes differences in brightness values from identical surface materials are caused by topographic slope and aspect, shadows, or seasonal changes in sunlight illumination angle and intensity. These conditions may hamper the ability of an interpreter or classification algorithm to identify correctly surface materials or land use in a remotely sensed image. Fortunately, ratio transformations of the remotely sensed data can, in certain instances, be applied to reduce the effects of such environmental conditions. In addition to minimizing the effects of environmental factors, ratios may also provide unique information not available in any single band that is useful for discriminating between soils and vegetation.
Chapter 3 Literature Review and History The _rst section in this chapter describes the work that is related to the topic. Many papers use the same image segmentation techniques for di_erent problems. This section explains the methods, discussed in this thesis, used by researchers to solve similar problems. The subsequent section describes the workings of the common methods of image segmentation. These methods were investigated in this thesis and are also used in other papers. They include techniques like Active Shape Models, Active Contour/Snake Models, Texture analysis, edge detection and some methods that are only relevant for the X-ray data. 3.1 Previous Research 3.1.1 Summary of Previous Research According to [14], compared to other areas in medical imaging, bone fracture detection is not well researched and published. Research has been done by the National University of Singapore to segment and detect fractures in femurs (the thigh bone).
[27] uses modi_ed Canny edge detector to detect the edges in femurs to separate it from the X-ray. The X-rays were also segmented using Snakes or Active Contour Models (discussed in 3.4) and Gradient Vector Flow. According to the experiments done by [27], their algorithm achieves a classi_cation with an accuracy of 94.5%. Canny edge detectors and Gradient Vector Flow is also used by [29] to _nd bones in X-rays. [31] proposes two methods to extract femur contours from X-rays. The _rst is a semi-automatic method which gives priority to reliability and accuracy. This method tries to _t a model of the femur contour to a femur in the X-ray. The second method is automatic and uses active contour models. This method breaks down the shape of the femur into a couple of parallel, or roughly parallel lines and a circle at the top representing the head of the femur. The method detects the strong edges in the circle and locates the turning point using the point of in_ection in the second derivative of the image. Finally it optimizes the femur contour by applying shapeconstraints to the model. Hough and Radon transforms are used by [14] to approximate the edges of long bones. [14] also uses clustering-based algorithms, also known as bi-level or localized thresholding methods and the global segmentation algorithms to segment X-rays. Clustering-based algorithms categorize each pixel of the image as either a part of the background or as a part of the object, hence the name bi-level thresholding, based on a speci_ed threshold. Global segmentation algorithms take the whole image into consideration and sometimes work better than the clustering-based algorithms. Global segmentation algorithms include methods like edge detection, region extraction and deformable models (discussed in 3.4). Active Contour Models, initially proposed by [19], fall under the class of deformable models and are used widely as an image segmentation tool. Active Contour Models are used to extract femur contours in X-ray images by [31], after doing edge detection on the image using a modi_ed Canny _lter. Gradient Vector Flow is also used by [31] to extract contours and the results are compared to that of the Active Contour Model. [3] uses an Active Contour Model with curvature constraints, to detect femur fractures, as the original Active Contour Model is susceptible to noise and other undesired edges. This method successfully extracts the femur contour with a small restriction on shape, size and orientation of the image. Active Shape Models, introduced by Cootes and Taylor [9], is another widely used statistical model for image segmentation. Cootes and Taylor, and their colleagues
[5, 6, 7, 11, 12, 10], released a series of papers that completed the de_nition of the original ASMs by modifying it, also called classical ASMs by [24]. These papers investigated the performance of the model with gray-level variation, di_erent resolutions and made the model more _exible and adaptable. ASMs are used by [24] to detect facial features. Some modi_cations to the original model were suggested and experimented with. The relationships between landmark points, computing time and the number of images in the training data were observed for di_erent sets of data. The results in this thesis are compared to the results in [24]. The work done in this thesis is similar to [24] as the same model is used for a di_erent application. [18] and [1] analyzed the performance of ASMs using the aspects of the de_nition of the shape and the gray level analysis of grayscale images. The data used was facial data, from a face database and it was concluded that ASMs are an accurate way of modeling the shape and gray level appearance. It was observed that the model allows for _exibility while being constrained on the shape of the object to be segmented. This is relevant for the problem of bone segmentation as X-rays are grayscale and the structure and shape of bones can di_er slightly. The _exibility of the model will be useful for separating bones from X-rays even though one tibia bone di_ers from another tibia bone. The //?working mechanisms of the methods discussed above are explained in detail in/
3.1.2 Common Limitations of the Previous Research As mentioned in previous chapters, bone segmentation and fracture detection are both complicated problems. There are many limitations and problems in the segmentation methods used. Some methods and models are too limited or constrained to match the bone accurately. Accuracy of results and computing time are conflict- ing variables. It is observed in [14] that there is no automatic method of segmenting bones. [14] also recognizes the need for good initial conditions for Active Contour Models to produce a good segmentation of bones from X-rays. If the initial conditions are not good, the final results will be inaccurate. Manual definition of the initial conditions such as the scaling or orientation of the contour is needed, so the process is not automatic. [14] tries to detect fractures in long shaft bones using Computer Aided Design (CAD) techniques. The tradeo_ between automizing the algorithm and the accuracy of the results, using the Active Shape and Active Contour Models, is examined in [31]. If the model is made fully automatic,
by estimating the initial conditions, the accuracy will be lower than when the initial conditions of the model are defined by user inputs. [31] implements both manual and automatic approaches and identifies that automatically segmenting bone structures from noisy X-ray images is a complex problem. This thesis project tackles these limitations. The manual and automatic approaches are tried using Active Shape Models. The relationship between the size of the training set, computation time and error are studied.
3.2 Edge Detection Edge detection falls under the category of feature detection of images which includes other methods like ridge detection, blob detection, interest point detection and scale space models. In digital imaging, edges are de_ned as a set of connected pixels that lie on the boundary between two regions in an image where the image intensity changes, formally known as discontinuities [15]. The pixels, or a set of pixels, that form the edge are generally of the same or close to the same intensities. Edge detection can be used to segment images with respect to these edges and display the edges separately [26][15]. Edge detection can be used in separating tibia bones from X-rays as bones have strong boundaries or edges. Figure 3.1, is an example of basic edge detection in images. 3.2.1 Sobel Edge Detector The Sobel operator, used to do the edge detection, calculates the gradient of the image intensity at each pixel. The gradient of a 2D image is a 2D vector with the partial horizontal and vertical derivatives as its components. The gradient vector can also be seen as a magnitude and an angle. If Dx and Dy are the derivatives in the x and y direction respectively, equations 3.1 and 3.2 show the magnitude and angle(direction) representation of the gradient vector, rD. It is a measure of the rate of change in an image, from light to dark pixel in case of grayscale images, at every point. At each point in the image, the direction of the gradient vector shows the direction of the largest increase in the intensity of the image while the magnitude of the gradient vector denotes the rate of change in that direction [15][26]. This implies that the result of the Sobel operator at an image point which is in a region of constant image intensity is a zero vector and at a point on an edge is a vector which points across the edge, from darker to
brighter values. Mathematically, Sobel edge detection is implemented using two 3*3 convolution masks or kernels, one for horizontal direction and the other for vertical direction in an image, that approximate the derivative in the horizontal and vertical directions. The derivatives in the x and y directions are calculated by 2D convolution of the original image and the convolution masks. If A is the original image and Dx and Dy are the derivatives in the x and y direction respectively, equations 3.3 and 3.4 show how the directional derivatives are calculated [26]. The matrices are a representation of the convolution kernels that are used. 3.2.2 Prewitt Edge Detector The Prewitt edge detector is similar to the Sobel detector because it also approximates the derivatives using convolution kernels to find the localized orientation of each pixel in an image. The convolution kernels used in Prewitt are different from those in Sobel. Prewitt is more prone to noise than Sobel as it does not give weighting to the current pixel while calculating the directional derivative at that point [15][26]. This is the reason why Sobel has a weight of 2 in the middle column and Prewitt has a 1 [26]. The equations 3.5 and 3.6 show the difference between the Prewitt and Sobel detectors by giving the kernels for Prewitt. The same variables, as in the Sobel case, are used. The kernels to calculate the directional derivatives are different. 3.2.3 Roberts' Edge Detector The Roberts edge detectors, also known as the Roberts' Cross operator, finds edges by calculating the sum of the squares of the differences between diagonally adjacent pixels [26][15]. So in simple terms, it calculates the magnitude between the pixel in question and its diagonally adjacent pixels. It is one of the oldest methods of edge detection and its performance decreases if the images are noisy. But this method is still used as it is simple, easy to implement and its faster than other methods. The implementation is done by convolving the input image with 2 * 2 kernels: 3.2.4 Canny Edge Detector Canny edge detector is considered as a very effective edge detecting technique as it detects faint edges even when the image is noisy. This is because in the beginning of the process, the data is convolved with a Gaussian filter. The Gaussian filtering results in a blurred image so the output of the filter does not depend on a single noisy pixel, also known as an outlier. Then the gradient of the image is calculated, same as in other filters like Sobel and Prewitt. Non-maximal suppression is applied after the gradient so that the pixels that are below a certain threshold are suppressed. A multi-level thresholding technique, same as the example in 2.4, involving two levels is then used on the data. If the pixel value is less than the lower
threshold, then it is set to 0 and if its greater than the higher threshold then it is set to 1. If a pixel falls in between the two thresholds and is adjacent or diagonally adjacent to a high-value pixel, then it is set to 1. Otherwise it is set to 0 [26]. Figure 3.5, shows the X-ray image and the image after Canny edge detection.
3.3 Image Segmentation 3.3.1 Texture Analysis Texture analysis attempts to use the texture of the image to analyze it. Texture analysis attempts to quantify the visual or other simple characteristics so that the image can be analyzed according to them [23]. For example, the visible properties of an image like the roughness or the smoothness can be converted into numbers that describe the pixel layout or brightness intensity in the region in question. In the bone segmentation problem, image processing using texture can be used as bones are expected to have more texture than the mesh. Range filtering and standard deviation filtering were the texture analysis techniques used in this thesis. Range filtering calculates the local range of an image
3. Principal curvature-based Region Detector 3.1. Principal Curvature Image Two types of structures have high curvature in one direction and low curvature in the orthogonal direction: lines (i.e., straight or nearly straight curvilinear features) and edges. Viewing an image as an intensity surface, the curvilinear structures correspond to ridges and valleys of this surface. The local shape characteristics of the surface at a particular point can be described by the Hessian matrix, H(x, D) = Ixx(x, D) Ixy(x, D) Ixy(x, D) Iyy(x, D) , (1) where Ixx, Ixy and Iyy are the second-order partial derivatives of the image evaluated at the point x and D is the
Gaussian scale of the partial derivatives. We note that both the Hessian matrix and the related second moment matrix have been applied in several other interest operators (e.g., the Harris [7], Harris-affine [19], and Hessian-affine [18] detectors) to find image positions where the local image geometry is changing in more than one direction. Likewise, Lowes maximal difference-of-Gaussian (DoG) detector [13] also uses components of the Hessian matrix (or at least approximates the sum of the diagonal elements) to find points of interest. However, our PCBR detector is quite different from these other methods and is complementary to them. Rather than finding extremal points, our detector applies the watershed algorithm to ridges, valleys, and cliffs of the image principal-curvature surface to find regions. As with extremal points, the ridges, valleys, and cliffs can be detected over a range of viewpoints, scales, and appearance changes. Many previous interest point detectors [7, 19, 18] apply the Harris measure (or a similar metric [13]) to determine a points saliency. The Harris measure is given by det(A) k tr2(A) > threshold where det is the determinant, tr is the trace, and the matrix A is either the Hessian matrix or the second moment matrix. One advantage of the Harris metric is that it does not require explicit computation of the eigenvalues. However, computing the eigenvalues for a 22 matrix requires only a single Jacobi rotation to eliminate the off-diagonal term, Ixy , as noted by Steger [25]. The Harris measure produces low values for long structures that have a small first or second derivative in one particular direction. Our PCBR detector compliments previous interest point detectors in that we abandon the Harris measure and exploit those very long structures as detection cues. The principal curvature image is given by either P (x) =max(1(x), 0) (2) or P (x) =min(2(x), 0) (3) where 1(x) and 2(x) are the maximum and minimum eigenvalues, respectively, of H at x. Eq. 2 provides a high response only for dark lines on a light background (or on the dark side of edges) while Eq. 3 is used to detect light lines against a darker background. Like SIFT [13] and other detectors, principal curvature images are calculated in scale space. We first double the size of the original image to produce our initial image, I11, and then produce increasingly Gaussian smoothed images, I1j , with scales of = kj1 where k = 21/3 and j = 2..6. This set of images spans the first octave consisting of six images, I11 to I16. Image I14 is down sampled to half its size to produce image I21, which becomes the first image in the second octave. We apply the same smoothing process to build the second octave, and continue to create a total of n = log2(min(w, h)) 3 octaves, where w and h are the width
and height of the doubled image, respectively. Finally, we calculate a principal curvature image, Pij , for each smoothed image by computing the maximum eigenvalue (Eq. 2) of the Hessian matrix at each pixel. For computational efficiency, each smoothed image and its corresponding Hessian image is computed from the previous smoothed image using an incremental Gaussian scale. Given the principal curvature scale space images, we calculate the maximum curvature over each set of three consecutive principal curvature images to form the following set of four images in each of the n octaves: MP12 MP13 MP14 MP15 MP22 MP23 MP24 MP25 ... MPn2 MPn3 MPn4 MPn5 where MPij =max(Pij1, Pij , Pij+1). Figure 2(b) shows one of the maximum curvature images, MP , created by maximizing the principal curvature at each pixel over three consecutive principal curvature images. From these maximum principal curvature images we find the stable regions via our watershed algorithm. 3.2. EnhancedWatershed Regions Detections The watershed transform is an efficient technique that is widely employed for image segmentation. It is normally applied either to an intensity image directly or to the gradient magnitude of an image. We instead apply the watershed transform to the principal curvature image. However, the watershed transform is sensitive to noise (and other small perturbations) in the intensity image. A consequence of this is that the small image variations form local minima that result in many, small watershed regions. Figure 3(a) shows the over segmentation results when the watershed algorithm is applied directly to the principal curvature image in Figure 2(b)). To achieve a more stable watershed segmentation, we first apply a grayscale morphological closing followed by hysteresis thresholding. The grayscale morphological closing operation is defined as f b = (f b) b where f is the image MP from Eq. 4, b is a 5 5 disk-shaped structuring element, and and are the grayscale dilation and erosion, respectively. The closing operation removes small potholes in the principal curvature terrain, thus eliminating many local minima that result from noise and that would otherwise produce watershed catchment basins. Beyond the small (in terms of area of influence) local minima, there are other variations that have larger zones of influence and that are not reclaimed by the morphological closing. To further eliminate spurious or unstable watershed regions, we (4)
threshold the principal curvature image to create a clean, binarized principal curvature image. However, rather than apply a straight threshold or even hysteresis thresholdingboth of which can still miss weak image structureswe apply a more robust eigenvector-guided hysteresis thresholding to help link structural cues and remove perturbations. Since the eigenvalues of the Hessian matrix are directly related to the signal strength (i.e., the line or edge contrast), the principal curvature image may, at times, become weak due to low contrast portions of an edge or curvilinear structure. These low contrast segments may potentially cause gaps in the thresholded principal curvature image, which in turn cause watershed regions to merge that should otherwise be separate. However, the directions of the eigenvectors provide a strong indication of where curvilinear structures appear and they are more robust to these intensity perturbations than is the eigenvalue magnitude. In eigenvector-flow hysteresis thresholding, there are two thresholds (high and low) just as in traditional hysteresis thresholding. The high threshold (set at 0.04) indicates a strong principal curvature response. Pixels with a strong response act as seeds that expand to include connected pixels that are above the low threshold. Unlike traditional hysteresis thresholding, our low threshold is a function of the support that each pixels major eigenvector receives from neighboring pixels. Each pixels low threshold is set by comparing the direction of the major (or minor) eigenvector to the direction of the 8 adjacent pixels major (or minor) eigenvectors. This can be done by taking the absolute value of the inner product of a pixels normalized eigenvector with that of each neighbor. If the average dot product over all neighbors is high enough, we set the low-to-high threshold ratio to 0.2 (for a low threshold of 0.04 0.2 = 0.008); otherwise the low-to-high ratio is set to 0.7 (giving a low threshold of 0.028). The threshold values are based on visual inspection of detection results on many images. Figure 4 illustrates how the eigenvector flow supports an otherwise weak region. The red arrows are the major eigenvectors, and the yellow arrows are the minor eigenvectors. To improve visibility, we draw them at every fourth pixel. At the point indicated by the large white arrow, we see that the eigenvalue magnitudes are small and the ridge there is almost invisible. Nonetheless, the directions of the eigenvectors are quite uniform. This eigenvector-based active thresholding process yields better performance in building continuous ridges and in handling perturbations, which results in more stable regions (Fig. 3(b)). The final step is to perform the watershed transform on the clean binary image (Fig. 2(c)). Since the image is binary, all black (or 0-valued) pixels become catchment basins and themidlines of the thresholded white ridge pixels become watershed lines if they separate two distinct catchment basins. To define the interest regions of the PCBR detector in one scale, the
resulting segmented regions are fit with ellipses, via PCA, that have the same second-moment as the watershed regions (Fig. 2(e)). 3.3. Stable Regions Across Scale Computing the maximum principal curvature image (as in Eq. 4) is only one way to achieve stable region detections. To further improve robustness, we adopt a key idea from MSER and keep only those regions that can be detected in at least three consecutive scales. Similar to the process of selecting stable regions via thresholding in MSER, we select regions that are stable across local scale changes. To achieve this, we compute the overlap error of the detected regions across each triplet of consecutive scales in every octave. The overlap error is calculated the same as in [19]. Overlapping regions that are detected at different scales normally exhibit some variation. This variation is valuable for object recognition because it provides multiple descriptions of the same pattern. An object category normally exhibits large within-class variation in the same area. Since detectors have difficulty locating the interest area accurately, rather than attempt to detect the correct region and extract a single descriptor vector, it is better to extract multiple descriptors for several overlapping regions, provided that these descriptors are handled properly by the classifier. 2. BACKGROUND AND RELATED WORK Consider an RGB image of a passage in a painting consisting of open brush strokes, that is, where lower layer strokes are visible. The task of recovering layers of strokes involves mainly three steps: 1. Partition the image into regions with consistent colors/shapes, corresponding to di_erent layers of strokes 2. Identify the current top layer 3. inpaint the regions of the top layer. The three steps are repeated whenever there are more than two layers remained. 2.1 De-pict Algorithm Given an image as input, De-pict algorithm starts by applying k-means and complete-linkage as a clustering step to obtain chromatic consistent regions,. Under the assumption that brush strokes of the same layer of painting are of similar colors, such regions in different clusters are good representatives of brush strokes at different layers in the painting, as shown in Fig. 3c. Note that after the clustering step, each pixel of the image is assigned a label corresponding to
its assigned cluster and each label can be described by the mean chromatic feature vector. Then the top layer is identified by human experts based on visual occlusion cues etc. Ideally, this step should be fully automatic, but this step challenging is not the focus of our current work. Lastly, the regions of the top layer are removed and inpainted by k nearest-neighbor algorithm 3.1 Spatially coherent segmentation We improve the layer segmentation by incorporating k-means and spatial coherence regularity in an iterative E-M way.10, 11 We model the appearances of brush strokes of different layers by a set of feature centers (mean chromatic vectors as in k-means). In other words, we assume that each layer is modeled as independent Gaussians with same covariances that only differs in the means. Given the initial models i.e. the k mean chromatic vectors, we can refine the segmentation with spatial coherent priors by minimizing the following energy function (E-step) min L X p jjfp cLp jj22 +_ X fp;qg2N T jep;qj [Lp 6= Lq]; (1) where Lp 2 f1; :::; kg is cluster label of pixel p, fp is the color feature of pixel p, ci is the color model for clusters i, jep;qj is the edge length between p,q and T is the delta function. The first term in Eq(1) measures the appearance similarity between the pixels and the clusters they are assigned to. And the second term penalizes the situation where pixels in the neighborhood belong to different clusters. By _xing the k appearance models, the minimization problem can be solved with graph-cut algorithm.12 The solution gives us under spatial regularization the optimal labeling of pixels to different clusters. After spatial coherent refinement, we can re-estimate k models as the mean chromatic vectors (M-step). Then we iterate the E and M step until convergence or a predefined number of iterations is reached. 3.2 Curvature-based inpainting
Unlike exemplar-based inpainting method, curvature-based inpainting methods focus on reconstructing the geometric structure of (chromatic) intensities which is usually represented by level lines.5, 7 Here, level lines can be contours that connect pixels of the same gray/chromatic intensity in an image. Therefore, such methods are well-suited for inpainting on images with no or very few textures, due to the fact that level lines capture concisely the structure and information of the texture-less regions. For van Gogh's painting, the brush strokes at each layer are close to textureless. Therefore, curvature-based inpainting can be superior to exemplar-based methods (for instance in De-pict and Criminisi et al.3) for recovering the structures of underlying brush strokes. In this paper, we evaluate the recent method proposed by Schoenemann et al.7 that formulates the curvature- based inpainting as a linear program. Unlike other methods, this method is independent of initialization andcan handle general inpainting regions e.g. regions with holes. In the following, we briey review Schoenemann et al.'s method in details. To formulate the problem as linear program, in this approach, curvature is modeled in a discrete sense (where a possible reconstruction of the level line with intensity 100 is shown). Specifically, we impose an discrete grid of certain connectivity (8-connectivity in Fig. 4) on the image. The edges constitute line segments and line segment pairs that are used to represent level lines. And the basic regions represent the pixels. Then, for each potential discrete level line, the curvature is approximated by the sum of angle changes at all vertices along the level line with proper weighting of the edge length. To ensure that regions and the level lines are consistent (for instance level lines should be continuous), two sets of linear constraints i.e. surface continuation constraints and boundary continuation constraints are imposed on the variables. Finally, the boundary condition (the intensities of the boundary pixels) of the damage region can also be easily formulated as linear constraints. With proper handling all these constraints, the inpainting problem can be solved as linear program. To handle color images, we simply formulate and solve a linear program for each chromatic channel independently. II. MATERIALS AND METHODS A. Data Retrieval In this study, data was collected from Dr. Siyami Ersek thoracic and cardiovascular surgery training and research hospital. All pulmonary computed tomographic angiography exams performed with 16 detectors CT (Somatom Sensation 16, Siemens, AG, Erlanger, Germany) equipment. Patients were informed about the examination and also for breath holding. Imaging performed with Bolus tracking program. After scenogram, single slice is taken at the level of
pulmonary truncus. A bolus tracking is placed at pulmonary truncus and trigger is adjusted to 100 HU (Hounsfield Unit). 70ml nonionic contrast agent at the rate of 4mL/sec with an automated syringe (Optistat Contrast Delivery System, Liebel-Flarsheim, USA) is used. When opacification is reached at the pre-adjusted level exam performed from the supraclavicular region to the diaphragms. Contrast injection performed via 18-20G intra venous cannula that was placed at antecubital vein. Scaning parameters were 120 kV, 80- 120 mA, slice thickness 1 mm, pitch 1.0-1.2. Images reconstructed with 1mm and 5mm thickness, and evaluated at mediastinal window (WW 300, WL 50) with advanced workstation (Wizard, Siemens, AG, Erlanger, Germany) in coronal sagittal and axial planes. Oblique plans used if needed. Each exam consists of 400-500 images with 512x512 resolution. B. Method The stages, which have been followed while doing lung segmentation from CTA images at this work, are shown in figure 1.
CTA images, which are in hands, are 250 as being 2D. The first step is thresholding the image. A thoracic CT contains two main groups of pixels: 1) highintensity pixels located in the body (body pixels), and 2) lowintensity pixels that are in the lung and the surrounding air (non body pixels). Due to the large difference in intensity between these two groups, thresholding leads to a good separation. In this study, thresholding has been tried out for the first time in a way that contains bigger parts than 700 HU. At the end of thresholding the new images are going to be in logical value. Thresh=image>700;
In each of these new images, subsegment vessels exist in lung region. At the second step, this method has been used to get rid of these vessels: firstly each of 2D images has been considered one by one and each of components in the image have labeled with connected component labelling algorithm. Then, looking at the size of each labeled piece, items, whose pixel numbers are under 1000, were removed from the image. Figure 3.
Next, the image in Figure 3 has been labeled with connected component labeling algorithm. The biggest size which is logical 1 is the patients body. This biggest size has been taken and the other parts have been removed from the image. And then the opposite of it has been gotten. So all of 0 turn into 1 and all of 1 turn into 0 Figure 4. As the 1. or 512. pixels of the parts out of the body in the image which shown in Figure 4 is going to be logical 1, the parts that achieve this condition have been removed and lung and airway are appeared like in Figure 5 Fig.5 segmentation of lung and airway Due to the fact that airway in Figure 5 is going to be very small compared to the lung size, each of images have been labeled with connected component labeling algorithm and the component whose number of pixels are below 1000 have been determined as airways and then removed from the image. The last image in hand is the segmented form of target lung. Before airways removed, finding the edges of the image with sobel algoritm, it has been gathered to original image and
the edges of lung and airway region have been shown in the original image Figure 6 (b). Also, by multiplying defined lung region with the original CTA lung image, original segmented lung image has been carried out. Figure 6 (c).
MATLAB MATLAB and the Image Processing Toolbox provide a wide range of advanced image processing functions and interactive tools for enhancing and analyzing digital images. The interactive tools allowed us to perform spatial image transformations, morphological operations such as edge detection and noise removal, region-of-interest processing, filtering, basic statistics, curve fitting, FFT, DCT and Radon Transform. Making graphics objects semitransparent is a useful technique in 3-D visualization which furnishes more information about spatial relationships of different structures. The toolbox functions implemented in the open MATLAB language has also been used to develop the customized algorithms. MATLAB is a high-level technical language and interactive environment for data analysis and mathematical computing functions such as: signal processing, optimization, partial differential equation solving, etc. It provides interactive tools including: threshold, correlation, Fourier analysis, filtering, basic statistics, curve fitting, matrix analysis, 2D and 3D plotting functions. The operations for image processing allowed us to perform noise reduction and image enhancement, image transforms, colormap manipulation, colorspace conversions, region-of interest processing, and geometric operation [4]. The toolbox functions implemented in the open MATLAB language can be used to develop the customized algorithms.
An X-ray Computed Tomography (CT) image is composed of pixels, whose brightness corresponds to the absorption of X-rays in a thin rectangular slab of the cross-section, which is called a voxel [1,3]. The Pixel Region tool provided by MATLAB 7.0.1 superimposes the pixel region rectangle over the image displayed in the Image Tool, defining the group of pixels that are displayed, in extreme close-up view, in the Pixel Region tool window. The Pixel Region tool shows the pixels at high magnification, overlaying each pixel with its numeric value [2,5]. For RGB images, we find three numeric values, one for each band of the image. We can also determine the current position of the pixel region in the target image by using the pixel information given at the bottom of the tool. In this way we found the x- and y-coordinates of pixels in the target image coordinate system. The Adjust Contrast tool displays a histogram which represents the dynamic range of the X-ray CT image (Figure.1.).
Figure 1. Pixel Region of an X-ray CT scan and the Adjust Contrast Tool The Image Processing Toolbox provide a reference-standard algorithms and graphical tools for image analysis tasks including: edge-detection and image segmentation algorithms, image transformation, measuring image features, and statistical functions such as mean, median standard deviation, range, etc., (Figure. 2.
3. PLOT TOOLS MATLAB provides a collection of plotting tools to generate various types of graphs, displaying the image histogram or plotting the profile of intensity values (Fig. 3.a,b). Figure
Figure 3. a. - The Histogram of X-ray CT image and the plot fits (significant digits: 2). A cubic fitting function is the best-fit model for histogram data plot. The fit curve was plotted as a magenta line through the data plot. Area Graph of X-ray CT brain scan displays the elements in a variable as one or more curves and fills the area beneath each curve
Figure 3.b. Area Graph of X-ray CT brain scan \The 3-D Surface Plot generates a matrix as a surface (Figures 4. a, b, c, d). We can also make the faces of a surface transparent to a varying degree. Transparency (referred to as the alpha value) can be specified for the whole 3D-object or can be based on an alphamap, which behaves in a way analogous to colormaps (Figures 4. a, b).
Figure 4.a. 3D Surface Plot of x-ray CT brain scan generated with histogram values, alpha(.0)
Figure 4.b. - 3D Surface Plot of x-ray CT brain scan generated with histogram values, alpha(.4) The meshgrid function is extremely useful for computing a function of two Cartesian coordinates. It transforms the domain specified by a single vector or two vectors x and y into matrices X and Y for use in evaluating functions of two variables. The rows of X are copies of the vector x and the columns of Y are copies of the vector y (Figure 4.c).
Figure 4.c. - 3D Surface Plot of x-ray CT brain scan generated with histogram values, mesh 3-D Surface Plot with Contour (Surfc) displays a matrix as a surface with contour plot below. Lighting is the technique of illuminating an object with a directional light source. This technique can make subtle differences in surface shape easier to see. Lighting processing can also be used to add realism to three-dimensional graphs. This example uses the same surface as the previous examples, but colors it yellow and removes the mesh lines (Figure 4.d.).
Figure 4.d. Surface Plot of x-ray CT brain scan generated with histogram values, lightening The Image creates an X-ray CT image graphics object by interpreting each element in a matrix as an index into the figure's colormap or directly as RGB values, depending on the data specified (Figure 5.a.). The Image with Colormap Scaling (imagesc function) displays an X-ray CT image and scale to use full colormap. MATLAB supports a number of colormaps. A colormap is an m-by3 matrix of real numbers between 0.0 and 1.0. Each row is an RGB vector that defines one color. ,,Jet ranges from blue to red, and passes through the colors cyan, yellow, and orange. It is a variation of the hsv (hue, saturation, value) colormap (Figure 5.b.).
Contour Plot is useful for delineating organ boundaries in images. It displays isolines of a surface represented by a matrix (Figure 6.). For example: Figure
Figure 6. - Contour Plot of X-ray CT brain scan The ezsurfc(f) or surfc function creates a graph of f(x,y), where f is a string that represents a mathematical function of two variables, such as x and y (Figure 7.).
Figure 7 Surfc on X-ray CT brain scan
The contour3 function creates a three-dimensional contour plot of a surface defined on a rectangular grid (Figure 8.).
Figure 8. Contour3 on X-ray CT brain scan 3-D Lit Surface Plot (Surface plot with colormap-based lighting, surfl function) displays a shaded surface based on a combination of ambient, diffuse, and specular lighting models (Figure 9.).
Figure 9. - 3D Lit Surface Plot of X-ray CT brain scan The 3-D Ribbon Graph of Matrix displays a matrix by graphing the columns as segmented strips (Figure 10.).
Figure 10. - The 3-D Ribbon Graph of X-ray CT brain scan 4. FILTER VISSUALIZATION TOOL (FVTool) Filter Visualization Tool (FVTool) computes the magnitude response of the digital filter defined with numerator, b and denominator, a. By using FVTool we can display the phase response, group delay response, impulse response, step response, pole/zero plot, filter coefficients and round-off noise power spectrum.(Figures: 11, 12, 13, 14, 15, 16 and 17).
Figure 11. Magnitude and Phase Response - Frequency scale: a) linear, b) log
Figure 12. Group Delay Response - Frequency scale: a) linear, b) log
Figure 13. Phase Delay Response - Frequency scale: a) linear, b) log
Figure 14. (a) Impulse Response (b) Pole/Zero Plot
Figure 15. Step Response (a) Default (b) Specify Length: 50
Figure 16. Magnitude Response Estimate - Frequency scale: a) linear, b) log
Figure 17. Magnitude Response and Round-off Noise Power Spectrum - Frequency scale: a) linear, b) log
Chapter 4 This chapter describes the workings of a typical ASM. Although there are many ex- tensions and modi_cations made, the basic ASM model work the same way. Cootes and Taylor [9], gives a complete description of the classical ASM. Section 4.1 introduces shapes and shape models in general. Section 4.2 describes the workings and the components of the ASM. The parameters and variations that affect the performance of the ASM are explained in Section 4.3. The experiments that are performed in this thesis to improve the performance of the model are also described in this section. The problem of initialization of the model in a test image is tackled in Section 4.4. Section 4.5 elaborates on the training of the ASM and the definition of an error function. The performance of the ASM on bone X-rays will be judged according to this error function.
4.1 Shape Models A shape is a collection of points. As shown in Figure 4.1, a shape can be represented by a diagram showing the points or as a n _ 2 array where the n rows represent the number of points and the two columns represent the x and y co-ordinates of the points respectively. In this thesis and in the code used, a shape will be defined as a 2n _ 1 vector where the y co-ordinates are enlisted after the x co-ordinates as shown in 4.1c. A shape is the basic block of any ASM as it stays the same even if it is scaled, rotated or translated. The lines connecting the points are not part of the shape but they are shown to make the shape and order of the points more clear [24].
Figure 4.1: Example of a shape
The distance between two points is the Euclidean distance between them. Equation 4.1, gives the formula for Euclidean distance between two points (x1; y1) and x2; y2. The distance between two shapes can be de_ned as the distance between their corresponding points [24]. There are other ways of de_ning distances between two points like the Procrustes distance but in this thesis, the distance means the Euclidean distance. (y2 - y1)2 + (x2 - x1)2 The centroid x of a shape x can be de_ned as the mean of the point positions [24]. The centroid can be useful while aligning shapes or _nding an automatic initialization technique (discussed in 4.4). The size of the shape is the root mean distance between the points and the centroid. This can be used in measuring the size of the test image which will help with the automatic initialization (discussed in 4.4). Algorithm 1 Aligning shapes Input set of unaligned shapes 1. Choose a reference shape (usually the 1st shape) 2. Translate each shape so that it is centered on the origin 3. Scale the reference shape to unit size. Call this shape x0, the initial mean shape. 4. repeat (a) Align all shapes to the mean shape (b) Recalculate the mean shape from the aligned shapes (c) Constrain the current mean shape (align to x0, scale to unit size) 5. until convergence (i.e. mean shape does not change much) output set of aligned shapes, and mean shape 4.2 Active Shape Models The ASM has to be trained using training images. In this project, the tibia bone was separated from a full-body X-ray (as shown in 1.2) and then those images were
re-sized to the same dimensions. This ensured uniformity in the quality of data being used. The training on the images was done by manually selecting landmarks. Landmarks were placed at approximately equal intervals and were distributed uniformly over the bone boundary. Such images are called hand annotated or manually landmarked training images. Figure 4.3 shows the original image and the manually landmarked image for training. While performing tests using different number of landmark points, a subset of these landmarks points is chosen. After the training images have been landmarked, the ASM produces two types of sub-models [24]. These are the profile model and the shape model. 1. The profile model analyzes the landmark points and stores the behaviour of the image around the landmark points. So during training, the algorithm learns the characteristics of the area around the landmark points and builds a profile model for each landmark point accordingly. When searching for the shape in the test image, the area near the tentative landmarks is examined and the model moves the shape to an area that fits closely to the profile model. The tentative location of the landmarks is obtained from the suggested shape. 2. The shape model defines the permissible relative positions of landmarks. This introduces a constraint on the shape. So as the profile model tries to find the area in the test image that tries to fit the model, the shape model ensures that the mean shape is not changed. The profile model acts on individual landmarks whereas the shape acts globally on the image. So both the models try to correct each other until no further improvements in matching are possible.
4.2.1 The ASM Model The aim of the model is to try to convert the shape proposed by the individual profiles into an allowable shape. So it tries to find the area in the image that closely matches the profiles of the individual landmarks, while keeping the overall shape constant. The shape is learnt from manually landmarked training images. These images are aligned and a mean shape is formulated with the permissible variations in it [24],
^x = x + b where ^x is the generated shape vector by the model. x is the mean shape, the average of the aligned training shapes xi, 4.2.2 Generating shapes from the model As seen in Equation 4.3, different shapes can be generated by changing the value of b. The model is varied in height and width, finding optimum values for landmarks. Figure 4.4 shows the mean shape and its whisker profiles superimposed on the bone X-ray image. The points that are perpendicular to the model are called _whiskers and they help the profile model in analyzing the area around the landmark points. The shape created by the landmark points are used for the shape model and the whisker profiles around the landmark points are used for the profile model. A profile and a covariance matrix is built for each landmark. It is assumed that the profiles are distributed as a multivariate Gaussian and so they can be described by their mean pro_le g and the covariance matrix Sg 4.2.3 Searching the test image After the training is over, the shape is searched in the test image. The mean shape calculated from the training images is imposed on the image and the profiles around the landmark points are search and examined. The profiles are offset 3 pixels along the whisker, which is perpendicular to the shape, to get the accurate area that closely resembles the mean shape [24]. The distance between the test profile g and the mean profile g is calculated using the Mahalanobis distance given by If the model is initialized correctly (discussed in 4.4), one of the profiles will have the lowest distance. This procedure is done for every landmark point and then the shape model confirms that the shape is the same as the mean shape. The shape model assures that the pro_le model has not changed the shape. If the shape model were not employed, the pro_le model may give the best pro_le results but the resulting shape may be completely di_erent. So, as mentioned before, the two models restrict each other. A multi-resolution search is done to make the model more robust. This enables the model to be more accurate as it can lock on to the shape from further away. So the model searches over a series of di_erent resolutions of the same image, called an image pyramid. The resolutions of the images can be set and changed in the algorithm [17, 24]. Figure 4.5 shows a sample image pyramid. The sizes of
the images are given, relative to the _rst image. A general picture, and not a bone 32 4.3 Parameters and Variations The performance of the ASM can be enhanced using optimizing the parameters that it depends on. Number of landmark points and number of training images are investigated in this thesis. The number of landmark points is an important variable that a_ects the ASM. The pro_le model of the ASM works with these landmark points to create pro_les. So the position of landmark points is as important as the number of landmark points. In the training images, landmark points are equally spaced along the boundary of the bone. Images are landmarked with 60 points and subsets of these points are chosen to conduct experiments. The impact of the number of landmark points on computing time and the mean error (defined in Section 4.5) is tested by running the algorithm with a different number of landmarks. As the number of landmark points is increased it is expected that the computing time increases and the error decreases. The results are explained in chapter5. A training set of images is used to train the ASM. As the number of training images increases, the model becomes more robust and intelligent. The computing time is expected to increase as it will take time to train and create profile models for each image. However, as the number of training images increases the mean profile and the model performs better so the error is expected to decrease. The model in this thesis has 12 images, 11 are used to train the ASM and 1 is used as a test image. gives an overview of the ASM. Figure 4.6a. shows the unaligned shape learnt from the training images. displays the aligned shapes. 4.4 Initialization Problem The Active Shape Model locks on to the shape learnt from the training images into the test image. It creates a mean shape pro_le from all the training images using landmark points. But the ASM starts of where the mean shape is located, but it may not be near the bone on a test image. So the model needs to be initialized or started somewhere close to the bone boundary in the test image. Experiments were conducted to see the effect of initialization on the error and the tracking of the shape. It was observed that if the initialization is poor, which means that the mean shape starts away from the bone in test X-ray, the model does not lock on to the bone. The shape and profile models fail to perform as the profile model looks for regions similar to those of the training images in the regions away from the bone. So it is unable to find the bone
as it is looking in a different region altogether. The error increases considerably if the mean shape is 40-50 pixels away from the bone in the test image. Figure 4.7a. shows the initialization. The pink contour is the mean shape and it starts away from the bone, so the result is a poor tracking of the bone.
Chapter 5
OUTPUT SCREENS
REFERENCES [1] H. Asada andM. Brady. The curvature primal sketch. PAMI, 8(1):214, 1986. 2 [2] A. Baumberg. Reliable feature matching across widely separated views. CVPR, pages 774 781, 2000. 2 [3] P. Beaudet. Rotationally invariant image operators. ICPR, pages 579583, 1978. 2 [4] J. Canny. A computational approach to edge detection. PAMI, 8:679698, 1986. 2 [5] R. Deriche and G. Giraudon. A computational approach for corner and vertex detection. IJCV, 10(2):101124, 1992. 2
[6] T. G. Dietterich. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10(7):18951924, 1998. 6 [7] C. Harris and M. Stephens. A combined corner and edge detector. Alvey Vision Conf., pages 147151, 1988. 2, 3 [8] F. Jurie and C. Schmid. Scale-invariant shape features for recognition of object categories. CVPR, 2:9096, 2004. 1, 2 [9] T. Kadir and M. Brady. Scale, saliency and image description. IJCV, 45(2):83105, 2001. 2 [10] N. Landwehr, M. Hall, and E. Frank. Logistic model trees. Machine Learning, 59(12):161205, 2005. 6, 7 [11] T. Lindeberg. Feature detection with automatic scale selection. IJCV, 30(2):79116, 1998. 2 [12] T. Lindeberg and J. Garding. Shape-adapted smoothing in estimation of 3-d shape cues from affine deformations of local 2-d brightness structure. Image and Vision Computing, pages 415434, 1997. 2 [13] D. G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91 110, 2004. 2, 3 [14] G. Loy and J.-O. Eklundh. Detecting symmetry and symmetric constellations of features. ECCV, pages 508521, 2006. 7 [15] J. Matas, O. Chum, M. Urban, and T. Pajdla. Robust widebaseline stereo from maximally stable extremal regions. Image and Vision Computing, 22(10):761767, 2004. 2 [16] G. Medioni and Y. Yasumoto. Corner detection and curve representation using cubic bsplines. CVGIP, 39:267278, 1987. 2 [17] K. Mikolajczyk and C. Schmid. An affine invariant interest point detector. ECCV, 1(1):128142, 2002. 2 [18] K. Mikolajczyk and C. Schmid. Scale and affine invariant interest point detectors. IJCV, 60(1):6386, 2004. 2, 3 [19] K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. V. Gool. A comparison of affine region detectors. IJCV, 2005. 2, 3, 4, 5 [20] F. Mokhtarian and R. Suomela. Robust image corner detection through curvature scale space. PAMI, 20(12):1376 1381, 1998. 2 [21] H. Moravec. Towards automatic visual obstacle avoidance. International Joint Conf. on Artificial Intelligence, page 584, 1977. 2 [22] A. Opelt, M. Fussenegger, A. Pinz, and P. Auer. Weak hypotheses and boosting for generic object detection and recognition. ECCV, pages 7184, 2004. 5, 6, 7
[23] E. Shilat, M. Werman, and Y. Gdalyahu. Ridges corner detection and correspondence. CVPR, pages 976981, 1997. [24] S. Smith and J. M. Brady. Susana new approach to low level image processing. IJCV, 23(1):4578, 1997. [25] C. Steger. An unbiased detector of curvilinear structures. PAMI, 20(2):113125, 1998. 1, 3 [26] T. Tuytelaars and L. V. Gool. Wide baseline stereo matching based on local, affinely invariant regions. BMVC, pages 412 425, 2000. 2

Anu Document

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Anu Document

Uploaded by

Copyright:

Available Formats

1.

(a) The original image Figure 2.3: Histogram of image [23]

(b) The histogram of the image

Figure 7 Surfc on X-ray CT brain scan

Figure 12. Group Delay Response - Frequency scale: a) linear, b) log

Figure 13. Phase Delay Response - Frequency scale: a) linear, b) log

Figure 14. (a) Impulse Response (b) Pole/Zero Plot

Figure 15. Step Response (a) Default (b) Specify Length: 50

Figure 16. Magnitude Response Estimate - Frequency scale: a) linear, b) log

Figure 4.1: Example of a shape

You might also like