# Untangling Fibers by Quotient Appearance Manifold Mapping for

Download Untangling Fibers by Quotient Appearance Manifold Mapping for

## Preview text

Untangling Fibers by Quotient Appearance Manifold Mapping for Grayscale Shape Classiﬁcation

Yoshihisa Shinagawa Computer Aided Diagnosis Group, Siemens Healthcare

Siemens Medical Solutions USA, Inc. 51 Valley Stream Pkwy, Malvern, PA, 19355

Yuping Lin Computer Science Department University of Southern California

Abstract

Appearance manifolds have been one of the most powerful methods for object recognition. However, they could not be used for grayscale shape classiﬁcation, particularly in three dimensions, such as classifying medical lesion volumes or galaxy images. The main cause of the difﬁculty is that the appearance manifolds of shape classes have entangled ﬁbers in their embedded Euclidean space. This paper proposes a novel appearance-based method called the quotient appearance manifold mapping to untangle the ﬁbers of the appearance manifolds. First, the quotient manifold is constructed to untangle the ﬁber bundles of appearance manifolds. The mapping from each point of the manifold to the quotient submanifold is then proposed to classify grayscale shapes. We show the effectiveness in grayscale 3D shape recognition using medical images.

1. Introduction

Appearance manifolds have been one of the most powerful methods for object recognition. It is used, for example, to identify a person given a face image (e.g., [26, 13]) or a video sequence (e.g., [11]).

A big merit of appearance manifolds combined with machine learning is that they do not require customized feature development. A face image may be identiﬁed with a person in the database by measuring various distances and feature values among the characteristic points such as the eyes, nose, and the mouth endpoints (e.g., [16, 23]). However, such customized features may not be applicable to different object recognition such as automobiles. In contrast, appearance manifold-based approaches are mostly independent of objects. For example, [11] formulated the problem of identifying a person k∗in a test image I as ﬁnding k∗ = arg mink dH (I, Mk) where dH is the L2-Hausdorff distance between the image and the appearance manifold Mk. The manifold can be easily replaced by the appearance manifold of another object learned from a different video sequence. Another example is to classify whether a clinical

lesion image has a smooth margin or an irregular margin. In this case, we may have to develop features that measure the smoothness of the margin. When the image is grayscale, we have to ﬁrst detect margins, then apply various methods such as Fourier descriptors (e.g., [2]) and wavelet analysis (e.g., [15]). In comparison, appearance manifolds use all the pixels as features and there is no need for development of customized features. Moreover, the clinical meanings of smoothness and irregularity are different from how we deﬁne them in our daily life as can be seen in Figure 3 (A) and (B) where there is a large overlap between the two categories regarding the smoothness in the ordinary sense. Smoothness of a lesion margin is related to benignity while irregularity to malignancy. Development of customized features that capture such clinical meanings requires medical domain knowledge and long term training.

Despite the aforementioned merits, computerized classiﬁers are not used, e.g., in a worldwide project called Galaxy Zoo that aims to understand how galaxies are formed; thousands of volunteers manually classify grayscale galaxy photographs according to the shapes and margins whether they are smooth, round, elliptic, spiral or barred-spiral [1]. A reason may lie in the fact that it has been difﬁcult to apply appearance manifold-based approaches to grayscale shape classiﬁcation. The essential difﬁculty is that the appearance manifold of a shape belonging to a class can be more distant from that of another shape in the same class than the shapes in different classes, which will be discussed in detail in the next section. That is, the appearance manifolds are entangled in the Euclidean space where they are embedded.

Due to this difﬁculty, for binary images, there has been another avenue of shape classiﬁcation that is totally different from appearance manifolds. Various methods have been proposed to classify 2D binary shapes based on, to name a few, shape context [4], segment sets [22], modebased approaches[25, 29] and shock graphs [21]. Typically, the contours of the binary images are parameterized by normalized arc length a to yield a pair of univariate functions (x(a), y(a)). It facilitates the use of many convenient tools such as Fourier analysis, wavelet decomposition, Principal

2006

2009 IEEE 12th International Conference on Computer Vision (ICCV) 978-1-4244-4419-9/09/$25.00 ©2009 IEEE

Component Analysis (PCA) and histogram analysis. It is difﬁcult, however, to employ similar strategies for 3D images, particularly for grayscale images. The difﬁculty for 3D images is in parameterizing grayscale volumes consistently; Cartesian coordinates are useless because it depends on the origins and postures. Even for a 3D binary image, the polar coordinate system or SPHARMs are not sufﬁcient to represent 3D boundary surfaces when there are multiple points with the same coordinates. For a 3D grayscale volume, the boundary itself is not well deﬁned.

In contrast, the contours of binary 2D images can be consistently parameterized by arc length with only one rotational Degree Of Freedom (DOF) left. In most of the methods, they ﬁx the origin to circumvent the problem of handling the rotational DOF. Such parameterizations do not exist for 3D grayscale volumes. PCA or shape context were not used for grayscale shape classiﬁcation unless the parameterization can be naturally introduced such as face recognition where the images are usually normalized so that the eyes are at the canonical positions. Shotton et al. proposed a method that enables the object recognition from 2D grayscale (or color) images based on matching of the contour fragments [20]. However, the method is designed for recognizing objects with well-deﬁned contours such as horses, faces and airplanes.

This paper proposes a method that enables the appearance manifold-based approach to classify 3D grayscale volumes. First, quotient manifolds are constructed to untangle the ﬁbers of appearance manifolds. The mapping from each point of the appearance manifold to the quotient submanifold is then proposed to classify grayscale shapes. We show the effectiveness in grayscale 3D shape recognition using medical images.

1.1. Appearance manifolds

An appearance manifold is constructed by ﬁrst interpreting a stacked vector

X = (I(0, 0), I(1, 0), ..., I(0, 1), I(1, 1), ..., I(m, n)) (1)

of an object image I(x, y) of size m × n as a point in Euclidean space Rmn. If the object changes slightly, the point changes its location slightly. The dimension of the set of such points is expected to be much lower than mn if the images are generated by taking photographs of the same object. The set is regarded as the embedding emb(M ) in Rmn of M called the appearance manifold. The appearance manifold of an object such as a face of a person is expected to be separable from the appearance manifold of another object. The image has a relatively smaller number of DOFs such as the camera angle and illumination [12]. If a face of a person is viewed in a slightly different direction, the image changes slightly. Such slight variations will form a curved line in Rmn in which the manifold is embedded. It was treated as a vector bundle in [12]. We further gener-

alize the trajectory as a ﬁber of a ﬁber bundle explained in

what follows.

M and the ﬁber F constitute a ﬁber bundle (E, M, π, F )

whose base space is M and total space is E, and π is a

projection (see, for example, [5]). Here, the projection π :

E → M should satisfy the following condition. For any

point p ∈ M , there exists a neighborhood U ⊂ M of p

such that π−1(U ) is homeomorphic to U × F by φ in the

following way:

π(φ(p, f )) = p

(2)

where f ∈ F is a ﬁber. In other words, in the neighborhood of p, E looks like M × F . In the case of an appearance manifold M , the trajectory of the stacked vector x can be regarded as the embedding of the ﬁber bundle. Roughly speaking, E can be regarded as the manifold M with the ﬁber sticking out at every point p of it. In what follows, we call the ﬁber bundle of the appearance manifold deﬁned in this way as the appearance ﬁber bundle.

In contrast to an appearance ﬁber bundle of a single object, the appearance ﬁber bundle of a shape class, however, is entangled with the appearance ﬁber bundle of other shape classes in the embedded space, which has not been studied much in computer vision. Let us take a simple example of object translation. Suppose that we want to classify an object image according to whether it belongs to Shape Class A or Shape Class B, and that Object A1, A2, A3, .. belong to the Shape Class A as shown in Figure 1. When the objects are translated in the images, they form in the embedded space of the manifold the trajectories that correspond to ﬁbers. The problem is that their trajectories are entangled with the trajectories of the objects belonging to another Shape Class B. An image on the trajectory of Object B1 may be closer to Shape A objects than an image on the trajectory of Object A1, depending on the locations on their ﬁbers. Under such circumstances where ﬁber bundles are entangled in the embedded space, it is difﬁcult to reduce the dimensionality of the manifolds. When the transformation is a nonlinear deformation instead of translation, the entanglement becomes more complicated.

The difﬁculty lies in the fact that we can only observe as images the total space E embedded in Rmn, not the appearance manifolds themselves. Hence, it is necessary to decompose the embedded total space to the ﬁber F and the appearance manifold M . The remedy is not very complicated when an object-centric coordinate system can be uniquely chosen like the faces; i.e., we can manage to get rid of most of the ﬁbers and infer M in the embedded space. However, when we need to classify objects into more ambiguous groups according to the shapes or margin of grayscale objects, the choice of a unique manifold is very often impossible.

Let us describe this situation using matrices. Suppose an image I of size m × n is converted to a stacked image

vector X0 belonging to an appearance manifold. When it is is translated by (dx, dy) where dx and dy are positive inte-

2007

gers, the image vector X1 after the translation is represented

by X1 = P X0. Here, the matrix P = {pi,j} (i = 1, ..., m,

j = 1, ..., n) is the permutation matrix whose elements are

such that

pjn+i,jn+i+dx = 1,

pjpnj+n+i,(ij++ddxy,j)nn++ii == 11,, (3)

p(j+dy)n+i,jn+i = 1,

for i = 1, ..., m − dx and j = 1, ..., n − dy, and all the other elements are zero. 1 That is, the new image vector is obtained by exchanging the axes of the coordinate system. The change of the vector by this operation hinders us from reducing the dimensionality by PCA [26], Linear Discriminant Analysis (LDA)[3, 28, 14], Independent Component Analysis (ICA) [6, 10], kernel PCA [13] etc. The operation is not limited to translation; e.g., rotation can be also approximated by the exchanges of axes obeying much more complicated rules. The ﬁber bundles of different objects are entangled by such operations and are interleaved. Section 4.4 shows an example later in Figure 9.

To remedy this problem, we need a mathematical tool to untangle them. This paper proposes a quotient space of the appearance manifold and the mapping from manifold points to the quotient submanifolds.

object B3

object B2 object B1

object A2 object A1

object A3

shape B

shape A

Figure 1. Entanglement of ﬁbers

Before going to the details in the next section, as a toy example, let us consider a one-dimensional image of width n that contains an object whose appearance manifold is M ; i.e., emb(M ) = X ∈ Rn. When we translate the object by t ∈ R, we can construct trivial total space E = M × R. Let us further simplify the toy example such that n = 4 and consider a shape class whose objects are one-dimensional line segments of length 2 generated by translating A1 = (1, 1, 0, 0) to A2 = (0, 1, 1, 0) and A3 = (0, 0, 1, 1). The Euclidean distance between A1 and A3 is 2. If we consider another shape class of one-dimensional line segments of

1For simplicity, we assume that the intensities of the background pixels are zero and the object stays inside the image when translated.

length 1 comprised of B1 = (1, 0, 0, 0), B2 = (0, 1, 0, 0), B3 = (0, 0, 1, 0) and B4 = (0, 0, 0, 1), the Euclidean distance between A1 and B1 is only 1. This is a toy example of the situation in Figure 1.

2. The Proposed Quotient Appearance Manifold Mapping

To decompose the embedded total space E to an appearance manifold M and a ﬁber F , we will identify embedded appearance manifolds emb((M, f1)) and emb((M, f2)); i.e., emb((M, f1)) emb((M, f2)) under the equivalence relation . The space we should analyze is the embedded quotient total space emb(E)/ in Euclidean space. In what follows, we call emb(E)/ the quotient appearance total space. Quotient space is used, for example, to eliminate Mo¨bius ambiguity for 2D binary images in [19]. The equivalence relation is not limited to translation of images represented by Eq (3). It includes afﬁne transformations such as rotation and scaling. When the purpose of using appearance manifolds is to classify shapes into categories, however, it is not sufﬁcient to use afﬁne transformations of images as the equivalence relation. For example, suppose that we want to classify an object image according to whether it has a smooth margin or an irregular margin. A portion of an image of Object A1 in Figure 1 may be locally deformed, but it still belongs to a class of the smooth margin. This situation cannot be handled by equivalence relation based on afﬁne transformations. For this reason, we propose a mapping called the quotient manifold mapping from each point of the manifold to the quotient subspace using the local afﬁne transformations as the equivalence relation. We ﬁrst subdivide an image X to subimages represented by stacked vectors xi (i = 1, 2, ...). The subimages may overlap. The subimages are normalized so that their norms are one: |xi| = 1 (i = 1, 2, ...). Then, we apply the equivalence relation to the subimages to obtain a quotient appearance total space for each of them. The quotient appearance manifold mapping of the image is a set of quotient appearance total space of its subimages. We can generalize this way of constructing a quotient appearance manifold mapping as follows. We ﬁrst associate a neighborhood U (p) to each pixel p of the image. When we denote the quotient appearance total space of U (p) by M (p), the quotient appearance manifold mapping of the image is obtained as a set of M (p); i.e., S = {M (p)|p ∈ domain(I)} where domain(I) ⊂ [1, m] × [1, n]. We regard S as the approximation of the appearance manifold.

We can further construct a mapping from the original image to the set S of M (p) such that F : I → S. This situation is analogous to a Gauss map where a surface normal (nx, ny, nz) is associated with each point of a surface.

2008

2.1. Dimension reduction of quotient appearance manifold mappings

This section discusses the dimension reduction of the quotient appearance manifold mapping. Assuming that M (p) forms a subspace of low dimensionality, we can calculate the subspace in a similar way to ordinary dimension reduction such as PCA, ICA and LDA. In our case, the numbers of classes are small for LDA to be robustly computable. In this paper, we use PCA for simplicity; i.e., given points of a manifold that are represented by stacked vectors, we calculate the mean μ and covariance matrix C of the vectors obtained in Section 3 to get the principal components as the eigenvectors e1, e2, ..., em of C. The distance between two appearance manifold mappings S1 = {M1(p1)|p1 ∈ domain(I1)} and S2 = {M2(p2)|p2 ∈ domain(I2)} is deﬁned as the sum of the distances

dist =

dist(M1(p1), M2(p2))dp1 (4)

p1 ∈domain(I1 )

between each corresponding pair of M1(p) and M2(q) where p2 = g(p1). The correspondence g between p1 and p2 is discussed in Section 2.2. The distance dist(M (p1), M (p2)) is given by k wi(xi − yi)˙ek where ek (k = 1, 2, ...) are the eigenvectors, and xi and yi represent the subimages of M (p1) and M (p2), respectively. The calculation of xi and yi is described in Section 3.

The principal axes can be regarded as part of the spanning vectors of the tangent hyperplane of the quotient appearance manifolds when the quotient appearance total spaces are thin. Suppose that the variance of the space is very small along some axes; i.e., the manifold is thin along the axes. The principal axes are orthogonal to them and approximate the tangent hyperplane.

When the manifolds cannot be approximated by linear structures, we need nonlinear manifold learning such as ISOmetric MAPping (ISOMAP) [24], Locally Linear Embedding (LLE) [17], Laplacianfaces [8] and Riemannian manifold learning [12]. ISOMAP ﬂattens manifolds while trying to preserve the geodesic distances approximated by shortest paths. LLE calculates the locally linear patches approximating the manifolds. Laplacian faces map face images into face subspace by Locality Preserving Projections. Riemannian manifold learning calculates the Riemannian normal coordinates that preserve the geodesic metric well. In order that the methods work robustly, we need a great number of data points of the manifolds.

2.2. Correspondence between two quotient appearance manifolds

As can be seen above, we need to establish the correspondence g : M1 → M2 between M1(p) and M2(p2) to calculate the distance between two appearance manifolds. We can use automatic 3D volume matching methods such as [18, 7]. For the sake of simplicity, we subdivide the two

images I1 and I2 into equal numbers of subimages I1,l and I2,l (l = 1, 2, ...), respectively, and pair up I1,l and I2,l. In our implementation, we subdivide the images along the Cartesian axes equally.

3. Actual Calculation of Quotient Manifolds

To implement the quotient manifold of a subimage Ii represented by a stacked vector xi relative to the equivalence relation , we transform the subimage to the one with a canonical position, size and posture in order to account for translation, scaling and rotation; i.e., we calculate another image Ji of a ﬁxed size such that Ii Ji holds. (For simplicity, we neglect shears.) For example, if the transformation is translation, we translate the subimage to obtain Ji. Denoting the transformation Ψ, we look for Ji = Ψ(Ii) that is closest to Ii. However, the deﬁnition of the canonical position and rotation is itself a difﬁcult task. We propose to obtain a canonical one as follows. We ﬁrst choose an image Ki. In our implementation, we have chosen an image from the training dataset randomly. Alternatively, we can choose the mean of the training dataset. We can deﬁne the closeness by the mean squared distance. Finally, we deﬁne Ji = argminIi |Ψ(Ii) − Ki|2. To account for rotation, we rotate Ii to generate its rotational variations, and calculate Ji for each of the rotated versions.

3.1. Implementation details

We ﬁrst resample each grayscale lesion volume to a volume I of m × n × l in size. Next, we subdivided each volume to eight disjoint subvolumes Ii,j,k (i, j, k = 0, 1) of size m/2 × n/2 × l/2 ;

Ii,j,k(x, y, z) = I(x + mi/2, y + nj/2, z + kl/2). (5)

The image I is then rotated by 45n (n = 0, 1, .., 7) degrees on the xy-plane, and by 180 degrees to ﬂip the z-axis to generate sixteen rotational variations. (Fewer variations were needed for the z-axis because our data resolution was smaller in the z-direction.) The subimages are generated in the same way for each rotation.

Given a dataset of images, we chose one of the lesion images as the canonical image K. The best translational position of each subimage is calculated such that the mean squared difference between it and Ki,j,k = K(x+mi/2, y+ nj/2, z + kl/2) is minimized (in a similar manner to block matching for MPEG video compression). The resulting set of subimages are regarded as the quotient appearance manifold mapping {Mi,j,k}.

4. Experiments

We apply the quotient appearance manifold mapping to classify a medical grayscale volume. In particular, we used Breast Magnetic Resonance Imaging (MRI) for the experiments. In Dynamic Contrast-Enhanced Breast MRI, Con-

2009

trast Agent (CA) is injected to a patient, and the CA concentration in the lesions is measured by MRI as the grayscale volumes. A lesion can be a mass (consisting of one connected component) or a non-mass (consisting of scattered connected components). We focus on the mass lesions because the shape and margin attributes are only available for mass lesions. (We cannot describe the shape or margin of a non-mass lesion.) The shape of a mass lesion volume is clinically classiﬁed to round, oval, lobulated, and irregular shapes. In a similar manner, the margin (boundary of the lesion) is classiﬁed to smooth, irregular, and spiculated (margin with spikes). These categories and example images are described in the BI-RADS (Breast Imaging-Reporting and Data System) [9] guide book. Round or oval shapes are mostly related to benign lesions such as round for lipid and oval for ﬁbroadenoma, while irregular shapes with spiculated margin are often related to malignant lesions (cancers). The number of lesions we used is 109 in the dataset consisting of 59 patients. We asked an experienced radiologist to read each lesion image and to describe the ground truth shape and margin attributes.

The instances belonging to the same category have very different appearances, which makes it challenging for a classiﬁer to allow for intra-class differences while distinguishing the inter-class differences. Moreover, the inputs are grayscale 3D volumes of MR images that are computationally heavy. Figures 2 and 3 illustrate several lesions with different shapes and margins, respectively.

A

B

C

D

Figure 2. The middle slices of lesions of different types of shapes. A: round, B: oval, C: lobulated, D: irregular

A

B

C

Figure 3. The middle slices of lesions of different types of margins. A: smooth, B: irregular margin and C: spiculated

A

B

C

D

E

F

G

Figure 4. The middle slices of the volume from the axial view of the 10 highest ranked principal components. A: round B: oval, C: lobulated, D: irregular, E: smooth, F: irregular margin, and G: spiculated.

4.1. Training

We used 76 lesions from 39 patients as the training set. There are 31 mass lesions (5 round, 13 oval, 8 lobulated and 5 irregular shapes, and 23 smooth, 5 irregular and 3 spiculated margins), 26 non-mass lesions, and 19 other tissues that are treated as negative samples. Regarding the shape attribute, for each group of round, oval, lobulated and irregular lesions, respectively, we calculated the mean and covariance of subimages Ji,j,k with rotational variations, and obtained the principal components as the eigenvectors ei (i = 1, 2, ...) of the covariance matrix. The principal components whose eigenvalues are smaller than a threshold are discarded. Finally, we have projected the subimages to the principal component axes by ai = xi,j,k · ei (i = 1, 2, ...) where xi,j,k is the stacked vector of a subimage. We used the projected values ai as the feature values for machine learning. The three classes for the margin description of a mass lesion, smooth, irregular and spiculated, are handled in the same manner.

Figure 4 shows the top principal components of the quotient appearance manifold mappings for each class. In comparison to Figure 9, they contain high frequencies that enable to distinguish the class differences, which will be discussed in Section 4.4.

4.2. Classiﬁer

We used Quadratic Discriminant Analysis (QDA) as the classiﬁer for the ﬁrst experiment. We chose QDA because it performs well with a relatively small number of features (i.e., principal components). For each description, the best quadratic surface that separates the positive and negative sets has been calculated. We have chosen the combination of the principal axes that gives the best result for the training set. The same axes thus selected are used in the testing phase, too. The threshold of the QDA is chosen so that the predetermined speciﬁcity (0.8 in our experiments)

2010

is attained for the training set.

4.3. Testing

The test set is disjoint from the training set. There are 33 lesions from 20 patients. There are 18 mass lesions, (1 round, 5 oval, 3 lobulated and 9 irregular shapes, and 9 smooth, 3 irregular and 6 spiculated margins), 11 non-mass lesions, and ﬁve other tissues. Each lesion is subdivided to subimages and rotated in the same way as the training set. Figure 5 shows the ROC curves of classiﬁcation of mass shapes. Figure 6 shows the ROC curves of mass margins. Comparing the ROC curves with the ROC curves without quotient appearance manifold mappings in Figures 7 and 8, we can see signiﬁcant improvement in all categories.

(A)

(B)

ﬁcity #true neg#attriuveesn+e#gafatilvseespositives attained using the threshold of the QDA determined at the time of training.

4.4. Additional experiments on rotational variations without quotient manifold mappings

In this section, we will investigate the effect of rotational variations, as well as the performance difference in terms of feature dimensions. We ﬁrst show two experimental results. We do not use the quotient manifold by translation in order to understand the pure effect of rotational variations. We have used AdaBoost in a similar way to [27] to avoid the extra work for feature selection, and plotted the ROC after 100 iterations. We mix the training and testing datasets, and employ leave-one-out cross validation in order not to be affected by the small number of samples.

Figure 7 shows the result with 332 samples (four rotational variants: with 0, 90, 180 and 270 degree) and 664 samples (eight rotational variants: with additional 45, 135, 225 and 315 degree). There is a large difference in the performance particularly for the irregular shapes because the number of samples was increased. Figure 8 shows the ROC curves for margin categories with eight rotational variants.

(C)

(d)

Figure 5. ROC curves of masses with (A): round (AUC=0.796),

(B): oval (AUC=0.735), (C): lobulated (AUC=0.313), and (D): ir-

regular shapes (AUC=0.942)

664

664

664

332

332

332

(A)round+oval

(B)lobular

(C)irregular

Figure 7. ROCs of different numbers of rotational variations

(A)

(B)

(C)

Figure 6. ROC curves of masses with (A): smooth (AUC=0.813), (B): irregular (AUC=0.707), and (C): spiculated margins (AUC=0.856)

The dashed line shows the operating point; the sensitivity #true pos#ittirvuees+po#sfiatilvseesnegatives and the speci-

(A)

(B)

Figure 8. ROCs of (A): smooth and (B): irregular or spiculated

margin

Figure 9 shows the central slices of the volumes from the axial view of the 20 highest ranked principal components for each class with eight rotational variants, respectively. We can see that the patterns of their top principal components down to the 8th are mostly low frequencies and are very similar for all categories with slightly different orders; i.e., the classes are very similarly interpreted under the PCA space and hence, it is difﬁcult to separate the classes. It means that the ﬁbers of the appearance manifolds of the classes are entangled, which shows why we need the quotient manifold mapping. In comparison, the top principal

2011

components of the proposed quotient manifold mapping in Figure 4 contain high frequencies that distinguish the class differences.

A

B

C

Figure 9. The middle slices of the volume from the axial view of the 20 highest ranked principal components. A: round or oval, B: lobulated, C: irregular. The images show resemblance to one another, which means that it is difﬁcult to separate the classes by PCA.

In the second experiment, the accuracy differences by using different numbers of highest ranked dimensions for training are compared. We use four different numbers of dimensions: 10, 25, 50 and 100 with eight rotational variations. The results are shown in Figure 10. From the ﬁgure we see that for the round or oval shape, the accuracy does not improve much by using more dimensions. Whereas for the irregular shape, the accuracy improves substantially by using 100 dimensions over fewer dimensions. This is reasonable since irregular shapes have a higher variety than oval shapes, which requires more dimensions for training its classiﬁer. In the above experiments, the irregular shape classiﬁer performs better than oval and lobular shape classiﬁers.

10

10

10

25

25

25

50

50

50

100

100

100

(A)round+oval

(C)lobular

(B)irregular(shape)

Figure 10. ROCs of different number of highest ranked principal components used

4.4.1 Comparison with customized features

We developed customized features to compare with the performance of our quotient appearance manifold mapping and to show that the shape and margin in the clinical sense is not a trivial image processing task. The ﬁrst feature is the histogram of angles between adjacent voxels on the lesion

boundary. When the boundary is smooth, the angle histogram has smaller variations than when the boundary is irregular. The second feature captures how the lesion shape deviates from a sphere; it calculates the number of lesion voxels inside spheres centered at the lesion centroid. When the radius r is varied, the ratio of the numbers of such voxels to the sphere volume can be regarded as a function of r. The feature is expected to capture whether the shape is round, oval, or irregular. We also use other features such as the ratio of th√e volume V and the volume of the circumscribed sphere, S/V 1/3 where S is the surface area, and the aspect ratios along x, y and z axes. The results for the testing set are plotted by blue dots in Figures 5 and 6. It can be seen that that quotient appearance manifold mappings outperform the customized features in most categories. For example, the sensitivities of round and lobulated shapes are zero for the customized features. At the same time, we can see that the clinical smoothness or irregularity of a lesion is not a simple geometric smoothness of a volume. For example, the difference between round and oval shapes is not just measuring the aspect ratios of width, height and depth; oval objects are more likely to be ﬁbroadenomas, so the look of a ﬁbroadenoma affects the shape description.

5. Conclusions and Future Work

We have proposed the quotient appearance manifold mapping to untangle the ﬁbers of the appearance manifolds. In this paper, we assumed that each quotient appearance manifold mapping forms thin subspace so that the hyperplane obtained by PCA approximates the quotient appearance total space well. The use of ISOMAP, Riemannian manifold learning or other methods when the subspace is curved is left as future work. We also need to increase the dataset for robust classiﬁcation.

References

[1] http://www.galaxyzoo.org.

[2] K. Arbter, W. Snyder, H. Burkhardt, and G. Hirzinger. Application of afﬁne-invariant fourier descriptors to recognition of 3-D objects. IEEE Trans. Pattern Analysis and Machine Intelligence, 12(7):640–647, 1990.

[3] P. Belhumeur, J. Hespanha, and D. Kriegman. Eigenfaces vs. ﬁsherfaces: Recognition using class speciﬁc linear projection. IEEE Trans. Pattern Analysis and Machine Intelligence, 19(7):711–720, 1997.

[4] S. Belongie, J. Malik, and J. Puzicha. Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Analysis and Machine Intelligence, 24(24):509–522, 2002.

[5] G. E. Bredon. Topology and Geometry. SpringerVerlag, 1993.

2012

[6] B. A. Draper, K. Baek, and M. S. Bartlett. Recognizing faces with PCA and ICA. Computer Vision and Image Understanding, 91:115–137, 2003.

[7] G. Guetat, M. Maitre, L. Joly, S.-L. Lai, T. Lee, and Y. Shinagawa. Automatic 3-D grayscale volume matching and shape analysis. IEEE Trans. Information Technology in Biomedicine, 10(2):362–376, 2006.

[8] X. He, S. Yan, Y. Hu, P. Niyogi, and H. J. Zhang. Face recognition using laplacianfaces. IEEE Trans. Pattern Analysis and Machine Intelligence, 27(3):328– 340, 2005.

[9] D. Ikeda, C. K. Kuhl, and et al. Development, standardization, and testing of lexicon for reporting contrast-enhanced breast magnetic resonance imaging studies. J. Mag. Res.Im, 13:889–895, 2001.

[10] J. Kim, J. Choi, J. Yi, and M. Turk. Effective representation using ICA for face recognition robust to local distortion and partial occlusion. IEEE Trans. Pattern Analysis and Machine Intelligence, 27(12):1977– 1981, 2005.

[11] K.-C. Lee, J. Ho, M.-H. Yang, and D. Kriegman. Visual tracking and recognition using probabilistic appearance manifolds. Computer Vision and Image Understanding, 99(3):303–331, 2005.

[12] T. Lin and H. Zha. Riemannian manifold learning. IEEE Trans. Pattern Analysis and Machine Intelligence, 30(7):1270–1281, 2008.

[13] C. Liu. Gabor-based kernel PCA with fractional power polynomial models for face recognition. IEEE Trans. Pattern Analysis and Machine Intelligence, 26(5):572–580, 2004.

[14] X. Liu, T. Chen, and B. V. K. V. Kumar. Face authentication for multiple subjects using eigenﬂow. Pattern Recognition, 36(2):313–328, 2003.

[15] S. G. Mallat. A theory for multiresolution signal decomposition: The wavelet representation. IEEE Trans. Pattern Analysis and Machine Intelligence, 11(7):674–693, 1989.

[16] K. Okada, J. Steffans, T. Maurer, H. Hong, E. Elagin, H. Neven, and C. Malsburg. The Bochum/USC face recognition system and how it fared in the FERRET phase III test. In H. Wechsler, P. J. Phillips, V. Bruce, F. F. Soulie, and T. S. Huang, editors, Face Recognition: From Theory to Applications, pages 186–205, 1998.

[17] S. Roweis and L. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290:2323–2326, December 2000.

[18] D. Rueckert and A. F. Frangi. Automatic construction of 3-D statistical deformation models of the brain using nonrigid registration. IEEE Trans. Medical Imaging, 22(8):1014–1025, 2003.

[19] E. Sharon and D. Mumford. 2D-shape analysis using conformal mapping. In Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, 2004.

[20] J. Shotton, A. Blake, and R. Cipolla. Multi-scale categorical object recognition using contour fragments. IEEE Trans. Pattern Analysis and Machine Intelligence, 30(5):796–809, 2008.

[21] K. Siddiqi, A. Shokoufandeh, S. J. Dickinson, and S. W. Zucker. Shock graphs and shape matching. International Journal of Computer Vision, 35(1):13–32, 1999.

[22] K. B. Sun and B. Super. Classiﬁcaiton of countour shapes using class segement sets. In Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, 2005.

[23] A. Tefas, C. Kotropoulos, and I. Pitas. Using support vector machines to enhance the performance of elastic graph matching for frontal face authentiﬁcation. IEEE Trans. Pattern Analysis and Machine Intelligence, 23(7):735–746, 2001.

[24] J. B. Tenenbaum, V. de Silva, and J. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290:2319–2323, December 2000.

[25] A. Tsai, A. Yezzi, W. Wells, C. Tempany, D. Tucker, A. Fan, and W. E. Grimson. A shape-based approach to the segmentation of medical imagery using level sets. IEEE Trans. Medical Imaging, 22(2):137–154, 2003.

[26] M. Turk and A. Pentland. Face recognition using eigenfaces. In IEEE International Conference on Computer Vision and Pattern Recognition, pages 586– 591. 1991.

[27] P. Viola and M. J. Jones. Robust real-time face detection. International Journal of Computer Vision, 57(2):137–154, 2004.

[28] M.-H. Yang. Kernel eignfaces vs. kernel ﬁsherfaces: Face recognition using kernel methods. In IEEE International Conference on Automatic Face and Gesture Recognition, pages 215–220. 2002.

[29] S. K. Zhou, J. Shao, B. Georgescu, and D. Comaniciu. Boostmotion: Boosting a discriminative similarity function for motion estimation. In Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, 2005.

2013

Yoshihisa Shinagawa Computer Aided Diagnosis Group, Siemens Healthcare

Siemens Medical Solutions USA, Inc. 51 Valley Stream Pkwy, Malvern, PA, 19355

Yuping Lin Computer Science Department University of Southern California

Abstract

Appearance manifolds have been one of the most powerful methods for object recognition. However, they could not be used for grayscale shape classiﬁcation, particularly in three dimensions, such as classifying medical lesion volumes or galaxy images. The main cause of the difﬁculty is that the appearance manifolds of shape classes have entangled ﬁbers in their embedded Euclidean space. This paper proposes a novel appearance-based method called the quotient appearance manifold mapping to untangle the ﬁbers of the appearance manifolds. First, the quotient manifold is constructed to untangle the ﬁber bundles of appearance manifolds. The mapping from each point of the manifold to the quotient submanifold is then proposed to classify grayscale shapes. We show the effectiveness in grayscale 3D shape recognition using medical images.

1. Introduction

Appearance manifolds have been one of the most powerful methods for object recognition. It is used, for example, to identify a person given a face image (e.g., [26, 13]) or a video sequence (e.g., [11]).

A big merit of appearance manifolds combined with machine learning is that they do not require customized feature development. A face image may be identiﬁed with a person in the database by measuring various distances and feature values among the characteristic points such as the eyes, nose, and the mouth endpoints (e.g., [16, 23]). However, such customized features may not be applicable to different object recognition such as automobiles. In contrast, appearance manifold-based approaches are mostly independent of objects. For example, [11] formulated the problem of identifying a person k∗in a test image I as ﬁnding k∗ = arg mink dH (I, Mk) where dH is the L2-Hausdorff distance between the image and the appearance manifold Mk. The manifold can be easily replaced by the appearance manifold of another object learned from a different video sequence. Another example is to classify whether a clinical

lesion image has a smooth margin or an irregular margin. In this case, we may have to develop features that measure the smoothness of the margin. When the image is grayscale, we have to ﬁrst detect margins, then apply various methods such as Fourier descriptors (e.g., [2]) and wavelet analysis (e.g., [15]). In comparison, appearance manifolds use all the pixels as features and there is no need for development of customized features. Moreover, the clinical meanings of smoothness and irregularity are different from how we deﬁne them in our daily life as can be seen in Figure 3 (A) and (B) where there is a large overlap between the two categories regarding the smoothness in the ordinary sense. Smoothness of a lesion margin is related to benignity while irregularity to malignancy. Development of customized features that capture such clinical meanings requires medical domain knowledge and long term training.

Despite the aforementioned merits, computerized classiﬁers are not used, e.g., in a worldwide project called Galaxy Zoo that aims to understand how galaxies are formed; thousands of volunteers manually classify grayscale galaxy photographs according to the shapes and margins whether they are smooth, round, elliptic, spiral or barred-spiral [1]. A reason may lie in the fact that it has been difﬁcult to apply appearance manifold-based approaches to grayscale shape classiﬁcation. The essential difﬁculty is that the appearance manifold of a shape belonging to a class can be more distant from that of another shape in the same class than the shapes in different classes, which will be discussed in detail in the next section. That is, the appearance manifolds are entangled in the Euclidean space where they are embedded.

Due to this difﬁculty, for binary images, there has been another avenue of shape classiﬁcation that is totally different from appearance manifolds. Various methods have been proposed to classify 2D binary shapes based on, to name a few, shape context [4], segment sets [22], modebased approaches[25, 29] and shock graphs [21]. Typically, the contours of the binary images are parameterized by normalized arc length a to yield a pair of univariate functions (x(a), y(a)). It facilitates the use of many convenient tools such as Fourier analysis, wavelet decomposition, Principal

2006

2009 IEEE 12th International Conference on Computer Vision (ICCV) 978-1-4244-4419-9/09/$25.00 ©2009 IEEE

Component Analysis (PCA) and histogram analysis. It is difﬁcult, however, to employ similar strategies for 3D images, particularly for grayscale images. The difﬁculty for 3D images is in parameterizing grayscale volumes consistently; Cartesian coordinates are useless because it depends on the origins and postures. Even for a 3D binary image, the polar coordinate system or SPHARMs are not sufﬁcient to represent 3D boundary surfaces when there are multiple points with the same coordinates. For a 3D grayscale volume, the boundary itself is not well deﬁned.

In contrast, the contours of binary 2D images can be consistently parameterized by arc length with only one rotational Degree Of Freedom (DOF) left. In most of the methods, they ﬁx the origin to circumvent the problem of handling the rotational DOF. Such parameterizations do not exist for 3D grayscale volumes. PCA or shape context were not used for grayscale shape classiﬁcation unless the parameterization can be naturally introduced such as face recognition where the images are usually normalized so that the eyes are at the canonical positions. Shotton et al. proposed a method that enables the object recognition from 2D grayscale (or color) images based on matching of the contour fragments [20]. However, the method is designed for recognizing objects with well-deﬁned contours such as horses, faces and airplanes.

This paper proposes a method that enables the appearance manifold-based approach to classify 3D grayscale volumes. First, quotient manifolds are constructed to untangle the ﬁbers of appearance manifolds. The mapping from each point of the appearance manifold to the quotient submanifold is then proposed to classify grayscale shapes. We show the effectiveness in grayscale 3D shape recognition using medical images.

1.1. Appearance manifolds

An appearance manifold is constructed by ﬁrst interpreting a stacked vector

X = (I(0, 0), I(1, 0), ..., I(0, 1), I(1, 1), ..., I(m, n)) (1)

of an object image I(x, y) of size m × n as a point in Euclidean space Rmn. If the object changes slightly, the point changes its location slightly. The dimension of the set of such points is expected to be much lower than mn if the images are generated by taking photographs of the same object. The set is regarded as the embedding emb(M ) in Rmn of M called the appearance manifold. The appearance manifold of an object such as a face of a person is expected to be separable from the appearance manifold of another object. The image has a relatively smaller number of DOFs such as the camera angle and illumination [12]. If a face of a person is viewed in a slightly different direction, the image changes slightly. Such slight variations will form a curved line in Rmn in which the manifold is embedded. It was treated as a vector bundle in [12]. We further gener-

alize the trajectory as a ﬁber of a ﬁber bundle explained in

what follows.

M and the ﬁber F constitute a ﬁber bundle (E, M, π, F )

whose base space is M and total space is E, and π is a

projection (see, for example, [5]). Here, the projection π :

E → M should satisfy the following condition. For any

point p ∈ M , there exists a neighborhood U ⊂ M of p

such that π−1(U ) is homeomorphic to U × F by φ in the

following way:

π(φ(p, f )) = p

(2)

where f ∈ F is a ﬁber. In other words, in the neighborhood of p, E looks like M × F . In the case of an appearance manifold M , the trajectory of the stacked vector x can be regarded as the embedding of the ﬁber bundle. Roughly speaking, E can be regarded as the manifold M with the ﬁber sticking out at every point p of it. In what follows, we call the ﬁber bundle of the appearance manifold deﬁned in this way as the appearance ﬁber bundle.

In contrast to an appearance ﬁber bundle of a single object, the appearance ﬁber bundle of a shape class, however, is entangled with the appearance ﬁber bundle of other shape classes in the embedded space, which has not been studied much in computer vision. Let us take a simple example of object translation. Suppose that we want to classify an object image according to whether it belongs to Shape Class A or Shape Class B, and that Object A1, A2, A3, .. belong to the Shape Class A as shown in Figure 1. When the objects are translated in the images, they form in the embedded space of the manifold the trajectories that correspond to ﬁbers. The problem is that their trajectories are entangled with the trajectories of the objects belonging to another Shape Class B. An image on the trajectory of Object B1 may be closer to Shape A objects than an image on the trajectory of Object A1, depending on the locations on their ﬁbers. Under such circumstances where ﬁber bundles are entangled in the embedded space, it is difﬁcult to reduce the dimensionality of the manifolds. When the transformation is a nonlinear deformation instead of translation, the entanglement becomes more complicated.

The difﬁculty lies in the fact that we can only observe as images the total space E embedded in Rmn, not the appearance manifolds themselves. Hence, it is necessary to decompose the embedded total space to the ﬁber F and the appearance manifold M . The remedy is not very complicated when an object-centric coordinate system can be uniquely chosen like the faces; i.e., we can manage to get rid of most of the ﬁbers and infer M in the embedded space. However, when we need to classify objects into more ambiguous groups according to the shapes or margin of grayscale objects, the choice of a unique manifold is very often impossible.

Let us describe this situation using matrices. Suppose an image I of size m × n is converted to a stacked image

vector X0 belonging to an appearance manifold. When it is is translated by (dx, dy) where dx and dy are positive inte-

2007

gers, the image vector X1 after the translation is represented

by X1 = P X0. Here, the matrix P = {pi,j} (i = 1, ..., m,

j = 1, ..., n) is the permutation matrix whose elements are

such that

pjn+i,jn+i+dx = 1,

pjpnj+n+i,(ij++ddxy,j)nn++ii == 11,, (3)

p(j+dy)n+i,jn+i = 1,

for i = 1, ..., m − dx and j = 1, ..., n − dy, and all the other elements are zero. 1 That is, the new image vector is obtained by exchanging the axes of the coordinate system. The change of the vector by this operation hinders us from reducing the dimensionality by PCA [26], Linear Discriminant Analysis (LDA)[3, 28, 14], Independent Component Analysis (ICA) [6, 10], kernel PCA [13] etc. The operation is not limited to translation; e.g., rotation can be also approximated by the exchanges of axes obeying much more complicated rules. The ﬁber bundles of different objects are entangled by such operations and are interleaved. Section 4.4 shows an example later in Figure 9.

To remedy this problem, we need a mathematical tool to untangle them. This paper proposes a quotient space of the appearance manifold and the mapping from manifold points to the quotient submanifolds.

object B3

object B2 object B1

object A2 object A1

object A3

shape B

shape A

Figure 1. Entanglement of ﬁbers

Before going to the details in the next section, as a toy example, let us consider a one-dimensional image of width n that contains an object whose appearance manifold is M ; i.e., emb(M ) = X ∈ Rn. When we translate the object by t ∈ R, we can construct trivial total space E = M × R. Let us further simplify the toy example such that n = 4 and consider a shape class whose objects are one-dimensional line segments of length 2 generated by translating A1 = (1, 1, 0, 0) to A2 = (0, 1, 1, 0) and A3 = (0, 0, 1, 1). The Euclidean distance between A1 and A3 is 2. If we consider another shape class of one-dimensional line segments of

1For simplicity, we assume that the intensities of the background pixels are zero and the object stays inside the image when translated.

length 1 comprised of B1 = (1, 0, 0, 0), B2 = (0, 1, 0, 0), B3 = (0, 0, 1, 0) and B4 = (0, 0, 0, 1), the Euclidean distance between A1 and B1 is only 1. This is a toy example of the situation in Figure 1.

2. The Proposed Quotient Appearance Manifold Mapping

To decompose the embedded total space E to an appearance manifold M and a ﬁber F , we will identify embedded appearance manifolds emb((M, f1)) and emb((M, f2)); i.e., emb((M, f1)) emb((M, f2)) under the equivalence relation . The space we should analyze is the embedded quotient total space emb(E)/ in Euclidean space. In what follows, we call emb(E)/ the quotient appearance total space. Quotient space is used, for example, to eliminate Mo¨bius ambiguity for 2D binary images in [19]. The equivalence relation is not limited to translation of images represented by Eq (3). It includes afﬁne transformations such as rotation and scaling. When the purpose of using appearance manifolds is to classify shapes into categories, however, it is not sufﬁcient to use afﬁne transformations of images as the equivalence relation. For example, suppose that we want to classify an object image according to whether it has a smooth margin or an irregular margin. A portion of an image of Object A1 in Figure 1 may be locally deformed, but it still belongs to a class of the smooth margin. This situation cannot be handled by equivalence relation based on afﬁne transformations. For this reason, we propose a mapping called the quotient manifold mapping from each point of the manifold to the quotient subspace using the local afﬁne transformations as the equivalence relation. We ﬁrst subdivide an image X to subimages represented by stacked vectors xi (i = 1, 2, ...). The subimages may overlap. The subimages are normalized so that their norms are one: |xi| = 1 (i = 1, 2, ...). Then, we apply the equivalence relation to the subimages to obtain a quotient appearance total space for each of them. The quotient appearance manifold mapping of the image is a set of quotient appearance total space of its subimages. We can generalize this way of constructing a quotient appearance manifold mapping as follows. We ﬁrst associate a neighborhood U (p) to each pixel p of the image. When we denote the quotient appearance total space of U (p) by M (p), the quotient appearance manifold mapping of the image is obtained as a set of M (p); i.e., S = {M (p)|p ∈ domain(I)} where domain(I) ⊂ [1, m] × [1, n]. We regard S as the approximation of the appearance manifold.

We can further construct a mapping from the original image to the set S of M (p) such that F : I → S. This situation is analogous to a Gauss map where a surface normal (nx, ny, nz) is associated with each point of a surface.

2008

2.1. Dimension reduction of quotient appearance manifold mappings

This section discusses the dimension reduction of the quotient appearance manifold mapping. Assuming that M (p) forms a subspace of low dimensionality, we can calculate the subspace in a similar way to ordinary dimension reduction such as PCA, ICA and LDA. In our case, the numbers of classes are small for LDA to be robustly computable. In this paper, we use PCA for simplicity; i.e., given points of a manifold that are represented by stacked vectors, we calculate the mean μ and covariance matrix C of the vectors obtained in Section 3 to get the principal components as the eigenvectors e1, e2, ..., em of C. The distance between two appearance manifold mappings S1 = {M1(p1)|p1 ∈ domain(I1)} and S2 = {M2(p2)|p2 ∈ domain(I2)} is deﬁned as the sum of the distances

dist =

dist(M1(p1), M2(p2))dp1 (4)

p1 ∈domain(I1 )

between each corresponding pair of M1(p) and M2(q) where p2 = g(p1). The correspondence g between p1 and p2 is discussed in Section 2.2. The distance dist(M (p1), M (p2)) is given by k wi(xi − yi)˙ek where ek (k = 1, 2, ...) are the eigenvectors, and xi and yi represent the subimages of M (p1) and M (p2), respectively. The calculation of xi and yi is described in Section 3.

The principal axes can be regarded as part of the spanning vectors of the tangent hyperplane of the quotient appearance manifolds when the quotient appearance total spaces are thin. Suppose that the variance of the space is very small along some axes; i.e., the manifold is thin along the axes. The principal axes are orthogonal to them and approximate the tangent hyperplane.

When the manifolds cannot be approximated by linear structures, we need nonlinear manifold learning such as ISOmetric MAPping (ISOMAP) [24], Locally Linear Embedding (LLE) [17], Laplacianfaces [8] and Riemannian manifold learning [12]. ISOMAP ﬂattens manifolds while trying to preserve the geodesic distances approximated by shortest paths. LLE calculates the locally linear patches approximating the manifolds. Laplacian faces map face images into face subspace by Locality Preserving Projections. Riemannian manifold learning calculates the Riemannian normal coordinates that preserve the geodesic metric well. In order that the methods work robustly, we need a great number of data points of the manifolds.

2.2. Correspondence between two quotient appearance manifolds

As can be seen above, we need to establish the correspondence g : M1 → M2 between M1(p) and M2(p2) to calculate the distance between two appearance manifolds. We can use automatic 3D volume matching methods such as [18, 7]. For the sake of simplicity, we subdivide the two

images I1 and I2 into equal numbers of subimages I1,l and I2,l (l = 1, 2, ...), respectively, and pair up I1,l and I2,l. In our implementation, we subdivide the images along the Cartesian axes equally.

3. Actual Calculation of Quotient Manifolds

To implement the quotient manifold of a subimage Ii represented by a stacked vector xi relative to the equivalence relation , we transform the subimage to the one with a canonical position, size and posture in order to account for translation, scaling and rotation; i.e., we calculate another image Ji of a ﬁxed size such that Ii Ji holds. (For simplicity, we neglect shears.) For example, if the transformation is translation, we translate the subimage to obtain Ji. Denoting the transformation Ψ, we look for Ji = Ψ(Ii) that is closest to Ii. However, the deﬁnition of the canonical position and rotation is itself a difﬁcult task. We propose to obtain a canonical one as follows. We ﬁrst choose an image Ki. In our implementation, we have chosen an image from the training dataset randomly. Alternatively, we can choose the mean of the training dataset. We can deﬁne the closeness by the mean squared distance. Finally, we deﬁne Ji = argminIi |Ψ(Ii) − Ki|2. To account for rotation, we rotate Ii to generate its rotational variations, and calculate Ji for each of the rotated versions.

3.1. Implementation details

We ﬁrst resample each grayscale lesion volume to a volume I of m × n × l in size. Next, we subdivided each volume to eight disjoint subvolumes Ii,j,k (i, j, k = 0, 1) of size m/2 × n/2 × l/2 ;

Ii,j,k(x, y, z) = I(x + mi/2, y + nj/2, z + kl/2). (5)

The image I is then rotated by 45n (n = 0, 1, .., 7) degrees on the xy-plane, and by 180 degrees to ﬂip the z-axis to generate sixteen rotational variations. (Fewer variations were needed for the z-axis because our data resolution was smaller in the z-direction.) The subimages are generated in the same way for each rotation.

Given a dataset of images, we chose one of the lesion images as the canonical image K. The best translational position of each subimage is calculated such that the mean squared difference between it and Ki,j,k = K(x+mi/2, y+ nj/2, z + kl/2) is minimized (in a similar manner to block matching for MPEG video compression). The resulting set of subimages are regarded as the quotient appearance manifold mapping {Mi,j,k}.

4. Experiments

We apply the quotient appearance manifold mapping to classify a medical grayscale volume. In particular, we used Breast Magnetic Resonance Imaging (MRI) for the experiments. In Dynamic Contrast-Enhanced Breast MRI, Con-

2009

trast Agent (CA) is injected to a patient, and the CA concentration in the lesions is measured by MRI as the grayscale volumes. A lesion can be a mass (consisting of one connected component) or a non-mass (consisting of scattered connected components). We focus on the mass lesions because the shape and margin attributes are only available for mass lesions. (We cannot describe the shape or margin of a non-mass lesion.) The shape of a mass lesion volume is clinically classiﬁed to round, oval, lobulated, and irregular shapes. In a similar manner, the margin (boundary of the lesion) is classiﬁed to smooth, irregular, and spiculated (margin with spikes). These categories and example images are described in the BI-RADS (Breast Imaging-Reporting and Data System) [9] guide book. Round or oval shapes are mostly related to benign lesions such as round for lipid and oval for ﬁbroadenoma, while irregular shapes with spiculated margin are often related to malignant lesions (cancers). The number of lesions we used is 109 in the dataset consisting of 59 patients. We asked an experienced radiologist to read each lesion image and to describe the ground truth shape and margin attributes.

The instances belonging to the same category have very different appearances, which makes it challenging for a classiﬁer to allow for intra-class differences while distinguishing the inter-class differences. Moreover, the inputs are grayscale 3D volumes of MR images that are computationally heavy. Figures 2 and 3 illustrate several lesions with different shapes and margins, respectively.

A

B

C

D

Figure 2. The middle slices of lesions of different types of shapes. A: round, B: oval, C: lobulated, D: irregular

A

B

C

Figure 3. The middle slices of lesions of different types of margins. A: smooth, B: irregular margin and C: spiculated

A

B

C

D

E

F

G

Figure 4. The middle slices of the volume from the axial view of the 10 highest ranked principal components. A: round B: oval, C: lobulated, D: irregular, E: smooth, F: irregular margin, and G: spiculated.

4.1. Training

We used 76 lesions from 39 patients as the training set. There are 31 mass lesions (5 round, 13 oval, 8 lobulated and 5 irregular shapes, and 23 smooth, 5 irregular and 3 spiculated margins), 26 non-mass lesions, and 19 other tissues that are treated as negative samples. Regarding the shape attribute, for each group of round, oval, lobulated and irregular lesions, respectively, we calculated the mean and covariance of subimages Ji,j,k with rotational variations, and obtained the principal components as the eigenvectors ei (i = 1, 2, ...) of the covariance matrix. The principal components whose eigenvalues are smaller than a threshold are discarded. Finally, we have projected the subimages to the principal component axes by ai = xi,j,k · ei (i = 1, 2, ...) where xi,j,k is the stacked vector of a subimage. We used the projected values ai as the feature values for machine learning. The three classes for the margin description of a mass lesion, smooth, irregular and spiculated, are handled in the same manner.

Figure 4 shows the top principal components of the quotient appearance manifold mappings for each class. In comparison to Figure 9, they contain high frequencies that enable to distinguish the class differences, which will be discussed in Section 4.4.

4.2. Classiﬁer

We used Quadratic Discriminant Analysis (QDA) as the classiﬁer for the ﬁrst experiment. We chose QDA because it performs well with a relatively small number of features (i.e., principal components). For each description, the best quadratic surface that separates the positive and negative sets has been calculated. We have chosen the combination of the principal axes that gives the best result for the training set. The same axes thus selected are used in the testing phase, too. The threshold of the QDA is chosen so that the predetermined speciﬁcity (0.8 in our experiments)

2010

is attained for the training set.

4.3. Testing

The test set is disjoint from the training set. There are 33 lesions from 20 patients. There are 18 mass lesions, (1 round, 5 oval, 3 lobulated and 9 irregular shapes, and 9 smooth, 3 irregular and 6 spiculated margins), 11 non-mass lesions, and ﬁve other tissues. Each lesion is subdivided to subimages and rotated in the same way as the training set. Figure 5 shows the ROC curves of classiﬁcation of mass shapes. Figure 6 shows the ROC curves of mass margins. Comparing the ROC curves with the ROC curves without quotient appearance manifold mappings in Figures 7 and 8, we can see signiﬁcant improvement in all categories.

(A)

(B)

ﬁcity #true neg#attriuveesn+e#gafatilvseespositives attained using the threshold of the QDA determined at the time of training.

4.4. Additional experiments on rotational variations without quotient manifold mappings

In this section, we will investigate the effect of rotational variations, as well as the performance difference in terms of feature dimensions. We ﬁrst show two experimental results. We do not use the quotient manifold by translation in order to understand the pure effect of rotational variations. We have used AdaBoost in a similar way to [27] to avoid the extra work for feature selection, and plotted the ROC after 100 iterations. We mix the training and testing datasets, and employ leave-one-out cross validation in order not to be affected by the small number of samples.

Figure 7 shows the result with 332 samples (four rotational variants: with 0, 90, 180 and 270 degree) and 664 samples (eight rotational variants: with additional 45, 135, 225 and 315 degree). There is a large difference in the performance particularly for the irregular shapes because the number of samples was increased. Figure 8 shows the ROC curves for margin categories with eight rotational variants.

(C)

(d)

Figure 5. ROC curves of masses with (A): round (AUC=0.796),

(B): oval (AUC=0.735), (C): lobulated (AUC=0.313), and (D): ir-

regular shapes (AUC=0.942)

664

664

664

332

332

332

(A)round+oval

(B)lobular

(C)irregular

Figure 7. ROCs of different numbers of rotational variations

(A)

(B)

(C)

Figure 6. ROC curves of masses with (A): smooth (AUC=0.813), (B): irregular (AUC=0.707), and (C): spiculated margins (AUC=0.856)

The dashed line shows the operating point; the sensitivity #true pos#ittirvuees+po#sfiatilvseesnegatives and the speci-

(A)

(B)

Figure 8. ROCs of (A): smooth and (B): irregular or spiculated

margin

Figure 9 shows the central slices of the volumes from the axial view of the 20 highest ranked principal components for each class with eight rotational variants, respectively. We can see that the patterns of their top principal components down to the 8th are mostly low frequencies and are very similar for all categories with slightly different orders; i.e., the classes are very similarly interpreted under the PCA space and hence, it is difﬁcult to separate the classes. It means that the ﬁbers of the appearance manifolds of the classes are entangled, which shows why we need the quotient manifold mapping. In comparison, the top principal

2011

components of the proposed quotient manifold mapping in Figure 4 contain high frequencies that distinguish the class differences.

A

B

C

Figure 9. The middle slices of the volume from the axial view of the 20 highest ranked principal components. A: round or oval, B: lobulated, C: irregular. The images show resemblance to one another, which means that it is difﬁcult to separate the classes by PCA.

In the second experiment, the accuracy differences by using different numbers of highest ranked dimensions for training are compared. We use four different numbers of dimensions: 10, 25, 50 and 100 with eight rotational variations. The results are shown in Figure 10. From the ﬁgure we see that for the round or oval shape, the accuracy does not improve much by using more dimensions. Whereas for the irregular shape, the accuracy improves substantially by using 100 dimensions over fewer dimensions. This is reasonable since irregular shapes have a higher variety than oval shapes, which requires more dimensions for training its classiﬁer. In the above experiments, the irregular shape classiﬁer performs better than oval and lobular shape classiﬁers.

10

10

10

25

25

25

50

50

50

100

100

100

(A)round+oval

(C)lobular

(B)irregular(shape)

Figure 10. ROCs of different number of highest ranked principal components used

4.4.1 Comparison with customized features

We developed customized features to compare with the performance of our quotient appearance manifold mapping and to show that the shape and margin in the clinical sense is not a trivial image processing task. The ﬁrst feature is the histogram of angles between adjacent voxels on the lesion

boundary. When the boundary is smooth, the angle histogram has smaller variations than when the boundary is irregular. The second feature captures how the lesion shape deviates from a sphere; it calculates the number of lesion voxels inside spheres centered at the lesion centroid. When the radius r is varied, the ratio of the numbers of such voxels to the sphere volume can be regarded as a function of r. The feature is expected to capture whether the shape is round, oval, or irregular. We also use other features such as the ratio of th√e volume V and the volume of the circumscribed sphere, S/V 1/3 where S is the surface area, and the aspect ratios along x, y and z axes. The results for the testing set are plotted by blue dots in Figures 5 and 6. It can be seen that that quotient appearance manifold mappings outperform the customized features in most categories. For example, the sensitivities of round and lobulated shapes are zero for the customized features. At the same time, we can see that the clinical smoothness or irregularity of a lesion is not a simple geometric smoothness of a volume. For example, the difference between round and oval shapes is not just measuring the aspect ratios of width, height and depth; oval objects are more likely to be ﬁbroadenomas, so the look of a ﬁbroadenoma affects the shape description.

5. Conclusions and Future Work

We have proposed the quotient appearance manifold mapping to untangle the ﬁbers of the appearance manifolds. In this paper, we assumed that each quotient appearance manifold mapping forms thin subspace so that the hyperplane obtained by PCA approximates the quotient appearance total space well. The use of ISOMAP, Riemannian manifold learning or other methods when the subspace is curved is left as future work. We also need to increase the dataset for robust classiﬁcation.

References

[1] http://www.galaxyzoo.org.

[2] K. Arbter, W. Snyder, H. Burkhardt, and G. Hirzinger. Application of afﬁne-invariant fourier descriptors to recognition of 3-D objects. IEEE Trans. Pattern Analysis and Machine Intelligence, 12(7):640–647, 1990.

[3] P. Belhumeur, J. Hespanha, and D. Kriegman. Eigenfaces vs. ﬁsherfaces: Recognition using class speciﬁc linear projection. IEEE Trans. Pattern Analysis and Machine Intelligence, 19(7):711–720, 1997.

[4] S. Belongie, J. Malik, and J. Puzicha. Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Analysis and Machine Intelligence, 24(24):509–522, 2002.

[5] G. E. Bredon. Topology and Geometry. SpringerVerlag, 1993.

2012

[6] B. A. Draper, K. Baek, and M. S. Bartlett. Recognizing faces with PCA and ICA. Computer Vision and Image Understanding, 91:115–137, 2003.

[7] G. Guetat, M. Maitre, L. Joly, S.-L. Lai, T. Lee, and Y. Shinagawa. Automatic 3-D grayscale volume matching and shape analysis. IEEE Trans. Information Technology in Biomedicine, 10(2):362–376, 2006.

[8] X. He, S. Yan, Y. Hu, P. Niyogi, and H. J. Zhang. Face recognition using laplacianfaces. IEEE Trans. Pattern Analysis and Machine Intelligence, 27(3):328– 340, 2005.

[9] D. Ikeda, C. K. Kuhl, and et al. Development, standardization, and testing of lexicon for reporting contrast-enhanced breast magnetic resonance imaging studies. J. Mag. Res.Im, 13:889–895, 2001.

[10] J. Kim, J. Choi, J. Yi, and M. Turk. Effective representation using ICA for face recognition robust to local distortion and partial occlusion. IEEE Trans. Pattern Analysis and Machine Intelligence, 27(12):1977– 1981, 2005.

[11] K.-C. Lee, J. Ho, M.-H. Yang, and D. Kriegman. Visual tracking and recognition using probabilistic appearance manifolds. Computer Vision and Image Understanding, 99(3):303–331, 2005.

[12] T. Lin and H. Zha. Riemannian manifold learning. IEEE Trans. Pattern Analysis and Machine Intelligence, 30(7):1270–1281, 2008.

[13] C. Liu. Gabor-based kernel PCA with fractional power polynomial models for face recognition. IEEE Trans. Pattern Analysis and Machine Intelligence, 26(5):572–580, 2004.

[14] X. Liu, T. Chen, and B. V. K. V. Kumar. Face authentication for multiple subjects using eigenﬂow. Pattern Recognition, 36(2):313–328, 2003.

[15] S. G. Mallat. A theory for multiresolution signal decomposition: The wavelet representation. IEEE Trans. Pattern Analysis and Machine Intelligence, 11(7):674–693, 1989.

[16] K. Okada, J. Steffans, T. Maurer, H. Hong, E. Elagin, H. Neven, and C. Malsburg. The Bochum/USC face recognition system and how it fared in the FERRET phase III test. In H. Wechsler, P. J. Phillips, V. Bruce, F. F. Soulie, and T. S. Huang, editors, Face Recognition: From Theory to Applications, pages 186–205, 1998.

[17] S. Roweis and L. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290:2323–2326, December 2000.

[18] D. Rueckert and A. F. Frangi. Automatic construction of 3-D statistical deformation models of the brain using nonrigid registration. IEEE Trans. Medical Imaging, 22(8):1014–1025, 2003.

[19] E. Sharon and D. Mumford. 2D-shape analysis using conformal mapping. In Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, 2004.

[20] J. Shotton, A. Blake, and R. Cipolla. Multi-scale categorical object recognition using contour fragments. IEEE Trans. Pattern Analysis and Machine Intelligence, 30(5):796–809, 2008.

[21] K. Siddiqi, A. Shokoufandeh, S. J. Dickinson, and S. W. Zucker. Shock graphs and shape matching. International Journal of Computer Vision, 35(1):13–32, 1999.

[22] K. B. Sun and B. Super. Classiﬁcaiton of countour shapes using class segement sets. In Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, 2005.

[23] A. Tefas, C. Kotropoulos, and I. Pitas. Using support vector machines to enhance the performance of elastic graph matching for frontal face authentiﬁcation. IEEE Trans. Pattern Analysis and Machine Intelligence, 23(7):735–746, 2001.

[24] J. B. Tenenbaum, V. de Silva, and J. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290:2319–2323, December 2000.

[25] A. Tsai, A. Yezzi, W. Wells, C. Tempany, D. Tucker, A. Fan, and W. E. Grimson. A shape-based approach to the segmentation of medical imagery using level sets. IEEE Trans. Medical Imaging, 22(2):137–154, 2003.

[26] M. Turk and A. Pentland. Face recognition using eigenfaces. In IEEE International Conference on Computer Vision and Pattern Recognition, pages 586– 591. 1991.

[27] P. Viola and M. J. Jones. Robust real-time face detection. International Journal of Computer Vision, 57(2):137–154, 2004.

[28] M.-H. Yang. Kernel eignfaces vs. kernel ﬁsherfaces: Face recognition using kernel methods. In IEEE International Conference on Automatic Face and Gesture Recognition, pages 215–220. 2002.

[29] S. K. Zhou, J. Shao, B. Georgescu, and D. Comaniciu. Boostmotion: Boosting a discriminative similarity function for motion estimation. In Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, 2005.

2013

## Categories

## You my also like

### 2019 Trans Am Nationals Winners List

548.4 KB5.9K1.5K### IEEE Abbreviations for Transactions, Journals, Letters

188 KB98888### Similarity and Dissimilarity Measures

1.6 MB58.6K20.5K### Preprocessing Challenges in Document Image Analysis

490.5 KB34.8K11.5K### Journal of Molecular Structure

4.2 MB22K7.3K### Skin Again Ointment

336.5 KB23.2K6K### DistillHash: Unsupervised Deep Hashing by Distilling Data Pairs

732.8 KB18.9K8.7K### Circumventing Short Channel Effects in FETs: Review

1003.9 KB58.3K13.4K### Analysis Of Capacitor Voltage Balance In Multilevel Inverter

834 KB59.7K28.6K