| Breast cancer is one of the most frequently diagnosed malignant diseases and the leading causes of cancer death in women all around the world. Numerous studies have shown that early detection is very critical for saving lives and increasing the treatment options. Mammography is the dominant method for early detection of breast cancer. It can detect the earlier sign before it can be felt by a health care professional. The computer-aided diagnosis (CAD) system for mammography aims to assist the radiologists in suspicious region detection and analysis, in order to improve the accuray of early diagnosis of breast cancer.Mass is one of the major abnormities on mammogram with complex characteristics. Some of them are embedded or hidden in surrounding tissues. Thus, computer-aided detection and diagnosis for masses on mammograms is one of the research directions and difficulties in mammographic CAD system. The traditional mammographic CAD system focused on the lesion detection and pathology classification without providing the specific diagnostic basis for the radiologists. This "black-box" strategy decreased the radiologists’ confidence and reliance on the CAD system. In recent years, the mammographic CAD system using content-based image retrieval (CBIR) can overcome the drawback of the traditional CAD system.This thesis focused on several key techniques in mammographic CBIR-CAD system, in order to improve the computer-aided detection and diagnosis performance for masses and provide valuable "second opinion" for the radiologists. The main contents of this thesis include mass segmentation, feature extraction and optimization, classification of mass subtypes and CBIR for similar masses.(1) Mass segmentation:Two mass segmentation methods based on the random walks (RW) algorithm were proposed in this thesis. The first method utilized the isocontour map of mass to label the seeds needed for RW algorithm automatically. The background seeds were labeled as a closed contour surrounding the mass to suppress the peripheral tissues. In addition, by using the nested pattern of the isocontour map, the mass seeds extended from the interior to the exterior of the mass and led to a series of segmented regions produced by the RW algorithm. The final segmentation result was derived according to a criterion function which combined the size, gradient and intensity information of mass. This method overcame the practical limitation of the original semi-automatic RW algorithm. The second method combined the complementary nature of the RW algorithm and Chan-Vese (CV) active contour model. Firstly, the initial random walks segmentation was implemented according to the seeds which were labeled automatically by the isocontour method. Secondly, two probability matrices produced by the initial RW segmentation were utilized to modify the energy function of CV model for the prevention of contour leaking. Lastly, the final segmentation result was derived by the contour evolution, during which the probability matrices were updated. The experimental results shown that the segmentation performance of these two proposed methods were more accurate, adaptive and robust than other three existing methods.(2) Feature extraction and optimization:On the basis of the Breast Imaging Reporting and Data System (BI-RADS), the intensity, shape and margin features of masses were extracted, among which six new features were proposed according to the topographic transformation of the mass isocontour. Then, five feature evaluation metrics were combined to evaluate the performance of the features. For different classification objective, the filter method was adopted for feature optimization. In the experiments, the feature subsets and classification performance before and after the feature optimization were evaluated and compared. The proposed six new features showed obvious advantages over the classification of five margin subtypes of mass. Moreover, the feature optimization can effectively decrease the feature dimensions as well as improve the classification performance and efficiency.(3) Classification of mass subtypes:A dynamic multi-class classifier utilizing support vector machine (SVM) and binary decision tree (BDT) was proposed for the classification of four shape subtypes and five margin subtypes of masses. This dynamic SVM-BDT method took advantages of both the efficient computation of the BDT and the high classification accuracy of SVM. Moreover, it adjusted the SVM classifiers on the node and the testing strategy according to the characteristics of each query mass. Thereby, it prevented the cumulative error in the traditional SVM-BDT method, as well as improve the classification accuracy and efficiency.(4) CBIR for similar masses:We firstly used a similarity measure function which integrated the probability outputs of the subtype classification with the Euclidean distance to perform the first round of query-by-example (QBE) retrieval. Then, in the relevant feedback (RFb) mode, the system allowed the users to identify the returned masses as relevant and irrelevant interactively. In the practical application, the number of the labeled masses was not enough sometimes. To deal with this problem, the Euclidean distance between the unlabeled and labeled samples in the kernel space was generated to enlarge the size of the feedback samples. Then, a new SVM model was trained by these feedback samples to determine the relevance of the masses in database. The leave-one-out cross validation (LOOCV) and precision-recall curve (PRC) were used to evaluate the retrieval performance of the QBE and RFb modes. The experimental results indicated the better performance of the proposed similarity measure function using in the QBE mode. Moreover, with the same number of labeled samples and RFb rounds, our method showed more significant improvement of the retrieval performance in the RFb mode. |