Extracting and Selecting Meaningful Features from Mammogram Digital Image Using Factor Analysis Technique

Breast cancer is the most common cancer in women around the world. Various countries including the United Arab Emirates (UAE) offer asymptomatic screening for the disease. The interpretation of mammograms is a very challenges task and is subject to human error. However the Mammography is considered to be a significant method for detection of breast tumors. Hence, finding an accurate and effective diagnostic method is very important to increase survival rate and reduce mortality of the women. Not all the features in the image give the characteristics and information of the image however, only some extracted features that can express enough information about the image. In this research statistical features selection methods have been developed with association with statistical techniques. 141 ROIs extracted from Digital Database for Screening Mammography (DDSM) has been used to contact this research. Our experiment was classified into two stages in order to reduce the image features which extracted from the ROIs. In the first stage we applied statistical techniques to reduce feature with high accuracy rate, in the second stage we applied graph based method and bayesian inference. Our method was able to achieve high accuracy compared to the original selected features.


-INTRODUCTION
Mammographic Screening has turned out to be an effective computerized solution in reducing the death rate of women suffering from breast cancer. A huge volume of mammograms get yielded by this screening procedure which can only be translated by radiologists. The diagnosis of malignancies from medical images is significantly improved by using computer aided diagnosis [31].
Few of the recently invented digital computerized techniques can be utilized by radiologists in order to facilitate the process of mass detection in mammograms. Mass detection on mammograms is one of the difficult tasks. With this stands a higher chance of generating cancer as well as some more mammographic abnormalities. This may results in invasive cancer too.
The recent inventions of computerized techniques to detect masses involve following key attributes. The first technique represents breast region isolation on digital mammogram and there is alignment made in corresponding two breast images. The second technique involves the region identification which is considered to be a suspicious region which could develop masses. The third technique involves feature analyses of all possible suspicious regions which in fact help in lowering the total number of false positive detections and detecting malignant region depends on quality of the image features [21]. Some of the visual features of each suspicious region are used as certain additional measures which can be utilized to differentiate normal and abnormal tissues There are two main directions in the research of the field of breast cancer detection. The first direction is to use the image processing approaches in order to enhance the image and to find the best region of interest (ROI). The other direction is to apply the best classifiers, by using the machine learning with conjunction with image processing. A computer Aided Diagnosis (CAD) algorithm has been previously developed to assist radiologists in the diagnosis of mammographic cluster of classifications [4].
As the second leading cause of cancer related mortality in women, it is crucial that breast cancer be detected in its early stages of development. Mammography has been used as a screening and diagnostic tool for the early detection of breast cancer. Mammography has proven to be effective for women 50-75 years of age [17]. A recent study showed that in women aged 40-49 years; screening mammography reduces breast cancer mortality by 16-18 %.
The detection of breast cancer in its early stage by screening mammography resources is the reliable way to minimize the mortality rate among women and increase the survival too. In addition, radiologists are not capable to accurately classify all lesions detected at mammography as being positive or negative. Although image quality has improved over the years, the interpretation of mammographic abnormalities is the weakest link in the diagnostic process. Up to 20-30% of early cancers are missed due to the misinterpretation of an abnormality as normal [16]. The cause of these false negative reports is unclear but probably represents a misinterpretation of an abnormality on the mammogram rather than the abnormality being overlooked. Computer-aided diagnosis (CAD) [27] [10] systems prompt a radiologist to focus on a suspicious region of the breast. Most CAD systems have a high sensitivity but medium specificity. Radiologists working with CAD have to ignore most CAD prompts of abnormality because of the systems' low specificity. [7] In a large prospective study increased the sensitivity of detection of mammographic abnormality by 19.5% by using a CAD system. The key research objective that has yet to be achieved is to develop CAD systems that increase the specificity to a sufficient level to be able to distinguish between benign and malignant lesions with a high level of confidence [16].
Computer-aided diagnosis (CAD) has been investigated as a method to provide radiologist with actual information of the breast for example estimates of the probability of the tumor, to aid in the classification of abnormalities detected at screening and so improve the specificity of mammogram investigation [22].
Most commercially available CAD systems that have been studied increase the detection rate of mammographic abnormalities compared to single reading by an experienced radiologist but do not assist the radiologist in the decision making process of deciding whether a detected abnormality is benign or malignant [19]. Despite significant recent improvement, the detection of suspicious irregularity of shapes in digital mammograms still remains a complex task. There are at least several reasons. First, mammography provides relatively low contrast images, especially in case of dense or heavy breasts. The visual manifestation in the mammogram of the shape and border of a lesion does not only depend upon the tangible characteristics of the lesion, but also is affected by the image acquisition technique and by the projection considered [2]. Second, symptoms of a presence of abnormal tissue may remain quite subtle. Important abnormality markers, the micro-calcification clusters, are easier to detect. However, in both cases one has to decide whether the detected lesion is benign or malignant with a significant level of uncertainty. Third, the automated detection of masses can be hampered by the wide diversity of their shape, size and subtlety. For these reasons, we need robust techniques for enhancing mammogram contrast, segmentation, detection of micro-calcifications and malignancy assessment [2]. In medical image especially the mammogram the segmentation between the normal tissue and tumor was impossible this due to the difference between them is actually small [7].
Feature extraction and selection is considered to be the most important stage in the CAD the classification is totally depends upon the feature extraction and selection [6]. The number of features selected and extracted for breast tumor detection reported in literature varies with the CAD approach employed. By using meaningful features and optimal number of feature the classification would be more accurate [1] and valuable while a large number of features would increase computational needs, making it difficult to define accurate decision boundaries in a large dimensional space. [15][20] [18]. the main objective of appropriate feature selection is to enhance the prediction performance of the predictors, O c t 2 5 , 2 0 1 3 providing more rapidly and more cost efficient predictors and to providing a superior awareness of the processes that generated the data [13].

-RELATED WORK
In medical image classification the features are extracted first but the significance features are at highdimensional data space [33]. Many technique challenges have been proposed to solve this problem this including computational complexity, sparsity and redundancy. There are many studies have been conducted for dimensionality reduction [14]. The high dimensionality can be transformed into low dimensionality by minimizing the loss of the data which is contained by the high-dimensional information. Many methods for feature extraction from digitized mammogram have been calculated from the derivation of limited gray level scale information [12][4].

-MAMMOGRAM FEATURES
The CAD image features are theoretical descriptions which are needed for the image processing and for the study of the image's meaning and content. The image features representation transpires as information's data structures that are able to be directly extracted from images which include colors and higher mathematical calculation derived from the feature's basic information that comprise histograms, its edges, and Fourier descriptors. Conversely, the use of more features is correlated with an introduction of longer computation efforts and higher costs. More features lead to longer calculation efforts, both in their gathering and application in the purpose of prediction which means that addition of more features into the system does not lead to its enhancement in an efficient manner because a feature which is newly added may not automatically contribute to results that are more precise.
The enhancement of the quality of feature extraction process is reachable through two distinct strategies in which one is the extraction of more new features and the second one is examining the procedure for feature pruning [23] [29]. Researchers have investigated features extracted from the mammogram which did not lead them to the classifier performance being enormously enhanced. [8] Have used only 6 features which used the graph based method and found out a true positive rate of (82.83%) and false positive rate of (0.08%). In addition, [9] used the sequential forward search (SFS) technique and create only 25 features with (0.02994) Mean Square Error (MSE) as a result of using of General Regression Neural Networks (GRNN). The application of the support vector machine (SVM), showed 11 features with Mean Square Error (MSE) of 0.0283. Numerous ways exist that can be used in discarding features that are non-significant in a system such as sequential backward search (SBF), sequential forward search (SFS) and stepwise regression. SBF and SFS focus on reducing MSE at the process of detection whereas stepwise regression is concerned with both the MSE value and features interaction. The use of stepwise logistic regression is an expensive technique mainly because it is based on the total number of experiments carried over all permutations that are likely for every feature present in the prediction model.
For feature extraction investigation, the following four important problems are recommended namely: 1. The feature extraction analysis should remove the multi-collinearity.
2. If many features are extracted, the image processing cost may be high.
3. The amount of features is not relied on when predicting precision.
There might be difficulty in building the feature extraction to speed up the classifiers. To clarify the meaning and content of the image, features are considered as theoretical description of the image [4]. This represents the features that should emphasize as data structure of material which are directly sourced from color and from its histogram which require complex mathematical calculations along with other factors such as Fourier descriptors and edges. Therefore, to process the data pertains to each feature different algorithm must be established at each instance. It is also vital to remember only to select those features that carry adequate data about the image in which such procedure uses in feature extraction has become practical and easily quantifiable. A large amount of research has cleared ways to revolutionize superior analysis of getting sufficient data which had increased cost as well as the time to process the image analysis. In mammogram the discovery of images are restricted gray scale analyses only which creates images only with less concentration and as a result identification between malignant and typical tissue has in the past been very difficult [7]. This has led gray-scale method demonstrating to inefficient in getting sufficient data in image extraction scenario [4].
By and large features of medical images can be segregated into 3 areas such as spatial, spectral and texture. In gray scale technology the spatial is limited to grey-level information only. This includes foreground and background information as well as other statistics such as shape etc. The texture represents the configuration or surface of an entity in both weighty and transmissive mammogram. Texture study is a vital part of various computer usages being applied for segmentation, grouping and identification of images with spatial variations. Spectral density signifies positive real value of a frequency which is relates to the stochastic process which is fixed and has both power and energy dimensions. This is essential to attain all the useful features which a quantifiable form represents. Various past studies such as with [10], was found that the vast majority of extractions were done presuming the progress of a detection system with enhanced features. Therefore, although many are done by adjusting the old features, some used more familiar extraction from syntactic image [25] and from knowledge base use [24]. The precisions were not much promising which has created a positive increase in analyzing the complexity in detection step and time consumption. As a result of the enhanced features of the process, the feature selection has been a vital task which enabled to tackle CAD problem quickly. O c t 2 5 , 2 0 1 3 In essence, theory is a proper means of investigation of the ideal features which must confront with both the problem of extraction and selection. Contemporary extensive research from medical analysis has smoothed the way to hope in finding the greatest features or a grouping of such that would result supreme classification rate when suitable classifier is used. The feature selection and extraction gives many angles as listed below. [9] used 61 features in the selection of a best features subset that led to the production of best micro-calcification identification using sequential backward search (SBS) and sequential forward search (SFS) reduction which are followed by a Support Vector Machine (SVM) and General Regression Neural Network (GRNN). Because of the method of feature selection it was found that between the two methods there was high inconsistency namely one feature that on the SFS was top-five in significance was discarded on the SBS. [30] greatly strived in making efforts to extend feature selection based on neural genetic technique in which each individual in the population essentially represent a candidate feature problem solution towards a certain problem selection subset. The test involved 14 features and roughly 214 feature subsets and the results attained from the experiment showed that a small number (5) of feature subsets had actually resulted highest classification rate which was at 85%. Using neural-genetic approach is costly in feature selection especially specially when the number of features is extensively large and mammography is considered.
[3] [32] Separately used statistical features that were simple on gray scale intensity, whereas [34] used sphericity, volume, the gray level standard deviation, mean of gray level, maximum eccentricity, gray level threshold, maximum circularity, maximum compactness and radian of mass sphere in their CAD system. [24] Used standard deviation, average gray scale, maximum and minimum of gray scale, skewness, gray level histogram and kurtosis in the identification and detection of lung cancer. [27] Conducted a study on 150 images obtained from database of Japanese Society of Radiological Technology (JSRT) using background image, patient age, scope of irregularity, RMS of power spectrum, as well as the full size at half maximum within the segmented region. [5] Studied on region of interest properties, personal profile, shape, and nodule size. [26] led to an extension of the new modified features, average gray level, number of pixel in ROI, energy, adjusted standard deviation, entropy, modified entropy, modified energy, standard deviation, average boundary gray level, modified skewness, skewness, and contrast.
On further investigation, there was a use of more features in addition to medical image analysis, in a study conducted by [28] where the test was designed for fault diagnosis in the induction motors essential in the enhancement of the process of feature extraction by suggesting new kernel trick. On such study, a calculation of a total of 76 features was done from 10 categories of time domain. The 10 categories included mean, shape factor, RMS, skewness, crest factor, kurtosis, entropy error, histogram lower, entropy estimation and histogram upper. In such research we did not find their common way which was used in the selection of features and therefore, we concluded that they tried to add more features with the aim of increasing their method's efficiency.
According to such medical image data and abundant features, the main problem encountered in CAD image processing is high processing cost. Based on each research, from among many researches that have been conducted on this field, there were varieties of feature categorization and extraction used in Medical Image Analysis table 3.1 summarizes these researches.

.1 DATABASE
The Digital Database for Screening Mammography (DDSM) is one of the mammographic image analysis databases which are used by University of South Florida research community. It is a shared effort between Massachusetts General Hospital, Sandia National Laboratories and the University of South Florida Computer Science and Engineering Department. The database contains approximately 2,500 studies. The database research consists of two mammogram for each breast, along with some connected patient information such as (age of the patient during the research progress, O c t 2 5 , 2 0 1 3 breast mass evaluation, subtlety evaluation for abnormalities, and explanation of suspicious) and mammogram information (scanner, spatial resolution). Each mammogram containing cancerous region has correlated pixel-level as well as the information according to the locations and types of cancerous regions. "The DDSM is structured according to "cases" and "volumes." A "case" is a group of digital mammogram containing all the information's which is related to the diagnosed patient [35]. A "volume" is just a group of cases composed mutually for purposes of improve of allocation. All volumes are accessible on 8mm tape.

DETERMINATION OF SAMPLE SIZE
According to the Digital Database for Screening Mammography (DDSM), there are 2500 cases available in 43 volumes. Due to this large population we decided to select a simple random sample size (SRS) to test our experiment by using a statistics random sample size equation: Z is confident intervals represent a z-score where z-score for (90%, 95%, and 99%) are equal 1.645, 1.96, and 2.58 respectively.
E is an error P is the probability that the selected woman is having a true positive (TP) effect of the disease. Q = (1p) is the probability that the selected women is true negative (TN) effect of the disease.
In our experiment we will use 95% confidence interval which gives a z-score of 1.96 critical values and the absolute error E equal to 0.0002. According to the cancer incidence report of the United Arab Emirates (UAE) in 2012 the incidence is 19.4/100,000 UAE population. From the above information's the sample size can be calculated as follow:

-EXPERIMENTAL AND METHOD
Our research consists of five stages and the process for each stage is outlined in the following section ( Figure 3)  (Table 7) Stage 5: Evaluation of three proposed methods O c t 2 5 , 2 0 1 3 Describe the frequency of the input image

CROSS VALIDATION
The grouping of feature stability testing across two statistical classifiers, i.e., Linear Discriminant Analysis (LDA) and Binary Logistic Regression (BLR), our experiment has been proposed as the following format: 141 ROIs were chosen using each of the three selection algorithms that are different; 10 subgroups that are equal were selected where the ROIs were divided. The selection algorithm include: Graph Based Analysis, Factor Analysis and Analysis of Variance. This means the resulting subgroups were 60 available for subsequent test. Among the 60 subgroups, in each 10 subgroups, we applied LDA [36] [11] and BLR to them by a cross validation method that was 10 fold, i.e., one subgroup was the training group whereas the remaining subgroups were used as testing set where the average error was obtained.
The splitting of the data shows that among the selected classifiers two of them are suitably good to facilitate their promote use mainly because there absolute errors as well as other indicators are small. More details of the crossvalidation results are shown in Table (4 -9).

-RESULT AND DISCUSSION
The performance of the binary classification is either positive or negative and is the statistical measures based on specificity and sensitivity. Sensitivity is a probability that is a measure of actual positive proportion or tumourous growth. The equation of sensitivity is explained in equation 4 and table 10. Specificity is defined in the specificity measures and the measure of negatives which refers to benign masses. These two measures are closely related to Type I and II errors. 100% sensitivity is the theoretical optimal prediction for the tumourous growth. 100% specificity is the theoretical optimal prediction for the non tumourous growth.
Where i a is actual value (supervised data), and i p is the predicted value. The experimental results of the three proposed methods with different statistical classifiers are classified according to the ROIs extracted from DDSM, the contribution and the measurements of the proposed techniques are discussed at the end of this chapter. Our research has been divided into three parts based on each method and its experiment.

Reduction Method using Factor Analysis
In this section we will discuss the experimental results for the three proposed techniques, the first part will introduce the experimental about the feature clustering by various number of factor started from two to eight with the contributed sum of square loading. Second part will introduce experimental results of the second proposed technique, and the third part of this section will introduce the experimental results of the third proposed technique by using PCA and ANOVA.

2 Experimental Results
Our experiment is based on 141 ROIs extracted from DDSM which is manually segmented by expert radiologists. In first stage after the ROIs has been segmented we extracted the original features from the ROIs which is consists of 141 O c t 2 5 , 2 0 1 3 ROIs, the original features extracted by using the three features domain (Texture, Spatial, and spectral). The features set collected of the features introduced in table 8. The factor analysis has been trained by different number of clusters varying from two to eight. The results are shown in table 11. The above table showing sum of square loading and number of features distributed according to each cluster. The data are distributed in each group (factor), for example, if the number of factor is 2 there will be a 78.78% explained sum of square with cluster 1 who having 27 features and so on. From the table we can conclude that the feasible solution might be from 5, 6, or 7 factors we select the 5 factor case this due to the difference in the processing step which showing no difference exist. Hence the 5 factors are shown in table 12.
To reduce the cost processing and to fulfill the assumptions of the statistical methods we proposed two experiments in order to evaluate two predicted statistical methods one is the logistic regression with 5 factors and the other is logistic regression with 50 features. The results are shown in table 13.

. 3 Feature Reduction in Graph Based Analysis
This section describe the feature reduction in Graph Based Analysis including training data and feature extraction,

. 3 . 1Training Data and Feature Extraction
In this section we will introduce 141 set of data extracted from the ROIs from DDSM our objective is to categorize consistent features into group of 12 feature sets by using the correlation coefficient, table 15 shows list of extracted features in each group or set.    (  are significant to Y. In conclusion our experiment has shown that the number of features has been reduced to 13 features. Table 18 shows the distribution of the recued features:     According to our experiment the results show that the two groups of features 50 selected features and 13 selected features are not statistically significant difference when we used the learning model LR and ANN. However our proposed method for feature selection method can play a crucial role for improving Computer Aided Detection (CAD) system.

. 4 Feature Selection under Feature Interaction
This section introduces the feature selection under feature interaction based on training data sets by applying very well-known statistical techniques one way Analysis of Variance (ANOVA), and Principal Component Analysis (PCA). Our testing start with applying ANOVA to reduce the primary features in the statement that every feature in two classes are independent and then keep the remaining features for the classification purpose. In order to reduce the initial selected features first of all the initial features has been segmented into sub-group using PCA. Table 21 shows the distribution of features according to the sub-group of features. , then we reduce sub-group i using ANOVA. Figure 13 shows the F-test values of 9 subgroup features as stated in Table 22.  Table 23. For the two methods with 40 features and 27 features with LDA classifiers, the results of sensitivity and specificity are given in Table 23.

6-CONCLUSION
The sensitivity and specificity of our experiment on two features 40 and 27 respectively with two classifier methods LDA and LR are shown different from each other. The application of the techniques in 40 features Hassan more features and less accuracy than method with 27 features. From this conclusion we can accept our hypothesis of 40 features of having features interaction effects during outcome detection.
Our research uses Analysis of Variance (ANOVA) and Linear Discrimenent Analysis (LDA) to tackle the problem of feature relations in order to increase accuracy of detection, and to minimize the cost of image processing. The proposed method is able to reduce features selection from 50 set to 27 set and gives good accuracy with respect to specificity and sensitivity.
The objective of this research is to find techniques to enable the extraction and selection of most appropriate meaningful features for medical image analysis in order to insure accuracy and to minimize computational time. There are a high number of patient focused diagnostic procedures that have increased over the years. The Decision Support systems (DSS) usage has supported the increased diagnosis and detection (CAD) with the lack of expert mammogram advice. The mammogram details are analyzed using the Digital Database for Screening Mammogram (DDSM). The main three steps in the analysis are the pre-processing, feature extraction and detection and classification.
The step of feature selection is focused to detail the important features for the purpose of detection. The features of the selection are too costly and are geared to improve the efficiency of the system. There are two main features of the extraction which includes the retention of the main features useful for the patient indicator explanation and the second feature is for the selection of the important features.
In each situation, there are different parameters for the diagnosis of the detection of cancer in the bone marrow, blood fluid, profile features of the patient, treatment of the blood chemicals and so on. These paradigms are useful in the recognition and in the detection of malignancy and cancer when the steps of feature selection are selected.
While maintaining the main features, the technique of factor analysis is applied in order to keep all features with their properties. And the non-significant features are removed. For removing all non-significance features we applied another statistical techniques such as analysis of Variance (ANOVA), Bayes inference, and correlation analysis. O c t 2 5 , 2 0 1 3 The Factor Analysis is proven to be the best chosen technique for features selection because of the features of the cases and the dependency problem on the features. This is an efficient method to construct a small number of factors like the group of hidden layer node in ANN; the feature space is reduced and is the technique of independent classifier.
There are only a small number of factors and the proficiency of the achievement detection is achieved and the baseof the detection proficiency is achieved. This is also exploring the new kernel function and this process is very simple.
There are two feature techniques based on the discarding the features. The Analysis of variance (ANOV) and the second method is the Graph Based method. The graph based techniques are better than the ANOVA and the 13 selected features using Feature selection in Graph Base analysis. There are 23 features in ANOVA. Therefore applying Graph base method giving a significance accuracy rate. While ANOVA has the same contribution with less processing cost and powerful than the SFS.