Pairwise Fuzzy Ordered Weighted Average Algorithm-Gaussian Mixture Model for Feature Reduction

Feature Reduction is a kind of dimensionality reduction of feature space. There are a number of approaches are used to identify the significant features but they are not using the weighing approach. The weighing approach is quite useful for obtaining the significant features and removing the insignificant and irrelevant features using OWA formulation. The aim of this approach is to obtain the significant features and removing insignificant features by using the pairwise approach. This approach is helpful to find the weights of pairwise features at the same time, which leads to remove the insignificant features from the feature space using OWA. The significance of the OWA formulation is that, the paired features are identified in priori and their sum of weights are equal to 1. OWA criterion is introduced to obtain the significant features that are useful for predicting the accuracy of the cluster in GMM.


Introduction
Feature reduction is a challenging problem of finding the significant features in data mining analysis. There are a number of selection techniques existing in feature selection, extraction and reduction. In real world datasets, the insignificant and irrelevant features exist, which are not useful for data mining process. This problem is addressed using the pairwise feature selection method, which finds the significant features in feature space. Model based clustering is one of the approaches for finding the significant features in a "one-in-all-out" manner. The paired features are selected, if at least one pair of clusters is separable by this feature. The insignificant and irrelevant features are removed, if it does not separate any of the clusters. Pairwise Mixture Model is able to take into account for finding the significant features in the model based clustering. In this approach, the new pairwise penalty is employed to find the interdependence between states and between observations. This penalty based method is used to penalize the difference between all pairs of cluster centers for each feature and reduce the centroids of non-separable clusters. The cluster centroids associated with the observations and other insignificant information are removed from the model. In this context, the feature reduction is employed in the pairwise penalty approach for finding the significant features using Gaussian Mixture Model by removing insignificant features from the feature space. Model based clustering helps to form the Gaussian mixtures of different shapes of clusters and it helps to improve the accuracy of clustering.

Motivation
The feature selection in clustering is one of the issues, also known as subspace clustering. Friedman et.al [1] proposed a technique for finding the subset of features using hierarchical clustering, which uncovers the cluster structures. Tadesse et al. [2] introduced a Bayesian term for feature reduction that searches for models, which finds different clusters and subsets of features.
Rafter et al. [3] also employed a regularization approach for identifying the relevant features and removing the irrelevant and insignificant features of the model, which were included or excluded from the model.
Pan et al. [4] proposed a Gaussian Mixture Model to impose a penalty on the cluster means. The means of all clusters were summed up to zero, where the method eradicates the insignificant features in clusters and their means shrunk to zero. In this approach the Bayesian Information Criterion (BIC) was used to identify the insignificant features from the feature space. The drawback of this approach was that, BIC identified a number of non zero components of maximum likelihood and some of the insignificant features were also clustered as non zero component. This Model is less accurate due to the insignificant features in winning component mixture.
Wang et al. [5] introduced two methods to find the insignificant features from the feature space such as Adaptive ∞ norm Penalized Gaussian Mixture Model (ALP-GMM) and Adaptive Hierarchically Penalized Gaussian Mixture Model (AHP-GMM). In this study, if the feature is significant for clustering but its weight was small that feature was lightly penalized. Whereas the feature was insignificant for clustering and its weight was large, hence the feature was heavily penalized. Some insignificant features were also selected by GMM, which affects the accuracy of the model. Jian Guo et.al. [6] proposed a new feature reduction method for obtaining the significant features from the feature space. The pairwise fusion penalty criterion was introduced to differentiate the pairs of the clusters for each feature and shrunk the centroids of non separable clusters. All the clusters were associated with features that are fused, that feature was insignificant and removed from GMM. This method did not identify the insignificant features in priori. So, this approach was not quite useful for identifying the significant features in priori and accuracy of the model was less.
Sen et .al [7] proposed a convex based method to penalize the pairwise ∞ norm regression/classification coefficients for obtaining the pairwise features in the feature space by simultaneous feature selection. In this study, the analysis was made with synthetic and realworld datasets, and some of the significant features were not obtained from the feature space in convex approach.

Pairwise Mixture Model
Let y = {y1, y2, … yn} is a set of N observed data (yn ∈ Ṛ) , x = { x1, x2, … xn} is the classification of y-data into a finite set classes (Ω= 1,2...k). In the probabilistic classic model, the data yn are the realization of mutually independent random variables (Yn)with the same mixture distribution.
, , (2) , , (2) , , ). Here, yn is used to estimate one classification xn only. It is the product of two independent Mixture Models defined by

Ordered Weighted Average (OWA)
The OWA aggregations in which ordered the weights based on relevance. This OWA aggregation is differing from other weighting approach as maximum, minimum and average weights can be calculated. The main feature of this approach recognizes the patterns and decides the patterns with its decision making capability, whether it's relevant or not. This operator helps us to recognize and decide the patterns from the maximum, arithmetic mean and minimum values. [8,10,11]

PFOWA-GMM algorithm
The algorithm for the proposed PFOWA is given below.

Analysis with Wine Dataset
Wine dataset contains the 178 data points and 13 features are used in this experiment. The dataset is divided into training and test set for checking the accuracy of the selected features. Orness criterion is used to select the pairwise features and the selected pairs are utilized for forming the Gaussian Mixture.
. Table 1. Wine Model parameters used to draw mixtures Table 1 Table 2 shows the results of wine dataset based on the model and sampling. The accuracy of PFOWA is compared with FWOWA, FOWA and IRRFS-RPEM, and its model and sampling error rate index values are shown in figure 3 and 4.

Analysis with Ionosphere Dataset
In this experiment, ionosphere dataset which is has 351 data points with 32 features are used for finding the pairwise feature reduct. Each subset is obtained by using orness criterion and selected pairwise features for constructing the GMM Model. The selected subsets are evaluated, and their accuracy is tested by RPEM algorithm.       (4), FOWA(2) accuracy is the best compared with IRRFS-RPEM(1), which plotted in x-axis and y-axis denotes the mean-standard deviation value.

Analysis with Wdbc dataset
In Wdbc dataset, 569 data points with two classes are used in this experiment. Pairwise model and sampling are formed to build the Gaussian mixture. The clusters are formed based on the pairwise features using orness criterion. The pairwise features are obtained from the experiment and it is tabulated in table 5. The orness criterion value is a measure to find the pairwise mixture and their weighted sum is 1. Highly significant and relevant features are obtained and insignificant features are eradicated by using OWA formulation with Joint probability. The mean vectors are  The pairwise mixtures are used to form the Gaussian mixture which is shown in Figure 9. The data points are highly significant and relevant such as    Table 6 shows the various accuracy values of different algorithms such as IRRFS-RPEM, FOWA, FWOWA and PFOWA. The PFOWA algorithm accuracy is less compared with FWOWA and the accuracy is high compared with FOWA due to the pairwise approach. M a y 15, 2 0 1 3

Analysis with Sonar Dataset
In Sonar dataset, 1000 data points with two classes are used in this experiment. The pairwise features are obtained from the feature space, which are significant and relevant features. These feature pairs are tested and estimated by RPEM. Table 7  The Sonar pairwise mixture model is shown in the figure 13. The various pairwise features are selected for building the Gaussian Mixture Model. In this model, the blue color denotes the selected data points for the pairwise mixture in the winning component. The mixture components contain the highly significant and relevant feature pairs which are selected using the penalty. Posteriori probability is represented as the color bar in the Figure 13. The density of mixture value lies between 0 to 3.5.
In Figure 14  In Table 8, the various algorithms are listed, which are used to test the sonar data set and their values are recorded in model and sampling error rate index. Each algorithm has its own specification and working principle is different. The PFOWA error rate index value is high compared with FWOWA in this experiment and error rate index value is less compared with FOWA and IRRFS-RPEM  Table 9 shows the proportions of the selected features by IRRFS-RPEM and FOWA in the real world dataset. The relevant features are selected and their clustering accuracy is less compared with FWOWA. In this context, Wine, ionosphere, Wdbc and Sonar datasets, the accuracy get slightly decreased when compared to FWOWA. PFOWA finds the significant and relevant features in the dataset and its predictability power is high, when compared to FOWA. PFOWA predictive accuracy is less with a mean of 1.03% when compared to FWOWA.

Conclusion
Feature reduction techniques reduce the representation of dataset that is much smaller in volume in the feature space. The objective of this study is achieved through Pair wise Fuzzy Ordered Weighted Average approach, which identifies the significant and relevant features from the features space by removing the insignificant features. The strength of this approach, OWA operator and penalty method is combined for obtaining the most significant and relevant features in the real world datasets.
The low weighted features are identified by the algorithm and penalty is applied to enhance the accuracy of the cluster accuracy. In this penalty based method, the priori probability and posterior probability are incorporated for penalizing the low weighted features in the feature reduction process. PFOWA algorithm using RPEM is also added advantage, to estimate the parameter values and evaluate the features for constructing the cluster structures and component mixture in this approach. The analysis reveals that, there is an improvement in the accuracy by employing the penalty and OWA compared with FOWA. This experiment divulges that PFOWA efficiency is 1.03% less compared with FWOWA on real world datasets.