Reducing the Impurity of Object-Oriented DatabaseThrough Gini Index

In the current scenario, the size of database is increasing due to audio and video files. In the database, irregularities occur due to duplication of data at many places, therefore, it needs reconstruction of database size. The present work deals with reducing of impurity through a well-known Gini index technique. Since many of software’s are using the object-oriented databases, therefore, an object-oriented database is considered, A real object-oriented database for Electricity Bill Deposit System is considered. A sample size of 15 records is considered, however the present technique can be applied for large size or even for the complex database. A decision tree is constructed and sample queries are performed for verifying the result and Gini index is computed for minimizing the impurity in the presented object-oriented database .


INTRODUCTION
Classification rules are defined as a predefined data in the forms of groups and classes. A real case study of Customer Deposit Electrical Bill System (CDEBS) using Gini Index and gain ratio is considered. Classification rule is useful in data mining which arranges the data in a group wise fashion. UML shows the graphical representation of any database problem. Classification rules have been successfully used to propose the new system which pertains to the electric bills of the customer. The results establish the relationship between the income factor of the customer and the payments for bill received against the same. It is observed that performance factor of object-oriented database is higher in comparison to the relational database. If customer's income is low and billed amount is very high then probability of the customer is not depositing and the bill will be higher side and vice versa. In the present work another method of Decision Tree construction technique is also used and it forms a tree-like structure where each branch represents the nodes involved in a decision process.
Watanabe [1] has described object-oriented query language which is to be complex in comparison with relational query languages. The author showed that object-oriented database deals with complex objects and object-specific methods and addressed a formal model of object-oriented databases to attach it to a query language on the basis of the formal model. Karlapalem and Vieweg [2] described the object-oriented database systems which are becoming popular and are being used in a large number of application domain. Khoshgoftaar [3] has described decision trees to be attractive for a software quality classification problem which predict the quality of program module in terms of risk-based classes. Yin et al [4] have postulated that Multi relational classification is the procedure of building a classifier based on information stored in multiple relations and making predictions with it. Existing approaches of Inductive Logic Programming (ILP) has proven effectiveness his with high accuracy in multi relational classification. Alsaadi [5] described the class diagram to be the most important diagrammatic representation of object-oriented software systems and includes both the static and behavioral aspects. This can serve as a pattern for a persistent collection of objects, or as a scheme for a database system, and as a set of communication diagrams at the same time.
Ali et al. [6] have described that the Unified Modelling Language (UML) is which the most widely known and used notation for object-oriented analysis and design. UML consists of various graphical notations, which capture the static system structures, system component behaviour and system component interact-ions. UML notations can be produced with the help of CASE (Computer-aided software engineering) tools such as Rational Rose. Kwak and Moon [7] has described published Query Graph (QG) as an easy-to-use visual query language which facilitates formulating a query. Unlike relational databases, object-oriented databases (OODBs), the basic entity of QG, i.e. a class, may consist of several entities to which the operations of a query actually apply, which causes the increase of query complexity and lack of expressiveness. We propose a visual query language Object Query Diagram (OQD) for OODBs, where a class is specialized as a number of object sets which are the primitive entities of designation. Park et al. [8] have proposed a new complete GINI-Index text feature selection algorithm for text classification. This new algorithm obtains an unbiased feature values and from the feature subsets. This algorithm eliminates many irrelevant and redundant features and also retains many representative features. They also compared the new algorithm with the original versions of algorithm and demonstrated the classification performance. Zhongyang et al. [9] have proposed that decision tree based classification method is better than the other traditional statistical classification methods as it can deal with noise and lost information without depending on normal school data but does not need the requirement of normal distribution. It has been proved that the decision tree-based classification method has obvious advantages, such as exact classification, efficient, definite classification criterion, intuitive classification structure, controllable classification precision automated classification etc. Tsang et al. [10] described traditional decision tree classifiers which work with data values which are known and precise. They extend such classifiers to handle data with uncertain information. Value uncertainty arises in many applications during the data collection process. Example sources of uncertainty include measurement/quantization errors, data staleness, and multiple repeated measurements. With uncertainty, the value of a data item is often represented not by one single value, but by multiple values forming a probability distribution. Rather than abstracting uncertain data by statistical derivatives. They discover that the accuracy of a decision tree classifier and can be much improved if the "complete information" of a data item. One Versus All (OVA) decision trees learn k individual binary classifiers, each one to distinguish the instances of a single class from the instances of all Other classes. Thus OVA are different from existing data stream classification schemes whose majority uses multiclass classifiers, each one to discriminate among all the classes. Basheer et al. [11] have inferred that data mining has a goal to discover knowledge from huge volume of data. Rule mining is a beneficial and one of the most usable mining methods in order to obtain valuable knowledge from stored data on database.

Classification of Object-Oriented Database
The main objective of the object-oriented database system is to provide encapsulation, abstraction, and polymorphism data hiding concepts to implement the real world environment in data storage structure. The classes are formed and they are accessed using the created objects of the concerned classes. The classes can be reorganized without affecting its usage in any application. There are certain classification rules that describe the predetermined set of data and classes. In the present work the concerned data is stored in table format in the form of field Id, Name, Unit, Amount, and Decision. Classification rules are applied to table elecbill.t1 according to the prevalent or standard conditions. Classification process seeks to divide the compiled data into two parts: training data and test data. Utility of training data lies in the analysis of classification algorithm. There exists class labels that consist of various attributes such as Name, Unit, Amount, Decision and these represent the form of classification rules. On the other hand, the test data is used to predict the accuracy of classification rules. When accuracy rules are acceptable then these rules can also be applied to the classification of new S e p t e m b e r 3 0 , 2 0 1 4 data which have been added in the database. A bill depicting high usage may indicate a risky decision in bill submission and vice versa. A table is created in object-oriented database SQL server 2008. SQL server 2008 supports objectorieneted database property feature. The object-oriented database query performed by following step. First user creates the database name then creates the The output is recorded in the following table 1.

UML Class Diagram
UML class diagram shows graphical representation of any database problem. It is very useful technology in the software field and a standard visual modeling language and is divided into three parts. First part shows the class name, second part shows the attributes name and third parts shows the operations. Fig 1 shows the one to one relationship of UML class diagram which consists is linked to the Electical_office. Fig 2 shows the two many associations for the customer who deposits the electricity bill in electrical office. After defining these classes let us compute the Gini Index for reducing the impurity in the object-oriented database.

Impurity of Object-Oriented Database
Large amount of data can be stored in the object-oriented form which may contain a lot of useless data. Impurity removes the useless data from the stored object-oriented database. Impurity is a measure of how often a randomly chosen element from the set would be incorrectly labeled if it were randomly labeled according to the distribution of labels in the subset. Table 1 stores the data field as ID, Income, Age, Deposit_bill, Deposit _decision and Deposit _ways. Some fields are not required in customer deposit in the electrical bill of customer and these fields are Age and Income. Necessary data is stored as object-oriented database and impurity removes the useless data from the database. When we apply the impurity then we face the problem of data redundancy. Redundancy removes the duplicate data from the object-oriented database. Electrical office provides the Cust_Id which is a primary key in object-oriented database table. The data partition is defined by following formula

Decision Tree
A decision tree is like a flow chart but shows a tree-like structure. Decision tree represents a graphical symbol. Decision tree comprises of root nodes, internal node and leaf nodes. Internal node is also called non leaf node and each leaf node is called terminal node. Top position node in tree is root node. Internal node is denoted by rectangle and leaf node is denoted as oval. Principal advantage of decision tree is that it can handle multidimensional data. A decision tree for the said data is designed in figure 3. This diagram indicates customer is a root node whereas unit and income are internal nodes. Low and high are non-leaf nodes. This diagram shows customer submits the electrical bill depending on the unit and his income. Low income coupled with high unit is a typical condition wherein the customer does not deposit the bill. .

Conclusions
From the above work, it is concluded that we can use UML for the representation of object-oriented database through UML class diagram. A powerful method for computation of entropy and Gini index is used for minimizing the impurity in the object-oriented database. When the size of database grows, then obviously impurity will grow and by using above technique, one can reduce the impurity in the database. The graphical representation of the database in the form of object-oriented is also presented through decision tree. The above work can also be implemented by the use of any higher level object-oriented programming language as UML is not dependent on the programming language.