Ontology Mapping using Fuzzy Decision Tree and Formal Concept Analysis

An ontology describes and defines the terms used to describe and represent an area of knowledge. Different people or organizations come up with their own ontology; having their own view of the domain. So, for systems to interoperate, it becomes necessary to map these heterogeneous ontologies.This paper discusses the state of the art methods and outlines a new approach with improved precision and recall. Also the system finds other than 1:1 relationships.


INTRODUCTION
The current web WWW has billions of pages, most of which are in human readable format only. As a consequence, software agents cannot understand and process this information and much of the potential of the web has so farremained untapped. Some problems of this web are non -availability of collective information, search based on keyword, irrelevant and excessive information, Semi-structuredinformation representation etc. In response, researchers have created the vision of the Semantic Web [Berners-Lee et.al.2001], where data has structure and semantics. Ontologies describe the semantics of the data. The term ontology is borrowed from philosophy, where it refers to a systematic account of what can exist or "be" in the world. In the fields of artificial intelligence and knowledge representation, the term refers to the construction of knowledge models that specify a set of concepts, their attributes and the relationships between them. Ontology allows explicitly specifying a domain of knowledge, which permits to access and reason about agent knowledge, incorporating semantics into data, and promoting its exchange in an explicit and understandable form. Collectively defined as "formal explicit specification of a shared conceptualization". To share information and knowledge of heterogeneous systems, one of the key issue is, the mapping of their ontologies for interoperability. Given two ontology O1 and O2, mapping one ontology onto another implies that for each entity (concept C, relation R, or instance I) in ontology O1, we try to find a corresponding entity, which has the same intended meaning, in another ontology O2. There have been several mapping approachesdeveloped so far. But many of them do find only 1:1 mapping and their precision and recall is not up to the mark for real time data. In this paper, we outline a mapping method which has improved precisionand recall and finds other type of mapping relation in addition to 1:1.

RELATED WORK
There have been several approaches for ontology mapping. Few of them are discussed in this section.

YAM++(YetAnotherMatcher)
It is an automatic flexible self-configuring ontology matching system for discovery of semantic correspondences between entities. The input ontologies are loaded and parsed by ontology parser. Entity information in ontology are indexed by two indexing systems namely annotation indexing and structure indexing. Then candidates having maximum similarity are filtered to reduce the search space. Next terminological matcher and instance matchers come up with mapping. The results of these mappings are aggregated and given to structural mapping system to find further mapping. Finally all these results are combined and selected through a component called combiner and selector. The final result is subjected to semantic verification to refine the found mappings.

MapSSS
This is an OWL ontology alignment algorithm designed to explore what can be accomplished using simple similarity measures. Input ontologies are treated as a directed graph with nodes corresponding to concepts, properties and individuals and an edge corresponding to relationship. The algorithm consists of syntactic, structural, and semantic metrics. These matrices are applied one after another and a positive result from one of them is treated as a match.

3.1Similarity measures used
First we made a study of factors influencing the mapping of entities. This study revealed the following facts. If labels are same they are likely to be the same.If properties are equal then Figure 1. Components of proposed system Concepts are likely to be equal.If domain and range for properties are equal then properties are likely to be equal.If super concepts of c1 and c2 are same then c1 and c2 are likely to be same If sub concepts are same then their super concepts are likely to be same.If concepts have similar siblings they are likely to be similar. If super properties are same so are sub properties.If sub properties are same, so are super properties.If instances are same concepts are likely to be same Instances that have same mother concept are same. If concepts have a similar low/high fraction of the instances then they are likely to be same.If two instances are linked to another instance via the same property, the 2 original instances are same.If two properties connect the same two instances the properties can be similar.If OWL file itself declares equality then such entities are equal. In order to cover these facts we used the following similarity measures. [Shvaiko et.al.2007] [Resnik 1999 1) String equality checking [1,0] If two labels are equal their similarity measure is taken as 1 otherwise 0.
Here s and t are entities to be matched.
3) Levenshteins Distance is the minimum number of insertions,deletions and substitutions of characters required to transform one string into other.

4)
Substring similarity J u n e 2 0 , 2 0 1 3 Here t is the largest common substring of x,y.

5)Cosine similarity
Based upon the comments per entities we define vector for each entity and use the above formula to find their dissimilarity. 6) Path comparison Enumerates all paths from root concept to the entities to be matched and then finds their similarity according to these paths. Given two hierarchy of strings < > =1 and < > =1 their path distance is defined as follows Such that <>, < > =1 = < > =1 , <> = ′ is the semantic similarity measure based on wordnet.

Fuzzy decision tree construction
The measures listed above are grouped into five distinct groups. An aggregate similarity measure per group is computed as weighted sum of similarity measures in the group. Calculation of similarity measures result in a table with columns equal to each aggregate similarity measure and rows correspond to entity pair. Now In order to construct decision tree we use a table with manually mapped entity pairs and their aggregate similarity measures of ontologies in the domain as that of the ontologies to be mapped. The decision tree is called fuzzy because we use fuzzy membership function for numerical attributes which are similarity measures. The fuzzy membership function used is triangular and is explained below. Suppose we have a table of numerical attribute values as shown in table I, after applying fuzzy membership  function explained in figure 1, the table becomes as in table II   If U={u1,u2………us} is the set of data samples where C={c1,c2,….cn} is the set of n similarity measures(condition attributes) and D={d} is class label attribute. Suppose this class label attribute has m different values di for(i=1 to m), let Si be the number of samples of class di in U. Now the expected information or entropy needed to classify a given sample is given by I(S1,S2,S3…Sm)= -log 2

=1
Where pi is the probability that an arbitrary sample belongs to class Si and is estimated by summation of those samples entropy(m is number of all such samples acts as weight of the jth subset and is the number of samples in the subset divided by the total number of samples. The smaller the entropy value ; the greater the purity of the subset partitions. Thus the attribute that leads to the largest information gain is selected as branching attribute. For a given subset Sj the information gain is expressed as . (|Sj| number of samples in the subset Sj) and is the probability that a sample in Sj belongs to di. So information gain of attribute Ci is given by Example:  For Sim1,U can be partitioned into 3 subsets S1,S2 and S3 since it has 3 distinct value. v=3.
For Sim1=high S11=number of samples with Sim1=high and in class d1=3. S21=number of samples with Sim1=high and in class d2=0.

Formal concept analysis for finding other than 1:1 relationships
In order to find other than 1:1 relationship we use the method of formal concept analysis. The requirement for this is that the input ontologies should contain instances. We form formal concept table based upon each common instance belonging to different concepts of two ontologies as follows. That is row of the formal context table will be matched instances and columns will be the concepts of ontology1 having this matched instance plus concepts of ontology2 having this matched instance. Example: Concept lattice as shown in Figure 3is built for above formal context defined in TableIV , by noting down following.
(  Figure 3. Concept lattice 'a' is common for all the instances, so it becomes the root concept. Next common pattern is be for 1, 2 instances. So they become next root. Among 3, 4 h is common so it becomes another root for 3, 4. The forest is constructed similarly. From the above tree it is clear that book is the root concept. Science and Essay are equivalent concepts and are sub concepts of book. And so on.

Result evaluation
The goal of the benchmark data set is to provide a stable and detailed picture of each algorithm. For that purpose, algorithms are run on systematically generated test cases. The systematic benchmark test set is built around seed ontology and many variations of it. Variations are artificially generated, and focus on the characterization of the behavior of the tools rather than having them compete on real-life problems.They are organized in three groups: Simple tests (1xx) such as comparing the reference ontology with itself; Systematic tests (2xx) obtained by discarding/modifying features from the reference ontology. Considered features are names of entities, comments, the specialization hierarchy, instances, properties and classes.
Real-life ontologies (3xx) found on the web.
The results of the approach on benchmark data sets of OAEI 2011 are as shown in table V. J u n e 2 0 , 2 0 1 3 It is also tested on benchmark data sets of 2012. The seed ontology concerns bibliographic references and is inspired freely from BibTeX. Itcontains 33 named classes, 24 object properties, 40 data properties, 56 named individualsand 20 anonymous individuals. Data set finance is about finance ontology, which contains 322 classes,247 object properties, 64 data properties and 1113 named individuals.Among the recent ontology mapping methods, results of MapSSS, YAM++ and AROMA are compared with the proposed method and the proposed method gave good result with respect to them. The table VI depicts the resulting precision and recall. Plot of the same is shown in Figure 4. Precision and recall have improved. This is due to the consideration of both instance and metadata measures present in ontology.

4.CONCLUSION
In this paper we have outlined a method to map ontologies using fuzzy decision tree approach. The method so proposed has given good precision and recall compared with leading systems in this area. Future work is to find precision and recall for other data sets of OM-2012 workshop.