Semantic data mapping technology to solve semantic data problem on heterogeneity aspect

a Faculty of Computing and Information Technology, King Abdul Aziz University, Rabigh, Saudi Arabia b Faculty of Computer Science and Information Technology, Mulawarman University, Indonesia c Faculty of Computing, University Teknologi Malaysia, Malaysia 1 ayunianta@kau.edu.sa *; obarukab@gmail.com; 3 norazah64@gmail.com; 4 ndengen@gmail.com; 5 haviluddin@gmail.com; 6 shahizan@utm.my; * corresponding author


I. Introduction
The rapid development of applications and systems is used to facilitate digital and online activities [1][2][3][4][5].Every application is develop for specific purposes based on function and feature that included on that applications [6].Furthermore, the diversity of applications developed with different programming language, applications/data architectures, database system and representation of data/information leads to heterogeneity problem [5,[7][8][9][10][11][12].In many aspects of heterogeneity, heterogeneity of data representation in term of semantic aspect is become more challenge in current days [13,14].
The one of the best solution to solve heterogeneity data problem specific on semantic aspect is using semantic data mapping process [10,[19][20][21][22].The main proses inside semantic data mapping process is to generate the representation of data format from data sources and transform into XML data format using semantic perspective [23][24][25].This process is also an important process in the implementation of data integration technology [26].The semantic data mapping process is the standardization and mapping process to produce uniformity between data with various data representation, heterogeneity format data and different semantic aspect between applications in the different data sources [27][28][29].In the current days, there are a lot of technologies and tools in term of semantic data mapping process, and this research is to compare and analyze the existing semantic data mapping technology and tools based on several criteria's [23,24,[30][31][32][33][34][35][36][37][38][39][40][41][42][43][44][45][46][47][48][49].
In this paper, there are several parts to complete this research.The first part is to compare and analyze existing semantic data mapping tools and finally come out with the conclusion of the suitable semantic data mapping tools that used in this research.The second part is the detail explanation and overview of the semantic data mapping tool that used in this research.Finally, in the third part is the implementation of semantic data mapping tool with specific application as a case study of this research.

II. Method
The semantic data mapping technology is related to the technique and implementation of the tools.The aim of this technology is the standardization and mapping process to generate uniformity between data with various data representation, heterogeneity data format and different semantic aspects between applications on the different data sources.There are a lot of existing techniques and tools for conducting semantic data mapping processes in recent years [23,24,[30][31][32][33][34][35][36][37][38][39][40][41][42][43][44][45][46][47][48][49].However, in this section there are two main steps conducted for the research.The first step is to compare existing semantic data mapping tools and technologies.The second step is to analyze the suitable semantic data mapping tools that implemented in the next section.

A. Comparison of Semantic Data Mapping Technology 1) Virtuoso RDF Views
Virtuoso's RDF View [44] is the ODBC/JDBC extraction data that provide result as a DAV repository, SOAP and WS* protocol endpoints.Virtuoso also provide SPARQL inside the SQL query, this can be implemented to the oracle RDF_MATCH function.This tool is come out with the free open source version and commercial version.In the license version is able to combine a hybrid database engine with the RDF triple storage.This tool has graphical user interface to declare the mapping process.However this feature is only for the virtuoso database and cannot used in others relational DBMS.

2) R2O
R2O aims to extract the information inside database to be RDF format files or ontology language using OWL format.The extraction process is to produce a standard format from different database representation format to be used in the ontology language.There are three main layers on the R2O mapping architecture they are implementation, formalism and modeling layer.The first layer is the implementation layer that related with the SQL, database system and ontology implementation.The second layer is the formalism layer; the purpose of this layer is to handle relational model and formal ontology model.The third layer is the modeling layer that related with the entity-relationship model and conceptual ontology model [30,36].

3) ODEMapster
The ODEMapster is able to map all instances inside relational database to produce semantic we instances based on all description inside R2O document.There two main execution models inside the ODEMapster, the first model is a query-driven upgrade (on-the-fly query translation) and the second model is the massive upgrade batch process to generate all semantic web individuals from the data sources [36,50].The big difference from this two models is on the semantic repository.The first model, clients can be directly accessible to the data source using query processor, and in the second model, clients access the data source through repository generation to generate from the semantic repository.

4) Dartgrid
Dartgrid is the semantic query system to support the building process on the large-scale ontology-based database virtual organization (DB-VO) using grid as the platform.There are three main technical characteristics in the Dartgrid as a referential implementation in the OntoVO model [32,33,35].The first characteristic is the development process on Globus 3.0 to construct VO in grid computing research area.The second characteristic is about RDF, the standard data model to defining protocols in the semantic web.The third characteristic is the ontologies itself, Dartgrid used ontologies to comply with the semantic and syntax of OWL, the standard ontology description language that produce by W3C.Dartgrid help data provider to conduct semantic data mapping process from relational schema in the data source into the shared ontology.

5) RDB2Onto
The main problem that tried to solve RDB2Onto is about conversion of the RDB data to the ontology data when they want to create web content based on the semantic web technologies such as OWL files.RDB2Onto is the simplification solution to extract data/information from relational database and produce RDF/OWL XML template using SQL syntax query [38].RDB2 Onto get two inputs from relational database and RDF/OWL SQL query, then from these inputs will process into three steps using SQL query execution, template filled in with data from RDB and storage of RDF/OWL data.Finally, the output of this process is the ontology data.

6) DB2OWL
The DB2OWL is to generate data from relational database into OWL-DL ontology language.There are two main steps in the semantic data mapping process inside DB2OWL [37].The first step is to read and extract all database schemas inside data sources; all schemas are included table name, column structures and all constraints inside database.The second step is to convert directly to the ontology language, contents inside the ontology language includes class name, data property, object property and semantic relationship in the ontology language.

7) Asio SBRD
The core component inside the SBRD-Asio is the automapper components that extract data from relational database [41].The component automapper use another input from ontology mapping (OWL) to produce two results, they are data source ontology (OWL) and mapping instance data and rules.SBRD itself is the semantic bridge for relational that used to communicate with the relational database and mapping instance data and rules.To get specific result this tool using SPARQL query that located inside semantic query decomposition (SQD) component.

8) Triplify
Triplify is a semantic data mapping tool to extract linked data from relational database based on mapping HTTP-URI requests onto relational database queries [42].Triplify is working between data source and web server.The main purpose of the Triplify is to retrieve the valuable information from data source and convert the query results into the RDF, linked data, and JSON format.Triplify can be implemented into the all relational database and PHP as a programming language in the web applications.

9) METAmorphosis
METAmorphosis comes with two main layers, the first layer is mapping and the second layer is template [31].There are two kinds of problems that can handle by METAmorphosis related with the data inside the data sources.The first is the data that stored with the RDF triples, which can query the data directly using RDF way.The second is the basic database implementation with the classic relational schema and there is some mapping to the RDF format.

10) Iconomy
Iconomy extract the data from relational database and synchronize with the ontology schema to create new ontology language [40,48].The motto if the iconomy is a simple and powerful tool to extract relational data into the semantic entities with user-friendly interface.There are several abilities inside the iconomy such as check consistency, decrypt scrambled, load any ontology, create simple SPARQL queries and configure built-in reasoned.Iconomy provides advanced options to create and synchronize the ontology to and from any relational database.

11) D2RQ Platform
The D2RQ is the one of the favorite semantic data mapping tool because D2RQ is a free semantic data mapping tool.D2RQ has several abilities such as extract and integrate data from more than one data sources, support Jena and RDF dump, can provide semantic data mapping files in turtle format, and working in the HTTP protocol using D2R server [23][24][25].D2RQ also provide automatic and manual mapping so in that way users are able to customize the semantic data mapping files to adjust with the other semantic data mapping files.

12) Ultrawrap
Ultrawrap created since 2009 with the specific purpose to synthesize the ontology language from SQL schema inside database system and provides SPARQL queries [46].Ultrawrap extract data from data sources that has triple-view or SQL schema format.On 2013 Ultrawrap enhanced and evaluated using two existing benchmark suites [51].There are four main components in the Ultrawrap semantic data mapping tool.The first component is the translation of SQL schema, the second component is the creation of an intentional triple table, the third component is the translation of SPARQL queries, and the fourth component is the native SQL query optimizer.

13) Owlifier
The purpose of the Owlifier is to create ontology knowledge from spreadsheets data such as Microsoft Excel, apple numbers, and open office spreadsheet [52].There are four important components contained inside Owlifier to convert spreadsheet data into the ontology knowledge.The first component is the text file of ontology definitions (blocks) that will get the spreadsheet data.The second component is the OQL ontology provider to convert the spreadsheet data into ontology language.The third component is the ontology reasoned as a facilitation to measure the ontology knowledge.The fourth component is the OWL import that has function to generate and import others ontology language to integrate with the spreadsheet data.

14) RDOTE
RDOTE is the semantic data mapping tool to convert data from multiple relational database into different ontology knowledge and integrate them into single ontology knowledge.There two main purposes of the RDOTE, the first are the ability of RDOTE to quick instantiate ontology knowledge with the real data this process can east implementation with large ontology dataset.The second is ability to transform datasets currently residing in the relational database into semantic web data through a graphic user interface [53].On 2013 RDOTE become more complex and complete with adding several ability and process such as ontology reader, RDB reader, mapping process and ontology writer [49].

B. Analyze The Suitable Semantic Data Mapping Tool
From comparison activity, the next step is to analyze the suitable semantic data mapping tool to be implemented in this research.The analyzing of the existing semantic data mapping tools is to evaluate and determine the best semantic data mapping tools that used in this research.In this part, researchers analyze fourteen semantic data mapping tools based on five criteria's.The first criteria is based on data sources, the second criteria is in term of ease of use, the third criteria in term of mapping process, the fourth criteria is about paid or free tools, and the fifth criteria is in term of multisource support.Detail analysis and comparison of fourteen existing semantic data mapping tools with five criteria's can be seen on Table 1.From comprehensive comparison and analysis of fourteen existing semantic data mapping tools, this research concluded that D2RQ is the suitable semantic data mapping tools based on several considerations.The first is because of D2RQ support data mapping from all SQL database.The second reason is in term of easy to use with simple steps processes.The third is because of this tool support automatic and manual semantic data mapping process, so semantic data mapper able to edit, customize and adjust to integrate with others files and sources.The fourth reason is because of this tool is free license, no need to pay.The last reason is because of this tool support multi-sources.

III. Results and Discussion
The power of D2RQ tool can produce four different results.The first result, D2RQ is able to provide web access using HTTP protocol this is because of D2RQ tool has D2R server.The second result is triple store using RDF dump library [23][24][25].The third one is the ability of D2RQ to communicate with local java application using Jena/sesame library.The last one of the ability D2RQ is to provide D2RQ mapping files that can collaborate and used in ontology language.The detail about D2RQ architecture can be seen on Fig. 1.Fig. 1.D2RQ Architecture [16] This research aims to produce D2RQ semantic data mapping file that will used to integrate with others D2RQ mapping files and also can be used to integrate with the ontology language.There are two simple steps to conduct semantic data mapping process using D2RQ, the first step is to prepare data source that will map in this research and the second step is using D2RQ to produce the semantic data mapping file.The semantic data mapping file is a RDF document format written in turtle syntax.The mapping process is using D2RQ namespace that is the domain of the mapping to guarantee the uniqueness of identifier.The namespace is written like uniform resource locator (URL) such as "http://www.semanticmapping.edu/exercise/qbs".In this section will be discussed the implementation of semantic data mapping tool in the Question Bank System (QBS).
A. Prepare the Question Bank System (QBS) Data Question bank system contains information about learning outcomes.Question bank system is using Oracle as a database system to implement the data design schemas.Learning outcomes information is a part of course outline in subject course.In course outline, there are some information about subject course guideline.Some information in course outline there are course learning outcomes, programme learning outcome, assessment task, student learning time, weekly schedule, and grading method.Information about weekly schedule stored in the

B. Semantic Data Mapping Result Using D2RQ
From four different results produced by D2RQ, this research is focuses on the semantic data mapping language to adjust and integrate with others semantic data mapping files and after that will used in the ontology language to conduct semantic data integration process.The result of the data mapping file in on turtle file format (.ttl) that contains all information about data sources also schema of the data inside the data source.The D2RQ Mapping Language is a declarative language for describing the relationship between a relational database scheme and RDFS vocabularies or OWL ontologies.A D2RQ mapping is itself an RDF document written in Turtle syntax.The mapping is expressed using terms in the D2RQ namespace.Namespace is a domain that serves to guarantee the uniqueness of identifiers, written like uniform resource locator (URL).The namespace that used in this research is "http://www.semanticmapping.edu/exercise/qbs#".This namespace also used in others data mapping file and in the ontology language as a unique name from this process.The terms in this namespace are formally defined in the D2RQ RDF schema (Turtle version, RDF/XML version).Furthermore, the example of data mapping language using D2RQ is represented as a turtle file format names qbs.ttl.The snippet contents of the qbs.ttl can be seen on Fig. 3. @prefix map: <#> .@prefix db: <> .@prefix vocab: <vocab/> .@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .@prefix d2rq: <http://www.wiwiss.fu-berlin.de/suhl/bizer/D2RQ/0.1#> .@prefix jdbc: <http://d2rq.org/terms/jdbc/> .@prefix learning: <http://www.semanticmapping.edu/exercise/qbs#>. map:database a d2rq:Database; d2rq:jdbcDriver "com.mysql.jdbc.Driver"; d2rq:jdbcDSN "jdbc:oracle:thin:@localhost:1521:qbs"; d2rq:username "qbs"; d2rq:password ""; jdbc:autoReconnect "true"; jdbc:zeroDateTimeBehavior "convertToNull"; .

Fig. 3. Snippet Contents of qbs.ttl
There is standard format to written or produce RDF document file the main thing is about document prefix that contains details information about the RDF file.There are two important prefix that need identify on every D2RQ mapping file.The first one is d2rq prefix that contain specific address of the D2RQ file.The second one is about name mapping file complete with the specific unique address of the mapping file.This is very important to be described in the mapping file because this information will also be used in the other mapping files and in the ontology language.In the next part of the mapping file is about detail of all data schema from database system.The detail schema contains information about database system name, database name, database driver and DSN, database username, database password and any others specification.The body of the data mapping file contains all information about table name and rows name.The detail about table name and field/column name will customize and adjust with others data mapping file (manual mapping process) to create semantic data mapping between some data schema.In this process every column name which has semantic aspect problem will be adjusted to another suitable column name in the different table, database, database system, and place system [3].

IV. Conclusions
The problem of the semantic aspect in the heterogeneity phenomenon is the current big issue on the data.In two different applications, it may occur the same data with different data representation inside the data source.This situation needs to be solving with the semantic data mapping process using specific tool or technology.In this research has been successful to compare and analyze fourteen semantic data mapping technologies using five criteria's.The result of the analysis is the recommended semantic data mapping technology which is superior to other technologies.Furthermore, the recommended semantic data mapping technology is implemented using real data in certain application.The result of this implementation is the semantic data mapping file that can be used to map, share and integrate with other semantic data mapping file or can also integrate with the ontology language.In the future research will continue this work to map and integrate with other semantic data mapping file to be used in the ontology language.

Table 1 .
Analysis and Comparison of Existing Semantic Data Mapping Tools

table weeklySchedule ,
table weeks and table wsWeeks.Information about learning outcomes, programme learning outcomes and assessment methods stored in table learningOutcomes, LOAssessmentMethod and assessmentMethod.Information about student learning time stored in table groupLearningTime and learningTime.All these tables have relationships with table tbUser, table tbDept, table fac, table tbLevel, and table subjectCourse.The detail schema of the Question Bank System (QBS) can be seen on Fig. 2.