Data mapping process to handle semantic data problem on student grading system

Many applications are developed on education domain. Information and data for each application are stored in distributed locations with different data representations on each database. This situation leads to heterogeneity at the level of integration data. Heterogeneity data may cause many problems. One major issue is about the semantic relationships data among applications on education domain, in which the learning data may have the same name but with a different meaning, or learning data that has a different name with same meaning. This paper discusses on semantic data mapping process to handle semantic relationships problem on education domain. There are two main parts in the semantic data mapping process. The first part is the semantic data mapping engine to produce data mapping language with turtle (.ttl) file format as a standard XML file schema, that can be used for Local Java Application using Jena Library and Triple Store. The Turtle file contains detail information about data schema of every application inside the database system. The second part is to provide D2R Server that can be accessed from outside environment using HTTP Protocol. This can be done using SPARQL Clients, Linked Data Clients (RDF Formats) and HTML Browser. To implement the semantic data process, this paper focuses on the student grading system in the learning environment of education domain. By following the proposed semantic data mapping process, the turtle file format is produced as a result of the first part of the process. Finally, this file is used to be combined and integrated with other turtle files in order to map and link with other data representation of other applications.


I. Introduction
In the currents days, there are many applications that support the learning process in the learning environment [1].Every application is provide specific learning information with particular purpose.In the context of learning environment, the source of information are come from various type of application systems such as teaching and learning online application called e-learning system, academic information management system, student management and payment system, Subject Courses evaluation system, student registration system, student grading system (SGS), library application system and many others, as called as learning applications which are heterogeneous in various aspects [2].Heterogeneity aspects on learning environment are become more widely and various [3,4].
There are a lot of heterogeneity aspects [3,[5][6][7], this research is focuses on semantic heterogeneity on learning environment.The one of important step to solve heterogeneity problem ARTICLE INFO A B S T R A C T specific on semantic aspect is using semantic data mapping process [6,[8][9][10][11].Semantic data mapping process is the one of important phase to integrate data and information between applications with different representations and sources [12].The purpose of semantic data mapping process is to standardize and map the data with various data representation, heterogeneity format data and different semantic aspect between applications in different data sources [13][14][15].Semantic aspect is about data that has same name with different meaning or data that has different name with same meaning.Every application on learning environment has specific purposes with different data to represent the information about learning process.However, that possible happen between two applications has same learning data.As an example between e-learning application and student grading system, in these two applications are store information about students data, but it has possible condition inside the database system they represent as different name of data table.In the elearning application, the data about student name represented as a learner name, however in the student grading system, the data about student name represented as a pupil name.
The process of semantic mapping data is related to the technique and implementation of the tools.We claim that the semantic data mapping can produce better data utilization.We also argue that semantic data mapping can handle heterogeneous data with different representation that have the same meaning data/information.There are many techniques and tools to conduct semantic data mapping process existing in the current days .However, the implementation of semantic data mapping process in this research is using D2RQ.There are several reason why researcher using D2RQ as a semantic data mapping tools because D2RQ is a free software, D2RQ supports automatic and manual mapping process, D2RQ support Any SQL-92 DB and D2RQ also support data integration process [23,[36][37][38].There are five structure elements inside the D2RQ; they are D2RQ engine, Jena/Sesame, RDF Dump, D2RQ mapping file and D2R server [36,39].
To integrate the learning data between learning applications inside learning environment, semantic data mapping is an important process to handle heterogeneity data problem to solve inconsistency issues in the representation of learning data [39].In the data integration process in the learning environment, there are many applications involved such as teaching and learning online application called e-learning system, academic information management system, student management and payment system, Subject Courses evaluation system, student registration system, library application system and many others.However, this research is focuses on semantic data mapping process in the student grading system.Student grading system is the application system on learning environment to handle all operation about student grade.Student grading system is using PostgreSQL as a database system to implement the data design schemas.In this system there are a lot of tables to save all information associated with student grade.Nevertheless, only certain tables are selected and used in this research.In this case, only five tables are selected and used in this research.
In this paper, we produced semantic data mapping on Student Grading System (SGS) with several parts process.The first process is building semantic data mapping architecture and drawing the SGS data source structure and relationship.The second process is creating data mapping language (turtle file format), creating D2R server to communicate and integrating with the other systems from outside environment using HTTP protocol and implementing D2RQ engine and Jena Library to communicate with the local application using java programming language.

A. Existing semantic data mapping Methods
Semantic data mapping process is a standardization process from various and heterogeneity format in the data sources.Semantic data mapping process is focuses on semantic aspects problem in the data representation.Semantic aspects problem is about data that has same name with different meaning or data that has different name with same meaning [12].There are a lot of existing semantic data mapping techniques, some of them are free tool and some of them are under license software.This section is to evaluate the existing semantic data mapping tools to determine the best semantic data mapping tools that used in this research.There are five existing data mapping tools try to analyze in this section.Analysis and comparison of existing semantic data mapping tools is based on six criteria as a features owned by each semantic data mapping tools.First two criteria are automatic and manual mapping of every tool.In the third criteria is about representation of language that produces from every semantic data mapping tool.In the fourth criteria is whether the semantic data mapping is support for the data integration or not.In the fifth criterion is about which database system that supported by semantic data mapping tool, and the final criteria is whether semantic data mapping tool is a free software or not.Detail analysis and comparison of existing semantic data mapping tools can be seen on Table 1.The D2RQ is the one of the famous semantic data mapping tools because this tool is free.D2RQ is the mapping mechanism to rewrite Jena/Sesame API using declarative language.The rewrite process in the Jena/Sesame API calls is to queries using SQL language and passes the query results as RDF triples up to the higher layers of the Jena or Sesame frameworks [38].The data representation and relational database inside the database system will access by the D2RQ as a virtual RDF.Furthermore, consequently SPARQL queries can be formulated and produce files in RDF format.The one of the purpose of D2RQ is to offers a declarative language for mapping in the Relational Databases to the ontologies language.However, the D2RQ is lacks of graphical user interfaces.Therefore, to use this tool need to learn and exploiting more to understand all features of the D2RQ.
The R2O is a declarative language and extensible mapping tool to describe mappings between relational DB schemas and ontologies implemented in RDF(S) or OWL.R2O is a RDBMS independent high level language that works with any implementation of database system that using SQL as a standard language to access the database system [16].There are many features in the R2O, such as create instances in the ontology, generate express mappings process, can be used for selfverification, and can be used to verify the integrity of parts of a database.
Dartgrid is a license semantic data mapping tool.The main purpose of Dartgrid is to propose a semantic-based approach to support the global sharing of database resources using grid as a platform.Two main phases in Dartdrid are representing the database resources as a grid services and organized it as an ontology-based virtual organization [18,19,21].Furthermore, a set of semantic tools are developed to raise the level of interaction with the system to a semantic level.Dartgrid also provide visual mapping feature to help data provider to define semantic mapping from relational schema to shared ontology language.Another feature is a visual query tool that can automatically generate a query based on ontology definitions.Visual query tool also provided to help specify complex semantic queries.Various other tools that provide semantic access to relational data have been developed, such as the RDB2Onto system, the METAmorphosis framework, the DB2OWL system, the SBRD tool, the Triplify system, the Ultrawrap system, and the SPASQL language.
The RDB2Onto is a free semantic mapping tool to convert certain data from relational database into RDF/OWL ontology language based on defined template [25].However, RDB2Onto only able to do automatic data mapping, for someone need manual data mapping, RDB2Onto is not a better choice.RDB2Onto map the relational database using SQL Query and converted into RDF/OWL XML template, furthermore the OWL data sent to the ontology language.In the implementation part RDB2Onto using java programming language with the specific library using Jena3 or Sesame4 to working and manipulate the ontology language.In the database implementation part, RDB2Onto able to working with any others relational databases such as MySQL, Oracle, SQL Server and others databases system using JDBC connector.Inside the RDB2Onto tool there are three main steps to conduct semantic data mapping process, they are SQL query execution, template filled in with data from RDB and the final step is the storage of RDF/OWL data.
The research about DB2OWL is introduced by Cullot, et al in 2007.In the research paper article, DB2OWL also called as a tool for automatic database-to-ontology mapping.Therefore, the main purpose of the DB2OWL is to automatically generate the database schemas into ontology language [24].The mapping process starts by detecting some particular cases for tables in the database schema.According to these cases, each database component (table, column, constraint) is then converted to a corresponding ontology component (class, property, relation).The set of correspondences between database components and ontology components is conserved as the mapping result to be used later.DB2OWL only support for automatic data mapping process and not used for manual data mapping process.DB2OWL consider particular table cases and take them into account while the mapping process.The lack of this tool is because of DB2OWL only working with database system Oracle and MySQL databases because they provide specific views about the database metadata.Extension of the presented tool is underway to deal with other DBMS that provide such views.In addition, DB2OWL will be developed further to map several databases into one ontology language, and to map databases from other models such as object oriented and relational database model.

B. D2RQ Platform
From comprehensive literature review and analysis about existing semantic data mapping tools in the previous part, this research is focuses using D2RQ tool.There are several reasons why we use D2RQ in this research.The main reason is because of D2RQ is a free semantic data mapping tool.The second two reasons is because of D2RQ can support automatic and manual data mapping.In the third reason is because of D2RQ can be used for any SQL relational database management systems and D2RQ also can be used in the implementation of data integration.Inside the D2RQ architecture, there are four main parts of D2RQ than can collaborate and communicate with the external applications and systems.The first part is D2RQ engine, this part will handle the communication with any SQL relational database management systems to get structure of representation data in every database system and convert to the several results.The second parts are Jena/Sesame and RDF Dump, Jena/Sesame will used for local application using Java programming language and RDF Dump will used for implementation in the triple store.The third part is D2RQ mapping file, the file format of this part is using turtle file format (.ttl).This file can edited/improved to adjust and collaborate with the ontology language for the implementation of data integration purpose.The fourth part is to produce D2R server, this part is using for internet/network communication purpose.D2R server will communicate through HTTP protocol using SPARQL, RDF and HTML format.The detail about D2RQ architecture can be seen on Fig. 1.The D2RQ Mapping Language is a declarative language for describing the relation between a relational database scheme and RDFS vocabularies or OWL ontologies.A D2RQ mapping is itself an RDF document written in Turtle syntax.The mapping is expressed using terms in the D2RQ namespace.Namespace is a domain that serves to guarantee the uniqueness of identifiers.It is written like uniform resource locator (URL).For example, http://www.wiwiss.fuberlin.de/suhl/bizer/D2RQ/0.1#.The terms in this namespace are formally defined in the D2RQ RDF scheme (Turtle version, RDF/XML version).

III. Implementation of D2RQ in the Sudent Grading System (SGS)
A. Prepare the Student Grading System Data Student Grading System (SGS) is the learning application that contains data about students learning result and the implementation of SGS data storage is using PostgreSQL database system..In the learning environment, SGS are integrated with many applications such as teaching and learning online application called e-learning system, academic information management system, student management and payment system, Subject Courses evaluation system, student registration system, library application system and many others.To integrate the SGS data with many others learning applications in the learning environment, SGS need to do semantic data mapping process to handle the problem about semantic aspect.To implement the semantic data mapping process in this research is using D2RQ mapping tool.
Every application has a different of information representation, this is because of every learning application is developed by different application developers.This situation produces the heterogeneity problem on the data schema especially in the semantic aspect.Semantic aspect is about data that has same name with different meaning or data that has different name with same meaning.The first step in this part is to show the detail information about data representation in the SGS.There are a lot of data and tables inside the SGS database management system.However, the semantic mapping process in this research is focuses on certain data that relate with the implementation of data integration process.Detail information about data representation in the SGS can be seen on Table 2. From overall tables in Student Grading system, this research only uses five tables that integrated from this system.The first table is table faculty.This table has three fields there are f_id, facultyCd and facultyNm.F_id is a primary field of this table to save unique id of every record on this table.The main information on this table is the faculty code information that stored in facultyCd field and information about faculty name that stored in the facultyNm field.

B. Semantic Data Mapping Result Using D2RQ
In the semantic data mapping process using D2RQ, there are several results that can be obtained from this process.This research is focuses on the data mapping language result to adjust and collaborate with the ontology language to do the semantic data integration process.D2RQ data mapping file is text mode file with turtle file format (.ttl) that contain data mapping from local data source based on ontology based language.The D2RQ Mapping Language is a declarative language for describing the relation between a relational database scheme and RDFS vocabularies or OWL ontologies.A D2RQ mapping is itself an RDF document written in Turtle syntax.The mapping is expressed using terms in the D2RQ namespace.Namespace is a domain that serves to guarantee the uniqueness of identifiers, written like uniform resource locator (URL).The example of the URL is like this "http://www.utm.my/exercise/sgs#".The terms in this namespace are formally defined in the D2RQ RDF schema (Turtle version, RDF/XML version).Furthermore, the example of data mapping language using D2RQ is represented as a turtle file format names sgs.ttl.The snippet contents of the sgs.ttl can be seen on Fig. 2.

Table 1 .
Analysis and Comparison of Existing Semantic Data Mapping Tools

Table 2 .
Information from Student Grading System The second table is alecturer_course table.In this table contains four fields there are lc_id, user_id, subject_course_id and semester.Lc_id field is a primary field of this table to save unique id of every record on this table.This table connected with two other tables, the first is subject_course table to get information about subject course that stored in subjecCourse_id field and the second is user table to get information about user that stored in the user_id field.The main field on this table is a semester field that store information about semester.The third table is the user table.This table is store all information about user attributes.Term user in this context is learning member such as students and lecturers name. in this table contain nine field there are u_id, faculty_id, name, idNumb, isNumb, passwd, email, address and phoneNumb.This table connected with two tables there are faculty table and grading table.User table will use information in faculty table to get information about faculty name of each user in this table.The second table is grading table.Grading table will use the information in user table to get name of user to give extra attribute in this table.The fourth table is a grading table.In the grading table contains eight fields, there are g_id, user_id, idNumber, subjectCourse_id, courseCode, midScore, finalScore and grade.Table grading connected with two tables there are user table and subject_course table.The two information will used in this table are user information from user table and information about subject course in subject_course table.The main information in the grading table are mid score information that stored in midScore field, final score information that stored in finalScore field and the final grade student that stored in grade field.The last table in student grading system is subject_course table.This table is contains information about subject course attribute such as course credit, course hour, course code and course name.This table connected with two others table, there are grading table and faculty table.This table use faculty information in faculty table and the information on the subject_course table are used by grading table.