Variable precision rough set model for attribute selection on environment impact dataset

a Department of Geology Engineering, STTNAS, Yogyakarta, Indonesia b Department of Information System, Universitas Ahmad Dahlan, Yogyakarta, Indonesia c Department of Urban and Regional Planning, STTNAS, Yogyakarta, Indonesia d Department of Mathematics, Universitas Gajah Mada, Yogyakarta, Indonesia 1 aniapriani@sttnas.ac.id; 2 yanto.itr@is.uad.ac.id; 3 septiana@sttnas.ac.id; 4 s.kartiko@yahoo.com; 5 danardono@ugm.ac.id


Introduction
Development is a process of social changes.It aims to enhance the society livelihood, without jeopardizing its environmental and cultural sustainability.Development may enable people within a society to decide their own future.In other words, it has to be participative.Based on those standpoints of views, development should elaborate all of the economic, social, and environmental aspects.Those aspects are inevitably important within the concept of sustainable development.Nevertheless, those three aspects are also the most vulnerable part which could be influenced by the side-effect of development.The imbalance development process may cause negative effect on the either economic, social, or environmental aspects of a society [1]- [3].
The development is often assumed as the physical development in a certain area.It is often marked by the infrastructure and facility enhancement in order to fulfill the social and economic needs of a society.One of the obvious sign of physical development is the land conversion: either convert unused land into building and structure or land function conversion, such as from residential into commercial use.Physical development may cause economic, social, and environmental shifts [4], [5].

A R T I C L E I N F O A B S T R A C T
The investigation of environment impact have important role to development of a city.The application of the artificial intelligence in form of computational models can be used to analyze the data.One of them is rough set theory.The utilization of data clustering method, which is a part of rough set theory, could provide a meaningful contribution on the decision making process.The application of this method could come in term of selecting the attribute of environment impact.This paper examine the application of variable precision rough set model for selecting attribute of environment impact.This mean of minimum error classification based approach is applied to a survey dataset by utilizing variable precision of attributes.This paper demonstrates the utilization of variable precision rough set model to select the most important impact of regional development.Based on the experiment, The availability of public open space, social organization and culture, migration and rate of employment are selected as a dominant attributes.It can be contributed on the policy design process, in term of formulating a proper intervention for enhancing the quality of social environment.
The physical development may raise both positive and negative impact in society.The increasing commercial activities, for instance, could provide financial benefits for the society [6]- [9].However, it may also lead into a higher social-economic disparity within a society.Therefore, this research aims to investigate the social, economic, and environmental impact of the physical development.A strategic program can be well planned by evaluating the environmental impact during the study period in an institution [10]- [12].
An effective way to detect the most principal impact is the use of data mining technique [13].Data mining in general, is the process of finding, analyzing a new information that may exist in data and summarizing the results as useful information.There are many outstanding studies on data mining in the many areas, such as clustering, association rules, classification, and conflict analysis [14]- [16].
In order to achieve the research objective, this research presents the utilization of the variable precision from rough set (VPRS) theory to perform attribute selection in environment dataset.This method was based on variable precision rough set approximation using minimum error classification of attributes [17].The VPRS is introduced by Ziarko [18].The extension of rough set may solve uncertainty data without any functional relationship attributes using error-tolerance capability [19].By setting the tolerance, VPRS also can handle the noisy data.This paper contribution is the selection of the most influential attribute by ordering the attributes based on its relevancies in term of the attribute minimum error classification of attribute in the dataset.Selecting and identifying the most influential attribute of the environment data set in the early phase could contribute to a better policy design process for the purpose of the enhancement of the social environment.This paper consists of four sections.The first section is the introduction.It is followed by the method which is include the theoretical review on the concept of information system, rough set data theory, and minimum error classification in section 2. The description of the data set characteristic will be presented in section 3, is also discussed the result of the experiment and the evaluation of the experiment will be discussed.The last section, the conclusion of the paper is in section 4.

Set approximations of information system
The set of approximations of information system can be defined in the following terms: is the domain of attribute .An information system can be defined as a mapping of pair universe  and attribute  to value set  [20] [19], as in (1).The accuracy of approximation of any subset  ⊆  with respect to  ⊆  is defined as in (3).

Variable Precision Rough Set (VPRS)
Variable precision rough set is extension of rough set theory.This is established by relaxing the subset operator.It is utilized to conduct analysis and identification of patterns of data that represented statistical trends rather than functional.VPRS classifies object based on its smaller error compare with the certain pre-defined level.The threshold introduced in this method does not require any information besides which is already in the data.In VPRS firstly introduced the error classification to define the lower and upper approximation [18].The error classification in VPRS is defined as in (4).
for every ,  ⊆ , where ,  ≠ ∅, (, ) is called the error classification rate of  relative to .
Definition 4. Lets  is a universe and  ⊆ .A real 0≤  ≤ 0.5 is given as a threshold.The and upper approximation of  are defined as in ( 5) and ( 6), respectively.
The Equation ( 5) is also called as the positive region of  that is the set object of  which can be classified into  with error classification rate not greater than .The we have   () ⊆   () ⟺ 0 ≤  ≤ 0.5, where  is restricted in interval [0,0.5] to keep the meaning of upper and lower approximation.
The accuracy of VPRS with given threshold  is presented in (7).
The problem appears is how to choose the threshold  to increase the approximation accuracy while the error classification could be minimized.There are three cases of B derived from proposition (8): Case 1.If  ≥ 0.5, it is clear the accuracy will be out.

Case 2. If
, the accuracy is equal.
, the accuracy of VPRS is greater than the traditional rough set.
Based on the previously mentioned cases,  can be formulated as positive number with the value is less than 0.5.The threshold  > 0 can be selected as the minimum error classification which is denoted as (9).
The attribute with minimum 0   is selected as the clustering decision.The pseudo code of the MECC algorithm is shown in Fig. 1.

Results and Discussion
This research aims to identify the most influential impact from the environment dataset by put the relevant attributes in a ranked order based on the minimum error classification in the dataset.The selection and identification of the most influential attribute of environmental impact in the early stage help the policy maker to design the proper intervention and take immediate action to improve the quality of social environment.
The dataset was established by conducting a survey in Yogyakarta, Indonesia.There are 400 respondents who involved in the survey.The respondent consists of 176 male and 224 female respondents.Reliability test has been conducted for the data set, with alpha score of 0.953.Data collected from survey is accumulated.The descriptive statistics, which is calculated by utilizing SPSS software, of the data set in form of mean and standard deviation is presented in Table 1.The of Physics and Socio-culture, economic are independent one another, thus the means of its impacts are that are 25.505, 12.7175 and 16.7475, respectively.Meanwhile, the dispersion of the impact is practically homogent, where the standard deviations are quite similar that are 2.954, 1.579 and 2.667, respectively.The average of classification error values for each attributes of the three aspects, can be described in the following part:

2) Impact on Socio-Cultural Aspects
There are five attributes of the socio-cultural aspects, namely: social organization (SB1), social interaction (SB2), culture (SB3), social practice (SB4), livelihood quality (SB5).The average of minimum error classification values can be seen in Table 3.The selected attribute is social organization (SB1) and culture (SB3) with the minimum error classification is 0.6, respectively.

3) Impact on economic aspects
There are ten attributes of the "economic aspects", namely: migration (E1), rate of employment (E2), economic structure development (E3), revenue (E4), expenditure (E5), shift of occupation (E6), public health (E7), increasing number of educational facility (E8), increasing number of religious facility (E9), increasing number of health care facility (E10).The average of minimum error classification values are shown in Table 4.The selected attributes are migration (E1), and rate of employment (E2) which both have the minimum error classification is 0.49.

Conclusion
This paper demonstrates the utilization of the variable precision rough set as attribute selection to the environmental impact.The utilization of the mean of minimum error classification using variable precision of attributes is the basic of this technique.This technique was utilized to examine three environment impacts, such as Physical and Chemical aspects, socio-cultural aspects, and economic aspects.This paper has demonstrated the usefulness of this technique to select the most influential environment impact.The selected attributes are availability of public open space, social organization and culture, migration and rate of employment.The result from the experiment as it has been presented in this paper can be a basic for the policy design process and to formulate the proper treatment to improve the quality of social environment.

Definition 2 .Definition 3 .
Two elements ,  ∈  are said to be B-indiscernible ⟺ (, ) = (, ), ∀ ∈ .A unique indiscernibility relation can be induced for every subset of.Let  ⊂ , () is a relation of indiscernibility induced by the set of attribute  and it is an equivalence relation.The partition of  induced by () is denoted by / and []  is the equivalence class in the partition / containing  ∈ .Lower and upper approximation of  induced by  are defined as in (2).

1 .
Compute the equivalence classes using the indiscernibility relation on each attribute.Step 2. Determine the error classification of attribute  1 with respect to all   , where  ≠ .Step 3. Select the mean error classification from step 2 to be a .Step 4. Select an attribute based on the minimum of the  End level (L7), land use (L8), availability of public open space (L9).The average of minimum error classification values are shown in Table2.The selected attribute, availability of public open space (L9), has the minimum error classification is 0.5.

Table 1 .
Mean and standard deviation of variables

Table 2 .
average of classification error value of each attribute of environmental aspect

Table 3 .
The average of classification error value of each attribute of socio-cultural aspects

Table 4 .
The average of classification error value of each attribute of economic aspect