Generated rules for AIDS and e-learning classifier using rough set approach

Web mining extracts the information from the World Wide Web (WWW) by using data mining techniques. The extraction of hidden pattern, or predictive information from huge database, and useful knowledge and unknown information can be discovered by using data mining. Data mining is one of the parts in Knowledge Discovery in Database (KDD). KDD is one of the processes used to transform data into knowledge. Data mining as an analysis of enormous datasets to discover hidden information or unsuspected relationships inside a network and to concise the data in novel ways and produce useful and meaningful information to the owner of the data [1]. In addition, data mining can be defined as a method of automatically extracting implicit and useful patterns from databases [2]. It encompasses many different techniques and algorithm, including classification, clustering, association rules and others. Over the years, Rough Set Theory (RST) has become an interest for researches and has been applied to many domains, such as data classification, data clustering, and association rules mining.


I. Introduction
Web mining extracts the information from the World Wide Web (WWW) by using data mining techniques.The extraction of hidden pattern, or predictive information from huge database, and useful knowledge and unknown information can be discovered by using data mining.Data mining is one of the parts in Knowledge Discovery in Database (KDD).KDD is one of the processes used to transform data into knowledge.Data mining as an analysis of enormous datasets to discover hidden information or unsuspected relationships inside a network and to concise the data in novel ways and produce useful and meaningful information to the owner of the data [1].In addition, data mining can be defined as a method of automatically extracting implicit and useful patterns from databases [2].It encompasses many different techniques and algorithm, including classification, clustering, association rules and others.Over the years, Rough Set Theory (RST) has become an interest for researches and has been applied to many domains, such as data classification, data clustering, and association rules mining.
Rough Set (RS) analyzes uncertainty of a dataset that is used to determine the crucial attributes of objects and build the upper and lower approximate sets of objects sets [3], [4].The main advantage of using RST instead of fuzzy set in data analysis is that it does not need any preliminary or additional information about data − like probability in statistics, grade of membership or the value of possibility in fuzzy set theory [5], [6].In the real world data varies in size and complexity, difficult to analyze and also hard to manage from computational view point.The major objectives of RS analysis are to reduce data size and to handle inconsistency in data [4].Moreover, it is being used for the extraction of rules from database.Decision rules extracted by RS algorithms are valuable and concise, which can be beneficial by enlightening some hidden knowledge in the data [7].Another research is Came out with the question of problem regarding large log dataset on how to remove the messy data timely with low cost and find out useful information in huge dataset [8].Therefore, to solve the problem of incomplete dataset, RST will be used since RS can deal with uncertainty data.RST is a new mathematical tool that can handle uncertainty and incomplete information.A principal goal of RST analysis is to synthesize or construct approximations (upper and lower) offsets concepts from the acquired data [9].RS for rules generation and rules extraction for better classification in Web usage mining using the Web log dataset since RS can deal with uncertain data has applied by [10], [11], and [12].The generated rules will be used as a guideline to query a large dataset and get the accurate relationship among the parameters from the database.
The rest of this paper is organized as follows.Section II reviews the related works; Section III presents the experimental design; Section IV provides the experimental results and analysis.Finally, the last section in this paper, Section V describes a conclusion as summary of the research.

II. Related Works
With the enormous growth of data especially large size data sets, mixed types of data, data change, incomplete and uncertain data, the information system may contain a number of redundancies that will not assist in any knowledge discovery and may in fact deceive the process.One of the methods which can be used to deal with these issues is the RST, proposed by [13], a mathematical tool used to deal with imperfect knowledge and discover pattern hidden in data.RST deals with uncertainty and vagueness, allowing generation of the sets of decision rules from data.Reduct set can be generated or the core of the contribute set can be constructed by eliminating the redundant attributes [14].
This simple idea leads to many competent applications of RS such as data mining, machine learning, and also in granular computing.RS have also been applied in many real life applications such as web transaction [15], [16], web search clustering [17], medical [7], [18], [19], [20], elearning [2], [3], and marketing [21]- [23].In real world data varies in size and complexity, which is difficult to analyze and also hard to manage from computational view point.The major objectives of RS are to reduce data size and to handle inconsistency or redundancy in data [4].Hidden patterns or hidden information or relationship can be identified from large data sets.Therefore, RS is used in this research to generate rules.

A. Reduct and Rules Generation
Computation of reduct is conducted to determine minimal attributes that represent the patterns of knowledge in the data.Attributes that are irrelevant will be eliminated through reduction process and rules will be produced from the reduced number of attributes.Thus, unimportant and redundant knowledge need to be eliminated in order to generate an effective reduct set and a more reliable model.Johnson's algorithm (Johnson) and Genetic algorithm (GA) are two reduction methods that can be used to generate rules.These two reduction methods are provided in ROSETTA software.
ROSETTA is a toolkit designed to support the overall data mining and knowledge discovery process, and for analyzing tabular data within the framework of RST that could be applied in the original dataset to compute the reduced set without the loss of the knowledge of the original set [24].The whole RS processes can be applied in ROSETTA; from the initial browsing and pre-processing of the data via computation of minimal attribute sets and generation of if-then rules or descriptive patterns, up to the validation and analysis of the induced rules or patterns [25].
Liu [26] stated that system performance is more effective if the rules are less.Performing reduction on a set of data is one mechanism to decrease the number of rules.Reduct provided by RS generates comprehensible rules compared to other methods [27].Liang et.al.[27] used RS and Rough Set-based Inductive Learning to help instructors and students with WebCT learning.Rough Set-based Inductive Learning was used to obtain the decision rules to provide the reasons for the lack of success of students.As a result, the Web learning system improved and increased the effectiveness of WebCT.
Back in 2001, RST used for the analysis of diabetic databases [12].They applied RS to Pima Indian Diabetic Database (PIDD) by using ROSETTA software.392 complete cases in the PIDD was randomly divided into training set (n=300) and testing set (n=92).The training set was then discretized.They used the Equal Frequency Binning (EFB) with k=5 bins.Then, they applied Johnson reducer algorithm to create the reduct.Next, classification method was applied to the testing set by using batch classifier with the standard/tuned voting method (RSES).The generated rules were applied to the testing set.The result showed that the prediction accuracy was increased.The workflow of the main steps in conducting rough set analysis has proposed by [28].The workflow in Fig. 1 is the same as the process conducted by [12].Fig. 1.Main steps of rough sets analysis by [28] On the other hand, RST applied to feature granularity of Cardiac datasets with 70% training and 30% testing set [7].Standard voting classifier (SVC) was used to implement classification.Fig. 2 shows the RS classification modeling as shown by [29].EFB discretization technique with k=3 was used to get the same number of data for each interval.New decision table was constructed based on core attributes and minimal cardinality in the generated reduct.Highest support values, less length and highest percentage of Rule Importance Measure (RIM) were the parameters used to analyze the generated rules.Fig. 2. Rough Set classification modeling by [29] In case of reduct, the experiment performed by [30] using k-fold method resulted with more rules produced by GA compared to Johnson which led to less accuracy.Their results also proved that k=10 was convenient for model validation.Fig. 3 shows the general steps to develop performance prediction model proposed by [30].
In 2012, the concept of cross validation with k=10 also applied [3].The generated rules enhanced the prediction performance of Web pre-caching and the rules were then used to construct queries for the datasets using Social Network Analysis (SNA).Fig. 4 shows the illustration of RS classification procedure by using ROSETTA System.Fig. 3. General steps for development of prediction model [30] In recent years, a new technique to solve the issue of unorganized large multimedia data proposed by [31].They used RST and web services technology in the proposed model to classify and analyze data.The proposed technique that involved 50% of testing data and 50% training data proved the effectiveness of RST in classifying data into respective clusters.RST for customer classification also applied [23].The generated rules presented the factors that influenced the client's purchase.They claimed that RST had no information loss, extendable and flexible compared to other data mining technologies.The generated rules helped to make their products better and organized the customer accurately.Both of these studies used ROSETTA software for validation and data processing.A year after, decision rules were used to classify real world Web services, done by [32] to improve the classification accuracy.Fig. 5 shows the RS steps they proposed.A new approach of Karnaugh map for the reduction of attributes and RST to generate rules proposed by [14].They claimed that the major objectives of RS analysis were to reduce data size and to handle inconsistency in data.They dealt with uncertainty and extracted useful information from the database.The proposed work used Flu Data Set where the data was discretized by using RST and K-map.The data about six patients was used as training data.Using k-map and RS approach, data was analyzed; redundant data was eliminated; attributes were reduced and set of rules were developed.The cross validation technique is used to define a validation dataset to test the model after training phase.In k-fold cross validation, the dataset is divided into k subsets of equal size.For each k experiment, a single subset is used as the testing set, and the remaining k-1 subsets will be used as the training set, as shown in Fig. 6.The advantage of using k-fold cross validation is all the data are eventually used for both training and testing.This technique ensures that each data subset is tested once, and thus has the same proportion of data, reducing bias in the model evaluation.k-Fold technique allows the accuracy for each fold to be calculated.The fold with the highest prediction accuracy can be identified.[34], the value of k is often 5 or 10, but there is no specific requirement.

Based on previous studies such as by
In ROSETTA, rules are constructed based on IF-THEN rules.Then terms used are LHS which stands for Left Hand Side that refers to the IF-part of the rule; and RHS which is Right Hand Side that refers to the THEN part of the rule.Rules are evaluated according to how general they are such as for coverage, the fraction of objects from the decision class in the THEN-part matches the IF-part; and how specific they are such as for accuracy, the fraction of objects matches the IF-part that are from the decision class of the THEN-part [35].Rosetta lists the rules and provides some statistics for the rules which are support, accuracy, coverage, stability and length.Below is the definition of the rule statistics: "i) the rule of LHS support is defined as the number of records in the training data that matches the IF condition, ii) the rule of RHS support is defined as the number of records in the training data that matches the THEN condition, iii) the rule of RHS accuracy is defined as the number of RHS support divided by the number of LHS support, iv) the rule of LHS coverage is the fraction of the records that satisfies the IF conditions of the rule.It is obtained by dividing the support of the rule by the total number of records in the training sample, v) The rule of RHS coverage is the fraction of the training records that satisfies the THEN conditions.It is obtained by dividing the support of the rule by the number of records in the training that satisfies the THEN condition, vi) The rule of LHS length is defined as the number of conditional elements in the IF part, vii) The rule of RHS length is defined as the number of conditional elements in the THEN part" [7], [35].

Testing set
Training set Most of these previous researchers used ROSETTA for the entire RS processes starting from data pre-processing until data classification stage.Some of the studies used the traditional technique to split data into training and testing set, and some used the k-fold cross validation technique.Discretization is applied to training and testing set.Discretization technique is one of the preprocessing techniques.The use of continuous attributes involves huge storage, misinterpretation and long rules.Hence, discretization is needed to change from continuous attributes to discrete attribute in order to increase the accuracy in prediction [36].Then, reduction is performed by using GA or Johnson to generate rules.Johnson used by [12] while [3] applied GA, and [30] applied these two reduction methods to make comparison.The generated rules were then used to classify the testing set.Hence, the classification accuracy was obtained.
Therefore, in this research, Rough Set is so far considered as popular approaches to generate rules.The generated rules will then be selected based on LHS support in order to query the dataset in Social Network Analysis part.The reason of using LHS support to select the significant rules is discussed in the next section.

B. Significant Rules
Reduct is possible to generate large number of rules that can be important or unimportant.Therefore, many analysis have been using approaches to identify the significant rules.Reference [37] suggested sorting the rules based on the support value in order to find the most important rules for each set.Value of length is not much different between each rules, thus support is used as the criteria to rank the rules.Furthermore, reference [7] claimed that rules with less length were not effective to measure the significance of rules.
On the other hand, the rules that had the highest support of objects in LHS support was the most significant rules mentioned by [30].Previously, reference [38] proposed a new measure called Rule Importance Measure (RIM) to evaluate association rules based on rough sets theory.It is possible for rules from different reduct sets to contain dissimilar representative information.Thus, important information might be excluded if only one set of reduct is examined for rules generation.Multiple Total number of data reducts will generate the rules many times.Rules that occur more frequently are considered to be more important.If a rule is generated more frequently across different rule sets, we say that this rule is more important than other rules [32].

III. Experimental Design
The proposed model of Rough Set Rules Generation (RSRG) by generating RS and set of rules is illustrated in Fig. 7.There are two components involved in the model.The two components are data pre-processing in which data was converted into a format that is acquainted for experiment; and rough set rules generation to generate rules and select rules based on high and low LHS support.

A. Data Pre-processing
This phase is the corresponding activity of component one as depicted in Fig. 7.This step includes two subsections, data cleaning and data transformation.Data collection and data analysis were involved at the beginning of this phase.In this research, two different datasets, including AIDS and EL log datasets were used as the datasets.The raw dataset would undergo the preprocessing process.Data pre-processing involved manipulating input data into a suitable form. Finalizing the data into a format that is acquainted for experiment.
The first step in pre-processing involved the process of identifying incorrect and unused records, and removing unnecessary attributes.Second step involved the formatting of data to be acquainted and amenable for experiment.The pre-processing output were then passed and processed as inputs for the next process.

B. Rough Set Rules Generation
Next, the filtered data would undergo the process in second component as illustrated in Fig. 7.The generated rules would be selected based on highest and lowest support value that would be used as the queries to cluster the data.Fig. 8 illustrates the procedure of RS using ROSETTA system.The procedure involved data splitting, data discretization, data reduction, classification and selection of significant rules based on LHS support.

1) Data Splitting
In this research, k-fold cross validation was used to split the data into testing and training set.The aim of using this technique was to validate the dataset and to ensure the consistency of results.
In fact, according to [28], the main advantage of k-fold cross validation is to reduce the bias by repeating the experiment ten times.Even though this methodology is rather time consuming, it is a viable option for small datasets.This research clearly expressed that 10-fold cross validation does not require more data compared to the conventional single split.Furthermore, 10-fold cross validation is the best and has been the common practice.In fact, in data mining community, for methods-comparison studies with relatively smaller datasets, k-fold cross validation is recommended [39].  1 and 2, respectively.Next, each training and testing set executed the discretization process.Discretization process involves converting continuous values into categories or classes.Reference [40] claimed that Naïve and Boolean Reasoning were ranked first as these two algorithms were the most suitable discretization methods in medical area that provided better accuracy.Similarly for engineering data with a specific class distribution, Naïve, semi-naïve and entropy gave better results compared to other methods.Therefore, in this research, two techniques of discretization provided by ROSETTA Toolkit which is Naïve Algorithm [10], [38], [41] and Boolean Reasoning Algorithm [42]- [44] were tested to establish a technique that present high accuracy in classification.This research also compared the accuracy of non-discretization technique with discretization technique.The end result of this process is data was transformed into several categories.

3) Reduct
Subsequently, the training sets went through the reduction process and rules were generated from this data.Different reduct techniques were compared between Genetic algorithm (GA) and Johnson's algorithm (Johnson).Reduct generation had two options; full object reduction and object related reduction.Full object reduction produced a set of minimal attributes subset that defines functional dependencies, while reduct with object related produced a set of decision rules or general pattern through minimal attributes subset that discern on a per object basis.The classification accuracy for reduct with object related is higher than using full reduct [45].Hence, reduct with object related was preferred in this research due to its ability in generating reduct based on discernibility function of each object.

4) Classification
Lastly, the testing sets were used to verify the rules generated from training sets.The classification was implemented using Standard Voting Classifier (SVC).The performance of SVC was more optimal and more accurate compared to Batch classifier performance [46].They concluded that, SVC was a better classifier in ROSETTA.Reference [47] claimed that SVC was an efficient algorithm under RS.Therefore, in this research, SVC was used to enhance accuracy of classification.The rules generated were used to classify the testing dataset.

IV. Experimental Results and Analysis
In this research, the classification accuracy of non-discretized datasets were compared with discretized datasets.The following section will discuss the results of non-discretized technique, followed by results of discretization technique, and results of reduct and rules generation which presents the number of reduct and rules.

A. Non-Discretization
The aim of this process is to compare the accuracy of non-discretize and discretize datasets with different reduct methods.The two reduct methods are Johnson's algorithm (Johnson) and Genetic algorithm (GA).10-fold and 5-fold cross validations were used in this research since according to Omar et al., (2011), depending on the data size, it is possible to divide k = 5 and k = 10.K-folds are labeled as 1, 2, 3,.. to 10.
Table 3 shows prediction accuracy of each fold using k=10 for AIDS dataset.Both reduct methods obtained the same accuracy.The highest prediction accuracy was 84.21% and the lowest was 66.67%.The average accuracy for 10-fold cross validation was 75.32%.GA and Johnson obtained the same accuracy as 5-fold cross validation as depicted in Table 4.The highest prediction accuracy was 81.08%, the lowest was 65.79% and the average was 72.75%.This indicates that cross validation k=10 produced higher accuracy than k=5 for non-discretize AIDS data.However, for EL dataset, GA outperformed Johnson with average accuracy of 97.86%, while Johnson yielded 97.74% when using 10-fold cross validation as shown in Table 5.The highest prediction accuracy was 99.31% (fold 6) obtained by both reduct techniques and the lowest was 95.80% (fold 1) obtained by Johnson.Table 6 shows 5-fold cross validation where the average accuracy of GA was 0.02% over Johnson.The highest prediction accuracy was 98.25% (fold 2) obtained by GA and the lowest was 96.88% (fold 4) obtained by both reduct techniques.Table 11 summarizes the results for non-discretize and discretize datasets.Based on the results for AIDS dataset, it was discovered that BR outperformed Naïve and non-discretized AIDS data when using k=5, while k-10 non-discretize outperformed Naïve and BR.As for k-fold of discretize data, the average accuracy of Naïve and BR increased when using k=5 compared to k=10.Thus, k=5 was well suited for discretize AIDS data.While for EL dataset, Naïve algorithm obtained the same average accuracy for k=10 and k=5 and outperformed BR by 99.98%.Moreover, discretize EL data by Naïve algorithm also yielded higher accuracy than non-discretize EL data.Thus, it was revealed that the best classification accuracy for EL dataset was generated by Naïve algorithm, and k=10 was well suited for cross validation of EL dataset since the highest prediction accuracy when k=10 was 100%.In terms of generated rules, AIDS data had 1 reduct and 8 rules for Naïve algorithm and 5 rules for BR.BR obtained higher prediction accuracy with lesser number of rules.Naïve produced lower accuracy with more number of rules.Both findings have their own advantages and drawbacks.Although BR gives the best accuracy, shorter rules generated may contribute to the loss of knowledge [36].On the other hand, Naïve showed comparative performance towards BR with more number of rules.Therefore, rules generated from AIDS data that had been discretized by Naïve algorithm using k=5 were selected to be used in this research.
Whereas for EL dataset, rules were generated from discretize EL data using Naïve algorithm using k=10 because it produced higher accuracy compared to BR and non-discretize EL data.Both reduct algorithms also produced the same prediction accuracy for each fold as depicted in Table 13.However, for non-discretize EL dataset (refer to Table 4), GA obtained higher accuracy than Johnson with the small difference of only 0.12%.EL dataset had 3 reduct and generated 18 rules for each fold.Based on the results for reduct process, Johnson and GA produced the same prediction accuracy for each fold for both datasets.This pattern of result where Johnson produced the same accuracy as GA is the same as obtained by [25] and [48] in their research in which GA and Johnson produced the same accuracy for the same dataset.Moreover, both reduct algorithms also generated the same number of reduct and number of rules.
Then, the most significant and less significant were selected from the 8 rules for AIDS dataset and 18 rules for EL dataset that had been generated.According to [30], the most significant rules have the highest support value of Left-Hand-Side (LHS) support.Thus, in order to find the most and least significant rules that would be used to visualize SNA, comparison of high and low LHS support value for each fold was made as shown in Tables 14 and 15.For AIDS data, the highest LHS was 36 obtained by fold 4 and the lowest LHS was 11 obtained by folds 4 and 1.For EL data, the highest LHS was 10597 acquired by fold 1 and the lowest LHS was 13 acquired by folds 2 and 5.  16 and 17 demonstrate the sample of rules derivation from AIDS and EL datasets, respectively.AIDS dataset consists of eight rules while EL dataset consists of eighteen rules.For AIDS dataset, support value of LHS showed the total number of support including VALUE(1) and VALUE(0), while RHS showed the number of support for each VALUE (1) or VALUE(0) separately.The generated rule of RESPONSE ([*, 10)) => VALUE(1) OR VALUE(0) was considered as the most significant rule.The rule was supported by 36 support values of LHS and 33 support values of RHS for VALUE(1) and 3 support values of RHS for VALUE(0).The RHS support values had two different values, depending on the numbers of records in the training dataset described by the THEN condition; VALUE (0) or VALUE (1).The RHS stability and LHS length was equal to one for all rules.There were two groups of rules for RHS length which are rules of length less than or equal to 1 and greater than 1.According to Sulaiman (2011), rules with length of greater than 1 contribute to better classification compared to rules of length less than or equal to 1.
The most significant rules based on high support value are often considered as the rule to query the dataset.Nonetheless, in this scenario, generated rule of RESPONSE ([*, 10)) => VALUE(1) OR VALUE(0) could not be considered as the most significant rule for query statement.This is because the rule had an infinite (*) value for the 'from, including' value.Rule of 'from * (including *)' is not valid to be used as a query statement.Therefore, other rules with high support value were chosen to be used as query statement to cluster the dataset.Tables 18 and 19 sorted the rules according to their support value.The higher the support value the more significant the rules.
From Table 18, RESPONSE([143, *)) => VALUE(1) OR VALUE(0) was considered as the top highest support value of LHS support.The first rule was supported by 36 support values of LHS and there were 27 support values of LHS for the second rule.Although the second rule contained infinite (*) value, the rule did not include * for instance, * was not included in 143 to *.Therefore, the rule statement that would be used for query was considered as RESPONSE>=143 and RESPONSE<=143.The rule with the lowest support value was also selected to be used to cluster the dataset.This was to determine the relationship between rules and LHS support value in visualization.Table 18 presents the generated rule of RESPONSE([55, 96)) => VALUE(0), which was considered as rule with less support value as it was only supported by 11 support values of LHS.For EL dataset, the generated rule of SIZE ([0.00008,*)) =>CACHE(1) was considered as the most significant rule.The rule was supported by 10480 support values for RHS and LHS.LHS and RHS support affected the total of LHS and RHS coverage.The RHS accuracy and stability were equal to one for all rules.On the other hand, the rule with the highest value of LHS and RHS support also obtained the highest value of coverage.Despite that, the same value of LHS and RHS support did not produce the same value of RHS and LHS coverage.The highest coverage for LHS was 0.503967 and 0.857962 for RHS.
The most significant rules based on high support value will be considered as the rule to query the dataset.Nonetheless, in this scenario, generated rule of SIZE ([0.00008,*)) =>CACHE(1) could not be used as the query statement.Although the rule statement did not include *, yet the dataset consisted of three reduct.Thus, attributes of NUM_OF_HITS were also needed to be used in the

V. Conclusion
This paper discusses the analysis and experimental results of Rough set.The generated rules of Rough set were analyzed based on high LHS support value to identify the significant rules.Selected significant rules were used to query the dataset.Furthermore, rules with low LHS support value also will be selected to be used in SNA part in order to compare the visualization of data based on rules with high and low LHS support values.Some limitations of this research that may serve as a guide for future work.Instead of only using k-fold cross validation and various discretization algorithms, analysis of various percent of training and testing should be done to compare the better result of classification.Different percent of training and testing and different discretization techniques contribute to different results of accuracy classification.Moreover, the available literature in RS opens a promising domain towards future research and more intensive experiments in other complex area such as big data analysis.

Fig. 7 .
Fig. 7.The proposed model of RSRG The data pre-processing involved the following two steps:  Filtering the data to remove unnecessary fields.

Fig. 8 .
Fig. 8. Procedures of RS Nevertheless, as discussed in Section II.A, it is possible to divide k=5 and k=10 depending on the size of data.Hence, in this research, 5-fold and 10-fold cross validations were applied on both datasets to test different types of k-fold on different sizes of datasets.Data was divided into 5-fold (80% training, 20% testing) and 10-fold (90% training, 10% testing) as shown in Tables1 and 2, respectively.

Table 1 .
5-fold cross validation of AIDS and EL datasets

Table 2 .
10-fold cross validation of AIDS and EL datasets Sarina Sulaiman et.al. (Generated rules for AIDS and e-learning classifier using rough set approach) 2) Discretization

Table 3 .
Classification accuracy for non-discretization technique for AIDS dataset 10-fold

Table 4 .
Classification accuracy for non-discretization technique for AIDS dataset 5-fold

Table 5 .
Classification accuracy for non-discretization technique for EL dataset -10-fold

Table 6 .
Classification accuracy of non-discretization technique for EL dataset -5-fold

Table 10 .
Classification accuracy for discretization technique for EL dataset -5-fold

Table 12 .
Number of reduct and rules for AIDS dataset -5-fold

Table 13 .
Number of reduct and rules for EL dataset -10-

Table 14 .
High and low LHS support value for AIDS dataset

Table 15 .
High and low LHS support value for EL dataset

Table 16 .
Sample rules of AIDS dataset

Table 17 .
Sample rules of EL dataset Sarina Sulaiman et.al. (Generated rules for AIDS and e-learning classifier using rough set approach)

Table 18 .
Sorted highest rules support values for AIDS dataset