(2) Nor Amalina Abdul Rahim (Faculty of Computing, Universiti Teknologi Malaysia, Malaysia)
(3) * Andri Pranolo ((Informatics Department, Universitas Ahmad Dahlan), Indonesia)
*corresponding author
AbstractThe emergence and growth of internet usage has accumulated an extensive amount of data. These data contain a wealth of undiscovered valuable information and problems of incomplete data set may lead to observation error. This research explored a technique to analyze data that transforms meaningless data to meaningful information. The work focused on Rough Set (RS) to deal with incomplete data and rules derivation. Rules with high and low left-hand-side (LHS) support value generated by RS were used as query statements to form a cluster of data. The model was tested on AIDS blog data set consisting of 146 bloggers and E-Learning@UTM (EL) log data set comprising 23105 URLs. 5-fold and 10-fold cross validation were used to split the data. Naïve algorithm and Boolean algorithm as discretization techniques and Johnson’s algorithm (Johnson) and Genetic algorithm (GA) as reduction techniques were employed to compare the results. 5-fold cross validation tended to suit AIDS data well while 10-fold cross validation was the best for EL data set. Johnson and GA yielded the same number of rules for both data sets. These findings are significant as evidence in terms of accuracy that was achieved using the proposed model
KeywordsRough Set; AIDS blog data; E-Learning log data; Rules derivation; Cross validation
|
DOIhttps://doi.org/10.26555/ijain.v2i2.74 |
Article metricsAbstract views : 2011 | PDF views : 260 |
Cite |
Full TextDownload |
References
Akbar, Z. (2003). Marketing data classification using Johnson’s algorithm.
Al-radaideh, Q. A., Sulaiman, M. N., Selamat, M. H., and Ibrahim, H. (2003). An Empirical Comparison of Reduct Generation Approaches in the Context of Rough Set Based Classification. ICITNS 2003 International Conference on Information Technology and Natural Sciences.
Anitha, A., and Krishnan, N. (2011). A Dynamic Web Mining Framework for E-Learning Recommendations using Rough Sets and Association Rule Mining. 12(11): 36–41.
Bose, Indranil (2006). Deciding the financial health of dot-coms using rough sets. Information and Management. 43(7):835 - 846.
Breault, J. L. (2001). Data Mining Diabetic Databases : Are Rough Sets a Useful Addition ? Proceedings of the 33rd Symposium on the Interface, Computing Science and Statistics.
Brtka, V., Berkovic, I., Brtka, E., and Jevtic, V. (2008). A comparison of rule sets induced by techniques based on rough set theory. 2008 6th International Symposium on Intelligent Systems and Informatics. (3): 1–4.
Chen, Y., Miao, D., and Wang, R. (2010). A rough set approach to feature selection based on ant colony optimization. Pattern Recognition Letters. 31(3): 226–233.
De, S. K., and Krishna, P. R. (2004). Clustering web transactions using rough approximation. Fuzzy Sets and Systems. 148(1): 131–138.
Durairaj, M., and Sathyavathi, T. (2013). Applying Rough Set Theory for Medical Informatics Data Analysis. (5): 1–8.
Elshazly, H. I., Ghali, N. I., Korany, A. M. El, and Hassanien, A. E. (2012). Rough Sets and Genetic Algorithms: A hybrid approach to breast cancer classification. 260–265.
Hand, D. (2001). Principles of Data Mining. (1).
Hvidsten, T. R. (2013). A tutorial-based guide to the ROSETTA system : A Rough Set Toolkit for Analysis of Data.
Jaddi, N. S., and Abdullah, S. (2013). Hybrid of genetic algorithm and great deluge algorithm for rough set attribute. 1737–1750.
Jiang, J. (2014). System model of college students’ network behavior research based on rough sets. Journal of Chemical and Pharmaceutical Research. 6(7): 2264–2270.
Karthik, S., Priyadarishini, A., Anuradha, J., and Tripathy, B. K. (2011). Classification and Rule Extraction using Rough Set for Diagnosis of Liver Disease and its Types. 2(3): 334–345.
Kohavi, R. (1995). A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. International Joint Conference on Artificial Intelligence. 14(12): 1137–1143.
Kumar, M., and Yadav, N. (2015). Fuzzy Rough Sets and Its Application in Data Mining Field. 2(3): 237–240.
Lebbe, A., Saabith, S., Sundararajan, E., and Bakar, A. A. (2014). Comparative Study on Different Classification Techniques for Breast Cancer Dataset. 3(10): 185–191.
Li, J., and Cercone, N. (2005a). A Rough Set Based Model to Rank the Importance of Association Rules.
Li, J., and Cercone, N. (2005b). Empirical Analysis on the Geriatric Care Data Set Using Rough Sets Theory.
Liang, A. H., Maguire, B., and Johnson, J. (2000). Rough Set Based WebCT Learning. 425–436.
Liu, Z. (2008). A New Heuristic Algorithm of Rules Generation Based on Rough Sets. 2008 International Seminar on Business and Information Management, (3), 291–294.
Mahajan, P. (2012). Rough Set Approach in Machine Learning : A Review. 56(10): 1–13.
Marzuki, Z., and Ahmad, F. (2007). Data Mining Discretization Methods and Performances. Machine Learning. (1): 978–980.
Mitra, A. (2012). Clustering Analysis in Social Network Using Rough Set and Soft Set. 2(2): 282–285.
Muda, A. K. Authorship Invarianceness for Writer Identification using Invariant Discretization and Modified Immune Classifier. Ph.D. Thesis. Universiti Teknologi Malaysis; 2009
Ngo, C. L., and Nguyen, H. S. (2005). A Method of Web Search Result Clustering Based on Rough Sets. The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI’05). 673–679.
Sulaiman, N. S. (2007). Generation of Rough Set Significant Reducts and Rules for Cardiac Dataset Classification.Universiti Teknologi Malaysis. Tesis Sarjana.
Ohrn, A, and Rowland, T. (n.d.). Rough Sets: a Knowledge Discovery Technique for Multifactorial Medical Outcomes. American Journal of Physical Medicine & Rehabilitation / Association of Academic Physiatrists. 79(1):100–108.
Olson, D. L. D., and Delen, D. (2008). Advanced data mining techniques.
Omar, M., Syed-Abdullah, S.-L., and Hussin, N. M. (2011). Developing a Team Performance Prediction Model: A Rough Sets Approach. Informatics Engineering and Information Science. 691–705.
Own, H. S., and Yahyaoui, H. (2014). Rough set based classification of real world Web services. Information Systems Frontiers. 1301–1311.
Patil, D., Patil, S., Balsetwar, P., Sabale, V., and Kulkarni, A. (2013). Excerption of User Profile from Web Log Data using Hadoop Framework. International Journal of Advanced Research in Computer Science and Software Engineering. 3(4):158-160
Pawlak, Z. (1982a). Rough Sets. International Journal of Computer and Information Sciences. 1–51.
Pawlak, Z. (1982b). Rough sets. International Journal of Computer & Information Sciences. 11(5): 341–356.
Phillips‚ Elizabeth, R.C., N. J., Goldsmith, M., and Creese, S. (2015). Applying Social Network Analysis to Security. International Conference on Cyber Security for Sustainable Society. 11–27. Retrieved from http://net-savvy.com/executive/intelligence/applying-social-network-analysis-to-social-me.html
Rahman, M. N. A., Lazim, Y. M., Mohamed, F., Saany, S. I. A., and Yusof, M. K. M. (2013). Rules Generation for Multimedia Data Classifying using Rough Sets Theory. 6(5): 209–218.
Reichle, M., Perner, P., & Althoff, K. (2006). Data Preparation of Web Log Files for Marketing. 131–145.
Shen, L., and Chen, S. (2013). Research of Customer Classification Based on Rough Set Using Rosetta Software. 837–843.
Shuib, N., Bakar, A., and Othman, Z. (2011). Performance Study on Data Discretization Techniques Using Nutrition Dataset. International Symposium on Computing, Communication, and Control. 304–308.
Srivastava, D., Batra, S., and Bhalothia, S. (2015). Efficient Rule Set Generation using K-Map and Rough Set Theory ( RST ). 2(3): 6–10.
Sulaiman, N. S., and Shamsuddin, S. M. (2011). Feature granularity for cardiac datasets using Rough Set. 2011 IEEE International Conference on Computer Science and Automation Engineering. 346–352.
Sulaiman, S., Shamsuddin, S. M., and Abraham, A. (2008). An Implementation of Rough Set in Optimizing Mobile Web Caching Performance (Invited Paper). Tenth International Conference on Computer Modeling and Simulation (uksim 2008), 655–660.
Sulaiman, S., Shamsuddin, S. M., and Abraham, A. (2009). Rough Neuro-PSO Web caching and XML prefetching for accessing Facebook from mobile environment. 2009 World Congress on Nature & Biologically Inspired Computing (NaBIC). 884–889.
Sulaiman, S., Shamsuddin, S. M., and Abraham, A. (2012a). Implementation of Social Network Analysis forWeb Cache Content Mining Visualization. London: Springer London.
Sulaiman, S., Shamsuddin, S. M., and Abraham, A. (2012b). Meaningless to Meaningful Web Log Data for Generation of Web Pre-caching Decision Rules Using Rough Set. 1: 2–4.
Szyma, A., and Przybyszewski, A. W. (2014). Rough Set Rules Help to Optimize Parameters of Deep Brain Stimulation in Parkinson ’ s Patients. 345–356.
Tri, I., Yanto, R., Herawan, T., and Deris, M. M. (2010). A Framework of Rough Clustering for Web. 265–277.
Xiao, C. (2015). Using Machine Learning for Exploratory Data Analysis and Predictive Models on Large Datasets. University Of Stavanger.
Yu, X., Su, L., and Gou, P. (2013). Study on Knowledge Discovery for Lifestyle Diseases Using Rough Set. 2013 6th International Conference on Intelligent Networks and Intelligent Systems, 13–16.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
___________________________________________________________
International Journal of Advances in Intelligent Informatics
ISSN 2442-6571 (print) | 2548-3161 (online)
Organized by UAD and ASCEE Computer Society
Published by Universitas Ahmad Dahlan
W: http://ijain.org
E: info@ijain.org (paper handling issues)
andri.pranolo.id@ieee.org (publication issues)
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0