Enhanced personalized learning exercise question recommendation model based on knowledge tracing

Personalized exercise question recommendation is a crucial aspect of smart education used to customize educational exercises and questions to individual students' distinct abilities and learning progress. Integrating cognitive diagnosis with deep learning has shown promising results in personalized exercise recommendations. However, the black-box nature of the deep learning model hinders their interpretability. This makes it challenging for educators and students to understand the reasons behind the model's predictions for the next problem


Introduction
Student's academic performance is not only a standard for certification and evaluation, but it is also essential in preparing students for their future endeavors.[1] Personalized learning, used to customize learning resources and methods to individual learners' distinct abilities and learning progress, has emerged as a promising approach to complement online learning and achieve better development.Usually, researchers consider the issue from the educational psychology perspective and propose cognitive diagnosis models to discover students' knowledge proficiency [2].Cognitive diagnosis is widely applied in the field of education.However, most models are linear.They are with limitation to learn the complex interaction between students and exercise questions.At present, considerable progress in personalized learning has been achieved through methods based on cognitive psychology and related theories, however, personalized learning in online scenarios still faces the issue of information overload [3].In Personalized exercise question recommendation is a crucial aspect of smart education used to customize educational exercises and questions to individual students' distinct abilities and learning progress.Integrating cognitive diagnosis with deep learning has shown promising results in personalized exercise recommendations.However, the black-box nature of the deep learning model hinders their interpretability.This makes it challenging for educators and students to understand the reasons behind the model's predictions for the next problem, and this limits their opportunity to take an active role in improving the learning process.To address this limitation, this article presents a novel personalized exercise question recommendation model based on knowledge tracing.The approach incorporates graph convolutional neural networks to model the student's abilities, thus enhancing the interpretability of the model.By employing a Bidirectional gate recurrent unit (Bi-GRU), the model effectively traces fluctuations in students' abilities over time and predicts their responses to exercise questions.Experimental results demonstrate the effectiveness of this model, achieving an accuracy of 90.8% and 92.6% on ASSISTment 2009 and ASSISTment 2017 datasets, containing 4218 and 1709 student records, respectively.Moreover, the experiment was also conducted to validate the model's exercise difficulty setting.Results indicate an acceptable level of effectiveness in generating appropriate difficulty-level recommendations for individual students.The proposed model contributes to advancing personalized exercise recommendations by offering valuable insights that can lead to more efficient and effective student learning experiences.addition, when provided with numerous exercise resources, students find it difficult to select suitable exercises to their abilities.To address these challenges, the development of a personalized exercise recommendation model has drawn inspiration from e-commerce recommendation systems.The model leverages students' history of answering records to build individual ability models and predict future performance, ultimately recommending exercises at an appropriate ability level.
In response to the above problems, we will review relevant literature in the fields of cognitive diagnosis and knowledge tracing.The work [4] provides a HELP-DKT model embedded students ability by -matrix achieved better prediction.The work in [5] proposed a novel model incorporated three embedding: students, exercises and skills to reach a better prediction performance .It aims to reduce the impact of the subjective labeling by calibrating the skill relation matrix and the Q-matrix and updates the heterogeneous interactions between students, exercises, and skills by graph convolutional network.The work in [6] pointed out that existing research are still lack of certain performance due to their neglection of complete content-based exercising, fine-grained knowledge concepts, and cognitive labels for specific requirement.Meanwhile, cognitive diagnosis is frequently used as psychological measurement in educational psychology area.It is considered as an evaluation method of students' mastery level for various knowledge concepts.The DINA model (Deterministic Inputs Noise and gate model) proposed by Torre [7] is a classic cognitive diagnosis model in the field of education.This model uses the incidence Q matrix of knowledge points to enhance the interpretability of cognitive diagnosis results, Qi Bin [8], Shan Ruiting [9] and other researchers all use the characteristics of Q matrix to ensure the accuracy and interpretability of recommendation results in combination with collaborative filtering.In response to the problem of insufficient predictive ability of the DINA model, some researchers have proposed an improved model for DINA.Researchers such as Latore [10] and Tu Dongbo [11] proposed a multilevel diagnostic structure with higher precision based on the problem of DINA scoring only 0-1.Researchers such as Liu [12] and Li Youxi [13] proposed a fuzzy cognitive diagnostic method that considers the importance of knowledge points.Jiang Peichao [14] constructed students' knowledge state by quantifying their potential cognitive level towards learning materials, and proposed a cognitive diagnostic method that combines students' reading materials.He Xiangnan [15] and other researchers proposed a general framework of neural collaborative filtering (NCF), which introduces neural networks to learn the interaction information between users and goods, and its prediction effect is better than linear methods such as matrix decomposition.Wang [16] and other researchers proposed the NeuralCD method for education based on the NCF method.This framework combines neural networks to learn the complex interaction between students and test question vectors and uses the monotonicity hypothesis and Q matrix of educational psychology for reference to ensure the interpretability of students' ability model.DKT-LCIRT [17] emphasized on reflect intrinsic difference between students by kinds of capability vectoring, therefore the model can articulately present interpretability.
Knowledge tracing [18] was first proposed in 1994 and has become a hot research topic in the field of smart education in recent years with the development of RNN [19].The task of knowledge tracing can be described as: given a student's record of problem-solving, tracking the student's level of mastery of knowledge points and predicting the situation of the next test question.Chris Piech [20] and other researchers proposed deep knowledge tracing, applying RNN to the field of knowledge tracing, using a recurrent neural network model to process input student sequences, tracking the dynamic changes in students' abilities over time, and ultimately predicting students' problem-solving performance.The predictive ability of DKT significantly exceeds that of BKT, but due to the lack of introduction of relationships between knowledge points and exercise questions, the model has interpretable issues.Yeung [21] and other researchers proposed the Deep IRTM, which combines the item response theory with the dynamic key value memory network model and uses IRT to simulate the state transformation.This paper makes notable contributions to this field of study in two key areas: • By proposing a cognitive diagnostic model based on graph convolutional networks, which involves constructing a student ability model and modeling the interaction process of student exercises using nonlinearity.The utilization of the  matrix ensures the interpretability of the diagnostic results.
• By introducing a knowledge tracing model grounded in the personalized recommendation to accurately predict students' exercise performance and generate a list of recommended exercises based on their difficulty levels.Experimental verification demonstrates the effectiveness of setting the recommended difficulty level.

Method
To solve the challengeable issues mentioned above, we propose a personalized exercise recommendation model based on knowledge tracing (PERKT).The graph convolutional neural network algorithm is implemented to construct an aggregated student ability vector, which improves the performance of the model while using educational domain related limitations to train the model to ensure the interpretability of the student ability model.By utilizing of graph convolutional neural network to construct a student ability model with deep interaction features, ensuring the interpretability of the student ability model.Taking Gate Recurrent Unit (GRU) to track the dynamic changes of student ability models over time, and generate a recommended list of test questions based on predicted student performance and difficulty.Moreover, deep learning is used to trace the changes in student ability with order and time characteristics, Finally, generate a recommendation list based on the predicted student response and difficulty range of the exercise questions.

Architecture
PERKT architecture was proposed in Fig. 1.The process began with reading the annotated input data as students' log from online learning (the left box on top Fig. 1).The Q matrix was provided to represent the correlation between test questions and knowledge topics.

Cognitive Diagnosis Based on Graph Convolutional Neural network
GCN (Graph Collaborative Network) [22] has become the latest technology of collaborative filtering since it was first proposed by ICLR2017.There are two basic nodes in GCN: user and item.Based on their associations, a user item bipartite graph is constructed, and the representation of these nodes is learned by smoothing features in the graph [23].The execution graph convolution of GCN iteration, where the new representation of the target node is aggregated from the features of its neighbors.Cognitive diagnosis makes use of the diagnosis vector to model the interaction process between students and exercises [16].In this study, we consider four aspects, including knowledge point correlation vector   , student ability vector   , student vector   and exercise vector   .

• Knowledge point correlation vector 𝑄𝑄 e
This vector represents the relationship between exercise and knowledge concepts.It will combine with student ability vector as well as exercise differentiation degree as the input of neural network for prediction.It will improve the interpretability of cognitive diagnosis based on the student's level related to each knowledge point.We can get  e from the product of exercise one-hot coding and Knowledge related Q matrix.
• Student vector v s Students vector v s can be calculated through student one-hot coding multipled by a parameter matrix A.
• Exercise vector Exercise vector v e is aim for assisting the computation based on graph convolution neural network of the students vector.It has the same dimension of   , by vector polymerization.
• Student ability vector   Both student vector and exercise vector are aim for modeling student ability vector.Student ability vector is a quantitative representation of students' proficiency in knowledge points and concepts.In this paper, students' proficiency in each knowledge point concept is expressed in a continuous way, and this more refined way represents students' proficiency in knowledge points, which is conducive to improving the accuracy of grade prediction in the interaction process of modeling students' exercise questions, and also helps students to conduct self-assessment according to the cognitive diagnosis results.
Where, the dimension of   is the same as  e , and   ∈ [0,1] represents student's mastery of knowledge point D. This combination of students' ability vector and knowledge quantity correlation vector makes the model have good interpretation, and because of the aggregation of neighbor information in the process of graph convolution, the model has more accurate diagnosis results.

• Student ability modeling
The traditional cognitive diagnostic methods use directly interacting exercise questions with students to model their abilities, ignoring the common problem-solving characteristics of similar students.The collaborative filtering method extracts the commonness of similar students by calculating the similarity of students, but it ignores the influence of high concatenate of similar students' exercise answering [8].For the above issues, this article proposes a graph convolution algorithm for modeling students' abilities, which can capture the feature of exercising from higher concatenate of neighbor node and more accurately construct students' ability models.In the process of student answering exercise, multiple students who have had direct interaction with the same test question usually have similar exercise-solving characteristics, which can be used to discover commonalities among students.The indirect high concatenate between students and exercise questions is difficult to explore through direct interaction, and using high concatenate connectivity can more accurately explore the commonalities between students.This algorithm takes students and exercise questions as the basic nodes in the graph convolutional network for multi-layer convolution calculations, so that each student and question node in the graph convolution aggregates information from nodes with excessive interaction.This concatenate aggregation can be abstracted as: In this algorithm, students and exercises are taken as the basic nodes in the graph convolutional neural network for multi-layer convolution calculation.Make each student and exercise node in the convolution of the graph aggregate nodes of high-order interaction, which can be expressed as: Where,  is the aggregate function used to calculate the normalized sum of the directly interactive student exercises.In this algorithm, the student ability vector is represented by the fusion of the nodes of the convolution of each layer of graph, and the student ability vector with the aggregation of multilayer cooperative information can reflect the student ability more accurately.The student-exercise interaction is regarded as a dynamic evolution process, and the problems done by similar students can be regarded as the characteristics of common students.Through the graph convolution operation, the characteristics of higher-order neighbors doing problems can be encoded into the students' ability vector.This process can be viewed as a kind of collaborative filtering.Fig. 2 shows the calculation process of ability vector of student  2 .From top to the bottoming the first layer, by aggregating students who have done the same exercise questions, a representation of the test question node is formed in the second layer.This test question node contains the common problemsolving features of these students, and the nodes in each subsequent layer are calculated in the same way.The number of layers in graph convolution represents the information of students or exercise questions that a student can aggregate as far as possible.After multi-layer graph convolution operation, the student representations of each layer are fused, and the problem-solving features of high-order neighbor students in the graph can be encoded into the student's ability vector.Neighbors on each layer can be considered as similar students (or exercises), so aggregating higher-order neighbor features can be seen as a collaborative filtering.
Among them, the denominator term is the symmetric normalization term, which can avoid the size of student vector and exercise vector increasing with the convolution calculation of graph.l represents the number of layers of graph convolution, and l=0 represents the initial student vector and item vector.In order to avoid overfitting, the three-layer graph convolution network is used in this model to ensure the performance of the model [24].In addition, linear calculation method is adopted for each layer node to reduce the difficulty of training.After three layers of convolution, the student vector obtained from each layer is fused as the student ability vector: where,  is the parameter of each layer in the graph convolutional network:

Prediction based on knowledge tracing
The goal of KT is to estimate knowledge mastery of students based on their historical answering performance of related exercises.But the current deep learning model are with limitation in: (a) they focus on the details of the nodes rather than to high-level interactive information; (b) they struggle to effectively establish complex structures of the nodes; and (c) they represent either students or exercises only, without integrating them [25].In order to accurately predict students' performance, we propose a bidirectional GRU model to tracing students' historical ability levels.The model is shown in the figure below (Fig. 3).

Fig. 3. BiGRU model
For this model, the input is the interaction sequence   = { 1 ,  2 , … ,   } of students' exercise questions, where the student's ability level   obtained from the graph convolution cognitive diagnosis process and the corresponding answer   the test question form a historical test question interaction tuple   = (  ,   ) for students, representing their problem-solving situation with a level of   ,mastery of knowledge points.By learning the characteristics of students' historical ability level sequence through bidirectional GRU, students' problem-solving behavior can be effectively modeled, and the hidden layer h and the ability level of students  (+1) at time  + 1 can be fused, ultimately predicting students' answering situation at time  + 1.
In the process of modeling students' problem-solving behavior, specific student ability levels are used as inputs, with h representing the number of forward hidden units ℎ ⃖�  ∈ ℝ ×ℎ time step reverse hidden state and use ℎ �⃗  ∈ ℝ ×ℎ as forward hidden state.At the same time, two GRU neural networks in opposite directions are trained, and their hidden layers are connected to the same output, The information of these two directional states is simultaneously obtained by the output layer, and the formula for calculating the hidden vector can be defined as: Where in the formula  0 ,   ,  ℎ �    respectively the corresponding trainable weight matrices, x_trepresenting the ability level tuple at the current time, ℎ (−1) representing the output of the network at the previous time,  t representing the update gate,   representing the reset gate,[] and * respectively, represent the connection of the matrix and the multiplication of matrix elements,  (+1) representing the probability of students correctly answering the exercise questions at  + 1 time.
In the process of model optimization, it is necessary to calculate the difference between the real value and the predicted value and minimize the loss function.The parameters that need to be updated in this model include two parts: parameter matrices {A, B} in cognitive diagnosis and parameter matrices { 0 ,   ,  ℎ � ,   } Knowledge tracing prediction process.The model is constructed by using the binary cross entropy loss function, whose formula is: Among them, T represents the size of the input sequence,   representing the actual student performance at time step  and the   representing predicted performance.This model uses the Adam algorithm [26] to optimize the model, which can more effectively update the parameter values of the network compared to the cascade descent algorithm.

Recommendation of exercise questions based on difficulty range
The final step of this model is to provide personalized recommend exercise questions for different students based on the predicted student responses mentioned above.In the context of education, the recommended exercise questions for students should be those with difficulty levels that match their abilities, and this difficulty level should be within a specific range according to teaching needs.For this purpose, we adopted a recommendation method that can set a difficulty range.The difficulty of the exercise reflects the difficulty scale of the test question for students.The difficulty   of the test question is denoted as: exercise  for students s is set to the probability that students  can correctly answer the exercise , which is defined as: In the process of recommending exercise questions, in order to clarify the difficulty boundary of the recommended exercise questions, the difficulty range  1 and  2 are set,  1 <  2 .Among them, we can predict the student's answer situation y and recommend the set of exercise questions from [ 1 ,  2 ] within the set of exercise questions E, where the predicted numerical range can be defined as: Among them,  1 and  2 respectively, represent the upper and lower bounds on the probability of students correctly answering questions.After setting the difficulty range for the test question set, a personalized recommendation list of difficult questions can be generated for different students.For example, to give students a certain challenge, a difficulty range parameter of  1 =0.1 to  2 = 0.2 is designed, and then the model will select the exercise questions with a correct answer probability of 0.1 to 0.2 in the test question list for recommendation, that is, for  ∈ [0.1,0.2].

Experimental Setup
Datasets and preprocessing.This experiment was conducted using the ASSISTments2009 [27] and ASSITments2017 datasets.And we choose the previous proposed work [28] that exclude data of duplications.Table 1 summarizes basic statistics of the datasets.
The experiment was conducted using the skill builder sub data from the ASSISTments2009 dataset.This includes 4218 students, 346860 student test question interactions, 17726 exercise questions, and 123 knowledge points, with an average of 1.20 knowledge points per question.Among them, id represents the question number of the test question, 0 and 1 represent the student's answer status.While the ASSISTments2017 dataset included 1709 students, 3162 exercise questions, 102 knowledge points, and 942816 student question interactions, with an average of 1.94 knowledge points per question.In the implementation, we use the environment of the experiment is macOS BigSur operating system, the processor uses Apple M1, and the memory is 8G.The python language and PyTorch framework are used to build the model.We divide the data into two datasets for students' problemsolving situations.80% of the dataset is used for the training set and the rest for the testing set.The experiment adopts five-fold cross verification.
We reviewed several state-of-the -art modeling to compare including IRT, BKT, PMF and DKT: -The IRT [29] model models students' problem-solving process through logistic functions, and is a cognitive diagnostic model.
-The BKT [30] model is a classic model proposed in 1990s that assumes that each student's knowledge state is a set of binary variables, and utilizes hidden Markov models to track these variables separately.[31], [32] made some progress in this area recent years.
-The PMF [33] model is a factorization model that maps students and test questions to potential vectors.
-The DKT [17] model is a RNN or LSTM based neural networks to model the interactive process of student question answering for prediction.
We use accuracy, precision, and the ROC curve as performance indicators.Among them, ACC is the percentage of correct predictions among all results, and a higher ACC value indicates a strong expression ability of the model.The accuracy is the percentage of correctly predicted positive samples in the actual positive samples.The AUC value ranges from 0 to 1, and for a random guess, its AUC value is 0.5.A higher AUC value indicates a higher predictive performance of the model.

Difficulty range parameter experiment
This experiment was designed to verify the validity of the parameter setting for the recommendation results.In order to verify whether the recommended difficulty exercises fall within the difficulty range set by the model parameters.In this experiment, the correct response rate index SR was adopted to evaluate the true difficulty of students in answering the recommended exercises, which represented the correct response probability of students in the exercise set   with the set difficulty range parameters 1 and 2, and was defined as: Where, total represents the number of exercises in the recommended exercises set, correct represents the number of correct answers of students in the set, and SR indicates the real difficulty of the recommended exercises for students.When the difficulty range is set too low, the recommended exercise list generated by the model will bring some challenges to students.When the difficulty range is set too high, the recommended exercises are easier for students to answer, and the true correct rate of students should be high.
In order to verify this model, the DKT model is used as the comparison model.In this experiment, the difficulty range of exercises was divided into 11 intervals according to the interval of 0.1.For each difficulty parameter, [ − 0.1,  + 0.1] was taken as the difficulty range of exercises for experiment.The experimental results are shown in Fig. 4.

Fig. 4. SR for different difficulty ranges
As can be seen from the figure, the correct response rate SR of students on the recommended exercises continuously increases with the decrease of the difficulty range parameters, indicating that PERKT can effectively recommend exercises that meet the difficulty requirements for students.When set to 0.6, PERKT has an SR value close to 0.6 on both data sets.However, due to the lack of more detailed modeling for students' mastery of knowledge points, the degree of individualization of DKT is not enough, and its SR values are 0.63 and 0.66 respectively.Moreover, on the data set with more interactive data of students' exercises, the SR values of PERKT are more stable.The experimental results show that PERKT uses graph convolutional neural network to build a more accurate student ability model, which can recommend exercises of different difficulty to students personalized and has better interpretability.

Interpretability of predictions
In order to further verify the interpretability performance of the PERKT model, consistency experiments were conducted in this section.The degree of correlation between the predicted academic scores of consistency assessment and the knowledge points.This section compares the consistency of BKT, DKT and PERKT on the ASSISTments2009 and ASSISTments2017 datasets, and sets random results for reference.As shown in Fig. 5, since PERKT uses Q matrix and monotonicity assumption to update parameters when building student ability model with graph convolutional neural network, its DOA (Degree of Agreement) value is higher than other comparison models.The experiment shows that the prediction results of PERKT have good interpretability.
It is generally believed that if students x have a higher level of mastery of knowledge point concepts than students y, it indicates that students x have a higher probability of correctly answering exam questions containing knowledge points than students y in the field of education [34].The form of the consistency of knowledge points  (c) is: indicates student  mater more knowledge points than student .(, , )=1 represent student  did exercise  but  did not, otherwise is 0. Finally, taking the mean of all consistency as the evaluation indicator  ∈ [0,1],  = 0.5 indicates that the cognitive diagnostic results of the model are not correlated with the predicted results of student performance, and the higher the  value show the stronger correlation between the two.

Experimental Results and Discussion
Table 2 presents the comparison results between the PERKT model and other comparative models on relevant performance indicators.The experimental results show that the GCKT model outperforms other comparative models in terms of ACC, precision, and AUC on the ASSISTments2009 and ASSISTments2017 datasets.Comparing the experimental results, it can be found that the BKT model performs poorly in both datasets, indicating that the hidden Markov model used by the BKT model has limited ability to model knowledge points and cannot capture complex student problem-solving interactions.The IRT and PMF models did not utilize the time series characteristics of students' answering, resulting in poor performance.The DKT model utilizes recurrent neural networks to trace students' knowledge levels and has good predictive performance.However, due to the direct use of student coding as input, it does not learn concatenate features between students, exercise, and knowledge points, resulting in its predictive performance being inferior to PERKT.From the experimental results, our model performs better than other comparative models in predicting students' response performance.
To demonstrate the exercise recommendation results from different model，we also select two baseline model related to student-exercise collaborative filter topics to compare.Which are: • Student-based collaborative filter [35] (SB-CF): This model refers to the idea of collaborative filtering to find interesting content for specific users, recommend exercises based on students' similarity, build a similarity matrix between students according to students' doing records, and then identify the top 10 students whose answers are most similar to those of target students.Then select suitable difficulty questions from the answer records of each similar student for recommendation.
• Exercise-base collaborative filter [36] (EB-CF): The model refers to the idea of similarity between items to recommend, using the intrinsic quality or inherent attributes of the project to recommend, in the exercise recommendation according to the students' exercise answers to set the difficulty weight for each exercise, and then calculate the exercise similarity matrix and extract exercises similar to the exercises done, and then recommend according to the desired exercise difficulty weight.
The accuracy of exercise recommendation performance show in Table .3.

Conclusion
In this paper, we proposed an enhanced personalized learning exercise question recommendation model based on knowledge tracing.We combined  matrix to improve the accuracy of modeling students' abilities and interpretability for diagnosis.Also, we borrowed the success of Bi-GRU to learn the characteristics of students' historical ability level sequences related and model their answering behavior.Finally, LSTM is used to trace the dynamic changes in students' abilities over time and predict their performance.We also implement an experiment of DOA to validate the rationality of cognitive diagnosis, and design a difficulty range parameter experiment to verify the effectiveness of the recommended exercises.Experimental results on two datasets showed the achievements of our model.In the future work, we are interested in extending attention mechanisms to improve our model performance.The psychological characteristic is also worth exploring, such as a student may make mistakes in exercises due to nervousness during the answering process.

Fig. 5 .
Fig. 5. DOA of different models Among these indicators, the students  have higher proficiency in knowledge points than students .this is represented as a non-zero value, otherwise it is 0. Equation Z= ∑ ∑ (  ,   )  =1  =1

Table 1 .
Description of Datasets

Table 2 .
The prediction of students answering questions performance of different models from 5 trials

Table 3 .
The Accuracy of exercise recommendation performance