Do fuzzy-logic non-linear models provide a benefit for the modelling of algebraic competency?

Statistical models used in mathematics education are often linear and latent variables are often assumed to be normally distributed. The present paper argues that by relaxing these constraints one may use models that fit the data better than linear ones and provide more insight into the domain. It combines research on statistical methodology with research on the competence structure within algebra. The methodological innovation is that models with latent variables from the unit interval are considered which allows to model relations by means of fuzzy logic. Estimation techniques for such models are discussed to the extend necessary for the present study. To assess the benefit of this modelling technique data from an algebra test is re-analyzed. It is shown that non-linear models have greater explanatory power and give interesting didactical insights. Moreover, model comparison allows to differentiate between different theoretical constructs related to algebraic understanding. Finally, a research program is outlined that aims at the development of a universal algebra competence model that can be applied to test data from various algebra tests.


Introduction
It is potentially beneficial for teaching to have an evidence-based understanding of the different component of algebraic competency. There are many theoretical constructs that aim at structuring school algebra: Usiskin (1988) defined four conceptions of algebra, Kieran (2004) has distinguished generational activities, transformational activities, and global/meta-level activities. Even more specific are many classifications of what variables are and how they are used. Küchemann (1979) has describe six ways to treat letters in algebraic formulas, Arcavi (1994) has related the understanding of variables and expressions, Epp (2011) has investigated the usefulness of different understanding of variables and (Oldenburg, 2019) is a classification approach for variables. The same holds true for other algebraic concepts, e.g. Bardini et al. (2013) distinguish different uses, and thus different understandings, of the equal sign, Mason and Sutherland (2002) introduce key aspects of school algebra. This list could easily be prolonged much more but this should suffice to give support for the following observations: There are many conceptual approaches to structure algebraic competencies, but statistical models are usually restricted to special cases, i.e. tests are constructed to explore the constructs of a special conception and it is usually not possible to apply one model to data from other tests. For example, an item that was designed to measure a latent construct from one theory may load on two latent constructs or even their combination from another theory. The present paper is the first in a planned series that aims at improving the situation by enhancing the flexibility of modelling. The full program will be outlined at the end of this paper.
Many statistical models used in algebra education research come from the class of structural equation models (SEM, see Hoyle 2012). Traditionally, these models haven been linear and non-linear versions gain only slowly support and are often restricted to very special model classes (Dijkstra & Schermelleh-Engel, 2014; Kelava et al., 2011;Umbach et al., 2017). The key innovation of the present paper is to use a more flexible estimation technique that allows to fit models even to data sets that provide no perfect measurement models for the constructs. The paper first describes this method, then the context from algebra education and finally applies the method to a test data set. The purpose is twofold: First, the algebraic test data are a benchmark to judge the usefulness of the estimation method on real world data. Second, the outcomes from fitting the model allow some conclusions that are of interest for algebra education. To this end, data from an algebra test is re-evaluated.
Hence, the research questions are the following: • RQ1: Does a non-linear model fit the data from the algebra test better than a linear model?
• RQ2: Does the non-linear model give interesting and plausible insights for the understanding of algebraic thinking? • RQ3: Can non-linear cross-loading rich models be used to analyze tests that don't fit the model well?

Materials and Methods
This section describes the new method that will be applied to algebra test data in later sections. The motivation is to have the greatest possible flexibility in modelling. Especially, linear structural models or their non-linear extensions are not capable of describing the models that will be applied here.
This section first describes the general model class (further details are given in Oldenburg (2020), and in Oldenburg (2021)) and comments a bit on model estimation and fit evaluation.
While most latent variable models use techniques to eliminate the values of latent variables for individual cases, the approach taken here is a weighted least square method that estimates values of parameters such as path loadings together with values of latent variables for each case. This approach is computationally demanding but feasible even for relatively large data sets.
The model consists of a set of equations = 0, = 1 … that are fitted to a data matrix , ∈ ℝ, = 1. . , = A weighted least square approach is then used to minimize the objective function involving weights > 0, ∑ = 1: This minimization problem may be restricted by constraints on parameters and/or latent variables. The minimizer of this function gives estimates both for the latent variables as well as the model parameters.
There are several strategies to choose the weights in the above objective function. If one knew the error variances ( ) 2 , it would be sensible to set : = 1/ ( ) 2 , because this allows together with some more assumptions to prove that the estimate is the maximum likelihood estimation (see Oldenburg, 2021).
The following strategies have been investigated elsewhere (Oldenburg, 2021): 1 : This is unweighted least squares, i.e. = 1 are chosen to be the same for all equations.
2 : This is a two-step-strategy: After a first estimation round with strategy 1 one has values of , and they give estimates of the error variances that can be used in a second round.
: This is a self-consistency strategy: The weights with constraints > 0, ∑ =1 = 1 and a proportionality factor > 0 are included into the parameters to be estimated from minimizing Here is some large penalty number.
: Again, this is a self-consistent strategy but with a different algorithm. The weights are not part of the optimization process but are viewed as parameters. Starting from the uniform weight vector (0) = ( 1 , … . , 1 ) the target function is minimized and a-posteriori error variance estimates ( (0) ) 2 are obtained. It the weights are optimal, then the sequence (0) ⋅ ( (0) ) 2 , = 1. . , will have minimal variance. Hence, for the next iterative step, (1) is obtained from (0) by changing the weights in the direction indicated by the deviation of (0) ⋅ ( (0) ) 2 from its mean value. This gives a new weight vector (1) and in another minimization step new estimates are calculated. The process then iterates until no reduction of variance can be found.
For fit comparison, the mean error residual per case and per equation ≔ ⋅ can be used. A second, similar measure is the data fit measure which is calculated in the following way: After having calculated all estimates for parameters and latent variables these are plugged into the model equations and from this the data are prognosed as if they were unknown. Then, the Euclidean distance between the data vector and the prognosed data vector is calculated and averaged. The resulting number should be as close as possible to 0. Now, this general estimation framework will be specialized to the situation needed in this paper. The latent variables in this model are certain algebraic competences , = 1. . , e.g. 1 be the ability to perform substitutions and 2 the ability to transform algebraic expressions correctly. The values of all latent variables are constraint to the unit interval, i.e. ∈ [0,1] for each case. This allows to write down model equations e.g. of the following kind: The parameters , ∈ ℝ have the following interpretations: are the offsets, i.e. the expected score on item without having any competence described by the following variables and averaging over all latent variables not in the equation. Hence, 1 ≠ 2 can occur, but experience shows that in most cases they differ only slightly. 1 is the path coefficient that measures the influence of competence 1 on -this is the same interpretation as in linear SEM. Given that the latent variable values are from the unit interval their product 1 ⋅ 2 can be interpreted in the sense of fuzzy logic (Zahdeh, 1965) as a measure for having both competences and 2 is the path weight of the influence of this combination. Similarly, 2 2 can be interpreted as a measure for having a particular strong competence 2 and 4 is its corresponding path weight. In the models estimated below all path coefficients are restricted to be non-negative, i.e. ≥ 0, ≥ 1. This reflects the idea that having a competence should not have negative effects. A further pragmatic reason is that these constraints eliminate certain special cases of under-determined systems. A similar argument could be made for offsets too, but on the other hand, allowing negative offsets gives the freedom for situations in which a little bit of a competence is actually not enough to predict any chance for solving the item but in which some higher value is needed to make the estimate larger than 0.
For better interpretation of path coefficients, one may normalize them by rescaling with the standard deviations of the involved variables. In the above example e.g., one has standardized coefficients ̂3 ≔ Standardized variables ̂= ( ) will also be used, so that = ⋅ is equivalent to ̂=̂⋅.

Empirical input: Test and data
A somewhat improved version of the test described in (Oldenburg, 2009) was used with 171 10th grade students from a German school. This is a full cross section of this school but maybe not representative for Germany as a whole, although the school is quite typical for Germany. The purpose of the data analysis is not to make claims or conjectures about the students but to test the method described above and to derive some insights into algebraic thinking. The test was administered by the teachers and written for diagnostic purposes. The anonymized data that are used for this analysis do not contain any personal data, so that the sample cannot be characterized by means of the data. However, one can assume equal sex distribution and average age of about 16-17 years. By the time of the test, education on elementary algebra is completed and in the next school year they'll start with (informal) calculus. The test was originally constructed with the algebra model of Oldenburg (2009) in mind. It contains 21 tasks, some of which have sub-items so that there is a total of 45 items. Most of them are encoded as incorrect/correct (0, 1) by the teachers, while some are encoded using partial credits which have been rescaled to the unit interval so that all observed variables are from [0,1]. There are several items shared with the test designed by Küchemann (1979). There are no multiple-choice items, all require a free-form answer but in most cases the correct answer (e.g., an expression or equation) is unambiguously defined.

The model dimensions
The model that will be investigated here has six latent dimensions of algebraic competencies. This model has also been fit successfully to other algebra test data as will be reported elsewhere. The dimensions as well as example items that load directly on the latent variables (i.e. there es an equation of the type observed = 0 + 1 ⋅ latent + ) are given in table 1.  For each of the items in the test I judged which of these six latent variables may explain individually or in combination success on the item. From this a model equation was generated. The set of these model equations is the non-linear model. By omitting all non-linear expressions from the model equations, a linear version of it was derived. Both models were fit using all strategies described above.
Note that the original test was not constructed for this model, so many items are not made to measure exactly one of these dimensions. Hence, a lot of possible cross-loadings will be included in the model equations.

Evaluating the estimation method
To assess model fit, the remaining error variance per case as defined above was calculated. Moreover, the data fit measure was determined. All results are given in table 2. Results indicate that by all measures and strategies the non-linear model fits the data better than the linear model. Note that for a completely uninformative model, = = 0.25. Hence, both models explain a large part of the data variance and the additional reduction of residual errors by more than 7% for the non-linear models can be considered substantial. Also note that 2 gives very good values of for the non-linear model but this is likely to be a result of overfitting.
In principle one can try to fit the linear model with standard SEM software like lavan. However, with the many cross-loadings that result from the fact that the test was not designed exactly for these constructs, none of the usual SEM estimation methods converge. This shows that the new method is capable of delivering sensible results where traditional methods are not applicable.
Summarizing these observations, RQ1 can be answered affirmatively.

Insights for algebra education
For the detailed analysis I'll restrict to results from two strategies, 1 , . The reason to include uniform estimation in this subset is due to the fact that it reflects in a sense best what the modeler wrote down. The other methods adjust weights by estimated error variances and this may not be necessary here, because manifest and latent variables are from [0,1] so that error variances can also be expected to be in a similar range. Another reason for including the uniform strategy 1 is that is shows the best data fit. Among the weighadjusting methods I choose because it also produces good data ft, and under some distributional assumptions can be expected to converge to the optimal value (although this conjecture is not proved but plausible from simulations).
From the estimates for the six latent variables correlations can be calculated. Results are given in table 3. The results in table 3 indicate that there is a substantial difference between linear and non-linear models but also that the estimation methods differ to some extent.
First, one can observe that with the linear model the latent constructs and cannot be separated, however, for the non-linear model they are clearly distinct (although the estimate of the correlation coefficient differs between both methods).

Now let's look at some items in detail:
Item A13: It is known that = 6 is a solution of ( + 1) 3 + = 349. Use this to find a solution of (5 + 1) 3 + 5 = 349.
For this item the following equations were set up for the non-linear model: Again, this is a sensible approach that is possible and interesting. However, it has the disadvantage that comparison with the linear model is not so easy. Of course, one can restrict to the subset of linear equations as well, but this will imply that the linear and the non-linear model differ in the number of equations, and this makes model comparison tricky. Hence, the two equations given above were used. An approach with a third equation with entering linearly brought bad model fit for this additional equation and was hence discarded.
This completes the justification why the above given equations are used. The results of the estimation process are given in table 4. Now for the interpretation of these results: For the linear model the normalized equations indicate that both competencies, and have explanatory power. However, in the linear model seems to be more important, as its normalized path coefficients is larger than the one for . Both estimation methods give somewhat different estimates, but this conclusion is independent of the method.
The results for the non-linear model emphasize the importance of as well, but in addition show that combinations of having two competencies give much stronger path coefficients than single competencies. Especially ⋅ , i.e., being good both at substitutions and relational thinking explains success best. Again, this conclusion is independent of the estimation method. For and for 1 have some additional explanation power but the path weights are small, and in fact much smaller than that of ⋅ . Hence, one may conclude that substitution in combination with other abilities is an important part of algebraic competency. Now that the formation and interpretation of model equations are explained in detail for item 13, some of the other items are discussed more briefly.
Item 10: Assume , are positive real numbers. Assume that 1 + 1 = 2 always holds. What can be said about if decreases?
Proposed model equations are: 10 = 10 + 10 ⋅ + 10 ⋅ ⋅ + 10 10 = 10 + 10 ⋅ + 10 ⋅ ⋅ + 10 The findings from fitting the model are contained in table 5. The interpretation is that in the linear model, is the most explaining latent variable, but contributes as well. For the first equation, the possible explanation ⋅ is not useful, but for the second equation ⋅ turns out to be the strongest predictor. The large offsets together with the large error variances indicate, however, that there may be some additional explanatory variable needed. Results are given in table 6. Interpretation is straightforward: The linear model clearly shows that the competencies and are important but predict most strongly when in addition a strong competency in quantification is available. As a last example, let's look at an item where a quadratic term in one competency gives a good estimation: Item 17c: Given the function ( ) = 3 − 2 calculate ( + 1).
The main difficulty in this item is to do the substitution correctly. Hence, the proposed model is: The linear model estimated by gives the normalized estimate A17c=-0.08+0.53 Subs (error variance 0.17) while the non-linear estimate is A17c=-0.43+0.92 Subs2 (error variance 0.04). This shows that getting this substitution done right requires a higher level of than most other tasks: Only if you are really good in substitution you can tackle this item (which confused many students because of the double role of x). Also note the substantial drop of error variance and that the linear term got almost zero loading in the nonlinear model.
Of course, much more insights can be drawn from the fitted model but for the purpose of this publication four items should be sufficient to underpin the positive answer to research questions 2 and 3.

Conclusions and Outlook
The results given above allow to answer all three research questions affirmatively. The new estimation method allowed to estimate linear models with many cross-loadings that cannot be estimated by traditional SEM techniques. Moreover, the nonlinear model fits the data much better than the linear one and supports the hypothesis, that for many items the best prediction of success is given by the fuzzy logic conjunction of having two competencies or by having one competency to a very high degree (to a higher power). Especially, it is interesting to note that substitution is an important algebraic competency and that it is especially important when combined with abilities in other areas of algebra such as quantification, calculation and relational thinking.
However, it must be stated clearly that the results given above must be read with caution. First, the method applied is new and this is the first application of it to real world data. It may be that the good performance seen in simulation studies may not carry over to such data. However, the fact that results are quite reasonable indicate that this worry may be unnecessary. Thus, several further methodological research goals are steered up: • Deeper theoretical underpinning of the method • Develop a Bayesian approach to estimating unit interval constrained latent variable models as a second method to increase trust in results • Application of the method to more data sets Regarding the results about algebraic competency, one additionally should consider that the data set was only of moderate size and that the test was not developed for the model tested. Thus, the following research goals arise: • Check other models. Especially, find if some items are best explained by products of three or even more latent variables • Check if the found importance of substation can also be supported by other data sets • Test other models proposed in the literature that can be translated into such a nonlinear structure

Perspectives: A research program
If the critical questions raised in the preceding section can be answered satisfactorily the following research possibilities opens up. As cross-loadings are less problematic compared to traditional SEM and logical conjunction can be modelled it is relatives easy to adapt the measurement model of the above six-dimensional model to the items from other existing algebra tests. The central idea of the research program is therefore to find a model of algebraic competency (maybe a modification of the above model) that can be fitted to the data from as many algebra tests as possible. The idea behind this is, of course, that the structure of algebraic competency should be a construct that is the same for many people educated in a similar algebraic culture. So while we currently often have "one study: one model : one test" the idea is to have one general model, i.e.
"several studies and tests: one model". Of course, not all data sets will be useful in the end. For example, the PISA data set is too sparse to be estimated by my method (although full data is not required but stable numerical results require a certain density). By following this research program it is hoped that a general model can be constructed that is supported by a great variety of data, and, vice versa, explains a large variety of data by some simple principles.

Data Availability
Data, test items (in German) and Mathematica programs to perform the analysis are available at https://myweb.rz.uni-augsburg.de/~oldenbre/AlgebraTestData.zip .

Conflicts of Interest
I declare that there no conflict of interests.

Funding Statement
The work was done without external fundings as part of employment at the University of Augsburg.