Methods and Rule-Of-Thumbs in The Determination of Minimum Sample Size When Appling Structural Equation Modelling: A Review

Basic methods and techniques involved in the determination of minimum sample size at the use of Structural Equation Modeling (SEM) in a research project, is one of the crucial problems faced by researchers since there were some controversy among scholars regarding methods and rule-of-thumbs involved in the determination of minimum sample size when applying Structural Equation Modeling (SEM). Therefore, this paper attempts to make a review of the methods and rule-of-thumbs involved in the determination of sample size at the use of SEM in order to identify more suitable methods. The paper collected research articles related to the sample size determination for SEM and review the methods and rules-of-thumb employed by different scholars. The study found that a large number of methods and rule-of-thumbs have been employed by different scholars. The paper evaluated the surface mechanism and rules-of-thumb of more than twelve previous methods that contained their own advantages and limitations. Finally, the study identified two methods that are more suitable in methodologically and technically which have identified by non-robust scholars who deeply addressed all the aspects of the techniques in the determination of minimum sample size for SEM analysis and thus, the prepare recommends these two methods to rectify the issue of the determination of minimum sample size when using SEM in a research project. deeper on this nature of the problem and demonstrated more accurate and practical solutions to the problem. The first method is simpler and more attractive in its nature and the second method is much more complex than the first. The researcher who is not methodically rich can use these two methods more simply with small computer applications on both normal and non-normal data. This paper provides more than twelve sample size determination methods and their contributions to the problem, lessons learned and their advantages and disadvantages in the past literature as a more nontechnical review. It is one of the contributions of this review. Most of the researchers who do not have the methodological and technical knowledge, use SEM as their analysis method face this fundamental and critical problem and this review provides the avenue to come up to the solution which they can find out how they should determine the sample size for their analysis based on SEM.


Introduction
The Structural Equation Model (SEM) is one of the most extensively used quantitative multivariate data analysis technique which is currently employing to examine the relationship between observed and latent variables of the exploratory, and confirmatory hypothesis testing approaches as well as various types of predictive analysis models. This modeling technique is particularly suitable in the social sciences where mostly the key concepts are not openly observable and are inherently latent generally defined as latent variables (Kline, 1998;Kock & Lynn, 2012). The SEM is the prominent approach to analysing the path models with such latent variables to produce the final conclusions about the nature of combining the theories. SEM has its roots in path analysis, which was invented by the geneticist Sewall Wright in 1921 (as cited by Hox & Bechger, 1999). As mentioned by Westland (2010), this modeling technique has been developed in three different streams , such as equation regression methods, iterative maximum likelihood algorithms for path analysis , and iterative least-squares fit algorithms for path analysis. researchers suggested and implemented different procedures as a rule -of-thumb for deciding sample size for their SEM-based researches (MacCallum & Austin, 2000;Westland, 2010). Therefore, this paper aims to make a review of the basic concepts involved in the determination of the minimum sample size for SEM and identifies more suitable methods for the determination of the minimum sample size for SEM.
The paper organized as follows; first introduction with research problem and objectives, second briefly explained the theoretical background of the study. Used methodology has been mentioned thirdly, and results and discussion represent forth based on the review of the past literature. Finally, a summary and the conclusions are been included.

Theoretical Background
The researcher uses sampling because of the inability to study the population as required. Inadequate, or unnecessary sample sizes impact the quality and accuracy of research. Hence, it is very important to represent the characteristics of the population within the studied sample. One of the most important factors is the selection of the number of different cases from the population, which will represent all the population characteristics. Three criteria need to be specified to determine the appropriate sample size , such as the level of precision, the level of confidence, and the degree of variability in the attributes being measured (Miaoulis & Michener, 1976). There are three bases to choose a sample size, such as cost base, variance base, and statistical power base (Singh & Masuku, 2014). The statistical power base sample size determination is using a target for the power of a statistical test to be applied once the sample is collected where the quality of the resulting estimates and assessed based on the power of a hypothesis test are been used to judge the sample size (Singh & Masuku, 2014).
The difference between the calculated sample parameters and the actual population parameters denoted by the error. According to Muthén & Muthén, (2002), the precision depicts this nature, and the sample parameters should close to the population parameters with the narrow margin of errors , which means high precision. The power of statistics refers to the type II error or 1-β, which means the probability of rejecting the false null hypothesis (Cohen, 1988). Mostly, the power measured as 0.8 or 80% of the probability of rejecting the null hypothesis is used by social science researches (Cohen, 1988 The decision taken by the researcher directly affects the validation of the study, suitability of parametric or nonparametric methods to use, as well as the precision and power of the model's parameter estimates. The American Psychological Association, (2009) mentioned that "how this intended sample size was determined (e.g., analysis of power or precision). If interim analysis and stopping rules were used to modify the desired sample size, describe the methodology and results". Wilkinson (1999) mentioned that the researchers should provide the process of sampling and the size as well as should document the effect size and the analytic procedure of the power calculation.

Methodology
The minimum sample size determination problem is not clearly defined , and still, it is being defined by the researchers. Thus, this study uses the exploratory study based on secondary information, where this method can be used to understand the existing problem more preciously. The study started with the general idea and employed it as a medium to identify issues that can be the focus of future researches. The research fundamentally used the grounded theory approach, and it based on more than fifty articles published in journals relevant to the determination of minimum sample size for SEM analysis.

Review Minimum Sample Size Determination
Although the minimum sample size determination of SEM is more problematic various rules -of-thumb have been suggested in the SEM literature. Nunnally, (1967) mentioned that the sampling error in a weight is a function of all the variables used in the regression analysis. And he mentioned two facts. The former was a sampling error that is a function of sample size and their intercorrelation, as well as the latter, was systematic differences between the characteristics of the two samples. According to his idea even though the Regression weights may be robust across samples that have quite different means and variances, but this should not be taken for granted. Further, he noted that "as a rule of thumb, but not a magical number, you should have 10 subjects per predictor in order to even hope for a stable prediction equation" (Nunnally, 1967); Wolf et al., 2013). With this proposal, the debate of minimum sample size determination in the SEM has significantly evolved.
The method of minimum sample size 100 or 400 has been suggested by Boomsma (1982) and (1985). They further study suggested a ratio of indicators to latent variables as r = p ⁄ k. According to this rule, r = 4 requires sample size at least 100 and r = 2 requires 400 sample size. The number of free parameters in the model also considered determining the sample size (Raykov, 2006). According to this rule, the minimum sample size should be ten times the number of free parameters of the model. If the model has 20 free parameters then the number of observations should be 200. Bentler, (1990) suggested that a 5 : 1 ratio of a sample size to the number of free parameters. Further, Velicer & Fava, (1998) reviewed the recommendations of past literature in the minimum sample determination and concluded minimum sample size is not a function of indicators. According to them the goodness of fit and obtain the proper solution achieved by two things such as a greater number of indicators per latent variable and higher factor lodgings in the given sample size. Consequently, MacCallum et al., (1999) argued and demonstrated that model characteristics such as the level of commonality across the variables, sample size, and degree of factor determinacy may influence to the parameter estimates and model fit statistics and hence, it makes some doubts on the above sample size rules-of-thumb to particular SEM analysis.
If the model is complex the PLS-SEM works efficiently in a smaller sample size (Fornell & Bookstein, 1982). Goodhue, Lewis, & Thompson (2006, 2007 tried to examine the rule of ten subjects by using the Monte Carlo simulation and made the comparison with sample sizes 40, 90, 150, and 200 under the effect size such as 'large', 'medium', 'small' and 'no effect'. According to their conclusions they mentioned "for simple SEM models with normally distributed data and relatively reliable measures, none of the techniques have adequate power to detect small or medium effects at small sample size" (Goodhue et al., 2006). Tanaka, (1987) suggested the sample size of the SEM model should depend on the number of estimated parameters rather than the to tal number of indicators. However, Westland, (2010) claimed that since the present SEM models are typically estimated in their entirety, and number of unique entries in the covariance matrix is (p(p+1))/2 when p is the number of indicators and it should be accepted as sample size is proportio nal to (p(p+1))/2 rather than p. Further, he mentioned this minimum sample size determination problem is more complex than the above and it has been shown by the Monte Carlo simulation studies done in the 1980s and 1990s. , generate a number of samples (e.g., 1000) for each sample size point, calculate the percentages of samples in which significant effects (e.g., for which P < .05) were found for each sample size point (the power associated with each sample size), and estimate via interpolation the minimum sample size at which power reaches the desired threshold (i.e., .8)" and they further mentioned that though the Monte Carlo simulation method is a prominent method for determining the minimum sample size it is a difficult way and for which both technical and methodological expertise with good computer programming skills is required as well as it is time-consuming (Kock & Hadaya, 2018). Wolf et al., (2013) also concluded that "the final lesson learned is that determining sample size requirements for SEM necessitates careful, deliberate evaluation of the spe cific model at hand".
Hair et al., (2014) have discussed another alternative method instead of "10 times rule" for minimum sample size estimation and Kock & Hadaya, (2018) referred it as the "minimum R-squared method" since it uses minimum R2 in the model for estimating the minimum sample size. This method particularly has been built on Cohen, (1988) power table for least squares regression and three elements require for determining the sample size. The first element of the minimum R-squared method is the maximum number of arrows pointing at a latent variable in a model, used significance level is the second and third is the minimum R 2 in the model. Table 01 illustrates the reduced version of the table presented by Hair et al., (2014) and it depends on the significance level of 0.05, which is the most commonly used significance level and assumes that the power is set at 0.8. This method appears to be an improvement over the 10-times rule method, as it takes as an input at least one additional element beyond the network of links in the model.  2  110  52  33  26  3  124  59  38  30  4  137  65  42  33  5  147  70  45  36  6  157  75  48  39  7  166  80  51  41  8  174  84  54  44  9  181  88  67  46  10  189  91  59  48  Table 01 -Reduced version of the table presented by Hair et  Although the 10-times rule method is a simple application for the researchers, it has been depicted that inaccurate estimates (Goodhue et al., 2012). A smaller sample can be used with PLS-SEM when other methods are failed to make the analysis. However, the nature of the population directly affects the legitimacy of such analysis which depends on the heterogeneity of the population (Sarstedt, Ringle, & Hair, 2017). Hence, a badly designed sample will be given wrong analysis by the PLS-SEM (Sarstedt et al., 2017). As mentioned by Marcoulides & Chin, (2013) a power analysis that includes model structure expected effect sizes and the significance level should be applied to determine the necessary sample size.
Kock & Hadaya, (2018) suggest two related new methods for determining the minimum sample size in PLS-SEM applications which based on mathematical equations neither methods do not employ the disadvantages of the above mentioned Monte Carlo simulations or on elements that make up the 10 times rule or the minimum Rsquared methods. The first method is called the "Inverse Square Root Method", which uses the inverse square root of a sample's size for standard error estimation. The second is called "Gamma -Exponential Method" which has been implemented the gamma and exponential smoothing function corrections for calculating the standard error estimation employed in the first method.

Inverse Square Root Method:
While the researchers are analysing the samples from the population in PLS_SEM it generates the path coefficients called β. Each of these path coefficients may have a standard error called S. As mentioned by Kock, (2015) and Weakliem, (2016) if it has been plotted the distribution of the ratio of β/S, it indicates the critical T ratio for a specific significance level. Cohen, (1988), Goodhue et al., (2012) and (Kock, 2015) explained the power of the test and it depicts the probability that the ratio of | |/ lies in greater than the critical T ratio for a given specific significance level chosen. | | is the absolute value which denotes the strength of the path coefficient influencing the power. The significance level normally used as in the researches is 0.05 or (P < .05). Hence, the critical T ratio can be denoted as T.05. As well as generally in the researches, it is assumed that if the path coefficients are normally distributed, the power will be greater than 0.8. By using these properties However, the true standard error S has been calculated using S ̂ and according to Kock & Hadaya, (2018), it underestimates the corresponding true value at very small samples (i.e., 1 < ≤ 10), and overestimates it at greater sample sizes (i.e., > 10). Therefore, they have suggested the Gamma Exponential Method which has been introduced as a refinement of the inverse square root method which has explained by the following formula (2). | |min (Ne) _^ (((e|β|min)/√(N ̂ )))>2.486 (2) As with the gamma function correction equation, this equation can be solved with a computer program that starts with N ̂ = 1 and progressive increments its value to 2, 3, etc. until the smallest positive integer that satisfies the equation is obtained.

Summary and Conclusions
The sample size determination of the SEM analysis is one of the most fundamental and crucial problems. The above review of the selection of minimum sample size determination in the prior literature has provided more than twelve methods that have been employed by the past researchers. At the beginning of 10 subjects per predictor rule mentioned by Nunnally, (1967 Robert & Casella, 1999). A simple application such as "10 times the maximum number of inner or outer links pointing at any latent variable" used by the "10-times rule" (Hair et al., 2011) and it has been the method of more favourite in more researchers. Again the minimum R 2 method also has been employed by Hair et al., (2014) it also more popular and has later been criticized in the literature. Finally, Kock & Hadaya, (2018) introduced the inverse square root method, and the gamma -exponential method and they proved those two appliances are fairly accurate than ever which the experiments has based on three Monte Carlo experiments.
According to past literature greater number of methods and rules -of-thumb have been employed to solve the fundamental issue of the minimum sample size determination of the SEM analysis. However, each of these methods contained its own limitations while applying the different models and hence, it's have been criticized in the literature. Basically, the foundation used to address the problem was the issue in most methods when the sample size determination is more critical on several factors that make influences to the final goodness -of-fit of the SEM analysis. Kock & Hadaya, (2018) have addressed deeper on this nature of the problem and demonstrated more accurate and practical solutions to the problem. The first method is simpler and more attractive in its nature and the second method is much more complex than the first. The researcher who is not methodically rich can use these two methods more simply with small computer applications on both normal and non-normal data.
This paper provides more than twelve sample size determination methods and their contributions to the problem, lessons learned and their advantages and disadva ntages in the past literature as a more nontechnica l review. It is one of the contributions of this review. Most of the researchers who do not have the methodological and technical knowledge, use SEM as their analysis method face this fundamental and critical problem and this review provides the avenue to come up to the solution which they can find out how they should determine the sample size for their analysis based on SEM.