Factorial Economic Planning Applied to Agricultural Experimentation

Innovation may be limited by the scarceness of resources, such as financial, homogeneous area, skilled labor or other research needs, for example the difficulty in experimental control of large areas in the field. In research areas such as chemistry and physics, designs are used in such a way that when compared to the agronomic designs, they result in a reduced number of experimental units, which in this work are called economic designs. Thus, the objective of this study was to identify significant factors and effects (p-value


I. INTRODUCTION
The Response Surface consists of a set of mathematical and statistical methods used in the study of the relation between factors and responses [1,2,3], that may be the result of a first-order polynomial or more complex interactions with polynomials of a higher order [4], this relation may be stated by equation (1): ŷ = f(x1,x2)+ϵ (1) where ŷ is the answer, because of the variables x1 and x2, added to the experimental error ϵ [5,6]. The designs of the Rotational Central Composite Design -RCCD and 3 k are second order designs, that is, the regression equation that describes the behavior of the variables also presents the quadratic terms and has the characteristic of repeatability in the central point, are flexible and require smaller numbers of experimental units [7,8] compared to the full factorial applied in the investigation the effect of many treatments. For example, a full factorial with two factors, each factor with five levels, contemplates twenty-five treatments repeated four times, which results in one hundred experimental units. However in 3 k , in the same analysis we would have nine treatments repeated four times. When observing the Rotational Central Composite Design RCCD, the addition of the vertices (± 1.41) that form the star increases the coding matrix for eleven treatments repeated four times [9]. Both the 3 k and RCCD are members of DOE (Design of Experiments) that includes an important framework of designs that support scientific findings [11]. The major difficulty is to adapt the economic factorials (3 k and RCCD) for agronomic experimentation, observing the basic principles of agricultural experimentation [10,12], which is characterized as a different scenario when compared with controlled environment research, presenting a low coefficient of variation (less than 3%) [13], observed the sensitivity of the statistical tests in the higher variation coefficients (higher 6%) [14], influenced by the uncontrolled factors for the field experiments. Thus, this study aimed to identify significant factors and effects (p-value<0.05) through the application of economic factorial planning and response surface for field experiments. The studies show that for both, linear and nonlinear models, it was possible to identify the same significant effects reducing experimental units.

II. EXPERIMENTAL PROCEDURES
The research was divided into two studies, the first aims to compare the regression models generated by 3 k and the control (full factorial), taking into account the maintenance of the same number of experimental units and repetitions. In the second, the regression models generated by 3 k and RCCD were compared with the control, maintaining the number of repetitions, but reducing the number of experimental units. The data from study I are an integral part of Zimmermann's [15] research and consist of a factorial 3x3, whose factors were factor 1, soil density (1.12, 1.26 and 1.39 g.cm -3 ) and factor 2, doses of a nutritional compound called "FTE-BR12 microelement" (0, 1, 2g) for the rice dry mass response variable in logarithm RBD (Randomized Blocks Design) was used to allocate the field experiment that resulted in nine treatments, repeated three times.
The study II consisted of an experiment organized in a full factorial (5x5), in RBD with four repetitions of treatments, being the factor 1, the days after emergence (dae) of corn, factor 2, the days after the application (daa) of corn defensive agent and the variable response to leaf width of the corncob in millimeters [16]. The data were coded per Montgomery methodology [4], presented in Table 1, resulting in the different number of treatments in the full factorial, 3 K and RCCD. Table.1: Encoding matrix, with treatment numbers for: 5x5 full factorial (two factors with five levels each), 3 2 (two factors with three levels each) and Rotational Central Composite Design RCCD (two factors with three levels + vertex), without repetition.
(2) where ŷ is the estimated response, β0 is intercept, b is the block, β1 is the linear coefficient of factor 1, xi is the variable 1 in level i, β2 is the linear coefficient of factor 2 xj is the variable 2 in level j, β11 is the quadratic coefficient of factor 1, β22 Is the quadratic coefficient of factor 2, β12xixj is the interaction between factors 1 and 2, ϵ is the experimental error, with the factors considered qualitative, were observed the assumptions of variance homogeneity (Bartlett and Levene tests) normal distribution of waste (Shapiro-Wilk and Kolmosgorov-Simirnov test), block additivity (Tukey test) and confirmed by visual inspection of residues.
The regression models were adjusted (quantitative variables) after the validation of the F test, starting from the most complete model and subtracting terms, tested the significance (p-valor≤ 0,05) of the coefficients (t-test).

International Journal of Advanced Engineering Research and Science (IJAERS)
[ In the selection of the most adequate model for describing the response, it was observed: 1. The absence of non-significant coefficients (p-value> 0.05) in all the regression models tested; 2. Analysis of variance (ANOVA) for the hypotheses: H0: the models are the same; the regression models did not differ significantly (p-value≤0.05); H1: the regression models differ significantly; 3. Comparison of the R 2 adjusted, the regression model being the best fit, which presented the highest R 2 adjusted;

Comparison of the Akaike Information Criterion
(AIC), the best fit regression model, which presented the lowest AIC [17]. 5. In the absence of differences between the models, the "law of parsimony" was observed. The AIC was chosen as an adjustment test because it penalizes the lack of adjustment and complex models, being in agreement with the law of parsimony [18,19]. Comparison between the coefficients of determination (R 2 ) were not accomplished, since this is influenced by the number of terms of the regression model, according to Adair and Silva [20], the withdrawal of terms increases the sum of the squares and, consequently, there is an increase in R 2 . In the development of 3 k , according to Montgomery's methodology [4], the data were coded (-1, 0 and +1) and for RCCD, according to the same author's methodology, the vertices were added (± 1,41). We observed the assumptions for both, tested the need to work with pure error (ϵp) (ANOVA) and repeated the process of selecting the best-fit model, as described in the full factorial. The pure error (ϵp) does not have correlation with the model [21,22], therefore, it does not depend on the estimative responses (ŷ), reflecting only the dispersion at each factor level of the repeated responses (y) around the mean (͞ y), calculated by ∑(y-͞ y) 2 , summation of the squared differences between the original response (y) and its mean (͞ y), which results in the estimation of the variance for the model, influencing the values of the F test, whether the model is adjusted or not [23]. Since methodologies are from different areas, each factorial presents different particularities. The R software was the program chosen for this study because it presents an immense range of packages [24]. For didactic purposes, the functions and packages used in this study are detailed in the mentioned sources in Table 2. not evidence to reject the hypothesis H0, satisfying the assumptions of homogeneity of variance and normal distribution of residues. For the full factorial, the effect of the blocks was not significant. In 3 k it was observed that it is not necessary to work with pure error. All assumptions were confirmed by visual inspections. For the two designs tested, the linear regression effect was significant (p-value ≤ 0.05), in agreement with ANOVA for the regression coefficients (t test), in which only the coefficient of linear density soil factor with significant effect on the response variable, however, to exclude the other terms from the model, there is a need for adjustment tests in order to eliminate the possibility of a collinearity effect, since a variable may not have a significant effect in isolation, but may influence the total effect of the model and its exclusion would make the model less explanatory or less adjusted [20].
In ANOVA between the models it was observed that there is no evidence against H0, therefore, the models do not differ significantly (Table 3)  Where: xi variables of factor 1 at level i; xj variables of factor 2 at level j; DF are the degrees of freedom; p-value is the is the probability of Fcalculated, significant (p≤0.05); R 2 is the coefficient of determination and R 2 adjusted is the is the coefficient of determination adjusted both for regression; AIC is Akaike's Information Criterion; Source: Self-elaboration.
Confirming the model selection (Table 3), the highest R 2 adjusted and lowest AIC belong to Models 5. The 3 k identified the same factor and significant effects when compared to the control (full factorial), however with the results estimated by 3 k model were 2.91% higher. The comparison between the models revealed homogeneous variances (0.01654) and the difference between the residues quantified by R 2 = 0.9623. This result is agreement with Konishi and Kitagawa [25], models with small variability fit well the reduction of experimental units. But to validate an experiment subject to variations of the environment a greater number of experimental units and repetitions is recommended, minimizing the experimental error [15,22,23], and, the Surface Response Methodology is an efficient tool to optimize the properties of processed foods [26]. Using mathematical and statistical techniques, experimental results indicate a combination of factor levels within an optimal region [4]. Study II observed regression models with more complex interactions. In the analysis of variance (ANOVA), performed for the control (full factorial), composed of one hundred experimental units, the interaction between the (qualitative) factors was significant (p-value≤0.05), therefore, interaction was observed in the Bartlett and Levene test, both with p-value>0.05, given the assumption of homogeneity of variance. Shapiro-Wilk and Kolmogorov-Simirnov indicated the normal distribution of waste and the Tukey test confirmed that the blocks were not additive to the model. In the development of 3 k , thirty-six experimental units were used. The assumptions were met, but also it was identified the need to work with pure error. In RCCD, composed of forty-four experimental units, the assumptions were not met and the model also indicated the need to work with pure error. In all the designs the effect of quadratic regression was significant (p-value≤0.05). All regression models tested (Table 4) showed significant coefficients (t-test, with normal error for full factorial and t-test with pure error for 3 k and RCCD). According to Faraway [27], the identification of the need for pure error in the t-test, refers to the option with the lowest variance and, therefore, increases the accuracy of the test. The ANOVA (analysis of variance) between the analyzed regression models revealed that they differed significantly (p-value≤0.05) ( Table 4).   In order to identify the models that best describe the experiment, we observed the highest R 2 adjusted and lowest AIC found in Models 1 ( Table 4), composed of intercept, linear terms, quadratics and interaction for full factorial, 3 k and RCCD. All effects are significant in the model but the nonlinear model is not well adjusted [18]. It is noteworthy that other models were also tested, but presented lower adjustments and for didactic purposes were not included in Tables 2 and 3. The representation of the three selected models (Figure 1) revealed that despite the reduction of 64 experimental units, the regression model for the 3 k design, estimated from thirty-six experimental units, was able to identify the same factors and significant effects that the control (full factorial-one hundred experimental units). Robust models are more reliable for the researcher and not affected by the loss of information [28]. However, RCCD (forty-four experimental units), although also did so, showed greater distance from the control. According to Mateus et al., [13], RCCD did not adjust well to agronomic data simulation experiments with coefficient of variation (CV) greater than 6%. The CV of this experiment was 1.8% and the precision of the RCCD was lower than the 3 k . There are indications that the estimation of the vertex for RCCD by the regression model of the full factorial may have interfered in the precision [29]. For Mendonça [14], who used data simulation, the loss of fit of the economic models can be compensated with the increase of repetitions of the treatments, in which the four replicates (full factorial, 3 k and RCCD) were maintained in this experiment for purposes of comparison, with the objective of making achievable large experiments with reduction of experimental units. And [29] all this is source of variability within the experiment and to circumvent these problems in the planning and conduction phase of the experiments is fundamental so that the experimental error is not high. Furthermore, the knowledge of statistical tests and the assumptions for their application is fundamental for the research to be statistically valid. Therefore, economic models are efficient and indicated to identify the significant result in initial tests or probing tests, especially in experiments without prior knowledge [14] and besides the basic requirements of the tests and observation of agronomic assumptions, the analysis of the particularities of the experiment should be observed [10]. Source: Self-elaboration IV. CONCLUSION The 3 k design was presented as an economic factorial capable of identifying the same factors and significant effects in the agronomic experiments, reducing the experimental units and contributing to the technical and economic viability of larger experiments, as long as the number of repetitions of the treatments was maintained. Data simulation was not used to experience actual practices.