Multivariate Analysis Applied to Forestry Agricultural Sciences: The Model-Directed Study

This is a literature review that aimed to find articles that exemplify and describe the use of multivariate analysis in different fields of Forest Agricultural Sciences, considering effective practices using multivariate statistical techniques for the simultaneous processing of data. For data collection were selected for the meta-analysis of 70 technical articles of which 54 were employed in the study directed to the use of multivariate techniques applied in the areas of agricultural sciences. The results showed thatstudies directed to certain areas within the Forest Agricultural Sciences exhibit some regularity in the use of multivariate analysis, and most application analyzes were more usual as the Cluster Analysis (AA) and Principal Component Analysis (PCA). Thus the use of multivariate analysis studies and evaluations of experiments in Agricultural Sciences proved to great value to allow greater clarity and better interpretation of dealing with complex phenomena.


INTRODUCTION
Statistically data analysis is classified into univariate or multivariate, i.e., it variables alone or jointly respectively. According VICINI, 2005 until the advent of computers the data were treated only in isolation, and when a phenomenon depends on many variables such analysis became unfeasible.
Multivariate analysis corresponds to a large number of methods and techniques that utilize, simultaneously, all variables in the theoretical interpretation of the set of obtained data (Neto, 2004).According to Hair et al., (2009) multivariate techniques are popular because they allow organizations to create knowledge, thereby improving their decisionmaking. Multivariate analysis refers to all the statistical techniques that simultaneously analyze multiple measurements on individuals or objects under investigation.
For Gerhardt, et al., 2001 multivariate analysis comes to data through a set of statistical techniques considering measures many variables simultaneously.
And to obtain such results some multivariate methods are applied to data depending on the research objectives,since it is known that an exploratory data analysis, aims to generate hypotheses that is exactly the goal of the multivariate analysis (VICINI, 2005).
Multivariate analysis is a vast field in which even experienced statistical move carefully, because this is a new area of science, much is yet to be discovered. The art of the use of multivariate analysis is the choice of the most appropriate options to detect the standards expected in the data (MAGNUSSON, 2003).
The purpose of their application may be to reduce data or structural simplification, sort and group, to investigate the dependency between variables, prediction and develop hypotheses and test them (JOHNSON; WICHERN, 1992).
Multivariate techniques can meet the specific interests of a forestry company or a research institution, aiming at a particular interest, apart from a property or set of properties. Thus, this study aims to quantify and clarify what and how the main tools of multivariate analysis applied in various areas of study of forest agricultural sciences are used, reviewing a number of literature

II.
REVIEW The application of multivariate analysis is a combination of multiple information entered in the experimental unit, so that the selection is based on a complete set of important variables that discriminate between materials that are more promising (Maeda et al., 2001). Since the multivariate techniques have numerous applications, one needs to know about the main of them being applied in the areas of Forest Agricultural Sciences, its functions and objectives. We as the main examples of multivariate techniques, multivariate normal distribution, matrix and vectors, quadratic forms, eigenvalues and eigenvectors, analysis of multivariate variance -MANOVA, the multivariate linear regression models, simultaneous tests on several variables, multivariate distances, component analysis , factors analysis, cluster analysis and discriminant, canonical correlation analysis.

Factor analysis
Factor Analysis (FA) aim to reduce the number of initial analysis with the least possible waste of information, taking advantage of a set statistical techniques (VICINI, 2005). CARVALHO, 2013 says that whenever there is a strong correlation with variables is conceivable to group them into a group, since different variable groups have weak correlation.
Factor analysis is applied when there is a large number of variables and correlated, includes principal component analysis and analysis of common factors, in order to identify a smaller number of new alternatives variables, uncorrelated and that somehow , summarize the main information of the original variables finding factors or latent variables (Mingoti, 2005).
According to Carvalho, 2013 generic formula for applying a factor analysis is defined by: X -= μ + ΛF ɛ (1) Whereas X = [X1 X2. . . Xp] T as a real random vector of dimension P, with mean vector μ = [μ1 μ2. . . μp] T and covariance matrix Σ variance-defined positive. The model of factorial analysis each observable variable Xi expressed as a linear function of m random variables F1, F2,. . . , Fm (m <p), called common factors , and one factor or error, Ɛi, i = 1, 2,. . . , P. Which it is also a random variable that explains the part of the respective variable variance not explained by common factors. Already Λ would be the matrix (PXM), the common factors m and p only factors are not observable.

Method Kaiser-Meyer-Olkin (KMO)
Using factor analysis there is an adequacy of data that is very important proposal by Kaiser-Meyer-Olkin (KMO). The KMO test is based on the principle that the inverse correlation matrix approaches the diagonal matrix, therefore compares the correlations between observed variables Solomon et al., (2012).
According VICINI, 2005 KMO can be obtained by the following equation: (2) The ratio of the sum of the squares of the correlations of all variables is divided by itself, plus the sum of the squares of the partial correlations of all variables.
At where: = r ij is the correlation coefficient between the observed variables i and j. = ij is the partial correlation coefficient between the same variables. The aij should be close to zero, because the factors are orthogonal to each other.
So that the data can fit the factor analysis should be noted the following regarding the value found in Kaiser's equation:

Sphericity test Bartlett
Another test used widely in the factor analysis is to Bartlett sphericity test (BTS), which tests the following hypothesis: the correlation matrix is an identity matrix, ie the values of the main diagonal are equal to 1 and the other figures be zero, concluding that its determinant is equal to 1. This means that the variables have no correlation and the null hypothesis can be rejected if the adopted α is equal to 0.05 or 5% and the value found is less than the value of α. (Pereira, 2001).
Bartlett's test evaluates the overall significance of the correlation matrix, i.e. tests the null hypothesis that the correlation matrix is an identity matrix Solomon et al., (2012).

Principal Component Analysis
The goal of the principal component analysis (PCA) is to address issues such as the generation, selection and interpretation of the investigated components. Intending thereby determine the most Through data covariance matrix becomes a major component estimated. For the application of analysis it is necessary to standardize the data so that the whole series will have the same magnitude of values. After obtain the eigenvectors that are values representing the weights of each component in each variable and range of (-1 to 1) and function as correlation coefficients that represents the contribution of each component to explain the total variation of the dataRuhoff et al., (2009).

Clusters analysis
The Cluster Analysis or Cluster (AA) in multivariate data identifies groups of objects. The goal is to form groups with homogeneous properties of large heterogeneous samples. Should be sought more homogeneous groups possible and that the differences between them are as large as possible (Hair et al., 2005).
The AA encompasses a variety of techniques and algorithms, and the goal is to find and separate similar data in the same group and are distinct from the data of the other groups (VICINI, 2005).
According Ruhoff et al., 2009 AA seeks to group data elements that are more like each other. The groups are determined so as to obtain homogeneity between the elements of the groups and heterogeneity between them. Dendrogram As a result of AA we get the dendrogram or phenograms also known as graphic tree that is graphic with a summary of the groups obtained by the analysis.  Figure 1 that the genetic material of 11:08 have the greatest similarity dendrogram, by having the smallest Euclidean distance being such as to form the first group. So then come variables 10:09, and after 1 and 5, and so on, the variables are grouped in descending similarity order, ie 12 variable formed the last Distance between A and B = DAB = √Σpj = 1 (xja -XJB) ² (3) In matrix form, this distance is given by:

Mahalanobis distance
The similarity between samples (treatment, individuals, populations) correlated to a set of characteristics and the distance between any pairs of sampling units, the degree of dependence between variables must be considered. To quantify distance between two populations when there is data repetition, it is recommended to use the Mahalanobis distance (d²) (VICINI, 2005).

Canonical Correlation Analysis
The Canonical Correlation Analysis (CCA) has as its main objective the study of existing linear relationship between two sets of variables. Applying this analysis summarizes the information of each response variables set in linear combinations seeking to maximize the correlation between the two sets (Mingoti, 2005 This multivariate analysis model allows to discover the relationship between two groups or sets of variables, increasing the correlation between the vectors of independent and dependent variables Burt, (2015).

Multiple Regression Analysis
Multiple regression provides the changes in the dependent variable in accordance with changes in the independent variables. The method is suitable when there is a single analysis dependent variable metric related to two or more independent variables (Hair et al., 2005).

Discriminant analysis
The multiple discriminant analysis consists of a set of tools and methods used to distinguish populations groups and classifying new observations in certain groups and used when groups are known a priori (Mingoti, 2005).

MANOVA / MANCOVA
The multivariate analysis of variance and covariance is also known as MANOVA (multivariate analysis of variance) and MANCOVA (multivariate analysis of covariance), aim to verify the similarity between multivariate groups simultaneously exploring the relationship between several independent variables and two or more variables dependent metrics (Hair et al, 2005). Dealing with multivariate analyzes, among the most used in the selected works we can mention among the most important the grouping or cluster analysis used 25 times, followed by Component Analysis Principal 20 times, the factor analysis 10 times, a Canonical Correspondence Analysis which was used 9 times and 8 times Discriminant Analysis.

III. RESULTS AND DISCUSSIONS
Since the case of the multivariate analysis used in each study, we observed a pattern between the multivariate method used and certain lines of research in the area has been established. Knowing this we sought to verify this pattern lines separating the search by subject and quantifying which types of multivariate method used was more.

Multivariate analysis ins studies involving managements soil
Of the 11 works found in this area can be seen in studies certain regularity in the use of multivariate analysis, making use of the same techniques to observe its data. And that because of the usemultivariate analysis deduce from some knowledge, very complex methods are rarely used in the searched items in exchange for simpler analysis that were most useful as the Cluster Analysis (AA) and Principal Components Analysis (PCA) which were the most widely used . But the determining factor in the choice of multivariate analysis applied is the purpose of the analysis, which generally applies in the simultaneous analysis of multiple sets factors when it needs to reduce data, identify relationships between variables, split group of similar factors among others. The use of multivariate analysis studies and evaluations of experiments in Agricultural Sciences proved to great value to allow greater clarity and better interpretability of dealing with complex phenomena.