Evaluation of PPP/GNSS obtained Coordinates Accuracy using a Decision Tree

— Point positioning over the Earth´s surface has become simpler after the advent of positioning systems using artificial satellites. Nowadays, the satellites constellations of GNSS are GPS and GLONASS, the most structured systems, however, other systems were built to integrate the GNSS in last years. There are different methods to perform precise positioning using the data transmitted by GNSS satellites and the PPP method is one of these. Similarly to others, the PPP uses the observables to produce the coordinates and precise them. As we know, precision is different from accuracy. While precision informs the data set quality, accuracy tells us how much the coordinate is close to its real position on the ground. Although the correlation between precision and accuracy correlation is implicit in the observables, the processing methods cannot achieve it. The purpose of this study was to identify this relationship using the data mining tool known as Decision Tree. The creation of a large set of coordinates with known precision and accuracy were necessary for the recursive training of the Decision Tree, which became able to predict the coordinates’ accuracy using only its precision abstract should summarize the content of the paper. Try to keep the abstract below 250 words. Do not make references nor display equations in the abstract. The journal will be printed from the same-sized copy prepared by you. Your manuscript should be printed on A4 paper (21.0 cm x 29.7 cm). It is imperative that the margins and style described below be adhered to carefully. This will enable us to keep uniformity in the final printed copies of the Journal. Please keep in mind that the manuscript you prepare will be photographed and printed as it is received. Readability of copy is of paramount importance.


I. INTRODUCTION
The evolution of global artificial satellite navigation systems that integrate the Global Navigation Satellite Systems, or simply GNSS, has been happening regularly and steadily over the past decade and leads us to understand that in a short time the world will reach a new stage in positioning of points using artificial satellites. Among these systems, the Global Positioning System (GPS) is in a more advanced stage, finalizing its modernization with the planned launch of Block III satellites and other investments in land infrastructure. The Russian Global Navigation Satellite System (GLONASS) is in the final stages of completing its constellation, while the European GALILEO system and the Chinese Compass Navigation Satellite Experimental System or Beidou-1 are in intermediate stages of deployment. In this context of novelties, with the consequent enlargement of horizons, some points still deserve to be researched, since they belong more to the fundamental technique applied in the positioning of points than to a particular positioning system. The relationship between precision and accuracy of a positioning is the subject addressed in this paper, investigated from data observed with dual frequency GNSS receivers. The objective of this paper was to study the accuracy of coordinates obtained by the Precision Point Positioning (PPP) method and the feasibility of using them in engineering works that require good accuracy. To understand accuracy behavior, the PPP processing results obtained over a period of six months were analyzed taking into account the different sources of error that act on the propagated signal and cause deviations above the limits acceptable for engineering purposes. In this project, the machine learning technique was applied. This technique uses a database populated with the known accuracies and precision of a set of previously measured point to, by computational training, induce a Decision Tree and make it capable of estimating the accuracy of a new positioning in which only the precision is known. Different methods of observation can be developed by using signal receivers transmitted by the satellites that make up the constellations that integrate the GNSS. These methods produce the geodesic coordinates of points, with different precisions, practically on the entire physical surface of the Earth. Among them, the absolute method known as Precision Point Positioning (PPP) allows precise positioning using only one receiver to record the carrier phase data transmitted by the satellites and then process them in combination with accurate ephemeris provided by the International GNSS Service (IGS). This is a very useful method for determining coordinates of points that are far from a terrestrial reference network.
Since it is an absolute method, PPP does not connect to the existing terrestrial geodesic networks in the studied region and, therefore, the coordinates determined with its use do not have the adjustment residuals of an existing terrestrial geodesic network. It can be said that, using PPP, each point determined is an independent point that has its own accuracy. However, jobs that will use the coordinates of that point will certainly make their connection to existing terrestrial geodesic networks, which can be a problem if their accuracy is not adequate. The Brazilian Institute of Geography and Statistics (IBGE) has a PPP Service available online (IBGE-PPP), which processes the GNSS data and provides the coordinates of a point measured using dual frequency receivers. These coordinates are linked to the Geocentric Reference System for the Americas (SIRGAS2000) and to the International Terrestrial Reference Frame (ITRF). According to HOFMANN et al. (2007) [1], the technique used to determine the coordinates by the PPP method uses a mathematical adjustment by the criterion of least squares (MMQ) and provides statistical indicators on the precision of the solution found in the adjustment. As it is known, accuracy is different from precision and for this reason there is some risk in assuming the coordinates that result from PPP processing based only on its precision. In many situations, the coordinates determined with very high precision do not have good accuracy and, therefore, do not represent the true point position on the Earth's physical surface. This happens initial data acquired by the receivers contain perturbations of some kind, such as the multipath influence, which according to MONICO (2008) [2], is a local interference capable of degrading the observables of the phases and of the codes and producing the coordinates from a point certainly far from their real position on the ground. Thus, the study presented here was developed to find a way to indirectly estimate how different are the precision and accuracy of a PPP-GNSS positioning solution.
This paper's main hypothesis is that once the correlation between accuracy and precision of a significant set of GNSS data is known, it becomes possible to predict the accuracy of a new measurement, based on its precision, using the computational technique of Machine Learning known as Decision Tree.

The Precise Point Positioning Technique (PPP)
PPP is a method in which the position coordinates of the receiver are calculated directly in function of the position coordinates of the satellites. This is an absolute positioning method and for this reason PPP is also known as a Precise Absolute Positioning method. The georeferenced coordinates obtained with this method are not associated to any planimetric network, or to any existing altimetric network on the Earth's surface, and for this reason, according to IBGE (2013) [3], the PPP coordinates can present significant differences regarding the vertices of these terrestrial networks. In other words, coordinates determined with the PPP may present unacceptable accuracy. PPP is a method similar to simple absolute positioning, but it is not the same, as there are some fundamental differences. One remarkable difference is that the coordinates of the receiver are calculated in the PPP from the precise ephemeris available in the IGS network or other similar institution. It is an expressive difference compared to the simple absolute positioning method that uses the broadcasted ephemeris transmitted by the satellites. In the PPP calculation, the movement of tectonic plates, the ground tides, the satellite clock errors, the receiver clock errors, the offsets of the antenna center of the satellite and the phase center of the receiver antenna are considered to get coordinates with good accuracy. Another important difference is that the PPP method also uses, in addition to C/A Code data, the L1 and L2 carrier phase data, which requires the user to use a dual frequency receiver. It is only with this type of receiver that the necessary data is obtained to model the ionosphere and to develop the model known as ionosphere-free, or ionofree, which according to XU (2016) [4], eliminates the effects of the ionosphere by the combination of the codes and carrier phases equations. It is a linear combination of data, extremely useful for eliminating the errors produced by the ionospheric refraction, when the signals cross the Earth's atmosphere heading to receiver. SILVA and SEGANTINE (2015) [5] estimate the precision of the PPP method in the order of 5 to 10 cm, although some tests show that it can reach 2 to 5 cm precision, especially when the collecting data time is more than two hours and there is a convergence of results. The PPP method began to be offered in Brazil by the Brazilian Institute of Geography and Statistics (IBGE)

International Journal of Advanced Engineering Research and Science (IJAERS)
[ around the year of 2000, through the link http://www.ppp.ibge.gov.br/ppp.htm. Strictly speaking, the PPP method can also be applied to data collected with single frequency receivers, which can only acquire data from a single carrier. In this case, some mathematical resources are applied to model the ionosphere, since it is not possible to combine the carriers phases. We did not deal with this case in this study.

Decision Tree
Machine Learning is a characteristic of a computer system training using a large amount of data to learn how to execute a certain taskand execute it at other times with better performance. WITTEN and FRANK (2005) [6] understand that the system modifies itself and automatically learns about a certain event, allowing a task from the same group of tasks to be more effectively performed the next time. It is a process that automatically or semiautomatically identifies the patterns implicit in large amounts of data. Due to this capacity, Machine Learning techniques are increasingly used to deal with problems of great complexity and difficult to conceptualize in different areas, such as in Mathematics, Medicine, Biology, and Engineering. ZHAN-LI et al. (2015) [7] demonstrated that this process is able to identify and synthetically recover three-dimensional points lost during the capture of a sequence of video images, in a process conceptually very close to the classification of points determined by the PPP method. Among the current Machine Learning techniques are: 1)The Neural Network, or Multi Layer Perceptron network, indicated for multiple classification of events, in which the number of learning examples is typically large.
2)The algorithm of Support Vector Machines, extremely fast, but with the disadvantage of solving only binary problems, involving only two classes.
3)The Decision Tree, designed to work with an unlimited number of multivariate data that serve as test examples in the training stage. It also has the ability to interpret and understand the implicit rules in this data set, and then uses these rules in a prediction process able to create infinite classes that will be used to classify a new event by the similarity of its characteristics compared to the characteristics of the examples used in the training stage. The accuracy of the coordinates obtained using GNSS, especially using the PPP method, can be understood as a complex problem, since the PPP is dissociated from existing geodesic networks on the surface of the Earth. For this reason, the Decision Tree is an appropriate tool to clearly explain the implicit positioning accuracy in the observed data.  [8], a Decision Tree is induced (created) from a reliable database, a data structure constituted recursively by: decision nodes which correspond to a test on a variable and leaf nodes, which correspond to the resulting classes, as shown in Figure 1.
To classify a measurement consisting of GNSS observables, the process begins at the root, following to each test node until the decision leaf is reached, at which point the classification takes place. Each Decision Tree can be represented by a set of rules, in which each rule begins at the root of the tree and walks to one of its leaves. Like any other automated and repetitive procedure the Decision Tree presents advantages and disadvantages. Among the advantages some can be highlighted: 1) The Decision Tree is easily created and intelligible.
2) Does not require "a priori" definitions for any parameter of the data under analysis.
3) The number of examples used, the quality of the database, and the intensity of the training control in the decision tree generating algorithms are considered to be unstable and sensitive to variations in the training data. This minimizes weak results at the decision points of the tree (decision nodes) and prevents inference errors from spreading to all subsequent branches. 4) The Decision Tree allows for simultaneous classification of alpha data, numerical data and alphanumeric data, with the condition that the output attribute is always an alpha class. After being recursively trained, a Decision Tree produces, as a result, the stratification of data in the form of classes. According to RICH & KNIGHT (1991) [9], classification is an important component for solving many problems, being in its simplest form considered as a direct task of recognition. From the point of view of machine learning, the act of classifying is the process of assigning to a given data the name and class to which it belongs. Previously to the classification some tasks had to be carried out for the Decision Tree induced in this study to classify the accuracy of the solutions of new positioning points. First, a set of coordinates with known precision and accuracy was organized and the Decision Tree was intensively trained based on this data until it established the intrinsic inference rules contained in them and in that way, the tree became able to perform the classification.

II. MATERIALS: STUDY AREA AND DATA SET
In this paper, a reference database composed of a multivariate dataset was prepared. This dataset was used to create the Decision Tree and then make it able to make the predictions about the accuracy of results. The data of the reference bank were acquired from three geodesic stations, located in the state of São Paulo, according to Figure 2.   [3], the result of a PPP positioning converges after two hours of stored data, and one of this study's objectives was to analyze one hour of data with the same convergence pattern. Therefore, the first file of the day contains data from 4:00 a.m. to 7:00 a.m., the second file from 5:00 a.m. to 8:00 a.m., and the last file of the day contains data from 3 p.m. to 6 p.m. In this way, 12 files were prepared each day, covering the daytime period from 4:00 a.m. to 6:00 p.m., which is considered business time, when most of the companies that work with georeferencing activities acquire their data, which shall be used in engineering services. Strictly speaking, PPP-IBGE processing provides the eleven first variables and the six final variables are obtained by crossing the data. The accuracy of the coordinates, for instance, was obtained by comparing the measured coordinates with the known coordinates of each geodesic station. This was done to highlight the important points in the Decision Tree training, which were the percentage of rejected GNSS epochs, both GPS and GLONASS, whose proportion has a direct relationship with the precision of the positioning result.

Decision Tree Induction Software
To interpret the 17 variables produced in PPP processing and to identify how they are related, we used the open software developed by Professors Ian H. Witten and Eibe Frank of the University of Waikato, New Zealand, known as WEKA (Waikato Environment Knowledge Analysis), version 3.8.

Fig.3: Example of a Decision Tree.
This software was chosen due to its capability of working with large volumes of data and for offering different Machine Learning techniques, including Decision Trees. The software facilitated the construction of several decision trees, such as the example above, created for this study until it reached the appropriate version to carry out the classification.

Accuracy Classes
During the computational training of a Decision Tree, the computational system creates the classification rules from the known situation to predict new events. For this reason the Decision Tree needs to be instructed about the interval of each class to be considered. Working with geodesic stations that have known coordinates, it is always possible to identify the quality of the positioning of the PPP method. Making a comparison between the coordinates determined in the PPP and the known coordinates enables the establishment of the classes and their amplitudes which must be respected in the results predicting process. In this study, the following accuracy classes were defined for the training of the trees:  /dx.doi.org/10.22161/ijaers.5.12.16  ISSN: 2349-6495(P) | 2456-1908(O) www.ijaers.com

Page | 123
The reference bank used to carry out the Decision Tree training used only the known information in the three geodesic stations, thus being the known reference in the process. It was populated by the 396 daily measurement sessions, each containing the 17 mentioned attributes. The accuracy class known in each case was classified by the researcher and became the 18th attribute in the database. The following figure shows the implicit accuracy in the Decision Tree training data. This figure also shows that the user, when working only accurately, is not aware of the accuracy of the result, being exposed to the risk of adopting as reliable some sets of coordinates that are very distant from the real position of the point on the ground. The Decision Tree interprets what really matters in a positioning, which is the accuracy of the result.

Fig.4: Precisions and Accuracies in GNSS data.
III. VALIDATION TEST Whenever the Decision Tree is triggered to classify the accuracy of a new PPP positioning solution from which it only knows the precision, it follows the rules implicit in the reference bank, identifies the links that may exist among the 17 variables of this new measurement, and makes the prediction about the 18th variable, which is the accuracy of the new solution, something still unknown, as of WITTEN and FRANK (2005) [6]. As it is an inferential method, there are always probabilities of errors directly associated with the quality of the reference bank.
To verify the quality of the predictions made by the Decision Tree, a specific stage was developed to validate its predictions.
To carry out the validation test, a new set of GNSS data could be acquired at any randomly chosen new location inside the triangle in question, in a different location from the EESC, SPBO and SPC1 geodesic stations, which had already been used in the training stage. If the chosen location was a place without any control we would only have to accept the prediction made by the Decision Tree without means to verify the quality of its prediction, which is the object of the validation. To know exactly how the Decision Tree classified the new data, we decided to use a fourth RBMC's geodesic station, located inside the territorial area formed by the three initial ones. The classification made by the Decision Tree over the collected data in this 4th station was used to validate the level of quality of the made predictions. The chosen station for the validation test was: The data acquired at this station was used to organize 33 observation sessions scattered from January to May 2016, but on different days from those used in the composition of the reference bank. The data files of each session were organized in the same way as the files used in the training, i.e., three hours each, a period sufficient for the convergence of data in PPP processing. These 33 files were sent online for PPP processing on the IBGE website. The table below shows the differences (m) values, comparing the calculated coordinates in each session with the known coordinates of the SPPI station, differences that inform the actual accuracy of each session. The last column on the right presents the accuracy prediction made by the Decision Tree, through alphabetic characters: A, B, C, D and Z, which represent the class estimated for each session, according to item 2.2.

International Journal of Advanced Engineering Research and Science (IJAERS)
[  At this point, it should be remembered that in the training stage, 17 attributes were used in each new measurement session, the latter being precisely the classification of the accuracy of the coordinates known in that stage. This point is emphasized because now, in the validation step, each instance representing a measurement session was organized with only 16 attributes, leaving the 17th attribute, concerning accuracy, for the Decision Tree to make its own prediction.
All new instances could be validated because the SPPI station has known coordinates. The validation test reached a result with 29 correct predictions in a universe of 33 predictions, which puts the degree of accuracy in this work at 88%, a little above the initial expectation, which gave the Decision Tree a confidence level of 86%.

IV. CONCLUSIONS
As predicted by WITTEN and FRANK, (2005) [6], the results obtained in the creation of the Decision Tree proved to be better as we introduced cross -data and not only the initial data. Variables number 12 and 13 were introduced to explain, respectively, the proportion of rejected GPS epochs and the proportion of rejected GLONASS epochs, in addition to making evident the degree of participation of each positioning system for the final result. In addition, these variables show the proportion of each system's data utilization individually, which helped the Decision Tree to be better conditioned for future interpretations. It has been confirmed that, in fact, the precision of measurements made with GNSS is something very different from accuracy. Figure 4 presents this difference very clear. In addition, the figure shows that the relationship between accuracy and precision is not deterministic and, therefore, each positioning result has to be monitored individually, otherwise a bad result may be accepted as good. The Decision Tree is a tool that allows the user to anticipate the correlation between both accuracy and precision. The data processing by the PPP-GNSS method reached, in this study, an accuracy of decimetric order, as already estimated by SILVA and SEGANTINE (2015) [5]. This level of quality puts the method in equal conditions to other methods of precise positioning and in a much better condition than was initially assumed. The obtained results are satisfactory and completelly within the expected range, since they showed a behavior very similar to each other, both for the set of precisions and the set of accuracies. Only 4 values of accuracy did not follow the behavior of the tests group in the 396 measurement sessions, although they resulted in values better than 2 centimeters, which is not significant for the study, according to LINOFF and BERRY (2011) [10].
From the results in this study, which used six months period of data to show the accuracy of coordinates as a greater parameter of importance than precision, it can be concluded that: It was clearly demonstrated that accuracy is something different from precision, which accompanies the coordinates calculated by any GNSS positioning method, including the PPP-GNSS method. It can be proven by the distance between them in Figure 2.
The 396 measurement sessions used to create and train the Decision Tree showed a correlation between precision and accuracy in the GNSS data, suggesting that there may