An algorithm for three-dimensional indoor positioning based on Bayesian inference, Fingerprinting method and Wi-Fi technology

Wireless indoor positioning systems have been shown to be very useful in many applications and have been the subject of a considerable amount of research, mainly concerning the two-dimensional (2D) case. However, in many practical situations it is necessary to determine the three-dimensional (3D) coordinates of an object or user. In this paper, a hybrid algorithm for implementation in a 3D indoor positioning system is proposed. This algorithm is implemented by using a fingerprinting technique based on both the k-means and naive Bayes methods, and uses the received signal strength (RSS) as an input parameter. In addition, a comparison of the main algorithms discussed in previous research papers and the proposed algorithm is presented. Indoor positioning experiments were conducted in a typical building with two floors (180m2) and four access points (APs). The proposed algorithm exhibited a better performance than that of other algorithms, with a mean error around 1.80m.


I. INTRODUCTION
Technological development has made the location of both people and objects in outdoor environments possible by means of tools such as the Global Positioning System (GPS). However, these tools are not efficient for handling indoor locations because the variability of these environments is high as compared that of outdoor environments. This is because the variability depends not only on the antenna's characteristics, but also on the type of construction and internal structures, such as the walls, floors, and partitions walls. Because of the numerous applications of indoor positioning systems (IPS), including utilization in emergency systems, localization of mobile robots, and navigation assistance in malls, schools, university campuses, airports, and hospitals, several studies have been conducted on location estimation in this type of environment. A survey of the main implementation techniques shows that they can be classified into three typical location estimation schemes: triangulation, scene analysis (fingerprinting technique), and proximity. These schemes are discussed in [1] and [2], whereas in [3] and [4] the application of artificial neural networks (ANNs) to the problem was proposed. The k-nearest neighbours (k-NN) algorithm and the Bayes method were addressed in [5], [6], [7], [8] and [9]. In this paper, we propose a three-dimensional (3D) solution to this problem through a hybrid algorithm. This algorithm is implemented by using a fingerprint technique based on both the k-means and naive Bayes methods, and uses the received signal strength (RSS) as an input parameter. Furthermore, we compare the proposed method with the ANN multilayer perceptron (MLP) and radial basis function (RBF) algorithms, as well as with the k-NN and Bayes (histogram and kernel) methods and the method of triangulation (lateration), which is one of the most commonly used methods in the context of outdoor locations. The experiments were conducted in two distinct scenarios, both in a typical build with a total area of 180m 2 (first floor 131m 2 and second floor 49m 2 ) and four APs. In the first scenario, actual measurements were performed and the RSS values obtained in pre-defined regions. In the second scenario, the signal behavior was simulated from the Cost 231 multi-wall (MWM) model. The rest of this paper is structured as follows: Section II presents an overview of the main techniques and technologies related to indoor positioning, proposed algorithms are discussed in section III, section IV presents the main discussions and computational results and finally, section V summarizes the results and presents suggestions for future work.  [2]. This method is classified into two categories:  Lateration: estimates user location by calculating the distance between the mobile unit and a set of APs with known coordinates. For a 3D positioning, it is necessary at least four APs. Centered on each of the APs, a sphere with radius r, being the location of the user defined by the intersection of such spheres. Figure 1 shows this concept.

Fig. 1: Indoor Positioning System Based on Lateration
 Angulation: The location of the target can be estimated by the intersection of several pairs of lines with respect to the direction of the angles, each formed by the circular radius from a set of APs. Figure 2 shows this concept.

Fig. 2: Indoor Positioning System Based on Angulation
The most common methods used to estimate distance in the triangulation method are [1]:  Time of Arrival (TOA): The distance between the mobile unit and the unit of measurement is directly proportional to the propagation time. For 3D positioning, TOA measurements must be performed in relation to the signals of at least four APs.  Time Difference of Arrival (TDOA): It consists of a variation of the TOA, however, the estimation of the mobile unit is obtained from multiple measurements in order to obtain the difference between the arrival times of the signal to the receiver.  Received Signal Strength (RSS): This technique is based on attenuation of the signal between multiple transmitters and the receiver. The estimation of the difference between transmitted and received RSS is determined through empirical and theoretical propagation models. Some of these models are discussed in [10]. A comparison of the methods in question, including parameters such as cost and accuracy, is presented in [11]. 2.1.2 FINGERPRINT Fingerprint is a technique based on pattern recognition and involves the division of the location system into two phases, offline and online phases. In the offline phase, the vectors of received signal strength are collected the of all the detected Wi-Fi signals from different access points at many known locations (reference points) [8]. The online phase consists of reading an RSS value, and by using a classification algorithm, this RSS value is compared with the values stored in the database during the offline phase, thus obtaining the location. The Figure 3 summarizes this procedure. position is given sectorally, that is, the system returns the room in which the user can be at a given time. Technologies used by this method include radio frequency identification and Bluetooth [1].

TECHNOLOGIES FOR INDOOR POSITIONING 2.2.1
Radio Frequency Identification -RFID It consists of a technology that uses radio waves to identify people or objects. Identification can be obtained by reading information stored in a tag attached to an object. Some works that use this technology include [13], [14] and [15].

2.2.2
Wi-Fi It is a technology that uses radiofrequency to transmit data from devices based on one of the 802.11 standards developed by the IEEE. These devices include for example: Notebooks, cell phones, cameras and TV's, which makes this technology more suitable for implementing an IPS [16].

2.2.3
Bluetooth It is often classified as a wireless personal area networks (PANs) technology, with transmission power and range much smaller than a Wi-Fi network, occupying a limited space around the user (usually 10 meters). It is used to connect wireless devices within a short distance, such as: Cellular phones, TV's, notebooks and stereos. For more detailed discussions on the application of this technology to the problem in question, see [17], [18], [19], [20] and [21].

III. PROPOSED SOLUTION AND GENERAL DESCRIPTION OF THE SYSTEM
The proposed algorithm is based on Bayesian inference and use the fingerprint technique as the basic structure for estimating the location. The algorithm consists of a combination of the k-means and the naive Bayes, a simplified version of the Bayes theorem discussed in the subsection 3.3. The Bayes theorem is given by [22]: The central idea of the proposed method is to determine the most probable position of the target based on the RSS vector measured in relation to all n APs. This procedure is performed through the following rule: be a collection of subsets, called sectors or rooms, where each is a candidate for target location and consists of n RP's where⋂ When the sector that represents maximum probability is selected, the 3D location of the target is defined from the centroid of the RP's belonging to the sector in question.

ESTIMATION OF THE LIKELIHOOD FUNCTION
Let x = {x1 ,x2,··· ,xN} an RSS sample obtained from an unknown probability distribution with density fX(x), in the off-line phase of the location system, our objective is to estimate the density in question. There are several methods for this estimate. The most common are presented below:  Histogram: this method subdivides the space of fX(x) with support [0,1] in n bins of (Sectors of ) equal sizes, given by: To estimate P(s| ), we obtain the frequency distribution of the signals in the environment. The goal is to check the frequency of each RSS interval for all locations.
• Density estimation Kernel (EDK): In probability and statistics, EDK, is a non-parametric method for estimating a probability density function fX(x) given by: [9].
Where η is the sample size. h is a non-negative data smoothing parameter called bandwidth. Similar to the histogram method, the kernel estimation establishes a function for representing the probability distribution, with the difference that the data is not to be allocated in discrete regions (bins), instead a continuous function is defined. K(•) is the kernel function in which: A widely used function in this case is the Gaussian kernel [9].Thus, the estimate for the term P(s| ), can be modeled through a normal distribution is given by: The implementation of an IPS through this algorithm divides the off-line phase into two steps. The first step starts with reading the RSS vectors in n RP's with predetermined 3D positions that make up the vector r. For each RP, measurements are taken for all APs. In addition, these measurements are performed at different heights and with the mobile device directed to the north, south, east and west. This results in the following matrix:

PROPOSED SOLUTION (KMEANS-BAYES)
This procedure aims to make the database robust, thus maximizing the accuracy of the algorithm. When all measurements are completed, in the second step, the kmeans algorithm partitions the indoor space into P clusters that represent the sectors in which the RP's will be allocated. For each of the clusters a set of centroids with Cartesian coordinates randomly chosen in the first iteration is defined. From the second iteration, the centroids are obtained based on the mean coordinate of the RP's. In addition, a value is defined for P is defined. Then, the distance between each RP and the centroids is calculated by assigning each RP to the cluster that has the nearest centroid. This procedure is repeated until there is no RP change between the clusters. This procedure is performed by minimizing an objective function, called the squared error function, which is defined as (8) Where ‖ − ‖is a 3D measure of the distance between each reference points and the centroid.In general, Euclidean distance is applied in (8). This, together with other measures of similarity, including Mahalanobis, Minkowski, and cosine distances, are discussed in [23].
When the clustering process is completed, there have the following configuration: • Each RP has a definite 3D coordinate P = (x,y,z)and an RSS vector related to this coordinate.
• are assigned probabilities for each of the clusters, according to the knowledge of the frequency of users in each of these clusters or evenly, that is, considering that P( ),i = 1,··· ,N are equiprobable. After this procedure, the offline phase is complete. In the online phase the vector s = {RSS1,RSS2,···,RSSN} is read and the target position is estimated based on equation (2), resulting in: (9) Using the general multiplication rule [24] in (9), results in: (10) In order to reduce the computational cost of system incurred by equation (9) when it is necessary to acquire a large volume of data, we will consider that that the probability of each attribute of vector s are independent of each other given the locations. This means that P(RSS1,RSS2,··· ,RSSN| ) can be simplified: (11) Thus, the target position given s is simplified to: (12) Equation (12) is a simplified version of the Bayes theorem, known as the Naive Bayes. More details on this classifier can be obtained in [22]. After the cluster that result in maximum probability has been identified, the target position is estimated from the centroid coordinate of the RP's. Algorithm 1 and figure 4 summarize the proposed method. ://dx.doi.org/10.22161/ijaers.4.10.26  ISSN: 2349-6495(P) | 2456-1908(O) www.ijaers.com Page | 170

INTRODUCTION
The experiments were conducted in two distinct scenarios, both in a typical build with a total area of 180m2 (first floor 131m2 and second floor 49m2) and four APs. In the first scenario, actual measurements were performed and the RSS values obtained in pre-defined regions. Figure 5 shows the floor plan with for the two floors used in experiments. In order to make the developed IPS more realistic, all experiments were performed in the environment with people performing their normal activities.

Fig. 5: Floor Plan used in experiments and simulations
In the second scenario, the same build was considered, taking into account all the characteristics of this environment (for example, the quantity, position and transmit power of the Aps), with the difference that in the scenario in question no real measurements were Instead, the Cost 231 multi-wall model (MWM) was used to simulate the behavior of the signal in this residence. This model considers the signal attenuation as the loss in free space added to the loss resulting from the numbers of walls and floors penetrated between the transmitter (Tx) and receiver (Rx). The model in question is given by [10]. The total loss related to the floors is a non-linear function of the number of penetratedfloors [10]. This feature is taken into account by the introduction of an empirical factor b. For practical reasons, the number of distinct walls should be kept low because the difference between the various wall types in relation to the attenuation is small and its meaning in the model is not clear [10]. The types of walls in question are displayed in the table 1 [10], while the typical values of Lw1, Lw2, Lf, and b are shownin the table 2 [10].  /dx.doi.org/10.22161/ijaers.4.10.26  ISSN: 2349-6495(P) | 2456-1908(O) www.ijaers.com Page | 172 the intersection of four spheres centered on four APs with known coordinates given by (x1, y1,z1), (x2, y2,z2), (x3, y3,z3) and (x4, y4,z4)whose distances to the target are given by di 2 with i = 1,2,…, 4, estimated by the attenuation of RSS ,we obtain the location of the target from the following relation:  KNN: For this algorithm, we considered the value of k = 2 and 3. Where k represents the number of nearest neighbors. In addition, Euclidean distance was used as a measure of similarity;  Artificial neural networks (MLP and RBF): Table.

3: Parameters Adopted for the Neural Networks
Probabilistic Model (Pure version of Bayes theorem): In this case, the IPS was developed through equation (2). using the histogram and kernel based versions. The kernel method was modeled by a distribution by a normal distribution, according to equation (6). 4.3 RESULTS OF SCENARIO I For this scenario, it was considered 150 RP's distributed in the two floors. For each PR, 300 RSS values were collected, according to the following rule: Notebook with heights of 0.2m; 1.0m and 1.80m. In each height, 25 measurements were performed with a notebook with the Notebook facing toward the north, east, south, and west directions. The Vistumbler software [25] was used to acquire RSS. After reading the RSS in all RPs, an experiment was conducted to verify the variation of the signal in the environment. At a fixed point (15.2.0.7 and 5.72m), for a period of 300s the signal strength Signal strength was observed with respect to the four APs. The results are shown in the figure 6. Note that even from a fixed position, the fluctuations in the signal level reaches 10 dB, which is the expected variability for indoor environments. With respect to the performance of the algorithms, the method of lateration when compared with others was not efficient. This is an expected result, since methods based on traditional outdoor location are not suitable for indoor environments, which have a much greater variability [11]. The system implemented with this method showed a mean error em of 2.23m from the actual locationwith precision of 68 % and 80.1% relative toabsolute maximum errorseaof 1.5 and 3.5m respectively.     V. CONCLUSION In this paper, we propose a three-dimensional (3D) solution to this problem through a hybrid algorithm. In addition, a comparison was made with the main algorithms discussed in previous research papers, including with the lateration method, one of the most used in the outdoor context. With the exception of this method, all algorithms used a fingerprint recognition technique. A lower variation in signal levels in Scenario II had a direct influence on the performance of the lateration method. In this scenario, the lateration obtained better performance than in scenario I (around 23cm in relation to em). Even with better performance in this scenario, the lateration method was less accurate than the others, which also demonstrated better performance compared to scenario I, but not significantly. As discussed, indoors environments have high variability. IPS developed only from simulations based on propagation models may not capture all the information regarding the signal variations in this type of environment and therefore, overestimate the results, as in the case of the lateration method implemented in this work. The proposed algorithm exhibited a better performance than that of other algorithms, resulting em = 1.80 and 1789 for scenarios I and II respectively. Because the proposed algorithm is based on Bayesian inference, there is good integration with the fingerprint technique, this because it relates the current and past information efficiently, which results in good results in relation to the classification. However, one of the limitations of the techniques in question is related to the quality of the information, that is, a robust database is necessary for an IPS based in this algorithm to present good results. As future work, we intend to develop the following solutions in order to maximize the precision of the location estimation:  IPS based on angulation;  Hybrid IPS based on triangulation and fingerprint;  IPS based on multiple discriminant analysis.