Variability analysis of the hierarchical clustering algoritms and its implication on consensus clustering

Clustering is one of the most important unsupervised learning tools when no prior knowledge about the data set is available. Clustering algorithms aim to find underlying structure of the data sets taking into account clustering criteria, properties in the data and specific way of data comparison. In the literature many clustering algorithms have been proposed having a common goal which is, given a set of objects, grouping similar objects in the same cluster and dissimilar objects in different clusters. Hierarchical clustering algorithms are of great importance in data analysis providing knowledge about the data structure. Due to the graphical representation of the resultant partitions, through a dendrogram, may give more information than the clustering obtained by non hierarchical clustering algorithms. The use of different clustering methods for the same data set, or the use of the same clustering method but with different initializations (different parameters), can produce different clustering. So several studies have been concerned with validate the resulting clustering analyzing them in terms of stability / variability, and also, there has been an increasing interest on the problem of determining a consensus clustering. This work empirically analyzes the clustering variability delivered by hierarchical algorithms, and some consensus clustering techniques are also investigated. By the variability of hierarchical clustering, we select the most suitable consensus clustering technique existing in literature. Results on a range of synthetic and real data sets reveal significant differences of the variability of hierarchical clustering as well as different performances of the consensus clustering techniques.


I. INTRODUCTION
The clustering algorithms are much applied in Data Mining, and widely used in solving real problems from various fields such as Medicine, Psychology, Botany, Sociology, Biology, Archeology, Marketing, etc. [28].
They are unsupervised learning algorithms aiming to find a clustering of a given data set, such that, similar elements belong to the same cluster and distinct elements belong to different clusters. Among various clustering algorithms, the hierarchical clustering algorithms are oftentimes applied, owing their easy implementation and inherent advantages due to the visualization of the clustering through a dendogram. Different hierarchical clustering algorithms are proper for different shaped clusters, so may produce different clustering. Thus, putting up the problem of choosing one of these clustering (which is not a trivial task), or determines a clustering that represents the consensus among these clustering. The difficult task of choose one clustering can be based on evaluating the clustering quality. The analysis of compactness and separation of clusters not always find the real clusters [3]. Furthermore, property as variability or stability, enable us to meet more stable solutions and infer about clustering quality. On the other hand, many works have sought combine the different clustering obtained by different algorithms and still get the best data clustering, namely, a consensus clustering, which a better clustering often means a more stable, more robust and more consistent clustering. Several approaches to produce consensus clustering have been proposed and carried out in various ways which may lead to different consensus clustering for the same base clusterings set. Furthermore, some works to evaluate/select the best consensus clustering have been proposed in literature. As, in [14] is proposed a diversity measure of the base clusterings and its relation to the consensus clustering quality. Also, in [5] the authors propose measures to select the best consensus, based on consistency between the base clusterings and the consensus clustering. In this work, in order to select the best consensus clustering, we propose to analyze the variance of the base clusterings and its relation to the consensus quality. The quality of a consensus clustering algorithm is measured by the match between the clustering obtained and the known truthful clustering of the data set. From some matching indices suggested in the literature, we apply

II. RELATED WORK
In this section we outline some related subjects with this work, such as, the differences between hierarchical clustering algorithms, the main approaches of consensus clustering, as well as the clustering validation issue. In a latter context, are discussed works concerned about the selection of the consensus clustering, by the application of clustering algorithms and validation indices.

A.
Hierarchical clustering algorithms The clustering algorithms can be classified into two main categories, as, hierarchical and partitional. The partitional algorithms generate a single data partition, while hierarchical algorithms organize the data into a nested sequence of partitions [18].
A hierarchical clustering method generates a hierarchy that is a structure with more information than the clustering obtained by partitional algorithms. Moreover, it doesn't need to specify the numbers of clusters, and most of the hierarchical clustering algorithms are deterministic. In addition to these advantages, the hierarchical clustering algorithms have lower cost than the traditional algorithms, such as, K-means or Expectation-Maximization, but instead, they do not scale well and have, at least, time complexity of O(n 2 ), where n is the number of elements [30], [6]. Hierarchical clustering algorithms produce a set of nested clusters organized in a hierarchy, represented in a dendrogram. These algorithms can be, divisive (top-down) or agglomerative (bottom-up). An agglomerative algorithm considers, at first, each element of the data set as a cluster, and then successively, according to the distances between clusters, joins pairs of clusters until all clusters are combined into a single cluster containing all the elements. A divisive clustering algorithm starts with a cluster with all elements and then divides the clusters recursively until obtaining clusters with the individual elements [30], [26]. Because the agglomerative algorithms are most often used than the divisive ones, this work addresses these algorithms, and henceforth we refer only to these algorithms. As the dendrogram usually contains more than one partition having different number of clusters, at our studies, we decide to fix the cut level of the dendrogram, i.e., fix the number of clusters according the data sets and their known structure. Different hierarchical clustering algorithms differ on definition of distance between clusters henceforth may conduct to different resulting clusterings. The Single Linkage (SL) method compute the distance between two clusters by the minimal distance between all elements one of each cluster. For Complete Linkage (CL) method the distance between two clusters is the maximal distance between all elements one of each cluster. Considering Average Linkage (AL) method the distance between two clusters is the average distance between all pairs of elements, one in each cluster. The Ward's method (W), also known by the method of minimum variance, differs from the above mentioned methods for not using distances between clusters to aggregate them. The objective of W is to look at the slightest deviation between the cluster centroid and the others elements of the cluster, i.e., looks at the smallest variance of the cluster. At each step, all the possibilities of adding two clusters are checked, and it's chosen the one which causes the smallest increase of the sum of squares error, SSE, of the aggregate cluster. , the number of clusters, the j th element in the i th cluster having centroid ̅ . and elements. The distances between clusters are computed by distances between two elements, in which can be for instance, Euclidian, Mahhattan or Mahalanobis distance. At this work we chose the Euclidian distance because, in ours preliminary experiments, this metric, was found be preferable compared to the Mahalanobis metric. As it takes into consideration the correlation between the data sets, the covariance matrices can be difficult to determine and memory and computation time grows in a quadratic way with the number of features [2]. Having different definitions of distance between clusters the hierarchical clustering algorithms may produce different resultant partitions for the same data set. SL establishes a local aggregation strategy, i.e., takes into account only the area where two clusters are closer to one another. The other parts of clusters as well as the general structure of the clustering are not taken into account. So, SL can produce clusters disordered, elongated and little compacts [30]. On the other hand, CL avoids this chain effect problem, the aggregation of clusters is not local, and the whole structure of the clustering can affect the decisions of aggregation. CL produces compact clusters with approximately the same size (number of elements) and small diameters. It is also sensitive to outliers. A single element far from the center can, dramatically increase the diameters of candidate clusters to join together and completely change the final clustering [30]. SL is more versatile than CL and works well in data sets containing non-isotropic clusters, including clusters well separated and concentric, while, CL works well in data sets with clusters that may not be well separated [18]. The drawbacks of SL and CL are due to the way they calculate the similarity between clusters by the similarity of a single pair of elements. AL otherwise evaluates similarities between clusters based on all their elements. Thus, AL overcomes the sensitivity of CL to outliers and the performance of SL forming long chains that do not correspond to the intuitive notion of compact clusters with spherical shapes [30]. On the other hand, W, seeking to minimize the deviations between, cluster's elements and cluster's mean; it's an indication of homogeneity. The distance between two clusters is defined as the consequent increase in SSE if both clusters would join to form a single cluster. W algorithm, is attractive because it is based on a measure with strong statistical, and generate clusters, as well as CL, having a high internal consistency. Also has better performance than other hierarchical methods, especially, when the cluster's proportions are approximately equal [7]. Some principal characteristics of the SL, CL, AL and W algorithms are established in the Is sensitive to outliers and noise but less sensitive than SL.

B.
Consensus clustering algorithms As each hierarchical clustering algorithm has its own characteristics, the application of different clustering algorithms, may generate a wide variety of solutions, for a given data set. Faced with the existence of different clustering algorithms, initially, some authors were worried about searching for a particular algorithm which produces a given clustering configuration that best fits the data set, but, lately the investigation turned to the problem of how to combine the different clustering delivered by different algorithms. Several contributions to this problem have emerged in the literature, in which the combination of different clustering, aims to obtain a "better" data clustering, which represents the consensus among these clustering [10]. The various techniques in processing consensus clustering consist of two principal steps, one is Generation, which defines how to produce the set of individuals clustering, and the other is Consensus Function, describing how to combine them to find the consensus clustering. Thus, different ways to obtain and combine clustering lead to different consensus clustering techniques. Furthermore, each technique considers that certain properties should be fulfilled by the consensus clustering. These properties can be, i) Stability-Lower sensibility to noise or outliers, ii) Consistency-Similar to all the individuals clustering, iii) Robustness-Better performance than the individuals clustering and iv) Novelty-A clustering different from the individuals [11]. In the Generation step, there are no constrains about how the clustering must be obtained. Therefore, different clustering algorithms or the same algorithm with different parameters initialization can be applied. A common idea in the different techniques is that, the several clustering to combine must have a certain diversity between them, so that, they provide more information in the processing of consensus [14]. At the second step, the Consensus Function focuses the methodology of combining these individuals clustering to obtaining the consensus clustering. The Consensus Function is the main step for any consensus clustering algorithm and can be based, for instance, on Voting, Co-association Matrix, Graph and Hyper graph Partitioning, Information Theory, Finite Mixture Models, Genetic Algorithms. Moreover, some consensus functions are based on more than one of these approaches [11]. From several important contributions in the consensus clustering framework, one should note the works of, Fred [8], Fred and Jain [9] and Strehl and Ghosh [33][34], which are the pioneers in traditional consensus clustering approaches and are perhaps, the most referred in the literature. Due to that, we chose these consensus clustering techniques for our studies. In [8], the Consensus Function is based on Voting and Coassociation Matrix. The objective is to find consistent and robust consensus clustering. The individuals clustering are delivered using the K-means algorithm. With the data clustering obtained, pairs of elements are voted to be in the same cluster on consensus clustering every time they belong to the same cluster in the different clustering. The number of times that pair of elements is in the same cluster is counted and set on a matrix, the co-association matrix. This matrix can be viewed as a similarity measure between elements, and the consensus clustering is achieved by joining in the same cluster, pair of elements with a coassociation value higher than 0.5 (the threshold pre-defined). That means that pairs of elements are in the same cluster in more than 50% of individuals clustering. The EAC (Evidence Accumulation Clustering), consists of a modification of [8] where the co-association matrix is represented as a graph [9]. The idea is to cut weak links between nodes on graph, by a threshold called "highest lifetime", which corresponds to the minimum weight in the edges. This is analogous to cut the dendrogram produced by SL algorithm, being lifetime the range of threshold obtained by the distance between two consecutive levels on the dendrogram. Wherein for each level is delivered a clustering with k clusters, and one range with the highest value is selected as the consensus clustering [11]. In order to build robust consensus clustering, in [33][34], the authors propose a technique where the consensus clustering is achieved by an optimization problem, consisting on the Consensus Function maximization. The process is carried on by applying Mutual Information and representation on hyper graphs. The Mutual Information, concept from Information Theory [4] is used to measure the shared information between pairs of clustering. The consensus clustering is a clustering that shares most information with all possible clustering. The objective of finding a clustering that maximizes the Mutual Information, by an exhaustive search of pairs of clustering, raises computational problems. To solve this problem, three algorithms based on a hyper graph representation and partitioning algorithms are proposed, CSPA -Clusterbased Similarity Partitioning Algorithm, HGPA -Hyper Graph Partitioning Algorithm and MCLA -Meta-Clustering Algorithm. The result of each of these algorithms is a consensus clustering. The three algorithms start from representing the individuals clustering as a hyper graph, where each clustering is represented by a hyper edge. The CSPA algorithm constructs a co-association matrix where its values are weights associated to each two elements (nodes), corresponding on hyper graph representation, to the edge between the elements. After that, it's applied the graph partitioning algorithm METIS that reduces the size of the graph by collapsing the vertices and edges, and after getting a partition from the smaller graph, the METIS then uncoarsen it to construct a partition for the original graph [20]. The greater the weight of the edge, the greater is the similarity between elements. Thus, on the first phase of METIS, this is the criterion used to join the common vertices, edge with the highest weight. The partition obtained by the smaller graph, is through an algorithm based on similarities. The HGPA algorithm applies also a partitioning algorithm, HMETIS, corresponding to hyper graphs [21]. Eliminating the minimal number of hyper edges (all hyper edges have the same weight) that corresponds to the relationships that occur less often. In MCLA algorithm is constructed a similarity matrix between clusters in terms of the amount of elements grouped in respective clusters. In hyper graph representation the clusters are nodes and the edges between two nodes have weight which is the similarity between the clusters. By the partitioning algorithm METIS, one obtains clusters called meta-clusters, and is calculated the times that each element appears in a meta-cluster. Being each element assigned to the meta-cluster to which appears more often [11]. Now, from these consensus clustering (associated to the three algorithms) is possible to search for final consensus clustering, the one which maximizes the shared Mutual Information. These authors, unlike the previous ones, use different algorithms to obtain the individuals clustering, and also pre define the desired number of clusters in the consensus clustering.

C.
Clustering validation indices Cluster validity can provide a quantitative answer, through validation indices, for the need of validate the output of a clustering algorithm. A validity index can be seen as a factor which assesses the goodness of a clustering [25]. The validation indices are applied according to the criteria employed which can be classified as external or internal criterion. Regarding to the external criteria a clustering is evaluated by the knowledge of a truly data clustering and according this criteria the usual indices applied are the, for instance, the Adjusted Rand [16] and Normalized Mutual Information [33][34]. The Adjusted Rand index (ARI) and Normalized Mutual Information (NMI) are, perhaps, the most popular measures of agreement between clustering. The ARI is based on agreements and disagreements of pairs of elements of two clustering and are computed by the equation (1). Where, U and V are two different clustering of the data set, n is the number of elements, the clustering U has clusters, and the clustering V has clusters, , is the number of elements that are in cluster of the clustering U and in cluster of the clustering V; . , is the total of elements in cluster and . , is the total of elements in cluster .
In Information Theory, the Normalized Mutual Information (NMI) is a symmetric measure to quantify the statistical information shared between two distributions [33][34]. Considering the two clustering U and V and the same descriptions of the terms of the ARI's expression, as above, the NMI is given by the equation (2). (2) ARI and NMI can take values in the interval [0,1]. The value equal to 1, means perfect agreement between the two clustering unlike the values close to 0 (even negative values for ARI) indicating total disagreement.

D.
The combination of the clustering algorithms, consensus clustering algorithms and clustering validation indices Faced with the existence of different techniques to build the consensus clustering, some works have been worried about the problem of validate the resulting consensus clustering. We describe below some experiments proposed to compare the performance of different consensus clustering, taking into account some measure which identifies the base clusterings that lead to the best consensus clustering.
Let Z be a set of n data, let P={ 1 , … , } be a clustering of Z into K clusters. A base clusterings set P is as set of N clustering of Z, P={ 1 , … , }. Let * be a consensus clustering and be the true clustering of the data. In [14], the authors propose four diversity measures for the base clusterings and the consensus clustering, based on ARI. The various base clusterings are obtained by K-means algorithms, with different initializations, and the consensus clustering is obtained by the EAC technique. The accuracy of a consensus clustering is with respect to a known true clustering of the data. Formally, the first diversity measure, 1 ( , * ), is defined as the average diversity between each clustering ∈ and the consensus clustering, * . It can be seen in Equation (3) (4)). The third and forth diversity measures, All these measures are compared and the authors conclude that only the first and the third measures present some relation with the consensus clustering quality, and that one should select the base clusterings with median values of 1 ( , * ) or 3 ( , * ) to get the best consensus clustering.
In another work [13] the authors evaluate the accuracy of the consensus clustering using 24 different scenarios, each one describing the base clustering algorithms and the consensus function applied. The base clustering algorithms used are, K-means, SL, AL and also these algorithms considering sub samples of the data. The consensus functions derive from the algorithms, CSPA, HGPA, by co-association matrix and by a matrix representing the data rather than similarities. The accuracy of the consensus clustering is like in [14]. After performed a set of experiments comparing the different scenarios, they conclude that the best can be using base clusterings obtained by K-means algorithms and the consensus function in which interpret the consensus matrix of the base clusterings as data instead of similarity.
In [5] the authors propose a new measure, to select the best consensus clustering among a variety of them. This measure is based on a concept of average cluster consistency, ( , * ), which measures the average similarity between each clustering of the base clusterings and a consensus clustering * . The definitions of measures can be seen by Equations (7) and (8), where, ≥ * , being and * the number of clusters of the clustering and * , respectively, and | | is the cardinality of the set of common data to the ℎ and ℎ clusters of the clustering and * , respectively. The quality of the consensus clustering, * , is calculated by the Consistency index, ( , * ) [8], which measures the quantity of data shared in matching clusters of the real clustering and the consensus clustering and it is defined by Equation (9), where is the number of clusters of the true clustering.
In the experiences, the base clusterings are obtained, among others algorithms, by K-means, SL, AL, CL, and also considering join clustering obtained by these algorithms. The number of clusters is randomly chosen between 10 and 30. The consensus clustering is obtained by the EAC technique and also by others two variants of the WEACS technique. This technique is an extension of the EAC, being the weighted co-association matrix and using sampling of the data. The accuracy of a consensus clustering is with respect to a known true clustering of the data. The authors conclude that the best consensus clustering is the one that achieves the highest ( , * ) value.

A.
Clustering variability and stability Many authors for the purpose of validate clustering, analyze the stability / variability / diversity of the clustering obtained by data resampling. The different works differ on the following issues: i) The methodology for resampling data, as, bootstrap [22], [25] or cross-validation [23], [24], [35], [3], [32]; ii) Clustering algorithm applied to the samples, as, K-means and hierarchical [23], K-means and EM [3], K-means, EM and hierarchical [25], [32] or Kmeans, KNN and hierarchical [27]; iii) Validation criteria, as, internal [22][23] or external [15]; iv) Validation indices, as, Gap [24], Adjusted Rand [23,15,3] or based on Information Theory [3], [32]. As the interest of this paper is about the clustering algorithm variability, one can mention some work concerned with this, existing in the literature, as for instance, the work in [25], in which, the authors interpret an algorithm of clustering as a statistical estimator and examine the variability of this estimator. This variability can be described as follows. Considering a data set with size n, Y, get k samples, by resampling, each one with the same size n, 1 , … , . To apply to each sample, a clustering algorithm, designated by = +1 =1 Another work in [3] analyzes the variability of a clustering by data resampling based on a weighted cross-validation procedure. From 20 weighted samples and the original sample moreover by a clustering algorithm as K-means, one gets clusterings for the original sample and for the weighted samples. It is measured the agreement between the clustering of the original sample and each one of the clustering of the weighted samples, by the Adjusted Rand index. Once having the 20 values of the Adjusted Rand index, its standard deviation is used to measure the clustering variability.

B.
Our work In this study, considering the hierarchical algorithms, we propose to evaluate the clustering variability by external criteria, and from this, the implications on the performance of three consensus clustering techniques. The comparison between the clusterings obtained is made by ARI and the measure of the clustering's variability is the standard deviation of ARI [3]. From these clustering, it is applied the consensus clustering techniques referred, and to evaluate the accuracy of these techniques, are applied, ARI and NMI, which have very similar behavior. Intending to analyze the clustering variability delivered by hierarchical algorithms, the first hypothesis under study is, whether the different processing forms of the hierarchical clustering, affects the respective variability. Regarding to the other hypothesis about the consensus clustering, we perform some studies to analyze the performance of some consensus clustering techniques, taking account the variability of the hierarchical base clusterings set, therefore, the second hypothesis under study is whether the performance of the consensus techniques depends on the variability of the base clusterings set. To test these hypotheses, a set of experiments are implemented.

IV. EXPERIMENTAL DESIGN
The following subsections, report the experiments in order to validate the hypotheses under study.

Data sets
In order to reach the variety of situations regarding to the data sets, different data sets simulated and real are considered. The differences are with respect to cardinality, number of cluster, the shape of the clusters, as, well separated clusters and quite close clusters and clusters with distinct densities. Also it is considered data sets with added noise and with overlapped clusters. A description for each data set is given below. Fig. 1 to Fig. 7 are represented the 2-dimensional simulated data sets used in our experiments and in the Table 2 are the details of those data. The data sets have random data (according to their partition into clusters) and Normal distribution. Some of them are data sets used by others papers. On some data sets, noises randomly uniformly distributed are added. There are seven data sets assigned, D1-4g, D2-3g, D2-3gr10 (data sets D2-3g, with 10% noise), D3-3g, D3-3gr10 (data sets D3-3g, with 10% noise), D4-10g [12] (data set having overlapped clusters) and D4-10gSS [12] (data set D4-10g, without overlapped clusters).

Real data sets
In the experiments we apply seven real data sets which are taken from the UCI Machine Learning Repository [19]. These data sets, besides different cardinalities, number of clusters and shape of the clusters, also have different dimensionality, wherein, some of them are used in medical studies. These data sets are described below and a summarized in the Table 3.
• Iris: Refer to types of iris flowers. The attributes are four, sepals length, sepals width, petals length and petals width. The clusters of iris plant are, Setosa, Versicolour and Virginica.
• Wine: Consists of chemical analysis of thirteen constituents found on wines growing in the same region. The data clusters are according to the origin of wine which can be from three different cultivars.  /dx.doi.org/10.22161/ijaers.4.6.14  ISSN: 2349-6495(P) | 2456-1908(O) Recencymonths since last donation, Frequency -total number of donation, Monetary -total blood donated, and Time -months since first donation. The data are then divided into two clusters representing whether the donor donated blood in March 2007 (yes or no) [17].
• WDBC-Wisconsin Diagnostic Breast Cancer: Contains 30 variables computed from digitized images of aspirated fine needle of a breast mass, which describing the characteristics of a cell nuclei presents. The clusters are two, meaning the diagnosis, benign or malignant [29].

B.
Generation of the base clusterings Intending to produce the base clusterings set, to each data set are applied the clustering algorithms, SL, CL, AL and W (with the Euclidean distance). For each data set, it is considered data resampling without replacement, yielding 50 data samples of size (2⁄3)N, where N is the cardinality of the data set. For the real data sets, before the resample, first the data are normalized to mean 0 and standard deviation 1. Each clustering algorithm is applied to samples, obtaining the corresponding base clusterings set. As the hierarchical algorithms produce a hierarchy of partitions, cutting the dendrogram in accordance with the number of pre-established clusters, results in a clustering. So, each base clusterings set delivered has the same number of clusters according to the known data partition. To analyze the variability of the base clusterings set, the clustering are compared to each other only on the data shared by them. Taking account that to get the consensus clustering all the base clusterings must have the same data, it is added to each clustering the remained data, from the data set, that were not selected in the sample.

C.
Obtaining the consensus clustering For each base clusterings set, to generate the consensus clustering three consensus clustering techniques are applied, namely one based on Voting scheme [8] (TEC.1); Evidence Accumulation Clustering [9] (TEC.2) and other based on Mutual Information and Hypergraphs [33,34] (TEC.3).

D.
Results and discussion 1. Variability of hierarchical clustering algorithms Given the data set and the clustering algorithm, from the 50 base clusterings obtained, it is calculated the ARI between them and consequently the measure of clustering variability which is the average ARI value. These results are stated in the Table 4. In order to compare the variability of the hierarchical clustering algorithms, it is applied the hypothesis test (unilateral) of variances' equality, the F Snedecor test. Wherein, we can statistically conclude about the relation of the clustering algorithms variances. In the Table 5 are displayed these relations.

International Journal of Advanced Engineering Research and Science (IJAERS)
[ Vol-4, Issue-6, Jun-2017]  https://dx.doi.org/10.22161/ijaers.4.6.14  ISSN: 2349-6495(P) | 2456-1908  Analyzing the variability results in the tables 4 and 5, for almost all the data sets, the clustering algorithm which presents greater average ARI also presents the lowest variability, with exceptions, for the simulated data set, D4-10g and the real data set Blood.

(O)
Regarding the simulated and real data sets, W and AL present at almost all the data sets, the lowest variability, and at one of the cases, W achieves variability equal to 0 and average ARI equal to 1. By other hand, SL presents at almost all the data sets the greater variability with the exception of D2-3gr10, Iris and Ecoli data sets. For some data sets, some clustering algorithms present equal and smaller variability than the remaining algorithms. For instance, for the data set Ecoli, SL and AL clustering algorithms and for data sets, Haberman's Survival and Breast Tissue, CL, AL and W clustering algorithms.
Observing the effect of data noise on variability, it is noted that for data sets D2-3gr10 and D3-3gr10, the CL clustering algorithm show the relatively most sensitivity to the noise. Regarding data sets D4-10g and D4-10gSS, all the clustering algorithms are affected by overlapping clusters. By the experimental results, we can state that, for each data set, some clustering algorithms have different variability. Now, analyzing the graphic representation with the characteristics of the simulated data sets, and taking into account the differences between the hierarchical algorithms, as well as, the result of their variability, we can set the following statements. remaining cluster, so they are not compact and neither elongated (see Table 2 and Fig. 1). It is somehow expected that the SL and CL produces less stability, and is mainly due to the result of its higher variability in relation to AL and W.
• For data set D2-3g, having all clusters the same cardinalities, C1 and C2 have smaller variance than the remaining cluster, are then more compact, also smalls with spherical shape and close to each other (see Table 2 and Fig. 2). After that, is expected that CL and W produce more stable clustering, according to the lowest variability of these clustering in relation to SL and AL. • With regard to data set D3-3g, where all the clusters have the same cardinalities and spherical shapes, 2 of them (C1 and C2) are less compact than the remaining one, also slightly apart and having larger diameters (see Table 2 and Fig. 4). It is expected that SL are less stable and moreover, presents a higher variability compared to the others clustering algorithms.

•
Taking account the data set D4-10gSS (without overlapped clusters), wherein the clusters are different from each other, have different cardinalities, in general, they are compacts and some of them slightly separated (see Table 2 and Fig. 7), it is expected that SL clustering cope less stability, resulting in higher variability, with regard to the remaining clustering algorithms.

•
Regarding to the data set D4-10g, having overlapped clusters (see Table 2 and Fig. 6), the variability values of all the clustering algorithms increase in relation of the corresponding data set without overlapped clusters.

•
As CL clustering algorithm is more sensitive to outliers or noisy data, the variability values for data sets D2-3gr10 and D3-3gr10 (see Table 2 and Figs. 4, 6) are expected. Faced the results delivered, we can confirm the hypothesis under consideration, that, different processing of hierarchical clustering can influence the respective variability.
2. Impact on consensus In order to compare the consensus clustering obtained by the three techniques with the known clustering of the data sets, the ARI and also the NMI are calculated. For each data set and each base clusterings derived by the hierarchical algorithms, the Table 6     V. CONCLUSIONS In this paper we proposed to analyze empirically the clustering variability derived by the hierarchical algorithms, such as, Single Linkage, Complete Linkage, Average Linkage and Ward method, and from it, take knowledge about the performance of three techniques of consensus clustering, which are, Voting algorithm [8], Evidence Accumulation Clustering [9] and one based on Mutual Information and hyper graphs [13,14]. Some data sets, synthetic and real, are used for this purpose. These performances were quantified considering measures by external criteria, applying the Adjust Rand index and the Normalized Mutual Information. Through of these researches we search to define clustering's profiles achieved by the hierarchical algorithms according to their variability, and from that, decide which strategy of consensus clustering to apply. These studies are performed by experimentally verify two hypotheses under consideration, one about, the difference of variability of the hierarchical clustering, wherein the analysis of their known properties led to the identification of a new property of these algorithms based on their variability. Another hypothesis studied, is the possibility of choosing the most appropriate consensus strategy, according to a particular type of clustering variances. Actually, when the consensus clustering techniques present different performances, in most of the cases the consensus technique based on Mutual Information and hyper graphs outperforms the others, with hierarchical clustering algorithm which have relatively higher variances.