MultiDendrograms
A hierarchical clustering tool
Index
Description
MultiDendrograms is a simple yet powerful program to make the Hierarchical Clustering of real data, distributed under an Open Source license. Starting from a distances (or weights) matrix, MultiDendrograms calculates its dendrogram using the most common Agglomerative Hierarchical Clustering algorithms, allows the tuning of many of the graphical representation parameters, and the results may be easily exported to file. A summary of characteristics:
 Multiplatform: developed in Java, runs in all operating systems (e.g. Windows, Linux and MacOS).
 Graphical user interface: data selection, hierarchical clustering options, dendrogram representation parameters, navigation across the dendrogram, deviation measures.
 Hierarchical Clustering algorithms implemented: variablegroup Single Linkage, Complete Linkage, Unweighted average, Weighted average, Unweighted centroid, Weighted centroid and Ward.
 Representation parameters: size, orientation, labels, axis, etc.
 Deviation measures: Cophenetic Correlation Coefficient, Normalized Mean Squared Error and Normalized Mean Absolute Error.
 Export: ultrametric matrix, dendrogram details in text and Newick tree formats.
 Plot: dendrogram image in JPG, PNG and EPS formats.
 Commandline: available direct calculation of hierarchical clustering from the commandline, without the need to use the graphical interface.
MultiDendrograms implements the variablegroup algorithms in [1] to solve the nonuniqueness problem found in the standard pairgroup algorithms and implementations. This problem arises when two or more minimum distances between different clusters are equal during the amalgamation process. The standard approach consists in choosing a pair, breaking the ties between distances, and proceeds in the same way until the final hierarchical classification is obtained. However, different clusterings are possible depending on the criterion used to break the ties (usually a pair is just chosen at random!), and the user is unaware of this problem.
The variablegroup algorithms group more than two clusters at the same time when ties occur, given rise to a graphical representation called multidendrogram. Their main properties are:
 When there are no ties, the variablegroup algorithms give the same results as the pairgroup ones.
 They always give a uniquely determined solution.
 In the multidendrogram representation for the results one can explicitly observe the occurrence of ties during the agglomerative process. Furthermore, the height of any fusion interval (the bands in the program) indicates the degree of heterogeneity inside the corresponding cluster.
Comparison with other applications
How do other applications deal with ties?

Ignore ties, uncommented in their respective manuals:
 Mathematica: Agglomerate and DirectAgglomerate functions in Hierarchical Clustering Package
 MATLAB: linkage function in the Statistics Toolbox
 R: hclust function in the stats package, and agnes function in the the cluster package
 Stata: cluster and clustermat commands

Report the existence of ties, and break them using the order of the observations in the input file:
 SAS: CLUSTER procedure

Break ties using the order of cases in the input file, and recommend the comparison with cases sorted in different random orders:
 PASW Statistics, formerly SPSS Statistics: Hierarchical Clustering Analysis procedure
How do I know if there are ties in my data?
 Most people would say I do not have problems with tied distances, however you cannot be sure unless the used software explicitly tells you so.
 In MultiDendrograms tied distances can be easily noticed in the dendrogram plots, in the dendrogram navigation window, and in the exported tree and Newick files.
Reference
[1] 
Solving Nonuniqueness in Agglomerative Hierarchical Clustering Using Multidendrograms
Alberto Fernández and Sergio Gómez Journal of Classification 25 (2008) 4365. (pdf) (doi) (Springer) 
Download
Please cite [1] if you use MultiDendrograms in your publications.
 Program (manual included): multidendrograms3.2.0.zip
 Manual: multidendrograms3.2manual.pdf
 Source code: multidendrograms3.2.0src.zip
Installation
No installation needed, just unzip multidendrogramsxxx.zip and run multidendrograms.bat (Windows), multidendrograms.sh (Linux) or multidendrograms.jar (all OS). Java version 6 (also known as Java 1.6) or higher is required.
Gallery
History
MultiDendrograms 3.2:
 New format for dendrogram navigation and save as text file
MultiDendrograms 3.1:
 Data in triangular form
MultiDendrograms 3.0:
 Scrollbars in dendrograms panel
 Commandline direct calculation of multidendrogram
 Ward hierarchical clustering
 Check if new version is available
 Confirmation before closing
 Improved performance
 Major source code refactoring
MultiDendrograms 2.1:
 Export dendrograms to Newick format
 Show calculation progress
 Improved GUI
 Improved performance
MultiDendrograms 2.0:
 Completely new multiplatform (Windows, Linux, MacOS, etc.) application
 Added Graphical User Interface (GUI)
 Control of the dendrogram appearance
 Navigation through the dendrogram details
 Accepts distance and similarity matrices
 Export dendrograms to JPG, PNG and EPS
 Calculation of ultrametric deviation measures
MultiDendrograms 1.0:
 Windows commandline application to compute multidendrograms
 Windows commandline application to compute ultrametric matrices
 Windows commandline application to generate EPS plots