=================================================================== == Radatools 2.0 == == == == Copyright (c) 2010 by == == == == Sergio Gomez (sergio.gomez@urv.cat) == == Alberto Fernandez (alberto.fernandez@urv.cat) == == Javier Borge-Holthoefer (javier.borge@urv.cat) == == Alex Arenas (alexandre.arenas@urv.cat) == == == == 04/04/2010 == == == == All rights reserved, see LICENSE.txt == =================================================================== ----------- Description ----------- Radatools is a set of freely distributed applications to analyze Complex Networks. In particular, it includes very useful programs for Communities Detection and Mesoscales Search. The program 'Communities_Detection' takes a complex network in Pajek format (*.net) and outputs the best partition found. Since probably you will not have the network in Pajek format, we have added programs to create Pajek network files from text files with lists of links, 'List_To_Net', and from adjacency (or weights) matrices, 'Matrix_To_Net'. Another useful program is 'Connected_Subgraphs', which checks if a network is connected and, if not, creates separate network files for each connected component. The output of 'Communities_Detection' is a partition in a text format we call 'Lol format' (usually *-lol.txt). It contains information about the number of elements in the partition, the number of communities (or lists of elements), the size of each community, and the indices of the elements which belong to each community. The program 'Convert_Lol_To_Clu' converts a partition in Lol format to Pajek (*.clu) format. And 'Reformat_Partitions' replaces the indices of the elements by their respective names in the network Pajek file, allowing also to group elements in a way more appropriate for the analysis of the communities, or in a more readable form. 'Mesoscales_Search' makes use of the same algorithms implemented in 'Communities_Detection' for the determination of the whole mesoscale of a complex network. Adding a common resistance (in the form of a self-loop) to all nodes, and running its value between 'r_asymp' and 'r_max', it is possible to find the community structure at all resolution levels, from the macroscale (all nodes in one community) to the microscale (every node in a different community). The output is a table with the values of the resistance, modularity and number of communities, and the list of different partitions found (*lols.txt). The program 'Mesoscales_Fine_Tuning' improves the previous results by checking all known partitions for each value of the resistance, and allows the combination of different executions of 'Mesoscales_Search'. Is is also possible to use 'Reformat_Partitions' with the files *lols.txt. -------- Contents -------- The detailed list of programs available in this version of Radatools and its organization in folders is as follows: 01-Prepare_Network: - List_To_Net: Convert a file with the list of links of a graph into a network file in Pajek format (*.net) - Matrix_To_Net: Convert a file with a graph in matrix form into a network file in Pajek format (*.net) - Connected_Subgraphs: Split a network in Pajek format into its connected components 02-Find_Communities: - Communities_Detection: Community detection in complex networks by optimization of modularity - Mesoscales_Search: Mesoscales search in complex networks by optimization of modularity for variable common self-loops - Mesoscales_Fine_Tuning: Mesoscales fine-tuning 03-Reformat_Results: - Convert_Lol_To_Clu: Convert a file with a partition in Lol format into a file with a partition in Pajek format (*.clu) - Reformat_Partitions: Reformat partitions in Pajek and Lol formats changing nodes' indices by nodes' names --------------------- Communities detection --------------------- The program 'Communities_Detection' implements several algorithms for the optimization of modularity [1]. It is prepared to work with - unweighted [1] and weighted [2] networks - undirected [1] and directed [3] networks - positive [1] and signed [4] networks The main types of (directed and undirected) modularity definitions optimized are: - UN: unweighted modularity [1,3] - WN: weighted modularity [2,3] - WS: weighted modularity with positive and negative links [4] The optimization algorithms implemented are: - (h): exhaustive search - (t): tabu search [5] - (e): extremal optimization [6] - (s): spectral optimization [7] - (f): fast algorithm [8] - (r): fine-tuning by reposition - (b): fine-tuning by bootstrapping based on tabu search [5] It is possible to combine different optimization algorithms in a single run of the communities detection program. For instance, a possible heuristic is 'trfr', which consists in a tabu search, followed by a reposition fine-tuning, followed by the fast-algorithm, and finally another reposition fine-tuning. Some heuristics have a stochastic behavior, thus several executions could lead to different optimal partitions. In 'Communities_Detection' you can specify the number of repetitions to execute each of the heuristics, before proceeding to the next one. If the partition file exists before the execution, it is taken as the initial partition. Thus, running the program twice with heuristics 'trfr' is equivalent to running it once with heuristics 'trfrtrfr'. There are four levels of verbosity: - (n): none - (s): summary - (p): progress - (v): verbose For long runs, 'p' or 'v' should be preferred since they save in temporary files the best intermediate partitions. ---------- Heuristics ---------- A short description of each heuristic follows: - Exhaustive search (h): only possible for very small networks. - Tabu search (t): usually the most accurate heuristic for medium sized networks, however it cannot deal with large networks. If you use it, please cite [5]. - Extremal optimization (e): good tradeoff between accuracy and execution time, and useful for networks too large for tabu search. If you use it, please cite [6]. - Spectral optimization (s): very fast heuristic and useful for large networks, but with low accuracy. The implementation does not cope with directed networks. If you use it, please cite [7]. - Fast algorithm (f): very fast heuristic and useful for large networks, but with low accuracy. It may be used as a fine-tuning algorithm when combined with others. If you use it, please cite [8]. - Reposition algorithm (r): very fast fine-tuning algorithm. - Bootstrapping algorithm (f): fine-tuning algorithm based on tabu search. If you use it, please cite [5]. The recommended strategy is the following: - For networks with less than 1000 nodes, try "trfr" with 1 repetition - If succeeds in a reasonable time, erase the previous result and increment the number of repetitions to 10, 30 or a larger number - To check if the partition may be improved, try "esrfrtrfr" with some repetitions, and without erasing the previous partition - For larger networks, do the same but starting with "esrfr" or "esbrfbr" ----------------- Mesoscales search ----------------- The program 'Mesoscales_Search' implements the strategy in [5] for the determination of the community structure of complex networks at different resolution levels, thus finding the whole mesoscale, from all nodes in one community (macroscale) to every node forming its own community (microscale). Based on the addition of a common self-loop to all nodes, and the optimization of modularity [1,2,3] using the same heuristics explained before, the output of this program is formed by two files: *table.txt: contains four columns: 'r': the resistance (or self-loop) 'r - r_asymp': average strength, used in the plots to determine the most stable partitions 'Q_r': modularity of the best partition found for the given resistance 'N_r': number of modules of the best partition found for the given resistance *lols.txt: list of different partitions found while scanning the mesoscale (in Lol format), with the value of the resistance at which each partition becomes optimal This algorithm of mesoscale determination works better for weighted and unweighted undirected networks. If the network is directed, it is possible that the partition with all nodes in one community may not appear near 'r_asymp', as explained in [5]. In this case, some of the first partitions found should be discarded, namely those with more modules than the first partition with the minimum number of communities. Another alternative is the symmetrization of the network. The recommended strategy to succeed in the finding of the mesoscale structure is the following: - First, try 'Communities_Detection' to choose the most adequate optimization heuristic and the number of repetitions, and to estimate the time needed; if you want a mesoscale with a granularity of 100 values of the resistance, then 'Mesoscales_Search' will take about 100 times the execution time of 'Communities_Detection' - If possible, use Tabu search [5], since it has the property of starting the optimization from the best partition found in the previous resistance step; the similarity between these partitions allows an important speed-up, and a higher quality of the mesoscale found - If 'r_asymp' and 'r_max' differ in several orders of magnitude, use a non-uniform scanning of the resistance, e.g. by setting the parameter 'max_delta_loop_ratio' to 10.0 or higher; in this way you will have a larger resolution for smaller values of 'r - r_asymp', which is usually the most interesting part of the mesoscale It is possible to improve the mesoscales found using 'Mesoscales_Fine_Tuning'. This program checks all known partitions at every value of the resistance. The idea is that, during the mesoscales determination, it is common that the changes from one partition to the next one are found at values of resistance slightly higher than optimal, due to the similarity of modularity between both partitions. Moreover, if you run 'Mesoscales_Search' several times, you can merge all partitions in a single *lols.txt file (or in an additional file with name *lols-extra.txt), and let 'Mesoscales_Fine_Tuning' look for the best possible mesoscales structure. ----- Hints ----- - Each folder contains program files (*.exe), sample script files (*.bat or *.sh) for each program, and some test files. - It is highly recommended to have a look at the sample script and test files (e.g. with Notepad, vi or nano) before proceeding with your data. - Each folder has a "results' subfolder with the outputs obtained after running the sample scripts with the given test files. These result files may differ from the ones obtained running the sample scripts, due to the stochastic behavior of some of the implemented algorithms (e.g. 'Communities_Detection'). - The programs may be run directly from command line, but it is more convenient to create script files (*.bat or *.sh) which call the programs with the necessary lists of parameters. Thus, it is recommended that you make a copy of the companion script file (*.bat or *.sh) of the program you want to use, edit it with a text editor (e.g. Notepad, vi or nano), and run it. - When you run a program without parameters, an 'Usage' message shows the list of expected parameters. - Read 'LICENSE.txt' to know the conditions to use Radatools. - Read 'CHANGES.txt' to know the evolution of the versions of Radatools. -------- Programs -------- 01-Prepare_Network: - List_To_Net: Convert a file with the list of links of a graph into a network file in Pajek format (*.net) > List_To_Net.exe list_input_file net_output_file [ network_type ] list_input_file : text file containing a list of links inspect the sample test files 'test-list??.txt' net_output_file : name of the output network file in Pajek format (*.net) network_type : A | D | U also lowercase symbols also case-insensitive full names (Auto, Directed, Undirected) A = Auto D = Directed U = Undirected default => Auto in Auto, if the Graph is Symmetric, the output is Undirected for repeated Edges, only the last one is stored - Matrix_To_Net: Convert a file with a graph in matrix form into a network file in Pajek format (*.net) > Matrix_To_Net.exe matrix_input_file net_output_file [ no_link_string ] matrix_input_file : text file containing a list of links inspect the sample test files 'test-matrix??.txt' net_output_file : name of the output network file in Pajek format (*.net) no_link_string : string used to identify inexistent links within the matrix file default => 0 - Connected_Subgraphs: Split a network in Pajek format into its connected components > Connected_Subgraphs.exe net_name_without_ext net_name_without_ext : name (without the '.net' suffix) of the network file in Pajek format 02-Find_Communities: - Communities_Detection: Community detection in complex networks by optimization of modularity > Communities_Detection.exe log_level modularity_type heuristics repetitions [self_loop] net_name best_partition_name log_level : N | S | P | V also lowercase symbols also case-insensitive full names (None, ...) N = None S = Summary P = Progress V = Verbose modularity_type : UN | UUN | WN | WUN | WLA | WULA | WLUN | WS | WNN also lowercase symbols also case-insensitive full names (Unweighted_Newman, ...) UN = Unweighted_Newman UUN = Unweighted_Uniform_Nullcase WN = Weighted_Newman WUN = Weighted_Uniform_Nullcase WLA = Weighted_Local_Average WULA = Weighted_Uniform_Local_Average WLUN = Weighted_Links_Unweighted_Nonlinks WS = Weighted_Signed WNN = Weighted_No_Nullcase heuristics : [htsefrb]+ also uppercase symbols also single case-insensitive full names (Exhaustive, ...) h = Exhaustive t = Tabu s = Spectral e = Extremal f = Fast r = Reposition b = Bootstrapping repetitions : number of repetitions of the heuristics positive integer does not apply to [hfr] algorithms net_name : name of the network file in Pajek format (*.net) self_loop : if any, adds the given self-loop value to all nodes best_partition_name : name of the file to save best partition found if file exists, the partition in the file becomes the initial partition - Mesoscales_Search: Mesoscales search in complex networks by optimization of modularity using common self-loops > Mesoscales_Search.exe net_name heuristics repetitions [ num_steps max_delta_loop_ratio ] net_name : name of the network file in Pajek format (*.net) heuristics : [htsefrb]+ also uppercase symbols also single case-insensitive full names (Exhaustive, ...) h = Exhaustive t = Tabu s = Spectral e = Extremal f = Fast r = Reposition b = Bootstrapping repetitions : number of repetitions of the heuristics positive integer does not apply to [hfr] algorithms num_steps : number of intervals between r_asymp and r_max default => 100 max_delta_loop_ratio : ratio between the last and the first resistance intervals if 1, the sampling of r is uniform default => 1.0 - Mesoscales_Fine_Tuning: Mesoscales fine-tuning > Mesoscales_Fine_Tuning.exe net_name_without_ext net_name_without_ext : name (without the '.net' suffix) of the network file in Pajek format the following files should exist, as generated by Mesoscales_Search: - name.net - name-lols.txt - name-table.txt - name-lols-extra.txt (optional, with additional partitions to check) 03-Reformat_Results: - Convert_Lol_To_Clu: Convert a file with a partition in Lol format into a file with a partition in Pajek format (*.clu) > Convert_Lol_To_Clu.exe lol_file_name clu_file_name [ number_of_lines_to_skip ] lol_file_name : name of the input partition file in Lol format clu_file_name : name of the output partition file in Pajek format (*.clu) number_of_lines_to_skip : number of lines to skip at the beginning of the Lol file non-negative integer default => 0 - Reformat_Partitions: Reformat a partition in Lol format changing nodes' indices by nodes' names > Reformat_Partition.exe net_name clu_or_lol_name lol_out_name [ header_lines header_mode ] [ group_by justify_width skip_size ] net_name : name of the network file in Pajek format (*.net) clu_or_lol_name : name of the partitions file in Pajek format (*.clu) or Lol format in Lol format, the file may contain many partitions, e.g. those describing mesoscales lol_out_name : name of the reformatted partition file header_lines : number of lines of the header before a partition in Lol format non-negative integer default => 0 ignored for partitions in Pajek format (*.clu) header_mode : CH | NH | SH also lowercase symbols also case-insensitive full names (Copy_Header, ...) CH = Copy_Header NH = No_Header SH = Separator_Header default => No_Header group_by : number of columns for the nodes' names in the reformatted partition file positive integer default => 1 justify_width : width of the columns for the nodes' names positive integer default => 1 skip_size : modules smaller or equal to this size are skipped non-negative integer default => 0 ---------- References ---------- [1] M.E.J. Newman and M. Girvan Finding and evaluating community structure in networks Physical Review E 69 (2004) 026113 [2] M.E.J. Newman Analysis of weighted networks Physical Review E 70 (2004) 056131 [3] Alex Arenas, Jordi Duch, Alberto Fernández and Sergio Gómez Size reduction of complex networks preserving modularity New Journal of Physics 9 (2007) 176 http://dx.doi.org/10.1088/1367-2630/9/6/176 [4] Sergio Gómez, Pablo Jensen and Alex Arenas Analysis of community structure in networks of correlated data Physical Review E 80 (2009) 016114 http://dx.doi.org/10.1103/PhysRevE.80.016114 [5] Alex Arenas, Alberto Fernández and Sergio Gómez Analysis of the structure of complex networks at different resolution levels New Journal of Physics 10 (2008) 053039 http://dx.doi.org/10.1088/1367-2630/10/5/053039 [6] Jordi Duch and Alex Arenas Community detection in complex networks using extremal optimization Phys. Rev. E 72 (2005) 027104 http://dx.doi.org/10.1103/PhysRevE.72.027104 [7] M.E.J. Newman Modularity and community structure in networks Proc. Nat. Acad. Sci. USA 103 (2006) 8577 [8] M.E.J. Newman Fast algorithm for detecting community structure in networks Physical Review E 69 (2004) 066133