Useful links
Click on one of the categories below to go quickly to the corresponding link section.
Categories:- Benchmarking and data sets,
- Algorithms for Graph Matching and Graph Edit Distance ,
- Graph Drawing,
- Combinatorial maps,
- Graph kernels,
- Other TCs,
- Various links
Benchmarking and data sets
- The Graph Database < http://mivia.unisa.it/datasets/graph-database>
- IAM Graph Database Repository for Graph Based Pattern Recognition and Machine Learning
- Chemistry datasets provided either by the GREYC laboratory .
- The LEMS database of shapes
- ILPIso datasets
- The Graph Data Repository For Graph Edit Distance (GDR4GED)
- Results of ICPR 2016 contest on Graph edit distance
- Challenge 1: computation of the exact or an approximate graph edit distance
- Challenge 2: computation of a dissimilarity measure for graph classification
- ICPR 2016 Competition on Subgraph Spotting in Graph Representations of Comic Book Images (SSGCI)
- A Benchmark Data Sets for Graph Kernels
This page contains collected benchmark data sets for the evaluation of graph kernels. The data sets were collected by Kristian Kersting, Nils M. Kriege, Christopher Morris, Petra Mutzel, and Marion Neumann with partial support of the German Science Foundation (DFG) within the Collaborative Research Center SFB 876 “Providing Information by Resource-Constrained Data Analysis”, project A6 “Resource-efficient Graph Mining”.
The /ARG Database/is a huge collection of labeled and unlabeled graphs realized by the MIVIA Labof the University of Salerno (Italy). The aim of this collection is to provide the graph research community with a standard test ground for the benchmarking of graph matching algorithms.The database is organized in two section: labeled and unlabeled graphs.
Both labeled and unlabeled graphs have been randomly generated according to different generation models (2D, 3D and 4D regular meshes, 2D, 3D and 4D nearly-regular meshes, bounded valence, nearly-bounded valence, randomly generated) regular, each involving different possible parameter settings.
As a result, 168 diverse kinds of graphs are contained in the database. Each type of unlabeled graph is represented by thousands of pairs of graphs for which an isomorphism or a graph-subgraph isomorphism relation holds, for a total of 143,600 graphs. Furthermore, each type of labeled graph is represented by thousands of pairs of graphs holding a not trivial common subgraph, for a total of 166,000 graphs. Currently graphs have size ranging from a few up to thousands of nodes.
The above website aims at introducing a repository of graph data sets
and corresponding benchmarks, covering a wide spectrum of different
applications. We expect that the graph repository provides a major
contribution towards promoting the use of graph based representations
and making graph based pattern recognition and machine learning
algorithms better comparable against each other. In future work we
want to further expand the graph set repository. Towards this end, we
highly encourage the community not only to use the available sets for
developing and testing their algorithms, but also to integrate their
own graph sets into our repository.
This repository is well documented in the following paper:
Riesen, K., Bunke, H.: IAM graph database repository for graph
based pattern recognition and machine learning, in N. da Vitoria Lobo et
al. (Eds.): Structural, Syntactic, and Statistical Pattern Recognition,
Springer, LNCS 5342, 2008, 287 - 297
Several databases of graphs corresponding to
classification or regression problems in
chemistry.
The Vision Group at LEMS proposes several shape databases composed of 99, 216 and 1032 shapes. This database is essential for people dealing with shape recongition. Interested readers may also have a look at this small database of 33 holed shapes. This database may (and should) be used in conjunction with one of the LEMS database.
Four synthetic datasets are provided. Each of them is composed of several triplets (pattern graph, target graph and groundtruth). The graphs have been randomly generated thanks to the Erdos-Renyi model with or without the constraint of producing connected graphs. For each option, one version of the dataset has an exact mapping (equal labels between matched vertices/edges) whereas an other version includes noise on label values. The groundtruth information gives the one-to-one vertex mapping involving the minimal cost assignment.
An additionnal dataset is composed of graphs generated from a real application. Target graphs are region adjacency graphs representing architectural plans. Pattern graphs are structural representations of graphical symbols. The groundtruth information gives the instances of patterns in the target, which correspond to instances of symbols on the plans.
To precisely evaluate the accuracy of Graph edit distance (GED) methods, Ground-Truth with information about the quality of the matching between pairs of graphs (low level information) is required in addition to higher level information (label of the object represented by the graph, classification rates) usually provided and used as GT to evaluate the methods. Most of the publicly available Graph repositories with associated ground-truths are dedicated to evaluating graph classification or exact graph matching methods and so the matching correspondences as well as the distance between each pair of graphs are not directly evaluated.
This Dataset called GDR4GED consists of a graph database repository annotated with low level information like graph edit distances values (optimal and sub-optimal) and also their corresponding node to node matching (Edit Paths). A set of performance evaluation metrics that can be computed using this dataset to assess the performance of GED methods is also proposed in the conference paper (GbR2015) describing this repository (available with the repository).
Measuring the dissimilarity between graphs is a key step in data analysis based on graph representations. This contest focusses on the definition and the computation of general dissimilarity measures through two challenges:
SSGCI is focused on the challenge of searching a query attributed graph in a database of attributed graphs and to provide the node correspondences between query and result graphs.
The graph datasets for the competition have been constructed by representing the contents of scanned comic book images by attributed region adjacency graphs (RAGs) containing attributes on nodes. The graphs are provided in GraphML format.
Algorithms for Graph Matching and Graph Edit Distance
- Software for factorized graph matching
This page contains software and instructions for factorized graph matching (FGM):
- Deformable Graph Matching: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013 F. Zhou and F. De la Torre/li>
- Factorized Graph Matching: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012 F. Zhou and F. De la Torre
- spectral matching (SM) :
A Spectral Technique for Correspondence Problems Using Pairwise Constraints International Conference on Computer Vision (ICCV), 2005 M. Leordeanu and M. Hebert,
- spectral matching with affine constraints (SMAC):
Balanced Graph Matching Advances in Neural Information Processing Systems (NIPS), 2006 T. Cour, P. Srinivasan and J. Shi,
- graduated assignment (GA):
A Graduated Assignment Algorithm for Graph Matching IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 1996 S. Gold and A. Rangarajan,
- probabilistic matching (PM):
Probabilistic Graph and Hypergraph Matching IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008 R. Zass and A. Shashua,
- integer projected fixed point method (IPFP):
An Integer Projected Fixed Point Method for Graph Matching and MAP Inference Advances in Neural Information Processing Systems (NIPS), 2009 M. Leordeanu, M. Hebert and R. Sukthankar,
- re-weighted random walk matching (RRWM):
Reweighted Random Walks for Graph Matching European Conference on Computer Vision (ECCV), 2010 M. Cho, J. Lee and K. Lee.
- Several Efficient Methods for Graph Matching and MAP Inference by Marius Leordeanu
- Solvers for QAP problems, .
See also the publications of Paul Swoboda.
- LAD Software <a href="http://liris.cnrs.fr/csolnon/LAD.html">
- VF Library
- Graphm package <http://cbio.ensmp.fr/graphm/>
- Reweighted Random Walks for Graph Matching
Paper, presentation and code. Abstract follows:
In this paper, we introduce a random walk view on the problem and propose a robust graph matching algorithm against outliers and deformation. Matching between two graphs is formulated as node selection on an association graph whose nodes represent candidate correspondences between the two graphs. The solution is obtained by simulating random walks with reweighting jumps enforcing the matching constraints on the association graph. Our algorithm achieves noise-robust graph matching by iteratively updating and exploiting the confidences of candidate correspondences. In a practical sense, our work is of particular importance since the real-world matching problem is made difficult by the presence of noise and outliers. - Bipartite Matching
An easy-to-use C++/Matlab toolbox for finding optimal correspondences between the elements of two sets, i.e. solving bipartite matching problems (aka linear sum assignment problems), together with their dual (aka labeling problems). Equivalently, the toolbox allows to design dissimilarity measures between two sets, based on optimal correspondences. The following problems are solved:
- symmetric MIN-LSAP / minimal-cost perfect bipartite matching
- asymmetric MIN-LSAP / minimal-cost maximum bipartite matching
- MIN-LSAP with Error-correction (LSAPE) / minimal-cost error-correcting bipartite matching
- Graph Edit distance libraries
- A software toolkit for graph edit distance computation by Kaspar Riesen
- A* algorithm with bipartite Heuristic (exact graph edit distance)
- Beam search
- Bipartite Graph edit distance using Assignment Algorithms
- GEDLIB by David Blumenthal
GEDLIB is a C++ library for (suboptimally) computing edit distances between graphs using various state-of-the-art methods. GEDLIB allows you to build your graphs via the C++ API or load them from GXL files. Several benchmark datasets are distributed with GEDLIB. For these datasets, GEDLIB provides predefined edit cost functions. You can easily extend GEDLIB by implementing new edit cost functions or new methods for computing the graph edit distance.
A python binding of this library may be found here. This binding is a side product of the project AGAC
- Quadratic (QAP) and linear (LSAP) based approaches to solve the graph edit distance
- ILPIso
- GEDEvo
GEDEVO, is a software tool for solving the network alignment problem. GEDEVO stands for Graph Edit Distance + EVOlution and it utilizes the evolutionary computing strategies for solving the so-called Graph Edit Distance problem.
Graph Matching and MAP Inference in Markov Random Fields are important problems in computer vision that arise in many current applications. This page presents several efficient methods for graph and hyper-graph matching, MAP inference and parameter learning. It also provides links to the publications, code and the datasets, on which experiments have been performed and comparisons with other current approaches.
LAD is a software (written in C) for solving the subgraph isomorphism problem, the goal of which is to decide if there exists a copy of a pattern graph in a target graph. It is distributed under the CeCILL-B FREE SOFTWARE LICENSE.
This is a graph matching library,developed by the Mivia Lab of the University of Salerno (Italy) which provides several algorithms for graph isomorphism, graph-subgraph isomorphism and graph monomorphism. The library is an efficient implementation of the VF2 graph matching algorithm, currently among the fastest ones, able to process very huge graphs due to its linear memory complexity.
The library is written in C++, and can be easily adapted to use different formats for the graphs, and different types of attributes. The web page contains the library documentation, including some usage examples, and the source files of the library.
GraphM is a tool for approximate solving of large scale graph matching problems. The package was developed in the Centres for computational biology and for mathematical morphology at Mines ParisTech.
Graph edit distance is one of the most flexible mechanisms for error tolerant graph matching. We present here some links to several software computing this distance.
The present software provides both a library and a graphical interface to compute the edit distance between sets of graphs using various heuristics. Namely:
This software provides a set of tools allowing to compute fast sub optimal estimation of the edit distance using LSAP formulations and more precise ones using an exact formulation of the GED problem based on quadratic minimization procedures (QAP). See also the specific page for LSAPE problems (E for Edition)
The QAP component of this software is based on the following paper while the LSAP component is based on this one.
ILIPIso is a software that implements the approach decribed in [Le Bodic et al. 2012] for substitution-tolerant subgraph isomorphism. It aims at finding assignments between a pattern graph and a target graph. Every vertex/edge of the pattern graph is matched to a different vertex/node in the target graph. Differences between attributes are allowed but penalized by a cost function. The matching is done so that the global assignment cost is minimal. It allows to process graphs whose vertices and/or edges can be labeled with numerical vector of any dimension.
The software can be configured in order to search for several instances of the matching, and return them by non-decreasing cost order. Several settings are available to discard solutions previously found, when searching for a new one.
Currently, only an online version and a compiled version are available (other architectures on demand), but source code should be provided soon.
Graph Drawing
- Graph Drawing <http://www.graphdrawing.org>
A site about graph drawing, mantained by some of the organizers of the International Symposium on Graph Drawing. The site contains links to several graph file formats, most of them based on XML.
Combinatorial Maps
- Descartes
- Girl Library
- An implementation of Combinatorial Pyramids
Descartes is a segmentation software based on a combinatorial map encoding of the partitions. It can import/export segmentation and is designed for an interactive use: The user select the regions to split or to merge.
A library which allows to structurate a partition thanks to combinatorial maps. Multiple adjacency and inclusions are encoded. The data structure may be efficiently updated after both split and merge operations.
A library which allows to design hierarchical segmentation algorithms using pyramids of combinatorial maps (combinatorial pyramids). Multiple adjacency are explicitely encoded and inside/outside (i.e. inclusion) relationships may be effciently computed. This implementation is based on an implicit encoding which allows to retreive any partition of the pyramid at a low memory cost.
Graph kernels
-
CHEMCPP
ChemCpp is a C++ toolbox for chemoinformatics focusing on the computation of kernel functions between chemical compounds.
Other TCs
- STRUCTURAL & SYNTACTICAL PATTERN RECOGNITION (TC2)
- Graphics Recognition (TC10)
- Reading systems (TC11)
IAPR's Technical Committee 2 on SSPR promotes interaction among researchers working on Structural and Syntactical Pattern Recognition (SSPR).
Graphics Recognition is an exciting field of pattern recognition. Along with optical character recognition (OCR) and document layout analysis, it forms the broader area of document image analysis and recognition. IAPR's Technical Committee 10 on Graphics Recognition promotes interaction among researchers working in document image analysis in general, and graphics recognition in particular.
Technical Committee 11 is concerned with theory and application of Reading Systems. The goal is to understand and develop systems which are able to analyze media containing character symbols and will return an encoded representation of the text content and structure.
This area covers research in the off-line recognition of printed texts and handwritten material, research on the automatic on-line recognition of handwriting as in pen-computing applications, as well as research in the analysis of electronic documents such as web pages. Applications include digital libraries, pen-based computing, check and mail reading, signature verification, web mining and content repurposing, web security using human interactive proofs, textual content analysis in videos, and historical document preservation and archiving.
The research topics cover all levels of processing, starting with the image or planar coordinates delivered by sensory technology, the necessary image and signal-processing stages, the subsequent segmentation and normalization stages, to feature extraction and selection, classification methods, statistical and syntactical pattern recognition and machine learning techniques.
Various Links
- Simbad FP7 European Projet
- GbR'2011 report
- Machine learning open source software
This project aims at undertaking a thorough study of several aspects of purely similarity-based pattern analysis and recognition methods, from the theoretical, computational, and applicative perspective. It aims at covering a wide range of problems and perspectives. It will consider both supervised and unsupervised learning paradigms, generative and discriminative models, and our interest will range from purely theoretical problems to real-world practical applications.
This report documents the discussion about challenges in graph-based representations held at the 8th IAPR - TC-15 workshop on graph-based representations in pattern recognition. The workshop took place in Münster (Germany) from the 18th to the 20th of May 2011. The discussion was initiated by four presenters (Milan Sonka, Luc Brun, Edwin Hancock, Pasquale Foggia) and moderated by Horst Bunke.
Open source tools have recently reached a level of maturity which makes them suitable for building large-scale real-world systems. At the same time, the field of machine learning has developed a large body of powerful learning algorithms for a wide range of applications. Inspired by similar efforts in bioinformatics (BOSC) or statistics (useR), the aim of mloss community is to build a forum for open source software in machine learning.